Posts

Showing posts from December, 2017

INSTANT OR SLOW SCRAPING?

Image
A. Collect images and text using PHP proxy Chinese is hard to learn and it's hard to find the chinese words in dictionaries. So there is a tool called an annotator for users to copy and patse a string of words to find a series of words translated into English (or other languages). Now suppose you find a comic book about "The novel of three kingdoms" with explanation in chinese and you want to use annotator. Unfortunately in the internet most of the "3 kingdoms" pictured books have the text part saved as picture that can not be used as a string. But I happened to find out the following web site: http://www.e3ol.com It gives us in each page only one picture and one "real" text. From these pages, you can use the mouse to copy the text, but it is very difficult to copy the image (every time one image is clicked it changes to the next image). To easily capture images and text, I used jQuery and a little PHP to create a PHP proxy web page to