getting IOException: Premature EOF when running import.io -

- January 15, 2011

i have created crawler using import.io first issue faced import.io not identify data on webpage after clicking "detect optimal settings". asks "is data want extract still in browser?" data not highlighted click no. data still not highlighted. same thing happens extractor. proceeded issue, clicking yes when asked "is data want extract still in browser?" though data not highlighted. went on build crawler , works fine. put around 15k urls in start url page depth 0.

what happens out of 15k pages, around 10% of pages not crawled. checked log file , shows ioexception: premature eof against rows not crawled.

if manually go page in browser, page loads fine , in same format in trained crawler. tried train pages showed error, doesnt help.

how can around error?

as responded support ticket, thought put information here well. error related website detecting using crawler , blocking urls. suggest rerunning crawler increased "pause between pages", since passing through many pages, in order site not block you.

Search This Blog

Core code

getting IOException: Premature EOF when running import.io -

Comments

Post a Comment

Popular posts from this blog

php - Admin SDK -- get information about the group -

php - Form validation is not working in codeigniter -

ruby on rails - one-to-many through referance table -