Python mechanize returns HTTP 429 error -
i trying automated task via python through mechanize
module:
- enter keyword in web form, submit form.
- look specific element in response.
this works one-time. now, repeat task list of keywords.
and getting http error 429 (too many requests).
i tried following workaround this:
adding custom headers (i noted them down website using proxy ) looks legit browser request .
br=mechanize.browser() br.addheaders = [('user-agent', 'mozilla/5.0 (windows nt 6.1) applewebkit/537.36 (khtml, gecko) chrome/41.0.2228.0 safari/537.36')] br.addheaders = [('connection', 'keep-alive')] br.addheaders = [('accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')] br.addheaders = [('upgrade-insecure-requests','1')] br.addheaders = [('accept-encoding',' gzip, deflate, sdch')] br.addheaders = [('accept-language','en-us,en;q=0.8')]`
since blocked response coming every 5th request , tried sleeping 20 sec after 5 requests .
neither of 2 methods worked.
you need limit rate of requests conform server's configuration permits. (web scraper: limit requests per minute/hour on single domain? may show permitted rate)
mechanize
uses heavily-patched version of urllib2
(lib/site-packages/mechanize/_urllib2.py
) network operations, , browser
class descendant of _urllib2_fork.openerdirector
.
so, simplest method patch logic seems add handler
browser
object
- with
default_open
, appropriatehandler_order
place before (lower higher priority). - that stall until request eligible e.g. token bucket or leaky bucket algorithm e.g. implemented in throttling urllib2 . note bucket should per-domain or per-ip.
- and
return none
push request following handlers
since common need, should publish implementation installable package.
Comments
Post a Comment