Python mechanize returns HTTP 429 error -


i trying automated task via python through mechanize module:

  1. enter keyword in web form, submit form.
  2. look specific element in response.

this works one-time. now, repeat task list of keywords.

and getting http error 429 (too many requests).

i tried following workaround this:

  1. adding custom headers (i noted them down website using proxy ) looks legit browser request .

    br=mechanize.browser() br.addheaders = [('user-agent', 'mozilla/5.0 (windows nt 6.1) applewebkit/537.36 (khtml, gecko) chrome/41.0.2228.0 safari/537.36')] br.addheaders = [('connection', 'keep-alive')] br.addheaders = [('accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')] br.addheaders = [('upgrade-insecure-requests','1')] br.addheaders = [('accept-encoding',' gzip, deflate, sdch')] br.addheaders = [('accept-language','en-us,en;q=0.8')]` 
  2. since blocked response coming every 5th request , tried sleeping 20 sec after 5 requests .

neither of 2 methods worked.

you need limit rate of requests conform server's configuration permits. (web scraper: limit requests per minute/hour on single domain? may show permitted rate)

mechanize uses heavily-patched version of urllib2 (lib/site-packages/mechanize/_urllib2.py) network operations, , browser class descendant of _urllib2_fork.openerdirector.

so, simplest method patch logic seems add handler browser object

  • with default_open , appropriate handler_order place before (lower higher priority).
  • that stall until request eligible e.g. token bucket or leaky bucket algorithm e.g. implemented in throttling urllib2 . note bucket should per-domain or per-ip.
  • and return none push request following handlers

since common need, should publish implementation installable package.


Comments

Popular posts from this blog

php - Admin SDK -- get information about the group -

dns - How To Use Custom Nameserver On Free Cloudflare? -

Python Error - TypeError: input expected at most 1 arguments, got 3 -