python - Unable to decode HTML page with urllib.request -

- May 15, 2015

i've wrote following piece of code searches url , saves html text file. however, have 2 issues

most importantly, not save € , £ in html this. decoding issue i've tried fix, far without success
the following code not replace "\n" in html "". isn't important me, curious why not working

any ideas?

import urllib.request  while true: # infinite loop     urllib.request.urlopen('website_url') f:         fdecoded = f.read().decode('utf-8')         data = str(fdecoded .read()).replace('\n', '') # not seem work?      myfile = open("testfile.txt", "r+")     myfile.write(data)     print ('----------------')

when -

fdecoded = f.read().decode('utf-8')

fdecoded of type str , reading byte string request , decoding str using utf-8 encoding.

then after cannot call -

str(fdecoded .read()).replace('\n', '')

str has no method read() , not need convert str again. -

data = fdecoded.replace('\n', '')

Search This Blog

Core code

python - Unable to decode HTML page with urllib.request -

Comments

Post a Comment

Popular posts from this blog

php - Admin SDK -- get information about the group -

Python Error - TypeError: input expected at most 1 arguments, got 3 -

qt - Passing a QObject to an Script function with QJSEngine? -