python - Unable to decode HTML page with urllib.request -
i've wrote following piece of code searches url , saves html text file. however, have 2 issues
- most importantly, not save € , £ in html this. decoding issue i've tried fix, far without success
- the following code not replace "\n" in html "". isn't important me, curious why not working
any ideas?
import urllib.request while true: # infinite loop urllib.request.urlopen('website_url') f: fdecoded = f.read().decode('utf-8') data = str(fdecoded .read()).replace('\n', '') # not seem work? myfile = open("testfile.txt", "r+") myfile.write(data) print ('----------------')
when -
fdecoded = f.read().decode('utf-8')
fdecoded
of type str
, reading byte string request , decoding str
using utf-8
encoding.
then after cannot call -
str(fdecoded .read()).replace('\n', '')
str
has no method read()
, not need convert str again. -
data = fdecoded.replace('\n', '')
Comments
Post a Comment