python - Unable to decode HTML page with urllib.request -


i've wrote following piece of code searches url , saves html text file. however, have 2 issues

  1. most importantly, not save € , £ in html this. decoding issue i've tried fix, far without success
  2. the following code not replace "\n" in html "". isn't important me, curious why not working

any ideas?

import urllib.request  while true: # infinite loop     urllib.request.urlopen('website_url') f:         fdecoded = f.read().decode('utf-8')         data = str(fdecoded .read()).replace('\n', '') # not seem work?      myfile = open("testfile.txt", "r+")     myfile.write(data)     print ('----------------') 

when -

fdecoded = f.read().decode('utf-8') 

fdecoded of type str , reading byte string request , decoding str using utf-8 encoding.

then after cannot call -

str(fdecoded .read()).replace('\n', '') 

str has no method read() , not need convert str again. -

data = fdecoded.replace('\n', '') 

Comments

Popular posts from this blog

dns - How To Use Custom Nameserver On Free Cloudflare? -

python - Pygame screen.blit not working -

c# - Web API response xml language -