html - Python 3.4 - reading data from a webpage -


i'm trying learn how read webpage, , have tried following:

>>>import urllib.request >>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = none) >>>contents = page.read() >>>lines = contents.split('\n') 

this gives following error:

traceback (most recent call last):   file "<pyshell#4>", line 1, in <module>     lines = contents.split('\n') typeerror: type str doesn't support buffer api 

now assumed reading url pretty similar reading text file, , contents of contents of type str. not case?

when try >>> contents can see contents of contents html document, why doesn't `.split('\n') work? how can make work?

please note i'm splitting @ newline characters can print webpage line line.

following same train of thought, tried contents.readlines() gave error:

traceback (most recent call last):   file "<pyshell#8>", line 1, in <module>     contents.readlines() attributeerror: 'bytes' object has no attribute 'readlines' 

is webpage stored in object called 'bytes'?

can explain me happening here? , how read webpage properly?

you need wrap io.textiowrapper() object , encode file (utf-8 universal can change proper encoding too):

import urllib.request import io u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = none) f = io.textiowrapper(u,encoding='utf-8') text = f.read() 

Comments

Popular posts from this blog

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -