html - Python 3.4 - reading data from a webpage -
i'm trying learn how read webpage, , have tried following:
>>>import urllib.request >>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = none) >>>contents = page.read() >>>lines = contents.split('\n')
this gives following error:
traceback (most recent call last): file "<pyshell#4>", line 1, in <module> lines = contents.split('\n') typeerror: type str doesn't support buffer api
now assumed reading url pretty similar reading text file, , contents of contents
of type str
. not case?
when try >>> contents
can see contents of contents
html document, why doesn't `.split('\n') work? how can make work?
please note i'm splitting @ newline characters can print webpage line line.
following same train of thought, tried contents.readlines()
gave error:
traceback (most recent call last): file "<pyshell#8>", line 1, in <module> contents.readlines() attributeerror: 'bytes' object has no attribute 'readlines'
is webpage stored in object called 'bytes'?
can explain me happening here? , how read webpage properly?
you need wrap io.textiowrapper()
object , encode file (utf-8
universal can change proper encoding too):
import urllib.request import io u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = none) f = io.textiowrapper(u,encoding='utf-8') text = f.read()
Comments
Post a Comment