html - Python 3.4 - reading data from a webpage -
i'm trying learn how read webpage, , have tried following:
>>>import urllib.request >>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = none) >>>contents = page.read() >>>lines = contents.split('\n') this gives following error:
traceback (most recent call last): file "<pyshell#4>", line 1, in <module> lines = contents.split('\n') typeerror: type str doesn't support buffer api now assumed reading url pretty similar reading text file, , contents of contents of type str. not case?
when try >>> contents can see contents of contents html document, why doesn't `.split('\n') work? how can make work?
please note i'm splitting @ newline characters can print webpage line line.
following same train of thought, tried contents.readlines() gave error:
traceback (most recent call last): file "<pyshell#8>", line 1, in <module> contents.readlines() attributeerror: 'bytes' object has no attribute 'readlines' is webpage stored in object called 'bytes'?
can explain me happening here? , how read webpage properly?
you need wrap io.textiowrapper() object , encode file (utf-8 universal can change proper encoding too):
import urllib.request import io u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = none) f = io.textiowrapper(u,encoding='utf-8') text = f.read()
Comments
Post a Comment