unicode - How can I handle mal-encoded character with Python 2? -

the html file fetching has characters not supported encoding specified in html header:

i found following ones not supported shift_jis encoding used. browser can correctly show characters.

when try read html file , decode processing, unicodedecodeerror.

url = 'http://matsucon.net/material/dic/kao09.html' response = urllib2.urlopen(url) response.read().decode('shift_jis_2004')

any way process html has mal-encoded characters without getting error?

try this:

response.read().decode('shift_jis_2004',errors='ignore')

GCM