python - What is the difference between UTF8-in literal and unicode point? -
i came cross website show unicode table.
when print letter 'ספר':
>>> x = 'ספר' >>> x '\xd7\xa1\xd7\xa4\xd7\xa8' i characters '\xd7\xa1\xd7\xa4\xd7\xa8'.
i think python encode word 'ספר' utf-8 unicode, because it's default, right?
but when run code:
>>> x = u'ספר' >>> x u'\u05e1\u05e4\u05e8' i u'\u05e1\u05e4\u05e8', unicode point, right?
how convert utf8-literal unicode point?
@in first sample created byte string (type str). terminal determined encoding (utf-8 in case).
in second sample, created unicode string (type unicode). python auto-detected encoding terminal uses (from sys.stdin.encoding) , decoded bytes utf-8 unicode code points.
you can make same conversion byte string unicode string decoding:
unicode_x = bytestring_x.decode('utf8') to go other direction, need encode:
bytestring_x = unicode_x.encode('utf8') you specified literals using actual utf-8 bytes characters; works fine in terminal not in python source code; python 2 source code loaded ascii text only. can change setting source code encoding declaration. specified in pep 263; has first or second line in source file. example:
# encoding: utf-8 or can stick \uhhhh , \xhh escape sequences represent non-ascii characters.
you want read difference between unicode , encoded (binary) byte strings, , how relates python:
Comments
Post a Comment