python - Converting incorrectly encoded Chinese characters in MySQL to UTF-8 -

June 15, 2010

i have large mysql table filled chinese characters in incorrect encoding. believe supposed encoded in latin1 (iso-8859-1), can't find way chinese characters contents of database rows.

converting between latin1 , utf8 doesn't - fields remain unchanged. i've tried re-importing database various encodings - same results.

some examples of current contents , should be:

æƒ¨äº‹ should 惨事
ä¸ should 不
æœ€ should 最

i've tried using python try , 'decode' contents, again without success. i've tried various combinations of this:

databasefield.decode('iso-8859-1').encode('utf8')

but can't work either.

sorry asking such vague question, don't know how continue trying figure out!

does know problem here?

you looking @ utf-8 decoded windows codepage 1252 instead:

>>> print u'惨事'.encode('utf8').decode('cp1252') æƒ¨äº‹ >>> print u'最'.encode('utf8').decode('cp1252') æœ€

fixing requires going other way:

>>> print u'æƒ¨äº‹'.encode('cp1252').decode('utf8') 惨事 >>> print u'æœ€'.encode('cp1252').decode('utf8') 最

there may have been loss there though, utf-8 encoding 不 uses codepoint not supported 1252:

>>> u'不'.encode('utf8') '\xe4\xb8\x8d' >>> print u'不'.encode('utf8').decode('cp1252') traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/users/mj/development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp1252.py", line 15, in decode     return codecs.charmap_decode(input,errors,decoding_table) unicodedecodeerror: 'charmap' codec can't decode byte 0x8d in position 2: character maps <undefined>

there several other windows codepage candidates can tried here though; 1254 result in similar output, example, minor differences.

Search This Blog

Detect

python - Converting incorrectly encoded Chinese characters in MySQL to UTF-8 -

Comments

Post a Comment

Popular posts from this blog

assembly - 8086 TASM: Illegal Indexing Mode -

javascript - addthis share facebook and google+ url -

Java, LWJGL, OpenGL 1.1, decoding BufferedImage to Bytebuffer and binding to OpenGL across classes -