python - Python3 - ascii/utf-8/iso-8859-1 can't decode byte 0xe5 (Swedish characters) -
i've tried io, repr() etc, don't work!
problem inputting å (\xe5):
(none of these work)
import sys print(sys.stdin.read(1)) sys.stdin = io.textiowrapper(sys.stdin.detach(), errors='replace', encoding='iso-8859-1', newline='\n') print(sys.stdin.read(1)) x = sys.stdin.buffer.read(1) print(x.decode('utf-8')) they give me unicodedecodeerror: 'utf-8' codec can't decode byte 0xe5 in position 0: unexpected end of data
also tried starting python with: export pythonioencoding=utf-8 doesn't work either.
now, here's i'm at:
import sys, codecs sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach()) sys.stdin = codecs.getwriter("utf-8")(sys.stdin.detach()) x = sys.stdin.read(1) print(x.decode('utf-8', 'replace')) this gives me: �
it's close...
how can take \xe5 , turn å in console? without breaking input() well, because solution breaks it.
note: know has been asked before, non of solve it.. not io
some info of system
os.environ['lang'] == 'c' sys.getdefaultencoding() == 'utf-8' sys.stdout.encoding == 'ansi_x3.4-1968' sys.stdin.encoding == 'ansi_x3.4-1968' my os: archlinux running xterm
running locale -a gives me: c | posix | sv_se.utf8
i've followed these:
- python 3: how specify stdin encoding
- http://python-notes.curiousefficiency.org/en/latest/python3/binary_protocols.html
- http://wolfprojects.altervista.org/talks/unicode-and-python-3/
- http://getpython3.com/diveintopython3/strings.html
- python 3 - encode/decode vs bytes/str
- how set sys.stdout encoding in python 3?
- http://docs.python.org/3.0/howto/unicode.html
(and few 50 more)
solution (sort of, still breaks input())
sys.stdout = codecs.getwriter("latin-1")(sys.stdout.detach()) sys.stdin = codecs.getwriter("latin-1")(sys.stdin.detach()) x = sys.stdin.read(1) print(x.decode('latin-1', 'replace'))
you running in xterm, not support utf-8 default. run xterm -u8 or use uxterm fix that.
the other way work around that, use different locale; set locale latin-1 example:
export lang=sv_se.iso-8859-1 but limited 256 codepoints, versus full range (several million) of unicode standard.
note python 2 never decoded input; writing out read terminal fine because raw bytes read interpreted terminal in same locale; reading , writing latin-1 bytes works fine. that's not quite same processing unicode data, however.
Comments
Post a Comment