python - Python3 - ascii/utf-8/iso-8859-1 can't decode byte 0xe5 (Swedish characters) -
i've tried io
, repr()
etc, don't work!
problem inputting å
(\xe5
):
(none of these work)
import sys print(sys.stdin.read(1))
sys.stdin = io.textiowrapper(sys.stdin.detach(), errors='replace', encoding='iso-8859-1', newline='\n') print(sys.stdin.read(1))
x = sys.stdin.buffer.read(1) print(x.decode('utf-8'))
they give me unicodedecodeerror: 'utf-8' codec can't decode byte 0xe5 in position 0: unexpected end of data
also tried starting python with: export pythonioencoding=utf-8
doesn't work either.
now, here's i'm at:
import sys, codecs sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach()) sys.stdin = codecs.getwriter("utf-8")(sys.stdin.detach()) x = sys.stdin.read(1) print(x.decode('utf-8', 'replace'))
this gives me: �
it's close...
how can take \xe5
, turn å
in console? without breaking input()
well, because solution breaks it.
note: know has been asked before, non of solve it.. not io
some info of system
os.environ['lang'] == 'c' sys.getdefaultencoding() == 'utf-8' sys.stdout.encoding == 'ansi_x3.4-1968' sys.stdin.encoding == 'ansi_x3.4-1968'
my os: archlinux
running xterm
running locale -a
gives me: c | posix | sv_se.utf8
i've followed these:
- python 3: how specify stdin encoding
- http://python-notes.curiousefficiency.org/en/latest/python3/binary_protocols.html
- http://wolfprojects.altervista.org/talks/unicode-and-python-3/
- http://getpython3.com/diveintopython3/strings.html
- python 3 - encode/decode vs bytes/str
- how set sys.stdout encoding in python 3?
- http://docs.python.org/3.0/howto/unicode.html
(and few 50 more)
solution (sort of, still breaks input()
)
sys.stdout = codecs.getwriter("latin-1")(sys.stdout.detach()) sys.stdin = codecs.getwriter("latin-1")(sys.stdin.detach()) x = sys.stdin.read(1) print(x.decode('latin-1', 'replace'))
you running in xterm
, not support utf-8 default. run xterm -u8
or use uxterm
fix that.
the other way work around that, use different locale; set locale latin-1 example:
export lang=sv_se.iso-8859-1
but limited 256 codepoints, versus full range (several million) of unicode standard.
note python 2 never decoded input; writing out read terminal fine because raw bytes read interpreted terminal in same locale; reading , writing latin-1 bytes works fine. that's not quite same processing unicode data, however.
Comments
Post a Comment