python3 向命令行输出utf8的问题

test.py
file='u8.html' # utf8编码的文件
text=open(file).read()

命令行运行 python3.exe test.py
提示

UnicodeDecodeError: 'gbk' codec can't decode bytes in position 2-3: illegal multibyte sequenc

修改为text=open(file, 'r', encoding='utf-8').read() 不报错了
print(text) 再输出一下
提示变成了
UnicodeEncodeError: 'gbk' codec can't encode character '\ufeff' in position 0: illegal multibyte sequence

怎么样在console输出utf8编码的文件的内容呢？

UTF8

Text

file

23 条回复 • 2017-02-24 22:22:32 +08:00

yakczh

2014 年 2 月 3 日

貌似出在文件内容上

# -*- encoding=utf-8 -*-
text='中文524μg/m³'
print(text)

这样直接硬编码到程序里，一样报错

sillyousu

2014 年 2 月 3 日

print(text.encode('utf8')) 试试

yakczh

2014 年 2 月 3 日

@sillyousu

# -*- encoding=utf-8 -*-
text="中文524μg/m³"
print(text.encode('utf8'))

c:\python32\python.exe test.py
运行结果
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xd6 in position 0: invalid continuation byte

yakczh

2014 年 2 月 3 日

应该是 b'\xe4\xb8\xad\xe6\x96\x87524\xce\xbcg/m\xc2\xb3' 刚才是错了

ritksm

2014 年 2 月 3 日

http://docs.python.org/3/library/functions.html#open

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. See the codecs module for the list of supported encodings.

于是你的系统的编码是GBK的吧（Windows?）
所以open(file, encoding='utf8').read() 试试

yakczh

2014 年 2 月 3 日

问题出在那个立方的3上，把最后的3删除就是正常了

ritksm

2014 年 2 月 3 日

oops...又没认真看题...忽略我

yakczh

2014 年 2 月 3 日

测试了一下，java代码和浏览器都能正常输出 text="中文524μg/m³"; python不愧是个蛋疼的语言，很多时间就浪费在这些坑坑洼洼里

phyng

2014 年 2 月 3 日

@yakczh 你大概用的是 Windows 的命令提示符，Windows 的命令提示符是用GBK编码输出的，无解。
Python2中你可以这样强制忽略错误用GBK编码，将会输出忽略错误的结果：
# -*- encoding=utf-8 -*-
text='中文524μg/m³'.decode('utf-8').encode('gbk', 'ignore')
print text
输出：
中文524μg/m

Python3的没有str.decode了，所以:
# -*- encoding=utf-8 -*-
text='中文524μg/m³'.encode('gbk', 'ignore').decode('gbk')
print(text)
输出：
中文524μg/m

sillyousu

2014 年 2 月 3 日

@yakczh 手头上没有python3, 不过Python2用可以正常输出 ³

yakczh

2014 年 2 月 4 日

@phyng windows控制台是可以用utf8的运行一下chcp 65001
M3.java
class M3
{
public static void main(String[] args)
{
String text="中文524μg/m³";
System.out.print(text);
System.out.println("Hello World!");
}
}

java M3
中文524μg/m³g/m³Hello World!

python32\python.exe m3.py
提示
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

ritksm

2014 年 2 月 4 日

以下在Unix环境下测试通过

http://stackoverflow.com/questions/3597480/how-to-make-python-3-print-utf8 第一种或者第二种方法都是可以的

同时还有一种是这样运行 PYTHONIOENCODING='utf8' python xxx.py

python3这个地方好像变得复杂了- -

yakczh

2014 年 2 月 4 日

@phyng 这是个抓取的html文件，我要调试程序看输出结果就得在每个print的地方加上 encode('gbk', 'ignore').decode('gbk')
调试好了，就得把这些垃圾删除掉，这还不如杀了我来得痛快

yakczh

2014 年 2 月 4 日

@ritksm
import sys
sys.stdout.buffer.write(text)

提示 TypeError: 'str' does not support the buffer interface

另外两种方法+chcp 65001 都能正常输出

phyng

2014 年 2 月 4 日 via Android

@yakczh 感谢指出，后来找到了这个文章 http://apoo.bokee.com/7028948.html

iambear

2014 年 2 月 4 日

我也遇到过这个问题，我的解决方案是：
用IDLE打开脚本，然后F5运行，这样就不会有编码问题了。

一般如果调试脚本需要查看输出的时候我就用IDLE，等到实际运行的时候，直接记log，不输出。

yakczh

2014 年 2 月 4 日

@iambear PyScripter 这个也可以，貌似只有用python写的编辑器输出是正常的，常规的编辑器象notepad++都有问题

9hills

2014 年 2 月 4 日

请不要在windows平台使用Python

tywtyw2002

2014 年 2 月 4 日

在windows下就不要用cmd去运行了，cmd整个就是一个坑。

est

2014 年 2 月 4 日 via Android

这个不是python坑而是windows的cmd坑。。。。

yakczh

2014 年 2 月 4 日

@est 同样的字符中在java下输出是正常的字符串在内存中与python3一样也是unicode,一样生成字节码

yakczh

2014 年 2 月 4 日

一样宣称跨平台

Vonex

2017 年 2 月 24 日

Tools -> Build System 里 Python3.sublime-build
加个配置
"env": {"LANG": "en_US.UTF-8"}

```
{
"cmd": ["/usr/local/bin/python3","-u","$file"],
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python",
"encoding": "utf-8",
"env": {"LANG": "en_US.UTF-8"}

}
```