被 Python 编码搞蒙逼

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

For Existing Member Sign In

• 请不要在回答技术问题时复制粘贴 AI 生成的内容

This topic created in 3073 days ago, the information mentioned may be changed or developed.

类似于这种的： UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 1: ordinal not in range(128)

gbk,utf-8,ascii 搞不转

蒙逼

xcb

ASCII

decode

21 replies • 2017-12-08 10:21:31 +08:00

cls1991

Dec 7, 2017

贴一下代码

leavic

Dec 7, 2017

换 python3

p2pCoder

Dec 7, 2017

2019 年都要到了，直接 python3

regicide

Dec 7, 2017

import sys
reload(sys)
sys.setdefaultencoding('utf8')
这个试过没如果这个不行基本上可以换车上 python3 了

marcong95

Dec 7, 2017 via Android

@p2pCoder 2018 都没到，亲你是穿越了？

p2pCoder

Dec 7, 2017

@marcong95 。。。。中午没睡，今天下午有点飘

livexia

Dec 7, 2017 via Android

爬虫吧，得先识别原编码方式

Shura

Dec 7, 2017

7102 年了，换 Python3 吧

lhx2008

Dec 7, 2017 via Android

文件头标明文件编码
用 decode encode 文本变量前面加个 u

johnsonqrr

Dec 7, 2017

PY3，请

DongDongXie

Dec 7, 2017

@cls1991 装的是 anaconda2.7，环境变量也配置了，就想用个 pip list，结果就给我报错，D:\Anaconda2\Lib\ntpath.py 87 行报错，result_path = result_path + p_path 就这里。然后加了个“ reload(sys)
sys.setdefaultencoding('gbk')”就正常了

·# Join two (or more) paths.
def join(path, *paths):
reload(sys)
sys.setdefaultencoding('gbk')
"""Join two or more pathname components, inserting "\\" as needed."""
result_drive, result_path = splitdrive(path)
for p in paths:
p_drive, p_path = splitdrive(p)
if p_path and p_path[0] in '\\/':
# Second path is absolute
if p_drive or not result_drive:
result_drive = p_drive
result_path = p_path
continue
elif p_drive and p_drive != result_drive:
if p_drive.lower() != result_drive.lower():
# Different drives => ignore the first path entirely
result_drive = p_drive
result_path = p_path
continue
# Same drive in different case
result_drive = p_drive
# Second path is relative to the first
if result_path and result_path[-1] not in '\\/':
result_path = result_path + '\\'
result_path = result_path + p_path
## add separator between UNC and non-absolute path
if (result_path and result_path[0] not in '\\/' and
result_drive and result_drive[-1:] != ':'):
、

DongDongXie

Dec 7, 2017

感觉新手很容易如不同编码方式的坑

ltux

Dec 7, 2017

蒙屄就去学习

wolong

Dec 7, 2017

我在 windows 下命令行里运行 py，也出现过这种情况。
换成直接双击文件运行就好了。

maidou931019

Dec 7, 2017

在 python2 中 str 存的是 bytes 数据，unicode 存的是 unicdoe 编码后的二进制数据，
在 python3 中 str 存的是 unicode 数据，bytes 存的是 bytes 数据

在 python2 中混淆了 bytes 和 unicode 数据，u'hello' + 'hi' 不会报错，结果为一个 unicode 数据
而在 python3 中严格区分了 unicode 和 bytes 数据，字节和字符类型，再混用直接报错，'hello' + b'hi' 不能相加会报错

justou

Dec 7, 2017

纠结编码问题不要局限于 py2py3 了, 要系统的了解下字符串在计算机中的表示方式以及编码原理, 清楚了原理再结合具体语言到具体的环境去实践并加深理解, 不然即使熟悉了 python 处理编码的方式, 换了个环境又搞蒙了. 不搞清楚原理怎么治都只是治标不治本.
给出一些原理性的参考资料:
Computer Systems A Programmer ’ s Perspective: Chapter2, Representing and Manipulating
Information
http://unicodebook.readthedocs.io/