python 中十六进制如何转中文？？

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests,urllib2
from bs4 import BeautifulSoup
import sys

def get_img(url):
wb = requests.get(url)
wb.encoding = "utf-8"
soup = BeautifulSoup(wb.text,'lxml')
img_url = soup.select('#Zoom img')
if img_url == []:
return False
else:
return img_url[0].get('src')

def get_title(url):
wb = requests.get(url)
wb.encoding = "utf-8"
soup = BeautifulSoup(wb.text,'lxml')
title = soup.select("a[href='#']")
return title

def get_url(Url):
wb = requests.get(Url)
wb.encoding = 'gb2312'
soup = BeautifulSoup(wb.text,'lxml')
title = soup.select('.ulink')
url = soup.select('.ulink')
titles_urls = []
for x,y in zip(title,url):
reload(sys)
sys.setdefaultencoding('utf-8')
data = {
'title': x.get_text().split("《")[1].split("/")[0].split("》")[0],
'url':y.get('href'),
}

titles_urls.append(data)
return titles_urls

for z in range(1,100):
url = 'http://www.ygdy8.net/html/gndy/dyzz/list_23_%d.html' %z
for x in get_url(url):
u = get_img("http://www.ygdy8.net"+str(x['url']))
if u != False:
print u
print x['title']
y = str(x['title'])
with open('imgs/'+str(y)+'.jpg', "wb") as f:
f.write(requests.get(u).content)
print "第%d 页" %z

运行结果：

https://ws3.sinaimg.cn/mw690/81298caagw1f7mwfo0otfj21kw29ku0x.jpg
琼斯的自由国度
Traceback (most recent call last): with open('imgs/'+str(y)+'.jpg', "wb") as f:
IOError: [Errno 2] No such file or directory: 'imgs/\xe7\x90\xbc\xe6\x96\xaf\xe7\x9a\x84\xe8\x87\xaa\xe7\x94\xb1\xe5\x9b\xbd\xe5\xba\xa6.jpg'
[Finished in 0.7s with exit code 1]

URL

lect

Text

return

12 replies • 2016-10-05 12:41:55 +08:00

codepurple

Sep 14, 2016

不是中文转换问题，是 imgs 目录没有创建报的错误

Arnie97

Sep 14, 2016 via Android

reload(sys); sys.setdefaultencoding('utf-8') 差评

NLL

Sep 14, 2016

@Arnie97 -。-# 这个不要么？

qqmishi

Sep 14, 2016 via Android

和中文无关，报错是没有这个文件或目录，应该是你目录没建立或者路径不对

NLL

Sep 14, 2016

@qqmishi
Traceback (most recent call last):

琼斯的自由国度

File "C:\Users\123\Desktop\dianyingtiant.py", line 51, in <module>
with open('imgs/'+str(y)+'.jpg', "wb") as f:
IOError: [Errno 22] invalid mode ('wb') or filename: 'imgs/\xe7\x90\xbc\xe6\x96\xaf\xe7\x9a\x84\xe8\x87\xaa\xe7\x94\xb1\xe5\x9b\xbd\xe5\xba\xa6.jpg'
[Finished in 0.7s with exit code 1]

qqmishi

Sep 14, 2016

@zhijiansha 我在 Ubuntu 下就执行成功了，，，应该是 windows 系统本身的锅。

qqmishi

Sep 14, 2016

@zhijiansha 反应过来了，，，你这是从例子代码里改出来的吧， windows 和 linux 的分隔符是反的。 http://blog.csdn.net/kazeik/article/details/8742953 ，可以参考一下。

Arthur2e5

Sep 15, 2016

@qqmishi 不是分隔符的问题。观察 \x... 那段序列和 get_url 的处理可知这玩意是 UTF-8 ，在 Windows 下默认对 str 用 ANSI API 当然会抓瞎。

解决方式很简单，不要用编码逻辑混乱的 python2 str ，要用 py2 也给我去用 unicode 。

qqmishi

Sep 16, 2016

@Arthur2e5 你是对的，改成 with open('imgs/'+unicode(y).encode('gbk')+'.jpg', "wb") as f:在 windows 下可以执行了

Arthur2e5

Sep 27, 2016

@qqmishi 我求求你了真的不要 encode gbk ，硬要用 py2 就好好用 unicode 数据类型行不行？

你用 gbk 对付 cp936 ANSI API 是吧，我一个欧元符号就可以把你搞死。
更不要说非中文版 Windows 了。

NLL

Sep 30, 2016

@Arthur2e5 那最合适的处理方式应该是？？

Arthur2e5

Oct 5, 2016

@zhijiansha 换成 python3 立地成佛，或者 python2 去乖乖用 unicode 数据类型。我感觉我说了很多遍了啊。