1
function007 2015-11-20 21:47:37 +08:00
转个码看看
|
2
haozhang 2015-11-20 21:57:29 +08:00
看一下 http response 的 content-type 首部字段里面的编码是什么。
|
3
haozhang 2015-11-20 21:57:58 +08:00
然后解析的时候设置相同的编码。
|
4
loudis 2015-11-20 22:30:35 +08:00
gzip?
|
5
ericls 2015-11-20 22:51:10 +08:00
用 cchardet 看看编码
|
6
meloncrashed 2015-11-20 23:05:14 +08:00 via iPhone
用 3.4 可能好些。
|
7
xu123456789 OP 'Content-Type': 'text/html; charset=UTF-8'
用这个解码 html.content.decode('utf-8') 报下面错误 UnicodeEncodeError: 'gbk' codec can't encode character u'\u203a' in position 15440: illegal multibyte sequence |
8
xu123456789 OP @haozhang 'Content-Type': 'text/html; charset=UTF-8'
用这个解码 html.content.decode('utf-8') 报下面错误 UnicodeEncodeError: 'gbk' codec can't encode character u'\u203a' in position 15440: illegal multibyte sequence |
9
xu123456789 OP @meloncrashed python3 好多东西不支持
|
10
xu123456789 OP 有人能写个模拟登录 V2EX 不出乱码的代码给我吗?
|
11
xu123456789 OP @ericls 怎么使用
|
12
DEMONHUNTER 2015-11-21 11:14:32 +08:00
# -*- coding:utf8 -*-
import sys import urllib, urllib2 def login(username, password): url = 'http://www.v2ex.com/signin' param = { 'u': username, 'p': password, 'once':48203, 'next':'/' } data = urllib.urlencode(param) headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,zh-TW;q=0.2,ko;q=0.2,ja;q=0.2', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Content-Length': len(data), 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'www.v2ex.com', 'Origin': 'http://www.v2ex.com', 'Referer': 'http://www.v2ex.com/signin', 'Upgrade-Insecure-Requests': 1, 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36' } response = urllib2.urlopen(url, data, timeout=10) the_page = response.read() print the_page if __name__ == '__main__': username = 'xu123456789' password = '***********' login(username, password) |
13
DEMONHUNTER 2015-11-21 11:16:54 +08:00
妈蛋的缩进都乱了。。 sorry ,讲究着看吧,没几行。
|
14
vitovan 2015-11-21 12:09:13 +08:00
|
15
silentsolo 2015-11-23 00:39:10 +08:00 via iPad
嗯
|