python scrapy 爬一个网站的，遇到了中文链接，存入不了数据库，你们是咋样解决的？？、 - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3674 days ago, the information mentioned may be changed or developed.

scrapy 爬一个网站,如:http://xxx.org/,遇到了一个链接是 http://xxx.org/新闻,

发现这个链接，在 scrpay print 话能输出到 shell 里，在 scrapy shell 里用 sel.xpath("//@href")，也能筛选到，但是在入 mysql 的时候，只能存入这个中文链接前的 url 链接，这个是 python2 的 unicode 问题么？

def parse(self, response):
    conn = MySQLdb.connect(host="localhost",user="root",passwd="root",db="url")
    cur = conn.cursor()
    for sel in response.xpath('//@href').extract():
            x = sel
            cur.execute('insert into urlsinfo (url) values(%s)',x)
            conn.commit()
    cur.close()

代码略丑，勿喷。刚学习 scrapy ，你们都是怎么解决这个问题的？

11 replies • 2016-06-03 18:03:32 +08:00

1

phantomer

OP

Jun 3, 2016

入库前用 base64 encode 了也无法存入数据库。

2

fengxiang

Jun 3, 2016

charset=‘ utf8 ’

3

annielong

Jun 3, 2016

或者是 mysql 插入时候有问题，貌似插入中文的时候也要设置编码的

4

besttime

Jun 3, 2016

感觉是数据库的问题， mysql 使用之前默认编码改了吗？没改就肯定了。别怪 python 哦。

5

WangYanjie

Jun 3, 2016

是数据库的问题吧

6

phantomer

OP

Jun 3, 2016

@fengxiang
@annielong
@besttime
@WangYanjie
数据库设置的是 utf-8

7

fengxiang

Jun 3, 2016

1

你要告诉 python 用 utf-8 操作数据库
MySQLdb.connect(host="localhost",user="root",passwd="root",db="url"， charset=‘ utf8 ’)

8

phantomer

OP

Jun 3, 2016

@fengxiang 我试试看。

9

whnzy

Jun 3, 2016

try: except 把错误报出来

10

phantomer

OP

Jun 3, 2016

@fengxiang 感谢帮我解决了问题，结贴了。

11

mactaew

Jun 3, 2016

PHPer 一眼想到的是 urlencode() 和 urldecode() 。。。

About · Help · Advertise · Blog · API · FAQ · Solana · 1233 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 44ms · UTC 17:26 · PVG 01:26 · LAX 10:26 · JFK 13:26
♥ Do have faith in what you're doing.