我正在写一个爬虫（gevent+requests+redis-py），出现了一些问题，看看各位有啥好的解决方案没？ - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 4414 days ago, the information mentioned may be changed or developed.

我的爬虫大致思想的是这样的，我想爬取某些列表页面上所有的列表url，有很多页，我遍历这些页面，然后抓去这些页面上的内容，当某个页面请求出错的时候，我就将它保存到一个数据库，下次从这个数据库里面把错误的取出来，然后再处理，这样一直循环，直到所有的都被处理完。不多说了，直接代码吧（更详细的问题描述见代码的注释）：

不知道各位对我这段代码有什么看法，或者吐槽也行，自己找了一些相关资料，成效不大。

10 replies • 2014-06-05 15:07:59 +08:00

1

jander

Jun 5, 2014

应该加上
from gevent import monkey; monkey.patch_socket()

2

penkchow

OP

Jun 5, 2014

哦，我加了monkey.patch_all()就包括了吧，官网文档有说明： http://www.gevent.org/gevent.monkey.html
@jander

3

jander

Jun 5, 2014

哦，没看仔细。
redis连接异常。你的代码使用redis.ConnectionPool, 其实redis可以直接连，内部已经使用pool实现：
redis.StrictRedis(host='localhost', port=6379, db=0)
你可以直接连试试。

4

jsonline

Jun 5, 2014

每个月都有一个人来问爬虫的问题。

5

penkchow

OP

Jun 5, 2014

1

@jander 额，连接池不是比直接连并发性更好么？

6

penkchow

OP

Jun 5, 2014

@jsonline 每个月总有那么几天……有人问爬虫……

7

jander

Jun 5, 2014

@penkchow 已经有缺省的连接池，不用你操心。

8

penkchow

OP

Jun 5, 2014

@jander 你的意思是因为我用了这个连接池才会出问题的

9

jander

Jun 5, 2014

1

@penkchow 我的意思：你的redis连接有问题，试试不同的方案。

10

penkchow

OP

Jun 5, 2014

Okay，试试。

About · Help · Advertise · Blog · API · FAQ · Solana · 960 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 32ms · UTC 21:56 · PVG 05:56 · LAX 14:56 · JFK 17:56
♥ Do have faith in what you're doing.