宅男福利 - 一个图片爬虫，基于 Tornado 协程 - V2EX

Home Sign Up Sign In

爱意满满的作品展示区。

This topic created in 3956 days ago, the information mentioned may be changed or developed.

写了一个基于 Tornado 的图片爬虫： https://github.com/RealHacker/python-gems/tree/master/image_crawler

成果展示：

只需要两步：
- 在 ini 文件中设置几个选项：

; start url for crawler
starturl  = http://pic.kdslife.com/

; regexes for links and image urls
linkregex=http://pic.kdslife.com/content_.*.html
imgregex=http://img.club.pchome.net/.*.jpg

; integer>=1, larger politeness means slower crawling
; but also less likely to be denied service
politeness=3

; the directory to store the downloaded images
imgdir=E:/kds/

; the min size of images that you want to download
minwidth=200
minheight=200

执行python crawler.py http://start-url-to-crawl

然后就等着收获吧！

欢迎报 bug ，提需求。

17 replies • 2015-09-23 22:34:52 +08:00

1

Tink

Sep 21, 2015

原来大家都好这口

2

wangleineo

OP

Sep 21, 2015

@Tink 人家只是研究爬虫，图片看也不看全删掉的：）

3

Tink

Sep 22, 2015

@wangleineo 大家都懂 233

4

kchum

Sep 22, 2015 via iPad

先收藏 😁

5

veau

Sep 22, 2015

原来大家都好这口

6

vietor

Sep 22, 2015 via Android

加数据库支持，关键字搜索， Web 预览，才行

7

radio777

Sep 22, 2015

硬盘不够大啊

8

alohathomas

Sep 22, 2015

小白表示不知道怎么用。

9

nisnaker

Sep 22, 2015

新手表示也想练手，请各位不吝推荐图片站~~
@all

10

nisnaker

Sep 22, 2015

我靠， v2 真有人叫 all

11

jamesfuxk

Sep 22, 2015

请问下，你是针对什么网站爬的？

12

zkzipoo

Sep 22, 2015

1.登陆模块？
2.命名规则？

13

zhajming

Sep 22, 2015

http://pic.kdslife.com/ ？？

14

onlyxuyang

Sep 22, 2015 via Android

@zhajming 有水印非高清 …… 差评…… 不抓……

15

wangleineo

OP

Sep 22, 2015

@vietor 额 scrapy 好像都没有这么多功能吧
@jamesfuxk 图片站
@zkzipoo 命名规则现在就是简单的 4 位数字
@zhajming @onlyxuyang 只是拿这个站做个例子，改配置就可以爬别的网站。

16

scenix

Sep 23, 2015

哈哈你这是把全站都爬下来的节奏啊。

我闲着没事写过一个从 1024 指定帖子页爬图转成 PDF 的，众所周知的原因还支持 socks5 代理。

https://github.com/scenix007/1024toPDF

17

gaocegege

Sep 23, 2015

咋不用 scrapy 啥的呢~

About · Help · Advertise · Blog · API · FAQ · Solana · 3484 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 74ms · UTC 05:01 · PVG 13:01 · LAX 22:01 · JFK 01:01
♥ Do have faith in what you're doing.