写了一个基于 Tornado 的图片爬虫: https://github.com/RealHacker/python-gems/tree/master/image_crawler
成果展示:

只需要两步:
- 在 ini 文件中设置几个选项:
; start url for crawler
starturl = http://pic.kdslife.com/
; regexes for links and image urls
linkregex=http://pic.kdslife.com/content_.*.html
imgregex=http://img.club.pchome.net/.*.jpg
; integer>=1, larger politeness means slower crawling
; but also less likely to be denied service
politeness=3
; the directory to store the downloaded images
imgdir=E:/kds/
; the min size of images that you want to download
minwidth=200
minheight=200
- 执行
python crawler.py http://start-url-to-crawl
然后就等着收获吧!
欢迎报 bug ,提需求。