Scrapy 下 Pipeline.py 定义 MongoDBPipeline 的疑问

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 3188 天前的主题，其中的信息可能已经有所发展或是发生改变。

http://doc.scrapy.org/en/latest/topics/item-pipeline.html?highlight=from_crawler#from_crawler

import pymongo

class MongoDBPipeline(object):
collection_name = 'test_items'

def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db

@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
)

def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]

def close_spider(self, spider):
self.client.close()

def process_item(self, item, spider):
self.db[self.collection_name].insert(dict(item))
return item

有几个疑问
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')

a ） crawler.settings.get
get 在 crawler.settings 的文档如下:
get(name, default=None)
Get a setting value without affecting its original type.

Parameters:
name (string) – the setting name
default (any) – the value to return if no setting is found

mongo_db 这一项为什么要获取 MONGO_DATABASE 和 items 两项数据？不是直接得到 mongo_db 的名字就行么？

b ）这句里面的'items'在实际应用里面是不是要定义成自己在 Items.py 里面定义的类名，这里的 items 起什么作用？

整条语句还是不太看得懂。现在 show dbs 发现里面没有数据，不知道 MongDB 在 pipeline 要怎么写，希望有大神能帮忙解释一下，谢谢！

1 条回复 • 2016-02-29 22:43:21 +08:00

KittySYSU

2016-02-29 22:43:21 +08:00

代码为 scrapy 官方给的代码