1
kingxsp 2013-06-10 21:35:37 +08:00
推荐pybloomfiltermmap库。
|
2
binux 2013-06-10 21:53:18 +08:00
import hashlib
hash = hashlib.md5 bloom = 0 def check(str): global bloom str_hash = hash(str) if bloom & int(str_hash.hexdigest(), 16) == 256 ** str_hash.digest_size: return True bloom |= int(str_hash.hexdigest(), 16) return False |
4
xavierskip 2013-06-11 00:25:12 +08:00
过滤重复url
这样行不行? list( set( urls ) ) |