迫于机器配置太低,用多进程多线程,一秒钟才处理几百条 uri ,于是想着来用异步协程来试下,看着文档写出了这样一个丑陋的代码,搞了几万条 uri 测试了下,好像也没啥问题,不打印结果到屏幕的话,一秒钟差不多可以处理 1000 条,大概有这么几个步骤:
- 1 、uris.txt 有几千万条 uri ,于是每次读 1000 行,避免内存占用过多
- 2 、利用 Semaphore 来控制并发数量为 100 ,避免 API 端把我给 ban 了
- 3 、复用 session
我现在的困惑是:
- 1 、我这样写,上面 3 点真的有达到目的吗?
- 2 、最后面 if name == 'main'下面那一坨,在 for 循环里面写 asyncio.run()总觉得怪怪的,但是不这样写,又不知道要怎么写
- 3 、弱弱的再问个小白问题,不是说事件循环才是 asyncio 的核心嘛?可我这里面也没用 asyncio.get_event_loop(),为啥也能跑得这么顺畅呢?
- 4 、如果要让代码优雅一点应该怎么修改呢?
耽误大佬周五下午一点点时间,帮忙瞅一眼,不胜感激!
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import asyncio
from asyncio import Semaphore
from aiohttp import ClientSession
from itertools import islice
def get_lines_iterator(filename, n=1000):
with open(filename) as fp:
while True:
lines = list(islice(fp, n))
if lines:
yield lines
else:
break
async def delete_file(uri: str,
session: ClientSession,
sem: Semaphore) -> int:
headers = {'Authorization': 'xxxxxxxxxxx'}
url = api + uri
async with sem, session.delete(url, headers=headers) as response:
return uri, response.status
# 写法 1:
# async def main(uris):
# sem = Semaphore(100)
# async with ClientSession() as session:
# tasks = [delete_file(uri, session, sem) for uri in uris]
# await asyncio.gather(*tasks)
# 写法 2:
async def main(uris):
sem = Semaphore(100)
async with ClientSession() as session:
async with asyncio.TaskGroup() as group:
result = [group.create_task(delete_file(
uri, session, sem)) for uri in uris]
return result
if __name__ == '__main__':
for lines in get_lines_iterator("uris.txt"):
uris = [uri.strip() for uri in lines]
result = asyncio.run(main(uris))
for x in result:
print(x.result())