V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
wsds
V2EX  ›  Python

爬虫工作 4~5 个小时,就报错了,不明白什么原因导致的,帮忙看一下

  •  
  •   wsds · 2018-06-13 11:34:39 +08:00 · 7816 次点击
    这是一个创建于 2340 天前的主题,其中的信息可能已经有所发展或是发生改变。

    报错很长,但看上去大概是这个原因:socket.gaierror: [Errno -3] Temporary failure in name resolution

    阿里云上运行的

    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
        for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
      File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
        body=body, headers=headers)
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/usr/lib/python3.5/http/client.py", line 1106, in request
        self._send_request(method, url, body, headers)
      File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
        self.endheaders(body)
      File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
        self._send_output(message_body)
      File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
        self.send(msg)
      File "/usr/lib/python3.5/http/client.py", line 877, in send
        self.connect()
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
        conn = self._new_conn()
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
        self, "Failed to establish a new connection: %s" % e)
    requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
        timeout=timeout
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
        _stacktrace=sys.exc_info()[2])
      File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "getimg.py", line 102, in <module>
        GetImg().getdata()
      File "getimg.py", line 76, in getdata
        base_url + j['href'], headers=self.headers)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in get
        return self.request('GET', url, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
        r = adapter.send(request, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
    
    21 条回复    2018-06-14 17:07:44 +08:00
    golmic
        1
    golmic  
       2018-06-13 11:39:12 +08:00 via Android
    是所有的都报错还是偶尔有报错? 像是触发反爬
    wsds
        2
    wsds  
    OP
       2018-06-13 11:43:14 +08:00
    @golmic 基本爬几个小时就会报这个错
    wsds
        3
    wsds  
    OP
       2018-06-13 11:43:29 +08:00
    @golmic 才爬了 1 万张不到
    lululau
        4
    lululau  
       2018-06-13 11:44:04 +08:00
    像是域名解析偶发抽风
    xxxy
        5
    xxxy  
       2018-06-13 11:48:58 +08:00
    dns 也有频率限制的
    golmic
        6
    golmic  
       2018-06-13 11:49:55 +08:00
    @lululau #4 解析出错会报 DNS 错误吧

    大量报错就处理一下反爬,偶尔报的话重试就行
    Cooky
        7
    Cooky  
       2018-06-13 11:52:52 +08:00
    换个好点的 dns ?
    lerry
        8
    lerry  
       2018-06-13 11:54:59 +08:00
    本地装个 dnsmasq 配置成系统默认 DNS, 可以改善 dns 查询
    baday
        9
    baday  
       2018-06-13 11:57:31 +08:00
    请求头 connection 设置为 close 试试
    wsds
        10
    wsds  
    OP
       2018-06-13 11:58:49 +08:00
    @lululau 网上查了些,说是这么回事
    wsds
        11
    wsds  
    OP
       2018-06-13 11:59:10 +08:00
    @Cooky 好点的是哪种?
    wsds
        12
    wsds  
    OP
       2018-06-13 11:59:20 +08:00
    @lerry 这是阿里云上
    ihancheng
        13
    ihancheng  
       2018-06-13 12:01:41 +08:00 via Android
    不想吐槽套路云了,正在学 python 爬虫,我用腾讯云就没问题,阿里云抛异常死活解决不了…… 不知道是不是自己的问题,但是我在网上找了方法还是无法解决。
    owenliang
        14
    owenliang  
       2018-06-13 12:03:09 +08:00 via Android
    异常是可以捕获的
    wsds
        15
    wsds  
    OP
       2018-06-13 12:07:28 +08:00
    @owenliang 这个已经是捕获后又抛出的了,你没看到 n 个 another exception occurred
    Cooky
        16
    Cooky  
       2018-06-13 12:38:02 +08:00
    @wsds 阿里云不能装 dnsmasq ?
    hicdn
        17
    hicdn  
       2018-06-13 13:27:26 +08:00
    DNS 解析问题。如果爬的是几个固定域名,改 hosts 文件。
    dapengzhao
        18
    dapengzhao  
       2018-06-13 15:36:25 +08:00
    我的爬虫运行一段时间也会报这个错我的解决方法时如果 ip 不被封就捕获这个异常睡一会然后在 while true 下 break 结束此次循环重新开始。
    gamecreating
        19
    gamecreating  
       2018-06-13 15:40:41 +08:00
    异常 捕获一下 处理吧...

    爬虫本来就不能保证全部连接成功 爬取成功
    JCZ2MkKb5S8ZX9pq
        20
    JCZ2MkKb5S8ZX9pq  
       2018-06-13 20:48:43 +08:00
    自己写个 request,把 requests 包进去,常用的异常处理重试随机 ua 自动代理等等的都包进去,一劳永逸。
    beforeuwait
        21
    beforeuwait  
       2018-06-14 17:07:44 +08:00
    Failed to establish a new connection
    遇到这种问题,写个 try except,报错休息 20 秒,再请求
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2532 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 01:23 · PVG 09:23 · LAX 17:23 · JFK 20:23
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.