推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
wsds
V2EX  ›  Python

爬虫工作 4~5 个小时,就报错了,不明白什么原因导致的,帮忙看一下

  •  
  •   wsds · Jun 13, 2018 · 8828 views
    This topic created in 2912 days ago, the information mentioned may be changed or developed.

    报错很长,但看上去大概是这个原因:socket.gaierror: [Errno -3] Temporary failure in name resolution

    阿里云上运行的

    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
        for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
      File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
        body=body, headers=headers)
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/usr/lib/python3.5/http/client.py", line 1106, in request
        self._send_request(method, url, body, headers)
      File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
        self.endheaders(body)
      File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
        self._send_output(message_body)
      File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
        self.send(msg)
      File "/usr/lib/python3.5/http/client.py", line 877, in send
        self.connect()
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
        conn = self._new_conn()
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
        self, "Failed to establish a new connection: %s" % e)
    requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
        timeout=timeout
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
        _stacktrace=sys.exc_info()[2])
      File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "getimg.py", line 102, in <module>
        GetImg().getdata()
      File "getimg.py", line 76, in getdata
        base_url + j['href'], headers=self.headers)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in get
        return self.request('GET', url, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
        r = adapter.send(request, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
    
    21 replies    2018-06-14 17:07:44 +08:00
    golmic
        1
    golmic  
       Jun 13, 2018 via Android
    是所有的都报错还是偶尔有报错? 像是触发反爬
    wsds
        2
    wsds  
    OP
       Jun 13, 2018
    @golmic 基本爬几个小时就会报这个错
    wsds
        3
    wsds  
    OP
       Jun 13, 2018
    @golmic 才爬了 1 万张不到
    lululau
        4
    lululau  
       Jun 13, 2018
    像是域名解析偶发抽风
    xxxy
        5
    xxxy  
       Jun 13, 2018
    dns 也有频率限制的
    golmic
        6
    golmic  
       Jun 13, 2018
    @lululau #4 解析出错会报 DNS 错误吧

    大量报错就处理一下反爬,偶尔报的话重试就行
    Cooky
        7
    Cooky  
       Jun 13, 2018
    换个好点的 dns ?
    lerry
        8
    lerry  
       Jun 13, 2018
    本地装个 dnsmasq 配置成系统默认 DNS, 可以改善 dns 查询
    baday
        9
    baday  
       Jun 13, 2018
    请求头 connection 设置为 close 试试
    wsds
        10
    wsds  
    OP
       Jun 13, 2018
    @lululau 网上查了些,说是这么回事
    wsds
        11
    wsds  
    OP
       Jun 13, 2018
    @Cooky 好点的是哪种?
    wsds
        12
    wsds  
    OP
       Jun 13, 2018
    @lerry 这是阿里云上
    ihancheng
        13
    ihancheng  
       Jun 13, 2018 via Android
    不想吐槽套路云了,正在学 python 爬虫,我用腾讯云就没问题,阿里云抛异常死活解决不了…… 不知道是不是自己的问题,但是我在网上找了方法还是无法解决。
    owenliang
        14
    owenliang  
       Jun 13, 2018 via Android
    异常是可以捕获的
    wsds
        15
    wsds  
    OP
       Jun 13, 2018
    @owenliang 这个已经是捕获后又抛出的了,你没看到 n 个 another exception occurred
    Cooky
        16
    Cooky  
       Jun 13, 2018
    @wsds 阿里云不能装 dnsmasq ?
    hicdn
        17
    hicdn  
       Jun 13, 2018
    DNS 解析问题。如果爬的是几个固定域名,改 hosts 文件。
    dapengzhao
        18
    dapengzhao  
       Jun 13, 2018
    我的爬虫运行一段时间也会报这个错我的解决方法时如果 ip 不被封就捕获这个异常睡一会然后在 while true 下 break 结束此次循环重新开始。
    gamecreating
        19
    gamecreating  
       Jun 13, 2018
    异常 捕获一下 处理吧...

    爬虫本来就不能保证全部连接成功 爬取成功
    JCZ2MkKb5S8ZX9pq
        20
    JCZ2MkKb5S8ZX9pq  
       Jun 13, 2018
    自己写个 request,把 requests 包进去,常用的异常处理重试随机 ua 自动代理等等的都包进去,一劳永逸。
    beforeuwait
        21
    beforeuwait  
       Jun 14, 2018
    Failed to establish a new connection
    遇到这种问题,写个 try except,报错休息 20 秒,再请求
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   4988 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 75ms · UTC 09:42 · PVG 17:42 · LAX 02:42 · JFK 05:42
    ♥ Do have faith in what you're doing.