python报错Max retries exceeded with url 的问题，tryCatch反复抓取内容方法

2021-07-16 网页编程网 网页编程网

问题

在使用requests多次访问同一个ip时，尤其是在高频率访问下，很容易出现Max retries exceeded with url的错误。

分析

一是要及时close()关闭，再是用tryCatch来反复抓取。循环继续访问并用sleep控制访问频率。

def get(url):
	try:
		res = requests.get(url)
		# 如果响应状态码不是 200，就主动抛出异常
		res.raise_for_status()
		# 关闭连接 ！！！--非常重要
		res.close()
	except Exception as e:
		logger.error(e)
	else:
		return res.json()

别人写的抓取知乎文章。也用此法：

html=""
    while html == "":#因为请求可能被知乎拒绝，采用循环+sleep的方式重复发送，但保持频率不太高
        try:
            proxies = get_random_ip(ipList)
            print("这次试用ip：{}".format(proxies))
            r = requests.request("GET", url, headers=headers, params=querystring, proxies=proxies)
            r.encoding = 'utf-8'
            html = r.text
            return html
        except:
            print("Connection refused by the server..")
            print("Let me sleep for 5 seconds")
            print("ZZzzzz...")
            sleep(5)
            print("Was a nice sleep, now let me continue...")
            continue

综合

如何实现，整个抓取过程不断呢？就是python3报错了，只是跳出本次循环，仍会继续运行呢？

if __name__ == '__main__':
    for i in range(80):
        url = 'https://www.xxxx.com/qiye/{}.htm'.format(1000-i)
        try:
            getData(url)#此处写抓取函数，出错误也不怕
        except:
            print(url+"Let me sleep for 5 seconds")
            print("ZZzzzz...")
            time.sleep(5)
            print("Was a nice sleep, now let me continue...")
            continue

阅读原文

阅读 4523

123 显示电脑版