首页 / PYTHON / python多线程下载文件
python多线程下载文件
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python多线程下载文件,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含2861字,纯文字阅读大概需要5分钟。
内容图文
![python多线程下载文件](/upload/InfoBanner/zyjiaocheng/1130/149c534e825c46ed837819caa3731564.jpg)
从文件中读取图片url和名称,将url中的文件下载下来。文件中每一行包含一个url和文件名,用制表符隔开。
1、使用requests请求url并下载文件
def download(img_url, img_name): with closing(requests.get(img_url, stream=True)) as r: with open(os.path.join(out_dir, img_name), ‘wb‘) as f: for data in r.iter_content(1024): f.write(data)
2、从文件中读取url,考虑文件较大,使用生成器的方式读取。
def get_imgurl_generate(): with open( ‘ ./example.txt ‘, ‘r‘) as f: for line in f: line = line.strip() yield imgs
3、使用多线程进行下载
lock = threading.Lock() def loop(imgs): while True: try: with lock: img_url, img_name = next(imgs) except StopIteration: break download_pic(img_url, img_name) img_gen = imgurl_generate() for i in range(0, thread_num): t = threading.Thread(target=loop, args=(img_gen,)) t.start()
完整代码,加入异常处理
![技术分享](/img/jia.gif)
![技术分享](/img/jian.gif)
1 # -*- coding: utf-8 -*- 2 import os 3 from contextlib import closing 4import threading 5import requests 6import time 7 8 9 headers = { 10‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36‘11} 1213#输出文件夹14 out_dir = ‘./output‘15#线程数16 thread_num = 20 17#http请求超时设置18 timeout = 5 1920ifnot os.path.exists(out_dir): 21 os.mkdir(out_dir) 22232425def download(img_url, img_name): 26if os.path.isfile(os.path.join(out_dir, img_name)): 27return28 with closing(requests.get(img_url, stream=True, headers=headers, timeout=timeout)) as r: 29 rc = r.status_code 30if 299 < rc or rc < 200: 31print‘returnCode%s\t%s‘ % (rc, img_url) 32return33 content_length = int(r.headers.get(‘content-length‘, ‘0‘)) 34if content_length == 0: 35print‘size0\t%s‘ % img_url 36return37try: 38 with open(os.path.join(out_dir, img_name), ‘wb‘) as f: 39for data in r.iter_content(1024): 40 f.write(data) 41except: 42print‘savefail\t%s‘ % img_url 4344def get_imgurl_generate(): 45 with open(‘./final.scp‘, ‘r‘) as f: 46 index = 0 47for line in f: 48 index += 1 49if index % 500 == 0: 50print‘execute %s line at %s‘ % (index, time.time()) 51ifnot line: 52print ur‘line %s is empty "\t"‘ % index 53continue54 line = line.strip() 55try: 56 imgs = line.split(‘\t‘) 57if len(imgs) != 2: 58print ur‘line %s splite error‘ % index 59continue60ifnot imgs[0] ornot imgs[1]: 61print ur‘line %s img is empty‘ % index 62continue63yield imgs 64except: 65print ur‘line %s can not split by "\t"‘ % index 666768 lock = threading.Lock() 69def loop(imgs): 70print‘thread %s is running...‘ % threading.current_thread().name 7172while True: 73try: 74 with lock: 75 img_url, img_name = next(imgs) 76except StopIteration: 77break78try: 79 download(img_url, img_name) 80except: 81print‘exceptfail\t%s‘ % img_url 82print‘thread %s is end...‘ % threading.current_thread().name 8384 img_gen = get_imgurl_generate() 8586for i in range(0, thread_num): 87 t = threading.Thread(target=loop, name=‘LoopThread %s‘ % i, args=(img_gen,)) 88 t.start()
原文:http://www.cnblogs.com/lilinwei340/p/6793796.html
内容总结
以上是互联网集市为您收集整理的python多线程下载文件全部内容,希望文章能够帮你解决python多线程下载文件所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。