python-scrapy错误:exceptions.IOError:无法识别图像文件
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python-scrapy错误:exceptions.IOError:无法识别图像文件,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含2602字,纯文字阅读大概需要4分钟。
内容图文
![python-scrapy错误:exceptions.IOError:无法识别图像文件](/upload/InfoBanner/zyjiaocheng/657/642e7566ddef4062b47a8ecbff994e1f.jpg)
我多次收到以下错误,却不知道图像文件名或跟踪它的响应URL:
2012-08-20 08:14:34+0000 [spider] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
self._runCallbacks()
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 204, in media_downloaded
checksum = self.image_downloaded(response, request, info)
File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 252, in image_downloaded
for key, image, buf in self.get_images(response, request, info):
File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 261, in get_images
orig_image = Image.open(StringIO(response.body))
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
raise IOError("cannot identify image file")
exceptions.IOError: cannot identify image file
那么,我该如何解决这个问题?导致它在我已经在settings.py中定义的特定数量的错误后停止了我的蜘蛛
解决方法:
冒犯的行使用PIL到scrapy.contrib.pipelines.images.ImagesPipeline中的Image.open():
def get_images(self, response, request, info):
key = self.image_key(request.url)
orig_image = Image.open(StringIO(response.body))
media_downloaded()中的try块捕获了此问题,但自身发出错误:
except Exception:
log.err(spider=info.spider)
您可以使用以下方法破解此文件:
try:
key = self.image_key(request.url)
checksum = self.image_downloaded(response, request, info)
except ImageException, ex:
log.msg(str(ex), level=log.WARNING, spider=info.spider)
raise
except IOError, ex:
log.msg(str(ex), level=log.WARNING, spider=info.spider)
raise ImageException
except Exception:
log.err(spider=info.spider)
raise ImageException
但是更好的选择是创建自己的管道,并在pipelines.py文件中覆盖image_downloaded()方法:
from scrapy import log
from scrapy.contrib.pipeline.images import ImagesPipeline
class BkamImagesPipeline(ImagesPipeline):
def image_downloaded(self, response, request, info):
try:
super(BkamImagesPipeline, self).image_downloaded(response, request, info)
except IOError, ex:
log.msg(str(ex), level=log.WARNING, spider=info.spider)
确保在您的设置文件中声明此管道:
ITEM_PIPELINES = [
'bkam.pipelines.BkamImagesPipeline',
]
内容总结
以上是互联网集市为您收集整理的python-scrapy错误:exceptions.IOError:无法识别图像文件全部内容,希望文章能够帮你解决python-scrapy错误:exceptions.IOError:无法识别图像文件所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。