Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含2584字,纯文字阅读大概需要4分钟。
内容图文
1. 有些scrapy命令,只有在scrapy project根目录下才available,比如crawl命令
2 . scrapy genspider taobao http://detail.tmall.com/item.htm?id=12577759834
自动在spider目录下生成taobao.py
# -*- coding: utf-8 -*- import scrapy class TaobaoSpider(scrapy.Spider): name = "taobao" allowed_domains = ["http://detail.tmall.com/item.htm?id=12577759834"] start_urls = ( ‘http://www.http://detail.tmall.com/item.htm?id=12577759834/‘, ) def parse(self, response): pass
还有其它模板可以用
scrapy genspider taobao2 http://detail.tmall.com/item.htm?id=12577759834 --template=crawl
# -*- coding: utf-8 -*- import scrapy from scrapy.contrib.linkextractors import LinkExtractor from scrapy.contrib.spiders import CrawlSpider, Rule from project004.items import Project004Item class Taobao2Spider(CrawlSpider): name = ‘taobao2‘ allowed_domains = [‘http://detail.tmall.com/item.htm?id=12577759834‘] start_urls = [‘http://www.http://detail.tmall.com/item.htm?id=12577759834/‘] rules = ( Rule(LinkExtractor(allow=r‘Items/‘), callback=‘parse_item‘, follow=True), ) def parse_item(self, response): i = Project004Item() #i[‘domain_id‘] = response.xpath(‘//input[@id="sid"]/@value‘).extract() #i[‘name‘] = response.xpath(‘//div[@id="name"]‘).extract() #i[‘description‘] = response.xpath(‘//div[@id="description"]‘).extract() return i
3.列出当前项目所有spider: scrapy list
4.fetch命令用法
A. scrapy fetch --nolog http://www.example.com/some/page.html
B. scrapy fetch --nolog --headers http://www.example.com/
5.view命令在浏览器中查看网页内容
scrapy view http://www.example.com/some/page.html
6.查看设置
scrapy settings --get BOT_NAME
7.运行自包含的spider,不需要创建项目
scrapy runspider <spider_file.py>
8.scrapy project的部署: scrapy deploy
部署spider首先要有spider的server环境,一般使用scrapyd
安装scrapyd:pip install scrapyd
文档:http://scrapyd.readthedocs.org/en/latest/install.html
9.所有可用命令
C:\Users\IBM_ADMIN\PycharmProjects\pycrawl\project004>scrapy
Scrapy 0.24.4 - project: project004
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
check Check spider contracts
crawl Run a spider
deploy Deploy project in Scrapyd target
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
原文:http://dingbo.blog.51cto.com/8808323/1600296
内容总结
以上是互联网集市为您收集整理的Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令全部内容,希望文章能够帮你解决Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。