Python爬虫之-Requests

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了Python爬虫之-Requests，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含8694字，纯文字阅读大概需要13分钟。

内容图文

Requests模块

Python标准库中提供了：urllib、urllib2、httplib等模块以供Http请求，但是，它的 API 太渣了。

它是为另一个时代、另一个互联网所创建的。它需要巨量的工作，甚至包括各种方法覆盖，来完成最简单的任务。

Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装;

从而使得Pythoner进行网络请求时，变得方便了许多，使用Requests可以轻而易举的完成浏览器可有的任何操作。

GET请求

            #
             1、无参数实例
            import
             requests
ret = requests.get(‘https://github.com/timeline.json‘)
print(ret.url)
print(ret.text) 

# 2、有参数实例
import requests payload = {‘key1‘: ‘value1‘, ‘key2‘: ‘value2‘} ret = requests.get("http://httpbin.org/get", params=payload) 
print(ret.url)
print(ret.text)

POST请求

            #
             1、基本POST实例
            import
             requests
  
payload = {‘key1‘: ‘value1‘, ‘key2‘: ‘value2‘}
ret = requests.post("http://httpbin.org/post", data=payload)
  
print(ret.text)
  
  
# 2、发送请求头和数据实例import requests
import json
  
url = ‘https://api.github.com/some/endpoint‘
payload = {‘some‘: ‘data‘}
headers = {‘content-type‘: ‘application/json‘}
  
ret = requests.post(url, data=json.dumps(payload), headers=headers)
  
print(ret.text)print(ret.cookies)

其他请求

requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)
  
# 以上方法均是在此方法的基础上构建
requests.request(method, url, **kwargs)

更多参数

                def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``‘name‘: file-like-objects`` (or ``{‘name‘: file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``(‘filename‘, fileobj)``, 3-tuple ``(‘filename‘, fileobj, ‘content_type‘)``
        or a 4-tuple ``(‘filename‘, fileobj, ‘content_type‘, custom_headers)``, where ``‘content-type‘`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How long to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, (‘cert‘, ‘key‘) pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request(‘GET‘, ‘http://httpbin.org/get‘)
      <Response [200]>
    """

参数列表

                def
                 param_method_url():
    
                #
                 requests.request(method=‘get‘, url=‘http://127.0.0.1:8000/test/‘)
                #
                 requests.request(method=‘post‘, url=‘http://127.0.0.1:8000/test/‘)
                pass
                def
                 param_param():
    
                #
                 - 可以是字典
                #
                 - 可以是字符串
                #
                 - 可以是字节（ascii编码以内）
                #
                 requests.request(method=‘get‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 params={‘k1‘: ‘v1‘, ‘k2‘: ‘水电费‘})
                #
                 requests.request(method=‘get‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 params="k1=v1&k2=水电费&k3=v3&k3=vv3")
                #
                 requests.request(method=‘get‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding=‘utf8‘))
                #
                 错误
                #
                 requests.request(method=‘get‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding=‘utf8‘))
                pass
                def
                 param_data():
    
                #
                 可以是字典
                #
                 可以是字符串
                #
                 可以是字节
                #
                 可以是文件对象
                #
                 requests.request(method=‘POST‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 data={‘k1‘: ‘v1‘, ‘k2‘: ‘水电费‘})
                #
                 requests.request(method=‘POST‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 data="k1=v1; k2=v2; k3=v3; k3=v4"
                #
                 )
                #
                 requests.request(method=‘POST‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 data="k1=v1;k2=v2;k3=v3;k3=v4",
                #
                 headers={‘Content-Type‘: ‘application/x-www-form-urlencoded‘}
                #
                 )
                #
                 requests.request(method=‘POST‘,
                #
                 url=‘http://127.0.0.1:8000/test/‘,
                #
                 data=open(‘data_file.py‘, mode=‘r‘, encoding=‘utf-8‘), # 文件内容是：k1=v1;k2=v2;k3=v3;k3=v4
                #
                 headers={‘Content-Type‘: ‘application/x-www-form-urlencoded‘}
                #
                 )
                pass
                def
                 param_json():
    
                #
                 将json中对应的数据进行序列化成一个字符串，json.dumps(...)
                #
                 然后发送到服务器端的body中，并且Content-Type是 {‘Content-Type‘: ‘application/json‘}
    requests.request(method=‘POST‘,
                     url=‘http://127.0.0.1:8000/test/‘,
                     json={‘k1‘: ‘v1‘, ‘k2‘: ‘水电费‘})


def param_headers():
    # 发送请求头到服务器端
    requests.request(method=‘POST‘,
                     url=‘http://127.0.0.1:8000/test/‘,
                     json={‘k1‘: ‘v1‘, ‘k2‘: ‘水电费‘},
                     headers={‘Content-Type‘: ‘application/x-www-form-urlencoded‘}
                     )


def param_cookies():
    # 发送Cookie到服务器端
    requests.request(method=‘POST‘,
                     url=‘http://127.0.0.1:8000/test/‘,
                     data={‘k1‘: ‘v1‘, ‘k2‘: ‘v2‘},
                     cookies={‘cook1‘: ‘value1‘},
                     )
    # 也可以使用CookieJar（字典形式就是在此基础上封装）from http.cookiejar import CookieJar
    from http.cookiejar import Cookie

    obj = CookieJar()
    obj.set_cookie(Cookie(version=0, name=‘c1‘, value=‘v1‘, port=None, domain=‘‘, path=‘/‘, secure=False, expires=None,
                          discard=True, comment=None, comment_url=None, rest={‘HttpOnly‘: None}, rfc2109=False,
                          port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)
                   )
    requests.request(method=‘POST‘,
                     url=‘http://127.0.0.1:8000/test/‘,
                     data={‘k1‘: ‘v1‘, ‘k2‘: ‘v2‘},
                     cookies=obj)


def param_files():
    # 发送文件# file_dict = {# ‘f1‘: open(‘readme‘, ‘rb‘)# }# requests.request(method=‘POST‘,# url=‘http://127.0.0.1:8000/test/‘,# files=file_dict)# 发送文件，定制文件名# file_dict = {# ‘f1‘: (‘test.txt‘, open(‘readme‘, ‘rb‘))# }# requests.request(method=‘POST‘,# url=‘http://127.0.0.1:8000/test/‘,# files=file_dict)# 发送文件，定制文件名# file_dict = {# ‘f1‘: (‘test.txt‘, "hahsfaksfa9kasdjflaksdjf")# }# requests.request(method=‘POST‘,# url=‘http://127.0.0.1:8000/test/‘,# files=file_dict)# 发送文件，定制文件名# file_dict = {#     ‘f1‘: (‘test.txt‘, "hahsfaksfa9kasdjflaksdjf", ‘application/text‘, {‘k1‘: ‘0‘})# }# requests.request(method=‘POST‘,#                  url=‘http://127.0.0.1:8000/test/‘,#                  files=file_dict)passdef param_auth():
    from requests.auth import HTTPBasicAuth, HTTPDigestAuth

    ret = requests.get(‘https://api.github.com/user‘, auth=HTTPBasicAuth(‘wupeiqi‘, ‘sdfasdfasdf‘))
    print(ret.text)

    # ret = requests.get(‘http://192.168.1.1‘,# auth=HTTPBasicAuth(‘admin‘, ‘admin‘))# ret.encoding = ‘gbk‘# print(ret.text)# ret = requests.get(‘http://httpbin.org/digest-auth/auth/user/pass‘, auth=HTTPDigestAuth(‘user‘, ‘pass‘))# print(ret)#
def param_timeout():
    # ret = requests.get(‘http://google.com/‘, timeout=1)# print(ret)# ret = requests.get(‘http://google.com/‘, timeout=(5, 1))# print(ret)passdef param_allow_redirects():
    ret = requests.get(‘http://127.0.0.1:8000/test/‘, allow_redirects=False)
    print(ret.text)


def param_proxies():
    # proxies = {# "http": "61.172.249.96:80",# "https": "http://61.185.219.126:3128",# }# proxies = {‘http://10.20.1.128‘: ‘http://10.10.1.10:5323‘}# ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)# print(ret.headers)# from requests.auth import HTTPProxyAuth#
# proxyDict = {# ‘http‘: ‘77.75.105.165‘,# ‘https‘: ‘77.75.105.165‘# }# auth = HTTPProxyAuth(‘username‘, ‘mypassword‘)#
# r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)# print(r.text)passdef param_stream():
    ret = requests.get(‘http://127.0.0.1:8000/test/‘, stream=True)
    print(ret.content)
    ret.close()

    # from contextlib import closing# with closing(requests.get(‘http://httpbin.org/get‘, stream=True)) as r:# # 在此处理响应。# for i in r.iter_content():# print(i)def requests_session():
    import requests

    session = requests.Session()

    ### 1、首先登陆任何页面，获取cookie
    i1 = session.get(url="http://dig.chouti.com/help/service")

    ### 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权
    i2 = session.post(
        url="http://dig.chouti.com/login",
        data={
            ‘phone‘: "8615131255089",
            ‘password‘: "xxxxxx",
            ‘oneMonth‘: ""
        }
    )

    i3 = session.post(
        url="http://dig.chouti.com/link/vote?linksId=8589623",
    )
    print(i3.text)

参数示例

使用示例

原文：https://www.cnblogs.com/bigtreei/p/9026393.html

内容总结

以上是互联网集市为您收集整理的Python爬虫之-Requests全部内容，希望文章能够帮你解决Python爬虫之-Requests所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/1059664.html

来源：【匿名】

【上一篇】Python爬虫-urllib模块【下一篇】PHP使用swoole实现多线程爬虫

更多 ►

【Python爬虫之-Requests】教程文章相关的互联网学习教程文章

Link ExtractorsLink Extractors 是那些目的仅仅是从网页(scrapy.http.Response ' ref='nofollow'> scrapy.http.Response 对象)中抽取最终将会被follow链接的对象? Scrapy默认提供2种可用的 Link Extractor, 但你通过实现一个简单的接口创建自己定制的Link Extractor来满足需求? 每个LinkExtractor有唯一的公共方法是 extract_links ,它接收一个 Response' ref='nofollow'> Response 对象,并返回一个 scrapy.link.Link 对象?Link ...

Python爬虫之提取Bing搜索的背景图片并设置为Windows的电脑桌面【代码】【图】

??鉴于现阶段国内的搜索引擎还用不上Google, 笔者会寻求Bing搜索来代替。在使用Bing的过程中，笔者发现Bing的背景图片真乃良心之作，十分赏心悦目，因此，笔者的脑海中萌生了一个念头：能否自己做个爬虫，可以提取Bing搜索的背景图片并设置为Windows的电脑桌面呢？Bing搜索的页面如下： ??于是在一个风雨交加的下午，笔者开始了自己的探索之旅。当然，过程是曲折的，但笔者尝试着能把它讲得简单点。 ??首先，我们需要借助一些Pytho...

Python 爬虫6——Scrapy的安装和使用【代码】【图】

前面我们简述了使用Python自带的urllib和urllib2库完成的一下爬取网页数据的操作，但其实能完成的功能都很简单，假如要进行复制的数据匹配和高效的操作，可以引入第三方的框架，例如Scrapy便是比较常用的爬虫框架。一、Scrapy的安装：1.最简单的安装方式：根据官方主页的指导：http://www.scrapy.org/ 使用pip来安装python相关插件其实都很简单，当然用这个办法安装Scrapy也是最为简单的安装方式，仅需在命令行窗口...

python | 爬虫笔记（六）- Ajax数据爬取

request得到和浏览器数据不同数据加载是异步加载方式，原始页面不包含数据，加载完后会会再向服务器请求某个接口获取数据，然后数据再被处理才呈现到网页上，这其实就是发送了一个 Ajax 请求。这样Web 开发上可以做到前后端分离，而且降低服务器直接渲染页面带来的压力。因此遇到这种情况，用requests模拟ajax请求6.1 Ajax 1- 介绍Ajax，全称为 Asynchronous JavaScript and XML，即异步的 JavaScript 和 XML。是利用 JavaScript 在...

python 爬虫第二个程序【代码】

#!/usr/bin/python #encoding=utf-8 import urllib2 import urllib import re import thread import time class Spider(object): def __init__(self): self.page = 1 self.pages = [] self.enable = False def Start(self): self.enable = True page = self.page thread.start_new_thread(self.LoadPage,()) while self.enable: ...

python爬虫模块理解【代码】

Url管理器：　　用来管理要抓取的url和已抓取的url,防止重复抓取和循环抓取，url管理器的五个最小功能:　　　　1、添加url到容器中　　　　2、获取一个url　　　　3、判断url是否已在容器中　　　　4、判断是否还有待爬取的url　　　　5、将待爬取的url移到已爬取的url网页下载器：　　网页下载器是爬虫的核心组件，它将url对应的互联网网页已html的形式保存在本地。目前有两种网页下载器，1：urllib2(python基础模块) 2:requests（...

[Python爬虫] 之三：Selenium 调用IEDriverServer 抓取数据【代码】

接着上一遍，在用Selenium+phantomjs 抓取数据过程中发现，有时候抓取不到，所以又测试了用Selenium+浏览器驱动的方式：具体代码如下：#coding=utf-8import osimport refrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysimport timefrom selenium.webdriver.common.action_chains import ActionChainsimport IniFileclass IEDriverCrawler: def __init__(self): #通过配置文件获取IEDrive...

[python爬虫] Selenium定向爬取PubMed生物医学摘要信息【代码】【图】

本文主要是自己的在线代码笔记。在生物医学本体Ontology构建过程中，我使用Selenium定向爬取生物医学PubMed数据库的内容。 PubMed是一个免费的搜寻引擎，提供生物医学方面的论文搜寻以及摘要。它的数据库来源为MEDLINE（生物医学数据库），其核心主题为医学，但亦包括其他与医学相关的领域，像是护理学或者其他健康学科。它同时也提供对于相关生物医学资讯上相当全面的支援，像是生化学与细胞生物学。 PubMed是因特网...

python爬虫中图形验证码的处理【代码】【图】

使用python爬虫自动登录时，遇到需要输入图形验证码的情况，一个比较简单的处理方法是使用打码平台识别验证码。使用过两个打码平台，打码兔和若快，若快的价格更便宜，识别率相当。若快需要注册两个帐号：开发者帐号与用户帐号，用户帐号用于发送识别请求，开发者帐号可以注册软件id，并于识别请求进行绑定，可以参与识别收入的分成返现。获取图形验证码目前发现的有两种方式：0x01 在抓包中可以直接获得图片：发送get请求可以直接...

Python爬虫【五】Scrapy分布式原理笔记【代码】【图】

Scrapy单机架构在这里scrapy的核心是scrapy引擎，它通过里面的一个调度器来调度一个request的队列，将request发给downloader，然后来执行request请求但是这些request队列都是维持在本机上的，因此如果要多台主机协同爬取，需要一个request共享的机制——requests队列，在本机维护一个爬取队列，Scheduler进行调度，而要实现多态服务器共同爬取数据关键就是共享爬取队列。单主机爬虫架构调度器负责从队列中调度requests进行爬取，而...

python之爬虫（三） Urllib库的基本使用【代码】【图】

官方文档地址：https://docs.python.org/3/library/urllib.html什么是UrllibUrllib是python内置的HTTP请求库包括以下模块urllib.request 请求模块urllib.error 异常处理模块urllib.parse url解析模块urllib.robotparser robots.txt解析模块urlopen关于urllib.request.urlopen参数的介绍：urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)url参数的使用先写一个简单...

python爬虫CSDN文章抓取

CSDN原则上不让非人浏览访问，正常爬虫无法从这里爬取文章，需要进行模拟人为浏览器访问。使用：输入带文章的CSDN链接自动生成正文的HTML，文件名为标题名#!/usr/bin/env python # coding=utf-8 ##########################################> File Name: CSDN_article.py#> Author: nealgavin#> Mail: nealgavin@126.com #> Created Time: Tue 27 May 2014 03:42:54 PM CST #########################################import rando...

python爬虫从入门到放弃（八）之 Selenium库的使用【代码】【图】

原文地址https://www.cnblogs.com/zhaof/p/6953241.html一、什么是Seleniumselenium 是一套完整的web应用程序测试系统，包含了测试的录制（selenium IDE）,编写及运行（Selenium Remote Control）和测试的并行处理（Selenium Grid）。Selenium的核心Selenium Core基于JsUnit，完全由JavaScript编写，因此可以用于任何支持JavaScript的浏览器上。selenium可以模拟真实浏览器，自动化测试工具，支持多种浏览器，爬虫中主要用来解决Ja...

python爬虫及结巴分词《攀登者》影评分析【代码】

《攀登者》影评爬取及分析0、项目结构其中simkai.ttf为字体文件，Windows查看系统自带的字体C:\Windows\Fonts一、爬取豆瓣影评数据# -*- coding: utf-8 -*- """爬取豆瓣影评""" import requests from lxml import etree import timeurl = "https://movie.douban.com/subject/30413052/comments?start=%d&limit=20&sort=new_score&status=P"#请求头 headers = {'Host': 'movie.douban.com', 'User-Agent': 'Mozilla/5.0 (Windows NT...

Python爬虫进阶二之PySpider框架安装配置【图】

关于首先，在此附上项目的地址，以及官方文档PySpider官方文档安装1. pip首先确保你已经安装了pip，若没有安装，请参照pip安装2. phantomjsPhantomJS 是一个基于 WebKit 的服务器端 JavaScript API。它全面支持web而不需浏览器支持，其快速、原生支持各种Web标准：DOM 处理、CSS 选择器、JSON、Canvas 和 SVG。 PhantomJS 可以用于页面自动化、网络监测、网页截屏以及无界面测试等。安装以上附有官方安装方式，如果你是 Ubuntu 或 ...

首页 / 爬虫 / Python爬虫之-Requests

Python爬虫之-Requests

内容导读

内容图文

Requests模块

GET请求

POST请求

其他请求

更多参数

内容总结

内容备注

内容手机端

【Python爬虫之-Requests】教程文章相关的互联网学习教程文章

python爬虫scrapy之rules的基本使用【代码】

Python爬虫之提取Bing搜索的背景图片并设置为Windows的电脑桌面【代码】【图】

Python 爬虫6——Scrapy的安装和使用【代码】【图】

python | 爬虫笔记（六）- Ajax数据爬取

python 爬虫第二个程序【代码】

python爬虫模块理解【代码】

[Python爬虫] 之三：Selenium 调用IEDriverServer 抓取数据【代码】

[python爬虫] Selenium定向爬取PubMed生物医学摘要信息【代码】【图】

python爬虫中图形验证码的处理【代码】【图】

Python爬虫【五】Scrapy分布式原理笔记【代码】【图】

python之爬虫（三） Urllib库的基本使用【代码】【图】

python爬虫CSDN文章抓取

python爬虫从入门到放弃（八）之 Selenium库的使用【代码】【图】

python爬虫及结巴分词《攀登者》影评分析【代码】

Python爬虫进阶二之PySpider框架安装配置【图】

PYTHON爬虫 - 相关标签

REQUEST - 相关标签

爬虫 - 最新教程

爬虫 - 最热教程