首页 / PYTHON / Python3爬取Wallhaven.cc图片

Python3爬取Wallhaven.cc图片

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了Python3爬取Wallhaven.cc图片，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含3604字，纯文字阅读大概需要6分钟。

内容图文

https://wallhaven.cc/ 上有很多优秀壁纸图片，网站访问速度有点慢，还是抓下来看比较方便。

1、安装python3

2、pip安装requests、lxml

3、运行代码

技术分享图片

            #
             -*- coding: utf-8 -*-
            
#
            wallhaven爬取
            import
             os

            from urllib.parse import urlencode
import time
from requests import codes
import random
import requests
from lxml import etree

#定义创建文件路径函数，将下载的文件存储到该路径def CreatePath(filepath):
    ifnot os.path.exists(filepath):
            os.makedirs(filepath)

#定义获取url函数，这里是通过urlencode方法把url的各个部分拼接起来的，拼接起来的url
#像是这样的：https://wallhaven.cc/search?q=girls&categories=111&purity=110&sorting=toplist&order=desc        def GetUrl(keyword,category):
    params = {
        ‘q‘: keyword,
        ‘categories‘: category,
        ‘purity‘: ‘110‘,#100\010\110‘sorting‘: ‘favorites‘, #relevance\random\date_added\views\favorites\toplist\toplist-beta‘topRange‘:‘1y‘, #1y\6M\3M\1w\3d\1d‘order‘:‘desc‘
    }
    base_url=‘https://wallhaven.cc/search?‘
    url=base_url + urlencode(params)
    print(url)
    return url

#获取查找到的图片数def GetPictureNum(url):
    allpic=""try:
        html = requests.get(url) 
        if codes.ok == html.status_code:
            selector = etree.HTML(html.text) 
            pageInfo = selector.xpath(‘//header[@class="listing-header"]/h1[1]/text()‘)#提取出文本
            string = str(pageInfo[0])#图片数是文本中的第一个
            numlist = list(filter(str.isdigit,string))  #有些数字是这样的，11,123,所以需要整理。for item in numlist:
                allpic+=item
            totalPicNum=int(allpic)  #把拼接起来的字符串进行整数化return totalPicNum
    except requests.ConnectionError:
        return None
        
#获取图片链接def GetLinks(url,number):
    urls=url+‘&page=‘+str(number)
    try:
        html=requests.get(urls)
        selector=etree.HTML(html.text)
        PicLink=selector.xpath(‘//a[@class="preview"]/@href‘)#这里寻找图片的链接地址，以求得到图片编号except Exception as e:
        print(‘Error‘,e.args)
    return PicLink    
    
#下载函数    def Download(filepath,keyword,url,count,headers):#其中count是你要下载的图片数
#此函数用于图片下载。其中参数url是形如：https://wallhaven.cc/w/eyyoj8 的网址
#因为wallheaven上只有两种格式的图片，分别是png和jpg，所以设置两种最终地址HtmlJpg和HtmlPng，通过status_code来进行判断，状态码为200时请求成功。
    string=url.replace(‘https://wallhaven.cc/w/‘,‘‘) #python3 replace#print(string)
    HtmlJpg=‘https://w.wallhaven.cc/full/‘+ string[0:2] +‘/wallhaven-‘ + string +‘.jpg‘
    HtmlPng=‘https://w.wallhaven.cc/full/‘+ string[0:2] +‘/wallhaven-‘ + string +‘.png‘try:
        pic=requests.get(HtmlJpg,headers=headers)
        if codes.ok==pic.status_code:
            pic_path=filepath+‘wallhaven-‘+string+‘.jpg‘else:
            pic=requests.get(HtmlPng,headers=headers)
            if codes.ok==pic.status_code:
                pic_path=filepath+‘wallhaven-‘+string+‘.png‘else:
                print("Downloaded error:",string)
                return
        with open(pic_path,‘wb‘) as f:
            f.write(pic.content)
            f.close()
        print("Downloaded image:",string)
        time.sleep(random.uniform(0,3))#这里是让爬虫在下载完一张图片后休息一下，防被侦查到是爬虫从而引发反爬虫机制。except Exception as e:
        print(repr(e))    
        
#主函数def main():
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5)        AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36",#请求头，这个可以通过查看你自己的浏览器得到。        }
    filepath = (‘/wallpaper/Pictures/‘)#存储路径。
    keyword=input(‘请输入关键词:‘)
    category=input(‘请输入图片分类，共有三种，分别为Gneral,Anime,People三种                   ，如果你想要只想选择Anime，就键入010,如果全选就键入111,以此类推:‘)
    CreatePath(filepath) #创建保存路径
    url=GetUrl(keyword,category)   #获取url    
    PicNum=GetPictureNum(url)#总图片数
    pageNum=int(PicNum/24+1)  #求出总页面数print("We found:{} images.".format(PicNum))

    j=1
    Arr = input("请输入你想要爬的图片数，不能超过已找到的图片数:【若要设定其实页码用|分割，如：50|10(即从第10页开始，取50个)】").split(‘|‘)
    Num = int(Arr[0])
    pageStart=0
    if(len(Arr) == 2):
        pageStart = int(Arr[1])

    for i in range(pageStart,pageNum):
        PicUrl=GetLinks(url,i+1)
        for item in PicUrl:
            #print(item)            Download(filepath,keyword,item,j,headers)
            j+=1
            if(j>Num):#如果你下载的图片够用了，那就直接退出循环，结束程序。returnif__name__ == ‘__main__‘:
    main()

参考地址：https://www.jianshu.com/p/90f734cb895d

原文：https://www.cnblogs.com/kuangxiangnice/p/12046059.html

内容总结

以上是互联网集市为您收集整理的Python3爬取Wallhaven.cc图片全部内容，希望文章能够帮你解决Python3爬取Wallhaven.cc图片所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/1258151.html

来源：【匿名】

【上一篇】PythonQt进阶【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【Python3爬取Wallhaven.cc图片】教程文章相关的互联网学习教程文章

Python3爬取Wallhaven.cc图片【代码】【图】

https://wallhaven.cc/ 上有很多优秀壁纸图片，网站访问速度有点慢，还是抓下来看比较方便。1、安装python32、pip安装requests、lxml3、运行代码 # -*- coding: utf-8 -*- #wallhaven爬取import os from urllib.parse import urlencode import time from requests import codes import random import requests from lxml import etree#定义创建文件路径函数，将下载的文件存储到该路径def CreatePath(filepath):ifnot os.path.exi...

Windows下配置Python2和Python3【代码】

前言现在Python开发如火如荼，在各个领域都有所应用，但是由于一些历史原因，python 2 和python 3 在开发过程中都有所使用。虽然官方已经宣布，到2020年，将不再支持python2了，但是目前还是有很多的软件和平台在用python2的内容。尤其Linux上，有很多的软件也是基于python2进行的开发。出于学习的目的，同时更好地进行运维开发工作，我们在自己电脑上同时配置python2 和python3 。由于MacOS 和Ubuntu 已经自带了Python，...

python3使用hashlib进行加密【代码】

hashlib是个专门提供hash算法的库，里面包括md5, sha1, sha224, sha256, sha384, sha512，使用非常简单、方便。MD5MD5的全称是Message-Digest Algorithm 5（信息-摘要算法）。128位长度。目前MD5是一种不可逆算法。具有很高的安全性。它对应任何字符串都可以加密成一段唯一的固定长度的代码。SHA1SHA1的全称是Secure Hash Algorithm(安全哈希算法) 。SHA1基于MD5，加密后的数据长度更长，它对长度小于264的输入，产生长度为160bit的...

Python3.x：BeautifulSoup()解析网页内容出现乱码【代码】

Python3.x：BeautifulSoup()解析网页内容出现乱码问题：start_html = requests.get(all_url, headers=Hostreferer) BeautifulSoup(start_html.text, "html.parser")　出现乱码；　解决方案：　将BeautifulSoup(start_html.text, "html.parser")替换为BeautifulSoup(start_html.content, "html.parser")，即可；start_html = requests.get(all_url, headers=Hostreferer) BeautifulSoup(start_html.content, "html.parser") 原文：h...

[Python3从入门到实战] 第03讲 Python变量类型【代码】【图】

Python中的数字类型 Python中的数字类型支持的几种数值类型整型：可正可负，不带小数点。在Python3中，整型没有大小限制，所以也可以存储长整型浮点型：可正可负，带小数点，可以使用科学计数法表示 1.1e2 = 110复数：复数由实数部分和虚数部分构成，可以用a + bj,或者complex(a,b)表示，复数的实部a和虚部b都是浮点型，因用的较少，不做过多阐述，有兴趣可自行拓展数字类型的特点数字类型这种类型是不可变的，如果改变数字数据类...

python3 threading初体验【代码】【图】

python3中thread模块已被废弃，不能在使用thread模块，为了兼容性，python3将thread命名为_thread。python3中我们可以使用threading进行代替。threading通过对thread模块进行二次封装。Thread 是threading模块中最重要的类之一，可以使用它来创建线程。有两种方式来创建线程：一种是通过继承Thread类，重写它的run方法；另一种是创建一个threading.Thread对象，在它的初始化函数（__init__）中将可调用对象作为参数传入。import th...

Python3基础 time 索引值访问元组中的年月日时分秒【代码】

???? Python : 3.7.0?????? OS : Ubuntu 18.04.1 LTS?????? IDE : PyCharm 2018.2.4????? Conda : 4.5.11???typesetting : Markdowncode""" @Author : 行初心 @Date : 18-10-2 @Blog : www.cnblogs.com/xingchuxin @Gitee : gitee.com/zhichengjiu """ import timedef main():my_time = time.localtime()print(my_time[0], "年")print(my_time[1], "月")print(my_time[2], "日")print(my_time[3], "时")print(my_time[4]...

[亲测!超级简单] Centos 安装Python3.6环境

配置好Python3.6和pip3安装EPEL和IUS软件源yum install epel-release -y yum install https://centos7.iuscommunity.org/ius-release.rpm -y 安装Python3.6yum install python36u -y 创建python3连接符ln -s /bin/python3.6 /bin/python3 安装pip3yum install python36u-pip -y 创建pip3链接符ln -s /bin/pip3.6 /bin/pip3 原文：https://www.cnblogs.com/niuli1987/p/9892480.html

python3通过纯真IP数据库查询IP归属地信息【代码】【图】

在网上看到的别人写的python2的代码，修改成了python3。把纯真IP数据库文件qqwry.dat放到czip.py同一目录下。 1#! /usr/bin/env python 2# -*- coding: utf-8 -*- 3# filename: czip.py 4 5 6import socket7import struct8 9 10class CzIp:11def__init__(self, db_file=‘qqwry.dat‘):12 self.f_db = open(db_file, "rb")13 bs = self.f_db.read(8)14 (self.first_index, self.last_index) = stru...

安装完Anaconda python 3.7，想使用python3.6方法【代码】

cmd使用命令：conda create -n py36 python=3.6 anaconda安装好后，会有提示：To activate this environment, use:# > activate py36## To deactivate an active environment, use:# > deactivate## * for power-users using bash, you must source即想激活python3.6版本，使用命令：activate py36退出python3.6，使用命令：deactivate 原文：https://www.cnblogs.com/xiaodai0/p/9780101.html

python3 urllib使用1

post方式# -*- coding:utf-8 -*-import urllib.parse #python2 在这道程序中只需引入urllibimport urllib.request #python 在这道程序中引入urllib2values = {}values[‘username‘]="借物少年"values[‘password‘]="XXXXXXX"data=urllib.parse.urlencode(values).encode(encoding=‘utf-8‘)url="http://passport.cnblogs.com/user/signin?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F"request1=urllib.request.Request(url,dat...

Python3---内建函数---zip()【代码】

前言该文章描述了函数zip()的使用2020-01-16天象独行　　0X01；查看zip()使用方法#!/uer/bin/env python #coding:utf-8 help(zip)Help on class zip in module builtins:class zip(object)| zip(*iterables) --> zip object| | Return a zip object whose .__next__() method returns a tuple where| the i-th element comes from the i-th iterable argument. The .__next__()| method continues until the shortest iterab...

centos7安装python3和Django后，ModuleNotFoundError: No module named '_sqlite3'【代码】【图】

1.准备安装环境yum groupinstall ‘Development Tools‘ yum install zlib-devel bzip2-devel openssl-devel ncurses-devel2.安装python3去官网下载编译安装包或者直接执行以下命令下载wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tar.xz解压tar -xvJf Python-3.6.2.tar.xz切换进入cd Python-3.6.2编译安装./configure prefix=/usr/local/python3make && make install安装完毕，/usr/local/目录下就会有python3了...

Python3 + django2 开发易语言网络验证（上）【代码】【图】

创作背景: 在某论坛中下载到一套php开发易语言网络验证的教程，照着看下来，花了两天的时间，结果发现教程里开发的网络验证，以及随着教程一起给学员的源码，都存在着根本用不了的bug！我想要看看能不能在原本的基础上修改，却出现了一大堆坑，不是这儿少个$ ，就是那少个; 要不就是哪{}包的不对了，擦，不是说php是世界上最好的语言吗？怎么感觉是世界上最坑的语言呢？也许因为一般的程序员都是从C启蒙，进而C++，然后转Ja...

利用python3爬虫爬取漫画岛-非人哉漫画【代码】【图】

最近学了一点点python爬虫的知识，面向百度编程爬了一本小说之后感觉有点不满足，于是突发奇想尝试爬一本漫画下来看看。一、效果展示首先是我们想要爬取的漫画网页： http://www.manhuadao.cn/　　网页截图：　　其次是爬取下来的效果：每一回的文件夹里面是这样的： (因为网站图片的问题...所以就成了这个鬼样子) 二、分析原理 1、准备：需要vscode或者其他能够编译运行python的...

PYTHON - 技术教程分类

Python3 教程 Python3 简介 Python3 环境搭建 Python3 基础语法 Python3 基本数据类型 Python3 解释器 Python3 注释 Python3 运算符 Python3 数字(Number) Python3 字符串 Python3 列表 Python3 元组 Python3 字典 Python3 集合 Python3 编程第一步 Python3 条件控制 Python3 循环语句 Python3 迭代器与生成器 Python3 函数 Python3 数据结构 Python3 模块 Python3 输入和输出 Python3 File Python3 OS Python3 错误和异常 Python3 面向对象 Python3 命名空间/作用域 Python3 标准库概览 Python3 实例 Python3 CGI编程 Python3 MySQL(PyMySQL) Python3 网络编程 Python3 SMTP发送邮件 Python3 多线程 Python3 日期和时间 Python3 内置函数 Python3 MongoDB Python3 urllib python 全部

PYTHON - 最热教程

python如何统计字符串中字母个数？使用Python进行微信公众号开发（三）回...Python+PyQT5的子线程更新UI界面的实例 python时间戳怎么获得？如何获得当前时...vscode调试python时提示无法将“conda”...python接口自动化全局变量access_token...python收取邮件(腾讯企业邮箱)python如何绘制降水图详解python并发获取snmp信息及性能测试...怎么卸载Python3.6？

首页 / PYTHON / Python3爬取Wallhaven.cc图片

Python3爬取Wallhaven.cc图片

内容导读

内容图文

内容总结

内容备注

内容手机端

【Python3爬取Wallhaven.cc图片】教程文章相关的互联网学习教程文章

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程