python 爬虫
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python 爬虫,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含2494字,纯文字阅读大概需要4分钟。
内容图文
![python 爬虫](/upload/InfoBanner/zyjiaocheng/630/1491ae6b1b924757a1bbdcf7de4146eb.jpg)
def?saveDownedurl(downedurl): ????url?=?downedurl ????conn?=?mysql.connector.connect(user='root',?password='694521',?database='picurl') ????cursor?=?conn.cursor() ????sql?=?"INSERT?INTO?downedurl?(picurl)?VALUES?(%s)" ????cursor.execute(sql,[url]) ????conn.commit()? ????print(cursor.rowcount,?"记录插入成功。") ????conn.close() ????#?sql?=?"INSERT?INTO?downedurl?(picurl)?VALUES?(url)" ????#?cursor.execute(sql) ????#?conn.commit()? ????#?print(cursor.rowcount,?"记录插入成功。") ????#?conn.close()
def?download_pic(pic_url,root_url,down_times): ?????url?=?pic_url ?????Referer?=?root_url ?????down_time?=?down_times ?????headers?=?{ ?????'User-Agent':'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64;?rv:68.0)?Gecko/20100101?Firefox/68.0', ?????'Referer':Referer ?????} ?????down_path?=?str(down_time)+'.jpg' ?????print?(down_path) ?????requests?=?Request(url,?headers=headers) ?????data?=?urlopen(requests).read() ?????with?open(down_path,?'wb')?as?f: ??????????f.write(data) ??????????f.close() ?????down_time+=1 ?????return?down_time
def?jiexi_rootPic_url(next_rootUrl,down_times): ?????url?=?next_rootUrl ?????headers?=?{ ?????'User-Agent':'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64;?rv:68.0)?Gecko/20100101?Firefox/68.0' ?????} ?????downtime?=?down_times ?????request_url?=?Request(url,?headers=headers) ?????response?=?urlopen(request_url).read().decode("utf-8")? ?????pattern?=?re.compile('<img?src="(.*?)"',?re.IGNORECASE) ?????pic_path?=??pattern.findall(response) ?????for?i?in?pic_path: ??????????print?('download_prepare') ??????????downtime?=?download_pic(i,url,downtime)? ??????????print(i) ?????time.sleep(2) ?????return?downtime
def?jiexi_url(root_url,down_times): ?????headers?=?{ ?????'User-Agent':'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64;?rv:68.0)?Gecko/20100101?Firefox/68.0' ?????} ?????downtime?=?down_times ?????url?=?root_url ?????request_url?=?Request(url,?headers=headers) ?????html?=?urlopen(request_url).read().decode("utf-8")? ?????response?=?re.compile('/rnyy(.*?).html',?re.IGNORECASE) ?????all_next_root?=??response.findall(html) ?????for?i?in?all_next_root: ??????????path?=?'http://mmff30.com/rnyy'+i+'.html' ??????????print?(path) ??????????saveDownedurl(path) ??????????downtime?=?jiexi_rootPic_url(path,downtime)
jiexi_url('http://mmff30.com/rwmy_9_3.html',4000)
内容总结
以上是互联网集市为您收集整理的python 爬虫全部内容,希望文章能够帮你解决python 爬虫所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。