首页 / PYTHON / 2021-03-10 Python 批量下载文献PDF

2021-03-10 Python 批量下载文献PDF

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了2021-03-10 Python 批量下载文献PDF，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含3640字，纯文字阅读大概需要6分钟。

内容图文

# -*- coding: utf-8 -*-
"""
Created on  Mar  10 21:22:22 2021
@author: kimol_love & solar2030
>>>>>>>> This code is designed based on kimol_love's code in his blog, https://blog.csdn.net/kimol_justdo/article/details/112996678?spm=1001.2014.3001.5501     Say thanks to him. Here, a 【for】 command was used so that we can downloading a series of papers by on-click. All we need to prepare is a text file including the lists of paper titles. And at the same time, I solved a bug related to '/' in paper titles. It can trouble troubles because '/' cannot be used in filenames. Using 【str.replace】command, we can replace '/'s with '_', for example, the bug then can be fixed.
>>>>>>>> 
"""
import os
import time
import requests
from bs4 import BeautifulSoup
from tkinter.filedialog import askopenfilename
import matplotlib.pyplot as plt
import numpy as np

path_and_name = askopenfilename(title='Paper lists： title or doi', filetypes=[('TXT', '*.txt')],
                                initialdir='D:\\')
data = []

# Data loading process:
# I. enumerate sequence，start from 0， rows stands for elements in lists.
# II. open txt，read data
# III. data lines：In CHI760E txt format, first data appears at 31th line.
with open(path_and_name)as txt_file:
    line = txt_file.readlines()
    for i, rows in enumerate(line):
        if i in range(0, len(line)):
            data.append(rows)
print(data[0])



def search_article(artName):
    '''
    搜索论文
    ---------------
    输入：论文名
    ---------------
    输出：搜索结果（如果没有返回""，否则返回PDF链接）
    '''
    url = 'https://www.sci-hub.ren/'
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
               'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
               'Accept-Encoding': 'gzip, deflate, br',
               'Content-Type': 'application/x-www-form-urlencoded',
               'Content-Length': '123',
               'Origin': 'https://www.sci-hub.ren',
               'Connection': 'keep-alive',
               'Upgrade-Insecure-Requests': '1'}
    data = {'sci-hub-plugin-check': '',
            'request': artName}
    res = requests.post(url, headers=headers, data=data)
    html = res.text
    soup = BeautifulSoup(html, 'html.parser')
    iframe = soup.find(id='pdf')
    if iframe == None:  # 未找到相应文章
        return ''
    else:
        downUrl = iframe['src']
        if 'http' not in downUrl:
            downUrl = 'https:' + downUrl
        return downUrl


def download_article(downUrl):
    '''
    根据论文链接下载文章
    ----------------------
    输入：论文链接
    ----------------------
    输出：PDF文件二进制
    '''
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
               'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
               'Accept-Encoding': 'gzip, deflate, br',
               'Connection': 'keep-alive',
               'Upgrade-Insecure-Requests': '1'}
    res = requests.get(downUrl, headers=headers)
    return res.content


def welcome():
    '''
    欢迎界面
    '''
    os.system('cls')
    title = '''
               _____  _____ _____      _    _ _    _ ____
              / ____|/ ____|_   _|    | |  | | |  | |  _ \
             | (___ | |      | |______| |__| | |  | | |_) |
              \___ \| |      | |______|  __  | |  | |  _ <
              ____) | |____ _| |_     | |  | | |__| | |_) |
             |_____/ \_____|_____|    |_|  |_|\____/|____/


            '''
    print(title)


if __name__ == '__main__':
#    while True:
     I=[]
     for ii in range(len(data)):
        welcome()
        #request = input('请输入URL、PMID、DOI或者论文标题：')
        request = data[ii].strip()
        title=request.replace("/", "_")
        print('搜索中...')
        downUrl = search_article(request)
        if downUrl == '':
            print('未找到相关论文，请重新搜索！')
            I.append('0')
        else:
            print('论文链接：%s' % downUrl)
            print('下载中...')
            pdf = download_article(downUrl)
            #文献存储目录   D:\doc_E\papers\
            with open('D:\doc_E\papers\%s.pdf' % title, 'wb') as f:
                f.write(pdf)
            print('---下载完成---')
            I.append('1')


        time.sleep(0.8)
     print('下载完成统计: %s', I)

内容总结

以上是互联网集市为您收集整理的2021-03-10 Python 批量下载文献PDF全部内容，希望文章能够帮你解决2021-03-10 Python 批量下载文献PDF所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/599326.html

来源：【匿名】

【上一篇】python中传递任意数量的实参（收集参数）【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【2021-03-10 Python 批量下载文献PDF】教程文章相关的互联网学习教程文章

Python 爬取qqmusic音乐url并批量下载【代码】

qqmusic上的音乐还是不少的，有些时候想要下载好听的音乐，但有每次在网页下载都是烦人的登录什么的。于是，来了个qqmusic的爬虫。　　至少我觉得for循环爬虫，最核心的应该就是找到待爬元素所在url吧。下面开始找吧（讲的不对不要笑我）#寻找url：　　这个url可不想其他的网站那么好找。把我给累得不轻，关键是数据多，从那么多数据里面挑出有用的数据，最后组合为music真正的music。昨天做的时候整理的几个中间url：#url1：https...

python批量下载邮件附件【代码】

# # !/usr/bin/env python3 # # -*- coding: utf-8 -*- import poplib,email,time,datetime,random from email.parser import Parser from email.header import decode_header from email.utils import parseaddrdef decode_str(s):#字符编码转换value, charset = decode_header(s)[0]if charset:value = value.decode(charset)return valuedef get_att(msg,Subject,date2):attachment_files = []for part in msg.walk():file_nam...

Python入门小练习 002 批量下载网页链接中的图片【代码】

我们常常需要下载网页上很多喜欢的图片，但是面对几十甚至上百张的图片，一个一个去另存为肯定是个很差的体验。我们可以用urllib包获取html的源码，再以正则表达式把匹配的图片链接放入一个list中，使用for循环来依次下载list中的链接。 import re import urllib a = raw_input("Please input a URL: ") s = urllib.urlopen(a) s2 = s.read()def image(s2):reg = r‘src="(.*?\.jpg)" pic_ext‘compile_reg = re.compile(reg)imag...

python3.4爬虫批量下载音乐【图】

最近在学习python，使用的版本为python3.4，开发环境为使用Pydev插件的eclipse。正好觉得http://www.dexiazai.com/?page_id=23上的音乐不错，决定使用python批量下载下来。 1、音乐地址经过分析，页面嵌入的虾米播放器中的地址如下，后面以逗号分隔的字符为音乐的id，如音乐的地址为http://www.xiami.com/song/2088578 <span style="font-size:14px;"><span style="font-size:14px;"> <embed src="http://www.xiami....

实现python批量下载网易云音乐的免费音乐【代码】【图】

python视频教程栏目介绍实现下载免费音乐相关免费学习推荐：python视频教程运行效果代码# -*- coding:utf-8 -*- import requests, hashlib, sys, click, re, base64, binascii, json, os from Crypto.Cipher import AES from http import cookiejar""" Website:http://cuijiahua.com Author:Jack Cui Refer:https://github.com/darknessomi/musicbox """class Encrypyed():"""解密算法"""def __init__(self):self.modulus = 00e0b5...

python如何安装批量下载【图】

Python是一种计算机程序设计语言。是一种面向对象的动态类型语言，最初被设计用于编写自动化脚本(shell)，随着版本的不断更新和语言新功能的添加，越来越多被用于独立的、大型项目的开发。可以参考下面的代码：import numpy as np a=np.array([[complex(1,-1),3],[2,complex(1,1)]]) print(a) print("矩阵2的范数") print(np.linalg.norm(a,ord=2) ) #计算矩阵2的范数 print("矩阵1的范数") print(np.linalg.norm(a,ord=1) ) #计算...

python爬虫[一]批量下载妹子图【图】

煎蛋网上的妹子图专题有着质量很高的美女http://www.gxlcms.com/css/css-rwd-images.html" target="_blank">图片，今天分享一下用 python 批量下载这些妹子图的方法。需要了解的知识和工具：#1 需要了解 python 的基本语法，对这篇文章来说，你只要知道如何操作 list ,for……in……，如何定义函数就够了。网页抓取、分析和保存文件的函数边用边了解。#2 需要安装第三方库 BeautifulSoup4。使用 pip 安装是很便利的方法。最新版本的...

利用Python实现Youku视频批量下载功能实例【图】

前段时间由于收集视频数据的需要，自己捣鼓了一个YouKu视频批量下载的程序。东西虽然简单，但还挺实用的，拿出来分享给大家。　　版本：Python2.7+BeautifulSoup3.2.1import urllib,urllib2,sys,os from BeautifulSoup import BeautifulSoup import itertools,re url_i =1 pic_num = 1 #自己定义的引号格式转换函数 def _en_to_cn(str):obj = itertools.cycle([“,”])_obj = lambda x: obj.next()return re.sub(r"[\"]",_obj,str)...

多线程爬虫批量下载pcgame图片url保存为xml的实现代码

代码如下:#coding=gbkfrom xml.dom import minidom,Nodeimport urllib2,re,osdef readsrc(src): try: url = urllib2.urlopen(src) content = url.read()#.decode(utf-8) return content except: print error return Nonedef pictype(content): 通过抓取网站导航栏，获得网站的图片类型返回列表，每个列表元素为一个字典，addr代表图片类型对于的链接，name代表图片类型的名称...

python批量下载图片的三种方法

有三种方法，一是用微软提供的扩展库win32com来操作IE，二是用selenium的webdriver，三是用python自带的HTMLParser解析。win32com可以获得类似js里面的document对象，但貌似是只读的（文档都没找到）。selenium则提供了Chrome，IE，FireFox等的支持，每种浏览器都有execute_script和find_element_by_xx方法，可以方便的执行js脚本（包括修改元素）和读取html里面的元素。不足是selenium只提供对python2.6和2.7的支持。HTMLParser则...

Python实现的批量下载RFC文档

RFC文档有很多，有时候在没有联网的情况下也想翻阅，只能下载一份留存本地了。看了看地址列表，大概是这个范围： http://www.networksorcery.com/enp/rfc/rfc1000.txt ... http://www.networksorcery.com/enp/rfc/rfc6409.txt 哈哈，很适合批量下载，第一个想到的就是迅雷…… 可用的时候发现它只支持三位数的扩展(用的是迅雷7)，我想要下的刚好是四位数…… 郁闷之下萌生自己做一个的想法！这东西很适合用python做，原理很简单，...

python实现批量下载新浪博客的方法

本文实例讲述了python实现批量下载新浪博客的方法。分享给大家供大家参考。具体实现方法如下：# coding=utf-8 import urllib2 import sys, os import re import string from BeautifulSoup import BeautifulSoup def encode(s):return s.decode(utf-8).encode(sys.stdout.encoding, ignore) def getHTML(url):#proxy_handler = urllib2.ProxyHandler({http:http://211.138.124.211:80})#opener = urllib2.build_opener(proxy_hand...

Python实现批量下载图片的方法

本文实例讲述了Python实现批量下载图片的方法。分享给大家供大家参考。具体实现方法如下：#!/usr/bin/env python #-*-coding:utf-8-*- #Filename:download_file.py import os,sys import re import urllib import urllib2 base_url = xxx array_url = list() pic_url = list() inner_url = list() def get_array_url(array_url,base_url):content = urllib.urlopen(base_url).read()array_url_a = re.findall(r/rihan.*?.html,cont...

编写Python脚本批量下载DesktopNexus壁纸的教程

DesktopNexus 是我最喜爱的一个壁纸下载网站，上面有许多高质量的壁纸，几乎每天必上，每月也必会坚持分享我这个月来收集的壁纸但是 DesktopNexus 壁纸的下载很麻烦，而且因为壁纸会通过浏览器检测你当前分辨率来展示合适你当前分辨率的壁纸，再加上是国外的网站，速度上很不乐观。于是我写了个脚本，检测输入的页面中壁纸页面的链接，然后批量下载到指定文件夹中。脚本使用 python 写的，所以需要机器上安装有 python 。用法...

python批量下载图片的三种方法【代码】

1.用微软提供的扩展库win32com来操作IE 2.用selenium的webdriver 3.用python自带的HTMLParser解析　　一是用微软提供的扩展库win32com来操作IE：win32com可以获得类似js里面的document对象，但貌似是只读的（文档都没找到）。　　二是用selenium的webdriver：selenium则提供了Chrome，IE，FireFox等的支持，每种浏览器都有execute_script和find_element_by_xx方法，可以方便的执行js脚本（包括修改元素）和读取html里面的元素。不足...

PYTHON - 技术教程分类

Python3 教程 Python3 简介 Python3 环境搭建 Python3 基础语法 Python3 基本数据类型 Python3 解释器 Python3 注释 Python3 运算符 Python3 数字(Number) Python3 字符串 Python3 列表 Python3 元组 Python3 字典 Python3 集合 Python3 编程第一步 Python3 条件控制 Python3 循环语句 Python3 迭代器与生成器 Python3 函数 Python3 数据结构 Python3 模块 Python3 输入和输出 Python3 File Python3 OS Python3 错误和异常 Python3 面向对象 Python3 命名空间/作用域 Python3 标准库概览 Python3 实例 Python3 CGI编程 Python3 MySQL(PyMySQL) Python3 网络编程 Python3 SMTP发送邮件 Python3 多线程 Python3 日期和时间 Python3 内置函数 Python3 MongoDB Python3 urllib python 全部

PYTHON - 最热教程

python如何统计字符串中字母个数？使用Python进行微信公众号开发（三）回...Python+PyQT5的子线程更新UI界面的实例 python时间戳怎么获得？如何获得当前时...vscode调试python时提示无法将“conda”...python接口自动化全局变量access_token...python收取邮件(腾讯企业邮箱)python如何绘制降水图详解python并发获取snmp信息及性能测试...怎么卸载Python3.6？

首页 / PYTHON / 2021-03-10 Python 批量下载文献PDF

2021-03-10 Python 批量下载文献PDF

内容导读

内容图文

内容总结

内容备注

内容手机端

【2021-03-10 Python 批量下载文献PDF】教程文章相关的互联网学习教程文章

Python 爬取qqmusic音乐url并批量下载【代码】

python批量下载邮件附件【代码】

Python入门小练习 002 批量下载网页链接中的图片【代码】

python3.4爬虫批量下载音乐【图】

实现python批量下载网易云音乐的免费音乐【代码】【图】

python如何安装批量下载【图】

python爬虫[一]批量下载妹子图【图】

利用Python实现Youku视频批量下载功能实例【图】

多线程爬虫批量下载pcgame图片url保存为xml的实现代码

python批量下载图片的三种方法

Python实现的批量下载RFC文档

python实现批量下载新浪博客的方法

Python实现批量下载图片的方法

编写Python脚本批量下载DesktopNexus壁纸的教程

python批量下载图片的三种方法【代码】

PYTHON - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程