Python大型多列表高效查询

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了Python大型多列表高效查询，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含5386字，纯文字阅读大概需要8分钟。

内容图文

我正在尝试创建如何使用Python操作由CSV表组成的海量数据库的示例.

我想找到一种方法来模拟通过一些list()传播的表中的高效索引查询

以下示例在3.2Ghz Core i5中需要24秒

#!/usr/bin/env python
import csv
MAINDIR = "../"
pf = open (MAINDIR+"atp_players.csv")
players = [p for p in csv.reader(pf)]
rf = open (MAINDIR+"atp_rankings_current.csv")
rankings = [r for r in csv.reader(rf)]
for i in rankings[:10]:
    player = filter(lambda x: x[0]==i[2],players)[0]
    print "%s(%s),(%s) Points: %s"%(player[2],player[5],player[3],i[3])

对于this dataset.

将非常感谢更有效或更pythonic的方式.

解决方法:

您可以使用itertools.islice而不是读取所有行并使用itertools.ifilter：

import csv
from itertools import islice,ifilter

MAINDIR = "../"
with  open(MAINDIR + "atp_players.csv") as pf,  open(MAINDIR + "atp_rankings_current.csv") as rf:
    players = list(csv.reader(pf))
    rankings = csv.reader(rf)
    # only get first ten rows using islice
    for i in islice(rankings, None, 10):
        # ifilter won't create a list, gives values in the fly
        player = next(ifilter(lambda x: x[0] == i[2], players),"")

不太确定什么过滤器(lambda x：x [0] == i [2],玩家)[0]正在做什么,你似乎每次都在搜索整个玩家列表而只是保留第一个元素.使用第一个元素作为键对列表进行排序可能需要付费,并使用二分搜索或构建一个dict,第一个元素作为键,行作为值然后只进行查找.

import csv
from itertools import islice,ifilter
from collections import OrderedDict

MAINDIR = "../"
with  open(MAINDIR + "atp_players.csv") as pf,  open(MAINDIR + "atp_rankings_current.csv") as rf:
    players = OrderedDict((row[0],row) for row in csv.reader(pf))
    rankings = csv.reader(rf)
    for i in islice(rankings, None, 10):
        # now constant work getting row as opposed to 0(n)    
        player = players.get(i[2])

你使用什么默认值,或者如果需要,你必须决定.

如果在每行的开头有重复元素但只想返回第一个匹配项：

with  open(MAINDIR + "atp_players.csv") as pf, open(MAINDIR + "atp_rankings_current.csv") as rf:
    players = {}
    for row in csv.reader(pf):
        key = row[0]
        if key in players:
            continue
        players[key] = row
    rankings = csv.reader(rf)
    for i in islice(rankings, None, 10):
        player = players.get(i[2])

输出：

Djokovic(SRB),(R) Points: 11360
Federer(SUI),(R) Points: 9625
Nadal(ESP),(L) Points: 6585
Wawrinka(SUI),(R) Points: 5120
Nishikori(JPN),(R) Points: 5025
Murray(GBR),(R) Points: 4675
Berdych(CZE),(R) Points: 4600
Raonic(CAN),(R) Points: 4440
Cilic(CRO),(R) Points: 4150
Ferrer(ESP),(R) Points: 4045

十个玩家的代码时间显示ifilter是最快的,但是当我们提高排名时,我们会看到dict获胜,而你的代码有多么糟糕：

In [33]: %%timeit
MAINDIR = "tennis_atp-master/"
pf = open ("/tennis_atp-master/atp_players.csv")                                                          players = [p for p in csv.reader(pf)]
rf =open( "/tennis_atp-master/atp_rankings_current.csv")
rankings = [r for r in csv.reader(rf)]
for i in rankings[:10]:
    player = filter(lambda x: x[0]==i[2],players)[0]
   ....: 
10 loops, best of 3: 123 ms per loop

In [34]: %%timeit
with  open("/tennis_atp-master/atp_players.csv") as pf, open( "/tennis_atp-master/atp_rankings_current.csv") as rf:                     players = list(csv.reader(pf))
    rankings = csv.reader(rf)    # only get first ten rows using islice
    for i in islice(rankings, None, 10):
        # ifilter won't create a list, gives values in the fly
        player = next(ifilter(lambda x: x[0] == i[2], players),"")
   ....: 
10 loops, best of 3: 43.6 ms per loop

In [35]: %%timeit                           
with  open("/tennis_atp-master/atp_players.csv") as pf, open( "/tennis_atp-master/atp_rankings_current.csv") as rf:
    players = {}
    for row in csv.reader(pf):
        key = row[0]
        if key in players:
            continue
        players[row[0]] = row
    rankings = csv.reader(rf)
    for i in islice(rankings, None, 10):
        player = players.get(i[2])
        pass
   ....: 
10 loops, best of 3: 50.7 ms per loop

现在有100名玩家,你会看到dict和10的速度一样快.构建dict的成本已经被恒定的时间查找所抵消：

In [38]: %%timeit
with  open("/tennis_atp-master/atp_players.csv") as pf, open("/tennis_atp-master/atp_rankings_current.csv") as rf:
    players = list(csv.reader(pf))
    rankings = csv.reader(rf)
    # only get first ten rows using islice
    for i in islice(rankings, None, 100):
        # ifilter won't create a list, gives values in the fly
        player = next(ifilter(lambda x: x[0] == i[2], players),"")
   ....: 
10 loops, best of 3: 120 ms per loop

In [39]: %%timeit
with  open("/tennis_atp-master/atp_players.csv") as pf, open( "/tennis_atp-master/atp_rankings_current.csv") as rf:
    players = {}                  
    for row in csv.reader(pf):
        key = row[0]
        if key in players:
            continue                                          
        players[row[0]] = row                                     
    rankings = csv.reader(rf)
    for i in islice(rankings, None, 100):
        player = players.get(i[2])
        pass
   ....: 
10 loops, best of 3: 50.7 ms per loop

In [40]: %%timeit
MAINDIR = "tennis_atp-master/"
pf = open ("/tennis_atp-master/atp_players.csv")
players = [p for p in csv.reader(pf)]
rf =open( "/tennis_atp-master/atp_rankings_current.csv")
rankings = [r for r in csv.reader(rf)]
for i in rankings[:100]:
    player = filter(lambda x: x[0]==i[2],players)[0]
   ....: 
1 loops, best of 3: 806 ms per loop

250名球员：

# your code
1 loops, best of 3: 1.86 s per loop

# dict
10 loops, best of 3: 50.7 ms per loop

# ifilter
10 loops, best of 3: 483  ms per loop

最终测试循环整个排名：

# your code

1 loops, best of 3: 2min 40s per loop

# dict 
10 loops, best of 3: 67 ms per loop

# ifilter
1 loops, best of 3: 1min 3s per loop

所以你可以看到,当我们循环更多的排名时,dict选项是迄今为止最有效的运行时,并且将非常好地扩展.

内容总结

以上是互联网集市为您收集整理的Python大型多列表高效查询全部内容，希望文章能够帮你解决Python大型多列表高效查询所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/804059.html

来源：【匿名】

【上一篇】带有空第二维的数组/向量的Python / numpy问题【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【Python大型多列表高效查询】教程文章相关的互联网学习教程文章

例子来源于《python基础教程》第三版，57p 该例子主要是使用字典的方式，实现一个小型的数据库，通过查询字典的键值来获取用户的信息。本人修改了部分代码。#!/usr/bin/python3 -*- coding:utf-8 -*- # 使用字典构建一个简单的数据库#导入模块，主要为了做异常退出 import os # 构建people字典，用来存储用户信息 people = {‘Ailce‘:{‘phone‘: ‘2341‘,‘addr‘: ‘Foo drive 23‘},‘Beth‘: {‘phone‘: ‘9102‘,‘addr...

使用Python查询MySQL数据库生成Excel文件发送监控周报。【代码】【图】

业务方要求每周发一封周报出来，将过去一周的线上项目的详细信息发送出来，我们的监控用的是zabbix，过去一直是手动填写，非常耗时耗力，而且显得非常不专业，所以我花了几个月时间学习Python，编写如下脚本供大家参考，望大神勿笑。 #!/usr/bin/env python #-*- coding: utf8 -*- import MySQLdb import string import xlsxwriter import time,datetime import sys reload(sys) sys.setdefaultencoding(‘utf-8‘)zdbhost = ‘ ‘...

python 调用hive查询实现类似存储过程

需求：数据仓库中所有表的定义结构保存到新的文件中，保存后类似下面数据，重复的数据只保留7月份即可****************ods_log_info*****************lid string uid string mb_uid string operation string module string result string ts string remark1 string remark2 string remark3 string ****************ods_mbportal_201407*****************data_time_thread string data_module string data_operation string data_re...

python实现ip查询示例

以下代码实现了ip查询功能处理程序复制代码代码如下:import os,time def getip(filepath): ip2city={} file=open(filepath,‘r‘) lines=file.readlines() file.close() for line in lines: ip=line.split(‘ ‘)[0] city=line.split(‘ ‘)[1] haship=hashm(ip) if haship in ip2city: pass else: ip2city[haship]=city print(‘Hash done!‘) ...

python查询【代码】

#coding=utf-8 import MySQLdb conn = MySQLdb.Connect(host = ‘127.0.0.1‘,port=3306,user=‘root‘,passwd=‘‘,db=‘test‘,charset=‘utf8‘) cursor = conn.cursor() sql = "select * from orders" cursor.execute(sql)#获取总记录数 print cursor.rowcount#获取一条数据 rs = cursor.fetchone() print rs#获取3条数据,将从第二条开始 rs = cursor.fetchmany(3) print rs#获取所有数据，返回所有的数据 rs = cursor.fetchal...

Odoo8查询产品时提示"maximum recursion depth exceeded while calling a Python object"【代码】

今天在生产系统中查询产品时，莫名提示错误：maximum recursion depth exceeded while calling a Python object，根据错误日志提示，发现在查询产品时，系统会构造一个domain，查询所有库位的库存量。当仓库较多的时候，构造的这个domain比较长，然后解析这个domain的方法distribute_negate是递归调用，因为递归次数太多，所以就提示错误。根据源码查看了生成domain的条件，这个部分不太好调整，所以后来直接找了个方法来增加递归的...

Python连接数据库查询结果保存excl

pymysql------操作mysql数据库openpyxl------操作excel表连上mysql操作：1、打开数据库import pymysqldb=pymysql.connect(host,user,password,database)2、使用cursor()方法创建一个游标对象cursor=db.cursor()3、执行操作a、数据库插入 try:　　curcor.excute(sql)　　db.commit()except:　　db.rollback()b、数据库查询（fetchone()--该方法获取下一个查询结果集。结果集是一个对象、fetchall()-----接收全部的返回结果行.）cu...

python-day43--多表查询【代码】

一、多表连接查询: #重点：外链接语法准备表#建表create table department( id int, name varchar(20) );create table employee( id int primary key auto_increment, name varchar(20), sex enum(‘male‘,‘female‘) not null default ‘male‘, age int, dep_id int );#插入数据insert into department values (200,‘技术‘), (201,‘人力资源‘), (202,‘销售‘), (203,‘运营‘);insert into employee(name,sex,age,...

Python MySQL - 创建/查询/删除数据库【代码】

#coding=utf-8import mysql.connector import importlib import sys#连接数据库的信息 mydb = mysql.connector.connect(host=‘115.xx.10.121‘,port=‘3306‘,user=‘root‘,password=‘xxxxxZ6XPXbvos‘,)mycursor=mydb.cursor() #创建数据库 mycursor.execute(‘create database test_cc‘) #查询数据库 mycursor.execute(‘show databases‘) for x in mycursor:if‘test_cc‘in x:print‘成功‘#删除数据库 mycursor.execute(...

学习python第四天——Oracle查询【代码】

3.子查询(难)：当进行查询的时候，发现需要的数据信息不明确，需要先通过另一个查询得到，此查询称为子查询；执行顺序：先执行子查询得到结果以后返回给主查询组成部分：1).主查询部分2).子查询部分【注意事项】：子查询一定需要被定义/包裹在小括号内部，可以认为是显示的提升了代码执行的优先级需求1：查询薪资比Abel的高的有谁？分析：①.先查询出Abel的薪资是多少？②.将过滤条件定义为>①，然后进行查询得到最终需要的结果代码...

python sqlite3 查询操作及获取对应查询结果的列名【代码】【图】

记录查询操作及获取查询结果列字段的方法1.sqlite3 中获取所有表名及各表字段名的操作方法SQLite 数据库中有一个特殊的表叫 sqlite_master，sqlite_master 的结构如下：CREATE TABLE sqlite_master ( type TEXT, name TEXT, tbl_name TEXT, rootpage INTEGER, sql TEXT ); 可以通过查询这个表来获取数据库中所有表的信息SELECT * FROM sqlite_master WHERE type='table';查询某张表的所有字段PRAGMA table_info(表名); 示例...

关于python中的查询数据库内容中用到的fetchone()函数和fetchall()函数(转)

最近在用python操作mysql数据库时，碰到了下面这两个函数，标记一下：fetchone() ：返回单个的元组，也就是一条记录(row)，如果没有结果则返回 Nonefetchall() ：返回多个元组，即返回多个记录(rows),如果没有结果则返回 ()需要注明：在MySQL中是NULL，而在Python中则是None 用法如下所示：fetchone()用法：cur.execute("select host,user,password from user where user=‘%s‘" %acc)jilu = cur.fetchone() ##此时通过 jilu[...

python3 批量查询域名A记录 & CNAME【代码】

场景：需要批量查询域名信息需要生成一个 domain_in.txt的文件文件内容格式（尾行不可有空行）：baidu.com bilibili.com 代码import random import string from dns.resolver import Resolver import requests import socket from parsel import Selector""" # python3 pip install dnspython3 pip install lxml pip install requests pip install parsel """dns_resolver = Resolver() dns_resolver.nameservers = ["8.8.8.8", "...

老男孩python3学习，课堂作业1.2 多级菜单查询【代码】

# Version: python3.6# Author: Gao# 多级菜单：三级菜单，可依次选择进入子菜单，列表，字典city_dict = { "四川省": { "成都市": ["锦江区", "青羊区", "金牛区", "武侯区", "成华区", "龙泉驿区"], "攀枝花市": ["东区", "西区", "仁和区", "米易县", "盐边县"], "泸州市": ["江阳区", "纳溪区", "龙马潭区", "泸县", "合江县"], ...

python开发mysql:单表查询&多表查询【代码】【图】

一单表查询，以下是表内容 1一 having 过滤2 1.1 having和where3 select * from emp where id > 15;4 解析过程;from > where 找到数据 > 分组（没有默认一个组）> select 打印 where是出结果之前5 select * from emp having id > 15;6 解析过程;from > where 找到数据（没有约束条件，就是整个表）） > 分组（没有默认一个组）> select 打印 > having where是出结果之后7 上面2个输出是一样的，因为没有...

PYTHON - 技术教程分类

Python3 教程 Python3 简介 Python3 环境搭建 Python3 基础语法 Python3 基本数据类型 Python3 解释器 Python3 注释 Python3 运算符 Python3 数字(Number) Python3 字符串 Python3 列表 Python3 元组 Python3 字典 Python3 集合 Python3 编程第一步 Python3 条件控制 Python3 循环语句 Python3 迭代器与生成器 Python3 函数 Python3 数据结构 Python3 模块 Python3 输入和输出 Python3 File Python3 OS Python3 错误和异常 Python3 面向对象 Python3 命名空间/作用域 Python3 标准库概览 Python3 实例 Python3 CGI编程 Python3 MySQL(PyMySQL) Python3 网络编程 Python3 SMTP发送邮件 Python3 多线程 Python3 日期和时间 Python3 内置函数 Python3 MongoDB Python3 urllib python 全部

PYTHON - 最热教程

python如何统计字符串中字母个数？使用Python进行微信公众号开发（三）回...Python+PyQT5的子线程更新UI界面的实例 python时间戳怎么获得？如何获得当前时...vscode调试python时提示无法将“conda”...python接口自动化全局变量access_token...python收取邮件(腾讯企业邮箱)python如何绘制降水图详解python并发获取snmp信息及性能测试...怎么卸载Python3.6？

首页 / PYTHON / Python大型多列表高效查询

Python大型多列表高效查询

内容导读

内容图文

内容总结

内容备注

内容手机端

【Python大型多列表高效查询】教程文章相关的互联网学习教程文章

【python基础】用字典做一个小型的查询数据库【代码】

使用Python查询MySQL数据库生成Excel文件发送监控周报。【代码】【图】

python 调用hive查询实现类似存储过程

python实现ip查询示例

python查询【代码】

Odoo8查询产品时提示"maximum recursion depth exceeded while calling a Python object"【代码】

Python连接数据库查询结果保存excl

python-day43--多表查询【代码】

Python MySQL - 创建/查询/删除数据库【代码】

学习python第四天——Oracle查询【代码】

python sqlite3 查询操作及获取对应查询结果的列名【代码】【图】

关于python中的查询数据库内容中用到的fetchone()函数和fetchall()函数(转)

python3 批量查询域名A记录 & CNAME【代码】

老男孩python3学习，课堂作业1.2 多级菜单查询【代码】

python开发mysql:单表查询&多表查询【代码】【图】

PYTHON - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程