Python Pandas自合并以合并笛卡尔积,以产生所有组合和总和
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了Python Pandas自合并以合并笛卡尔积,以产生所有组合和总和,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5895字,纯文字阅读大概需要9分钟。
内容图文
我是Python的新手,似乎它具有很大的灵活性,并且比传统的RDBMS系统快.
建立一个非常简单的过程以创建随机的幻想团队.我来自RDBMS背景(Oracle SQL),对于这种数据处理来说似乎并不是最佳选择.
我使用从csv文件读取的熊猫制作了一个数据框,现在有一个包含两列的简单数据框-Player,Salary:
` Name Salary
0 Jason Day 11700
1 Dustin Johnson 11600
2 Rory McIlroy 11400
3 Jordan Spieth 11100
4 Henrik Stenson 10500
5 Phil Mickelson 10200
6 Justin Rose 9800
7 Adam Scott 9600
8 Sergio Garcia 9400
9 Rickie Fowler 9200`
我想通过python(pandas)做的是产生6位玩家的所有组合,其薪水在一定的范围45000-50000之间.
在查找python选项时,我发现itertools组合很有趣,但是它会生成大量的组合列表,而不会过滤薪水总和.
在传统的SQL中,我将使用SUM进行大规模的合并笛卡尔联接,但随后又使播放器处于不同位置.
如A,B,C然后,C,B,A.
我的传统SQL表现不佳,如下所示:
` SELECT distinct
ONE.name AS "1",
TWO.name AS "2",
THREE.name AS "3",
FOUR.name AS "4",
FIVE.name AS "5",
SIX.name AS "6",
sum(one.salary + two.salary + three.salary + four.salary + five.salary + six.salary) as salary
FROM
nl.pgachamp2 ONE,nl.pgachamp2 TWO,nl.pgachamp2 THREE, nl.pgachamp2 FOUR,nl.pgachamp2 FIVE,nl.pgachamp2 SIX
where ONE.name != TWO.name
and ONE.name != THREE.name
and one.name != four.name
and one.name != five.name
and TWO.name != THREE.name
and TWO.name != four.name
and two.name != five.name
and TWO.name != six.name
and THREE.name != four.name
and THREE.name != five.name
and three.name != six.name
and five.name != six.name
and four.name != six.name
and four.name != five.name
and one.name != six.name
group by ONE.name, TWO.name, THREE.name, FOUR.name, FIVE.name, SIX.name`
有没有办法在Pandas / Python中做到这一点?
可以指向的任何文档都很棒!
解决方法:
这是使用简单算法的非熊猫解决方案.它从按薪水排序的球员列表中递归生成组合.这样就可以跳过生成超过薪资上限的组合.
正如piRSquared所提到的,在问题中所述的薪水限制内没有6人的球队,因此我选择了限制以产生少量球队.
#!/usr/bin/env python3
''' Limited combinations
Generate combinations of players whose combined salaries fall within given limits
See https://stackoverflow.com/q/38636460/4014959
Written by PM 2Ring 2016.07.28
'''
data = '''\
0 Jason Day 11700
1 Dustin Johnson 11600
2 Rory McIlroy 11400
3 Jordan Spieth 11100
4 Henrik Stenson 10500
5 Phil Mickelson 10200
6 Justin Rose 9800
7 Adam Scott 9600
8 Sergio Garcia 9400
9 Rickie Fowler 9200
'''
data = [s.split() for s in data.splitlines()]
all_players = [(' '.join(u[1:-1]), int(u[-1])) for u in data]
all_players.sort(key=lambda t: t[1])
for i, row in enumerate(all_players):
print(i, row)
print('- '*40)
def choose_teams(free, num, team=(), value=0):
num -= 1
for i, p in enumerate(free):
salary = all_players[p][1]
newvalue = value + salary
if newvalue <= hi:
newteam = team + (p,)
if num == 0:
if newvalue >= lo:
yield newteam, newvalue
else:
yield from choose_teams(free[i+1:], num, newteam, newvalue)
else:
break
#Salary limits
lo, hi = 55000, 60500
#Indices of players that can be chosen for a team
free = tuple(range(len(all_players)))
for i, (t, s) in enumerate(choose_teams(free, 6), 1):
team = [all_players[p] for p in t]
names, sals = zip(*team)
assert sum(sals) == s
print(i, t, names, s)
输出
0 ('Rickie Fowler', 9200)
1 ('Sergio Garcia', 9400)
2 ('Adam Scott', 9600)
3 ('Justin Rose', 9800)
4 ('Phil Mickelson', 10200)
5 ('Henrik Stenson', 10500)
6 ('Jordan Spieth', 11100)
7 ('Rory McIlroy', 11400)
8 ('Dustin Johnson', 11600)
9 ('Jason Day', 11700)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 (0, 1, 2, 3, 4, 5) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson') 58700
2 (0, 1, 2, 3, 4, 6) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Jordan Spieth') 59300
3 (0, 1, 2, 3, 4, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Rory McIlroy') 59600
4 (0, 1, 2, 3, 4, 8) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Dustin Johnson') 59800
5 (0, 1, 2, 3, 4, 9) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Jason Day') 59900
6 (0, 1, 2, 3, 5, 6) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Jordan Spieth') 59600
7 (0, 1, 2, 3, 5, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Rory McIlroy') 59900
8 (0, 1, 2, 3, 5, 8) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Dustin Johnson') 60100
9 (0, 1, 2, 3, 5, 9) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Henrik Stenson', 'Jason Day') 60200
10 (0, 1, 2, 3, 6, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Justin Rose', 'Jordan Spieth', 'Rory McIlroy') 60500
11 (0, 1, 2, 4, 5, 6) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Phil Mickelson', 'Henrik Stenson', 'Jordan Spieth') 60000
12 (0, 1, 2, 4, 5, 7) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Phil Mickelson', 'Henrik Stenson', 'Rory McIlroy') 60300
13 (0, 1, 2, 4, 5, 8) ('Rickie Fowler', 'Sergio Garcia', 'Adam Scott', 'Phil Mickelson', 'Henrik Stenson', 'Dustin Johnson') 60500
14 (0, 1, 3, 4, 5, 6) ('Rickie Fowler', 'Sergio Garcia', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson', 'Jordan Spieth') 60200
15 (0, 1, 3, 4, 5, 7) ('Rickie Fowler', 'Sergio Garcia', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson', 'Rory McIlroy') 60500
16 (0, 2, 3, 4, 5, 6) ('Rickie Fowler', 'Adam Scott', 'Justin Rose', 'Phil Mickelson', 'Henrik Stenson', 'Jordan Spieth') 60400
如果您使用的旧版Python不支持语法产生的收益,则可以替换
yield from choose_teams(free[i+1:], num, newteam, newvalue)
与
for t, v in choose_teams(free[i+1:], num, newteam, newvalue):
yield t, v
内容总结
以上是互联网集市为您收集整理的Python Pandas自合并以合并笛卡尔积,以产生所有组合和总和全部内容,希望文章能够帮你解决Python Pandas自合并以合并笛卡尔积,以产生所有组合和总和所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。