python – 在pandas数据帧上使用布尔过滤器时的KeyError
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – 在pandas数据帧上使用布尔过滤器时的KeyError,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含3943字,纯文字阅读大概需要6分钟。
内容图文
当来自一个数据帧的日期时间对象在另一个数据帧的日期时间对象范围内时,尝试组合两个数据帧.
继续得到:KeyError:’不能使用单个bool索引到我发布的第二个块中的这行代码的setitem’.
gametaxidf.loc[arrivemask, 'relevant'] = 1
我假设它会发生在下一行,同样的命令也是如此.
这是给我带来麻烦的部分:
with open('/Users/benjaminprice/Desktop/TaxiCombined/Data/combinedtaxifiltered.csv', 'w') as csvfile:
fieldnames1 = ['index','pickup_datetime', 'dropoff_datetime', 'pickup_long', 'pickup_lat','dropoff_long','dropoff_lat','passenger_count','trip_distance','fare_amount','tip_amount','total_amount','stadium_code']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames1)
writer.writeheader()
for index, row in baseballdf.iterrows():
gametimestart = row['Start.Time']
gametimeend = row['End.Time']
arrivemin = gametimestart - datetime.timedelta(minutes=120)
arrivemax = gametimeend - datetime.timedelta(minutes = 30)
departmin = gametimeend - datetime.timedelta(minutes = 60)
departmax = gametimeend + datetime.timedelta(minutes = 90)
gametaxidf = combineddf[combineddf.DATE==row.DATE]
gametaxidf['relevant']=0
for index, row in gametaxidf.iterrows():
arrivemask = (arrivemin < row['dropoff_datetime']) and (row['dropoff_datetime'] < arrivemax)
departmask = (departmin < row['pickup_datetime']) and (row['pickup_datetime'] < departmax)
gametaxidf.loc[arrivemask, 'relevant'] = 1
gametaxidf.loc[departmask, 'relevant'] = 1
with open('/Users/benjaminprice/Desktop/TaxiCombined/Data/combinedtaxifiltered.csv','a') as combinedtaxi:
gametaxidf.to_csv(combinedtaxi,header=None)
print(str(index) + "done")
Gametaxidf.head(5):
index pickup_datetime dropoff_datetime pickup_long pickup_lat \
0 195 2014-04-01 00:08:13 2014-04-01 00:15:32 -73.922218 40.827557
1 344 2014-04-01 00:16:30 2014-04-01 00:20:38 -73.846046 40.754566
2 558 2014-04-01 00:28:59 2014-04-01 00:36:36 -73.921692 40.831394
3 744 2014-04-01 00:42:00 2014-04-01 00:49:46 -73.938080 40.804646
4 776 2014-04-01 00:43:54 2014-04-01 00:53:22 -73.952652 40.810577
dropoff_long dropoff_lat passenger_count trip_distance fare_amount \
0 -73.900620 40.856174 1 2.30 9.0
1 -73.890259 40.753246 1 0.56 4.5
2 -73.942719 40.823257 1 1.53 7.0
3 -73.928490 40.830433 1 2.96 11.0
4 -73.924332 40.827320 1 2.28 10.5
tip_amount total_amount stadium_code DATE relevant
0 0 10.0 1.1 2014-04-01 0
1 0 5.5 2.1 2014-04-01 0
2 0 8.0 1.1 2014-04-01 0
3 0 12.0 1.0 2014-04-01 0
4 0 11.5 1.0 2014-04-01 0
还得到此警告:尝试在DataFrame的切片副本上设置值.
Try using .loc[row_indexer,col_indexer] = value instead
但它让我继续通过……任何帮助都会很棒.
解决方法:
这里
gametaxidf.loc[arrivemask, 'relevant'] = 1
您正在尝试通过.loc运算符设置数据帧值. Pandas docs for selecting rows说:
.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:
- A single label, e.g. 5 or ‘a’, (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
- A list or array of labels [‘a’, ‘b’, ‘c’]
- A slice object with labels ‘a’:’f’, (note that contrary to usual python slices, both the start and the stop are included!)
- A boolean array
你试图使用最后一种输入,但是这个
arrivemask = (arrivemin < row['dropoff_datetime']) and
(row['dropoff_datetime'] < arrivemax)
是标量布尔值,而不是数组.
您无需遍历数据框.熊猫为你做到了.只需使用:
gametaxidf.loc[
(arrivemin < gametaxidf['dropoff_datetime'])
&
(gametaxidf['dropoff_datetime'] < arrivemax)
, 'relevant'] = 1
内容总结
以上是互联网集市为您收集整理的python – 在pandas数据帧上使用布尔过滤器时的KeyError全部内容,希望文章能够帮你解决python – 在pandas数据帧上使用布尔过滤器时的KeyError所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。