python – Pandas合并错误TypeError:’>’和’str’实例之间不支持’>’
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – Pandas合并错误TypeError:’>’和’str’实例之间不支持’>’,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含9473字,纯文字阅读大概需要14分钟。
内容图文
![python – Pandas合并错误TypeError:’>’和’str’实例之间不支持’>’](/upload/InfoBanner/zyjiaocheng/782/7c646b643a424b95be9f8700fae464f0.jpg)
我有一个包含几个表的数据集,每个表都有国家,年份和一些指标.我已将所有excel表转换为csv文件,然后将它们合并到一个表中.
问题是我有一些拒绝合并的表,并出现以下消息TypeError:’>’ ‘int’和’str’实例之间不支持
我尽我所能,但没有运气,仍然出现同样的错误!
此外,我尝试了数百个不同的文件,但仍有数十个文件面临这个问题.
对于示例文件file17.csv和file35.csv(如果有人需要重复它).这是我使用的代码:
# To load the first file
import pandas as pd
filename1 = 'file17.csv'
df1 = pd.read_csv(filename1, encoding='cp1252', low_memory=False)
df1.set_index(['Country', 'Year'], inplace=True)
df1.dropna(axis=0, how='all', inplace=True)
df1.head()
出>>>
+-------------+------+--------+--------+
| | | ind500 | ind356 |
| Country | Year | | |
| Afghanistan | 1800 | 603.0 | NaN |
| | 1801 | 603.0 | NaN |
| | 1802 | 603.0 | NaN |
| | 1803 | 603.0 | NaN |
| | 1804 | 603.0 | NaN |
+-------------+------+--------+--------+
在>>>
# To load the second file
filename2 = 'file35.csv'
df2 = pd.read_csv(filename2, encoding='cp1252', low_memory=False)
df2.set_index(['Country', 'Year'], inplace=True)
df2.dropna(axis=0, how='all', inplace=True)
df2.head()
出>>>
# To merge the two dataframes
gross_df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
gross_df.dropna(axis=0, how='all', inplace=True)
print (gross_df.shape)
gross_df.to_csv('merged.csv')
重要的提醒:
我注意到在所有成功的文件中,列名称以升序显示,即ind001,ind009,ind012,因为它们是自动排序的.而有错误的文件有一个或多个列有错误排列的列,如ind500,后面是第一个表中的in356,同样适用于提供的第二个样本.
请注意,两个dataframes指示了两个索引(国家和年份)
错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\algorithms.py in safe_sort(values, labels, na_sentinel, assume_unique)
480 try:
--> 481 sorter = values.argsort()
482 ordered = values.take(sorter)
TypeError: '>' not supported between instances of 'int' and 'str'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-11-960b2698de60> in <module>()
----> 1 gross_df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer', sort=False)
2 gross_df.dropna(axis=0, how='all', inplace=True)
3 print (gross_df.shape)
4 gross_df.to_csv('merged.csv')
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
52 right_index=right_index, sort=sort, suffixes=suffixes,
53 copy=copy, indicator=indicator)
---> 54 return op.get_result()
55
56
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in get_result(self)
567 self.left, self.right)
568
--> 569 join_index, left_indexer, right_indexer = self._get_join_info()
570
571 ldata, rdata = self.left._data, self.right._data
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _get_join_info(self)
720 join_index, left_indexer, right_indexer = \
721 left_ax.join(right_ax, how=self.how, return_indexers=True,
--> 722 sort=self.sort)
723 elif self.right_index and self.how == 'left':
724 join_index, left_indexer, right_indexer = \
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\indexes\base.py in join(self, other, how, level, return_indexers, sort)
2995 else:
2996 return self._join_non_unique(other, how=how,
-> 2997 return_indexers=return_indexers)
2998 elif self.is_monotonic and other.is_monotonic:
2999 try:
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\indexes\base.py in _join_non_unique(self, other, how, return_indexers)
3076 left_idx, right_idx = _get_join_indexers([self.values],
3077 [other._values], how=how,
-> 3078 sort=True)
3079
3080 left_idx = _ensure_platform_int(left_idx)
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
980
981 # get left & right join labels and num. of levels at each location
--> 982 llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
983
984 # get flat i8 keys from label lists
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _factorize_keys(lk, rk, sort)
1409 if sort:
1410 uniques = rizer.uniques.to_array()
-> 1411 llab, rlab = _sort_labels(uniques, llab, rlab)
1412
1413 # NA group
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _sort_labels(uniques, left, right)
1435 labels = np.concatenate([left, right])
1436
-> 1437 _, new_labels = algos.safe_sort(uniques, labels, na_sentinel=-1)
1438 new_labels = _ensure_int64(new_labels)
1439 new_left, new_right = new_labels[:l], new_labels[l:]
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\algorithms.py in safe_sort(values, labels, na_sentinel, assume_unique)
483 except TypeError:
484 # try this anyway
--> 485 ordered = sort_mixed(values)
486
487 # labels:
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\algorithms.py in sort_mixed(values)
469 str_pos = np.array([isinstance(x, string_types) for x in values],
470 dtype=bool)
--> 471 nums = np.sort(values[~str_pos])
472 strs = np.sort(values[str_pos])
473 return _ensure_object(np.concatenate([nums, strs]))
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\numpy\core\fromnumeric.py in sort(a, axis, kind, order)
820 else:
821 a = asanyarray(a).copy(order="K")
--> 822 a.sort(axis=axis, kind=kind, order=order)
823 return a
824
TypeError: '>' not supported between instances of 'int' and 'str'
解决方法:
此错误表示合并DF中的索引具有不同的dtypes
演示 – 如何将字符串索引级别转换为int:
In [183]: df
Out[183]:
0 1 2 3
bar 1 -0.205037 0.762509 0.816608 -1.057907
2 1.249104 0.338777 -0.982084 0.329330
baz 1 0.845695 -0.996365 0.548100 -0.113733
2 1.247092 -2.674061 -0.071993 -0.734242
foo 1 -1.233825 -0.195377 -0.240303 1.168055
2 -0.108942 -0.615612 -1.299512 0.908641
qux 1 0.844421 0.251425 -0.506877 1.307800
2 0.038580 0.045072 -0.262974 0.629804
In [184]: df.index
Out[184]:
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['1', '2']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]])
In [185]: df.index.get_level_values(1)
Out[185]: Index(['1', '2', '1', '2', '1', '2', '1', '2'], dtype='object')
In [187]: df.index = df.index.set_levels(df.index.get_level_values(1) \
.map(lambda x: pd.to_numeric(x, errors='coerce')), level=1)
结果:
In [189]: df.index.get_level_values(1)
Out[189]: Int64Index([1, 2, 1, 2, 1, 2, 1, 2], dtype='int64')
更新:试试这个:
In [247]: d1 = pd.read_csv('https://docs.google.com/uc?id=1jUsbr5pw6sUMvewI4fmbpssroG4RZ7LE&export=download', index_col=[0,1])
In [248]: d2 = pd.read_csv('https://docs.google.com/uc?id=1Ufx6pvnSC6zQdTAj05ObmV027fA4-Mr3&export=download', index_col=[0,1])
In [249]: d2 = d2[pd.to_numeric(d2.index.get_level_values(1), errors='coerce').notna()]
In [250]: d2.index = d2.index.set_levels(d2.index.get_level_values(1).map(lambda x: pd.to_numeric(x, errors='coerce')), level=1)
In [251]: d1.reset_index().merge(d2.reset_index(), on=['Country','Year'], how='outer').set_index(['Country','Year'])
Out[251]:
ind500 ind356 ind475 ind476 ind456
Country Year
Afghanistan 1800 603.0 NaN NaN NaN NaN
1801 603.0 NaN NaN NaN NaN
1802 603.0 NaN NaN NaN NaN
1803 603.0 NaN NaN NaN NaN
1804 603.0 NaN NaN NaN NaN
1805 603.0 NaN NaN NaN NaN
1806 603.0 NaN NaN NaN NaN
1807 603.0 NaN NaN NaN NaN
1808 603.0 NaN NaN NaN NaN
1809 603.0 NaN NaN NaN NaN
... ... ... ... ... ...
Bahamas, The 1967 NaN NaN NaN NaN 18381.131314
Gambia, The 1967 NaN NaN NaN NaN 937.355288
Korea, Dem. Rep. 1967 NaN NaN NaN NaN 1428.689253
Lao PDR 1967 NaN NaN NaN NaN 1412.359955
Netherlands Antilles 1967 NaN NaN NaN NaN 14076.731352
Russian Federation 1967 NaN NaN NaN NaN 11794.726437
Serbia and Montenegro 1967 NaN NaN NaN NaN 2987.080489
Syrian Arab Republic 1967 NaN NaN NaN NaN 2015.913906
Yemen, Rep. 1967 NaN NaN NaN NaN 1075.693355
Bahamas, The 1968 NaN NaN NaN NaN 18712.082830
[46607 rows x 5 columns]
内容总结
以上是互联网集市为您收集整理的python – Pandas合并错误TypeError:’>’和’str’实例之间不支持’>’全部内容,希望文章能够帮你解决python – Pandas合并错误TypeError:’>’和’str’实例之间不支持’>’所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。