首页 / PYTHON / python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？

python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含2322字，纯文字阅读大概需要4分钟。

内容图文

python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？

我正在使用数据的多类分类创建模型,它具有6个功能.我使用LabelEncoder使用下面的代码预处理数据.

#Encodes the data for each column.
def pre_process_data(self):
    self.encode_column('feedback_rating')
    self.encode_column('location')
    self.encode_column('condition_id')
    self.encode_column('auction_length')
    self.encode_column('model')
    self.encode_column('gb') 

#Gets the column using the column name, transforms the column data and resets
#the column
def encode_column(self, name):
    le = preprocessing.LabelEncoder()
    current_column = np.array(self.X_df[name]).tolist()
    self.X_df[name] = le.fit_transform(current_column)

当我想预测一个新实例时,我需要转换新实例的数据,以使这些特征与模型中的特征匹配相同的编码.有没有一种简单的方法来实现这一目标？

此外,如果我想保留模型并检索它,那么是否有一种简单的方法来保存编码格式,以便使用它来转换检索到的模型上的新实例？

解决方法:

When I want to predict a new instance I need to transform the data of the new instance so that the features match the same encoding as those in the model. Is there a simple way of achieving this?

如果不完全确定您的分类“管道”如何运作,但您可以在一些新数据上使用您的拟合LabelEncoder方法 – 如果标签是训练集中存在的标签,则会转换新数据.

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

# training data
train_x = [0,1,2,6,'true','false']
le.fit_transform(train_x)
# array([0, 1, 1, 2, 4, 3])

# transform some new data
new_x = [0,0,0,2,2,2,'false']
le.transform(new_x)
# array([0, 0, 0, 1, 1, 1, 3])

# transform data with a new feature
bad_x = [0,2,6,'new_word']
le.transform(bad_x)
# ValueError: y contains new labels: ['0' 'new_word']

Also if I want to persist the model and retrieve it, then is there a simple way of saving the encoding format, in order to use it to transform new instances on the retrieved model?

您可以保存模型的模型/部件,如下所示：

import cPickle as pickle
from sklearn.externals import joblib
from sklearn import preprocessing

le = preprocessing.LabelEncoder()
train_x = [0,1,2,6,'true','false']
le.fit_transform(train_x)

# Save your encoding
joblib.dump(le, '/path/to/save/model')
# OR
pickle.dump(le, open( '/path/to/model', "wb" ) )

# Load those encodings
le = joblib.load('/path/to/save/model') 
# OR
le = pickle.load( open( '/path/to/model', "rb" ) )

# Then use as normal
new_x = [0,0,0,2,2,2,'false']
le.transform(new_x)
# array([0, 0, 0, 1, 1, 1, 3])

内容总结

以上是互联网集市为您收集整理的python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？全部内容，希望文章能够帮你解决python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/823564.html

来源：【匿名】

【上一篇】python – focus和focus_set方法有什么区别？【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？】教程文章相关的互联网学习教程文章

[Python数据挖掘]第4章、数据预处理【代码】【图】

数据预处理主要包括数据清洗、数据集成、数据变换和数据规约，处理过程如图所示。一、数据清洗 1.缺失值处理：删除、插补、不处理 ## 拉格朗日插值代码（使用缺失值前后各5个未缺失的数据建模）import pandas as pd #导入数据分析库Pandasfrom scipy.interpolate import lagrange #导入拉格朗日插值函数 inputfile = ‘../data/catering_sale.xls‘#销量数据路径 outputfile = ‘../tmp/sales.xls‘#输出数据路径...

吴裕雄 python 机器学习——数据预处理字典学习模型【代码】【图】

from sklearn.decomposition import DictionaryLearning#数据预处理字典学习DictionaryLearning模型def test_DictionaryLearning():X=[[1,2,3,4,5],[6,7,8,9,10],[10,9,8,7,6,],[5,4,3,2,1]]print("before transform:",X)dct=DictionaryLearning(n_components=3)dct.fit(X)print("components is :",dct.components_)print("after transform:",dct.transform(X))# 调用 test_DictionaryLearning test_DictionaryLearning() from skl...

吴裕雄 python 机器学习——数据预处理标准化StandardScaler模型【代码】【图】

from sklearn.preprocessing import StandardScaler#数据预处理标准化StandardScaler模型def test_StandardScaler():X=[[1,5,1,2,10],[2,6,3,2,7],[3,7,5,6,4,],[4,8,7,8,1]]print("before transform:",X)scaler=StandardScaler()scaler.fit(X)print("scale_ is :",scaler.scale_)print("mean_ is :",scaler.mean_)print("var_ is :",scaler.var_)print("after transform:",scaler.transform(X))# 调用 test_StandardScaler test_S...

python 对入参文本进行预处理成以一个空格为间隔的一维数组

#!/usr/bin/python import re def pre_process_msg ( msgIn ): if msgIn=="": return "msgIn_Input_Error,should‘nt Null, it is Strings" else: #1 trim msg = msgIn msg = msg.strip() #2 process msg internal special char replace with “ ” dst_replace_pattern1 = re.compile(‘\n‘) msg = dst_replace_pattern1.sub(" ",msg) dst_repl...

吴裕雄--天生自然 pythonTensorFlow图形数据处理：图像预处理完整样例【代码】【图】

import numpy as np import tensorflow as tf import matplotlib.pyplot as plt#随机调整图片的色彩，定义两种顺序。def distort_color(image, color_ordering=0):if color_ordering == 0:image = tf.image.random_brightness(image, max_delta=32./255.)image = tf.image.random_saturation(image, lower=0.5, upper=1.5)image = tf.image.random_hue(image, max_delta=0.2)image = tf.image.random_contrast(image, lower=0.5, u...

吴裕雄 python 机器学习——数据预处理正则化Normalizer模型【代码】【图】

from sklearn.preprocessing import Normalizer#数据预处理正则化Normalizer模型def test_Normalizer():X=[[1,2,3,4,5],[5,4,3,2,1],[1,3,5,2,4,],[2,4,1,3,5]]print("before transform:",X)normalizer=Normalizer(norm=‘l2‘)print("after transform:",normalizer.transform(X))# 调用 test_Normalizer test_Normalizer() 原文：https://www.cnblogs.com/tszr/p/10801982.html

Python机器学习（七十四）Keras 预处理数据【代码】

首先需要调整数据集的形状，让其包含图像的位深信息。打印原始数据集的形状：>>> print (X_train.shape) (60000, 28, 28)可以看到并没有包含图像的位深信息。MNIST是灰度图像，位深为1，我们将数据集从形状(n，宽度，高度)转换为(n，位深，宽度，高度)。if K.image_data_format() == ‘channels_first‘:X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)input_shape =...

吴裕雄 python 神经网络——TensorFlow 图像预处理完整样例【代码】【图】

import numpy as np import tensorflow as tf import matplotlib.pyplot as pltdef distort_color(image, color_ordering=0):if color_ordering == 0:image = tf.image.random_brightness(image, max_delta=32./255.)image = tf.image.random_saturation(image, lower=0.5, upper=1.5)image = tf.image.random_hue(image, max_delta=0.2)image = tf.image.random_contrast(image, lower=0.5, upper=1.5)else:image = tf.image.rand...

吴裕雄--天生自然 pythonTensorFlow图形数据处理：图像预处理完整样例【代码】【图】

import numpy as np import tensorflow as tf import matplotlib.pyplot as plt#随机调整图片的色彩，定义两种顺序。 def distort_color(image, color_ordering=0):if color_ordering == 0:image = tf.image.random_brightness(image, max_delta=32./255.)image = tf.image.random_saturation(image, lower=0.5, upper=1.5)image = tf.image.random_hue(image, max_delta=0.2)image = tf.image.random_contrast(image, lower=0.5, ...

python中预处理以及热图的简单介绍【图】

本篇文章给大家带来的内容是关于python中预处理以及热图的简单介绍，有一定的参考价值，有需要的朋友可以参考一下，希望对你有所帮助。在数据分析当中的东西还是很多的，我在这里只是启发式的介绍一下，了解到这方面的东西之后，使用的时候可以更快的找到解决办法，希望能对大家有所帮助。这次，依然是使用的sklearn中的iris数据集，对其进行通过热图来展示。预处理sklearn.preprocessing是机器学习库中预处理的模块，可以对数据进...

Python中数据预处理（代码）

本篇文章给大家带来的内容是关于Python中数据预处理（代码），有一定的参考价值，有需要的朋友可以参考一下，希望对你有所帮助。1、导入标准库import numpy as np import matplotlib.pyplot as plt import pandas as pd2、导入数据集dataset = pd.read_csv(data (1).csv) # read_csv：读取csv文件 #创建一个包含所有自变量的矩阵，及因变量的向量 #iloc表示选取数据集的某行某列；逗号之前的表示行，之后的表示列；冒号表示选取全...

【机器学习】数据预处理之将类别数据转换为数值

在进行python数据分析的时候，首先要进行数据预处理。有时候不得不处理一些非数值类别的数据，嗯，今天要说的就是面对这些数据该如何处理。目前了解到的大概有三种方法： 1，通过LabelEncoder来进行快速的转换； 2，通过mapping方式，将类别映射为数值。不过这种方法适用范围有限； 3，通过get_dummies方法来转换。1 import pandas as pd2 from io import StringIO3 4 csv_data = A,B,C,D5 1,2,3,46 5,6,,87 0,11,12,8 9 df = p...

opencv-python（PIL）图像处理之训练模型前的几种图预处理【代码】

# 滤波import cv2 image=cv2.imread("")image_new=cv2.medianBlur(image,3)from PIL import Imagefrom PIL import ImageEnhance image = Image.open(.jpg)#亮度增强enb_b = ImageEnhance . Brightness (image)brightness = 1.3image_bri= enb_b. enhance (brightness )image_bri. show ()# 色度增强enh_col = ImageEnhance.Color (image)color=1.5image_col = enh_col . enhance (color)image_col.show()#对比度增强enh_con = Imag...

python数据分析中使用pandas进行预处理的转换数据【代码】【图】

转换数据（1）哑变量处理类别型数据（2）使用等宽法、等频法和聚类分析方法离散化连续型数据 1.哑变量处理类别型数据 import pandas as pd import numpy as np detail=pd.read_csv('../数据分析/detail.csv',encoding='gbk') data=detail.loc[0:5,'dishes_name'] print('哑变量处理之前：\n',data) print('哑变量处理之后：\n',pd.get_dummies(data))2.离散化连续型数据 ① 等宽法离散化 price=pd.cut(detail['amounts'],5)...

Python 文本数据预处理实践【代码】【图】

https://mp.weixin.qq.com/s/BwWmYTXyk8iN1miqPzHVFg 在进行数据分析与可视化之前，得先处理好数据，而很多时候需要处理的都是文本数据，本文总结了一些文本预处理的方法。将文本中出现的字母转化为小写input_str = """ There are some people who think love is sex And marriage And six oclock-kisses And children, And perhaps it is, Miss Lester. But do you know what I think? I think love is a touch and yet not a t...

首页 / PYTHON / python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？

python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？

内容导读

内容图文

内容总结

内容备注

内容手机端

【python – 如何预处理新实例以进行分类,以便特征编码与Scikit-learn的模型相同？】教程文章相关的互联网学习教程文章

[Python数据挖掘]第4章、数据预处理【代码】【图】

吴裕雄 python 机器学习——数据预处理字典学习模型【代码】【图】

吴裕雄 python 机器学习——数据预处理标准化StandardScaler模型【代码】【图】

python 对入参文本进行预处理成以一个空格为间隔的一维数组

吴裕雄--天生自然 pythonTensorFlow图形数据处理：图像预处理完整样例【代码】【图】

吴裕雄 python 机器学习——数据预处理正则化Normalizer模型【代码】【图】

Python机器学习（七十四）Keras 预处理数据【代码】

吴裕雄 python 神经网络——TensorFlow 图像预处理完整样例【代码】【图】

吴裕雄--天生自然 pythonTensorFlow图形数据处理：图像预处理完整样例【代码】【图】

python中预处理以及热图的简单介绍【图】

Python中数据预处理（代码）

【机器学习】数据预处理之将类别数据转换为数值

opencv-python（PIL）图像处理之训练模型前的几种图预处理【代码】

python数据分析中使用pandas进行预处理的转换数据【代码】【图】

Python 文本数据预处理实践【代码】【图】

PYTHON - 相关标签

模型 - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程