python – AttributeError:’numpy.ndarray’对象没有属性’toarray’
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – AttributeError:’numpy.ndarray’对象没有属性’toarray’,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含3661字,纯文字阅读大概需要6分钟。
内容图文
![python – AttributeError:’numpy.ndarray’对象没有属性’toarray’](/upload/InfoBanner/zyjiaocheng/771/bdff89b333b849f599b7acc2b6e23c9b.jpg)
我正在从文本语料库中提取特征,我正在使用td-fidf矢量化器并从scikit-learn中截断奇异值分解以实现这一点.但是,由于我想要尝试的算法需要密集矩阵并且向量化器返回稀疏矩阵,我需要将这些矩阵转换为密集数组.但是,每当我尝试转换这些数组时,我都会收到错误,告诉我我的numpy数组对象没有属性“toarray”.我究竟做错了什么?
功能:
def feature_extraction(train,train_test,test_set):
vectorizer = TfidfVectorizer(min_df = 3,strip_accents = "unicode",analyzer = "word",token_pattern = r'\w{1,}',ngram_range = (1,2))
print("fitting Vectorizer")
vectorizer.fit(train)
print("transforming text")
train = vectorizer.transform(train)
train_test = vectorizer.transform(train_test)
test_set = vectorizer.transform(test_set)
print("Dimensionality reduction")
svd = TruncatedSVD(n_components = 100)
svd.fit(train)
train = svd.transform(train)
train_test = svd.transform(train_test)
test_set = svd.transform(test_set)
print("convert to dense array")
train = train.toarray()
test_set = test_set.toarray()
train_test = train_test.toarray()
print(train.shape)
return train,train_test,test_set
追溯:
Traceback (most recent call last):
File "C:\Users\Anonymous\workspace\final_submission\src\linearSVM.py", line 24, in <module>
x_train,x_test,test_set = feature_extraction(x_train,x_test,test_set)
File "C:\Users\Anonymous\workspace\final_submission\src\Preprocessing.py", line 57, in feature_extraction
train = train.toarray()
AttributeError: 'numpy.ndarray' object has no attribute 'toarray'
更新:
威利指出,我对矩阵稀疏的假设可能是错误的.所以我尝试将我的数据提供给我的算法,降低维数,它实际上没有任何转换,但是当我排除维数减少时,这给了我大约53k的功能,我得到以下错误:
Traceback (most recent call last):
File "C:\Users\Anonymous\workspace\final_submission\src\linearSVM.py", line 28, in <module>
result = bayesian_ridge(x_train,x_test,y_train,y_test,test_set)
File "C:\Users\Anonymous\workspace\final_submission\src\Algorithms.py", line 84, in bayesian_ridge
algo = algo.fit(x_train,y_train[:,i])
File "C:\Python27\lib\site-packages\sklearn\linear_model\bayes.py", line 136, in fit
dtype=np.float)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 220, in check_arrays
raise TypeError('A sparse matrix was passed, but dense '
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
有人可以解释一下吗?
UPDATE2
根据要求,我将提供所涉及的所有代码.由于它分散在不同的文件上,我只是将其分步发布.为清楚起见,我将保留所有模块导入.
这是我预处理代码的方式:
def regexp(data):
for row in range(len(data)):
data[row] = re.sub(r'[\W_]+'," ",data[row])
return data
def clean_the_text(data):
alist = []
data = nltk.word_tokenize(data)
for j in data:
j = j.lower()
alist.append(j.rstrip('\n'))
alist = " ".join(alist)
return alist
def loop_data(data):
for i in range(len(data)):
data[i] = clean_the_text(data[i])
return data
if __name__ == "__main__":
print("loading train")
train_text = porter_stemmer(loop_data(regexp(list(np.array(p.read_csv(os.path.join(dir,"train.csv")))[:,1]))))
print("loading test_set")
test_set = porter_stemmer(loop_data(regexp(list(np.array(p.read_csv(os.path.join(dir,"test.csv")))[:,1]))))
在将train_set拆分为x_train和x_test进行交叉验证后,我使用上面的feature_extraction函数转换数据.
x_train,x_test,test_set = feature_extraction(x_train,x_test,test_set)
最后,我将它们输入我的算法
def bayesian_ridge(x_train,x_test,y_train,y_test,test_set):
algo = linear_model.BayesianRidge()
algo = algo.fit(x_train,y_train)
pred = algo.predict(x_test)
error = pred - y_test
result.append(algo.predict(test_set))
print("Bayes_error: ",cross_val(error))
return result
解决方法:
TruncatedSVD.transform返回一个数组,而不是稀疏矩阵.事实上,在scikit-learn的当前版本中,只有矢量化器返回稀疏矩阵.
内容总结
以上是互联网集市为您收集整理的python – AttributeError:’numpy.ndarray’对象没有属性’toarray’全部内容,希望文章能够帮你解决python – AttributeError:’numpy.ndarray’对象没有属性’toarray’所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。