python – 加载pickled分类器数据:词汇不适合错误
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – 加载pickled分类器数据:词汇不适合错误,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含3914字,纯文字阅读大概需要6分钟。
内容图文
![python – 加载pickled分类器数据:词汇不适合错误](/upload/InfoBanner/zyjiaocheng/805/62cb9457293541ddbf0f979495fa6100.jpg)
我在这里阅读了所有相关问题,但找不到可行的解决方案:
我的分类器创建:
class StemmedTfidfVectorizer(TfidfVectorizer):
def build_analyzer(self):
analyzer = super(TfidfVectorizer, self).build_analyzer()
return lambda doc: english_stemmer.stemWords(analyzer(doc))
tf = StemmedTfidfVectorizer(analyzer='word', ngram_range=(1,2), min_df = 0, max_features=200000, stop_words = 'english')
def create_tfidf(f):
docs = []
targets = []
with open(f, "r") as sentences_file:
reader = csv.reader(sentences_file, delimiter=';')
reader.next()
for row in reader:
docs.append(row[1])
targets.append(row[0])
tfidf_matrix = tf.fit_transform(docs)
print tfidf_matrix.shape
# print tf.get_feature_names()
return tfidf_matrix, targets
X,y = create_tfidf("l0.csv")
clf = LinearSVC().fit(X,y)
_ = joblib.dump(clf, 'linearL0_3gram_100K.pkl', compress=9)
这个位有效,并生成.pkl,然后我尝试在不同的脚本中使用它:
class StemmedTfidfVectorizer(TfidfVectorizer):
def build_analyzer(self):
analyzer = super(TfidfVectorizer, self).build_analyzer()
return lambda doc: english_stemmer.stemWords(analyzer(doc))
tf = StemmedTfidfVectorizer(analyzer='word', ngram_range=(1,2), min_df = 0, max_features=200000, stop_words = 'english')
clf = joblib.load('linearL0_3gram_100K.pkl')
print clf
test = "My super elaborate test string to test predictions"
print test + clf.predict(tf.transform([test]))[0]
我得到ValueError:词汇不适合或空!
编辑:错误跟踪请求
File "classifier.py", line 27, in <module>
print test + clf.predict(tf.transform([test]))[0]
File "/home/ec2-user/.local/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 1313, in transform
X = super(TfidfVectorizer, self).transform(raw_documents)
File "/home/ec2-user/.local/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 850, in transform
self._check_vocabulary()
File "/home/ec2-user/.local/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 271, in _check_vocabulary
check_is_fitted(self, 'vocabulary_', msg=msg),
File "/home/ec2-user/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 627, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.utils.validation.NotFittedError: StemmedTfidfVectorizer - Vocabulary wasn't fitted.
解决方法:
好吧,我通过使用管道来解决问题,以便将我的矢量化器保存在.plk中
这是它的外观(也更简单):
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.externals import joblib
from sklearn.pipeline import Pipeline
import Stemmer
import pickle
english_stemmer = Stemmer.Stemmer('en')
class StemmedTfidfVectorizer(TfidfVectorizer):
def build_analyzer(self):
analyzer = super(TfidfVectorizer, self).build_analyzer()
return lambda doc: english_stemmer.stemWords(analyzer(doc))
def create_tfidf(f):
docs = []
targets = []
with open(f, "r") as sentences_file:
reader = csv.reader(sentences_file, delimiter=';')
reader.next()
for row in reader:
docs.append(row[1])
targets.append(row[0])
return docs, targets
docs,y = create_tfidf("l1.csv")
tf = StemmedTfidfVectorizer(analyzer='word', ngram_range=(1,2), min_df = 0, max_features=200000, stop_words = 'english')
clf = LinearSVC()
vec_clf = Pipeline([('tfvec', tf), ('svm', clf)])
vec_clf.fit(docs,y)
_ = joblib.dump(vec_clf, 'linearL0_3gram_100K.pkl', compress=9)
另一方面:
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.externals import joblib
import Stemmer
import pickle
english_stemmer = Stemmer.Stemmer('en')
class StemmedTfidfVectorizer(TfidfVectorizer):
def build_analyzer(self):
analyzer = super(TfidfVectorizer, self).build_analyzer()
return lambda doc: english_stemmer.stemWords(analyzer(doc))
clf = joblib.load('linearL0_3gram_100K.pkl')
test = ["My super elaborate test string to test predictions"]
print test + clf.predict(test)[0]
重要的事情要提到:
变换器是管道的一部分,就像tf一样,所以不需要重新声明一个新的矢量化器(之前是故障点,因为它需要训练数据中的词汇表),或者.transform()测试字符串.
内容总结
以上是互联网集市为您收集整理的python – 加载pickled分类器数据:词汇不适合错误全部内容,希望文章能够帮你解决python – 加载pickled分类器数据:词汇不适合错误所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。