[PYTHON-TSNE]可视化Word Vector
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了[PYTHON-TSNE]可视化Word Vector,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含2704字,纯文字阅读大概需要4分钟。
内容图文
![[PYTHON-TSNE]可视化Word Vector](/upload/InfoBanner/zyjiaocheng/1076/bd2a3a6b5a334ad4a1fa64c0165c7c41.jpg)
需要的几个文件:
1.wordList.txt,即你要转化成vector的word list:
spring
maven
junit
ant
swing
xml
jre
jdk
jbutton
jpanel
swt
japplet
jdialog
jcheckbox
jlabel
jmenu
slf4j
test
unit
2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。
spring
maven
junit
ant
swing
xml
jre
jdk
jbutton
jpanel
swt
japplet
jdialog
jcheckbox
jlabel
jmenu
slf4j
test
unit
3.model,用gensim生成的word2vec model;
4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:
from gensim.models.word2vec import Word2Vec from pathutil import get_base_path modelpath = ‘XXX/model‘ model = Word2Vec.load(modelpath) sentenceFilePath = ‘wordList.txt‘ vectorFilePath = ‘word2vec.txt‘ sentence = [] writeStr = ‘‘ with open(sentenceFilePath, ‘r‘) as f: for line in f: sentWordList = line.strip().split(‘‘) for word in sentWordList: if word notin model: print‘error!‘ vec = model[word] for vecTmp in vec: writeStr += (str(vecTmp) + ‘‘) writeStr += ‘\n‘ f = open(vectorFilePath, "w") f.write(writeStr.strip())
5.运行visualization.py,用于生成图片:
import numpy as np from gensim.models.word2vec import Word2Vec import matplotlib.pyplot as plt from pathutil import get_base_path modelpath = ‘XXX/model‘ model = Word2Vec.load(modelpath) sentenceFilePath = ‘wordlist.txt‘ labelFilePath = ‘wordlist.txt‘ visualizeVecs = [] with open(sentenceFilePath, ‘r‘) as f: for line in f: word = line.strip() vec = model[word.lower()] visualizeVecs.append(vec) visualizeWords = [] with open(labelFilePath, ‘r‘) as f: for line in f: word = line.strip() visualizeWords.append(word.lower()) visualizeVecs = np.array(visualizeVecs).astype(np.float64) # Y = tsne(visualizeVecs, 2, 200, 20.0); # # Plot.scatter(Y[:,0], Y[:,1], 20,labels); # # ChineseFont1 = FontProperties(‘SimHei‘) # for i in xrange(len(visualizeWords)): # # if i<len(visualizeWords)/2: # # color=‘green‘ # # else: # # color=‘red‘ # color = ‘red‘ # plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1)) # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0]))) # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1]))) # plt.show() # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True)) # temp = temp / vis_norm temp = (visualizeVecs - np.mean(visualizeVecs, axis=0)) covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp) U, S, V = np.linalg.svd(covariance) coord = temp.dot(U[:, 0:2]) for i in xrange(len(visualizeWords)): print i print coord[i, 0] print coord[i, 1] color = ‘red‘ plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1), fontsize=22) # fontproperties = ChineseFont1 plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0]))) plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1]))) plt.show()
运行结果:
原文:http://www.cnblogs.com/XBWer/p/6961960.html
内容总结
以上是互联网集市为您收集整理的[PYTHON-TSNE]可视化Word Vector全部内容,希望文章能够帮你解决[PYTHON-TSNE]可视化Word Vector所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。