python-UnicodeDecodeError:“ ascii”编解码器无法解码位置40的字节0xc3:序数不在范围内(128)
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python-UnicodeDecodeError:“ ascii”编解码器无法解码位置40的字节0xc3:序数不在范围内(128),小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5439字,纯文字阅读大概需要8分钟。
内容图文
![python-UnicodeDecodeError:“ ascii”编解码器无法解码位置40的字节0xc3:序数不在范围内(128)](/upload/InfoBanner/zyjiaocheng/680/98c193b5c82d4b6abe0de939e5119f53.jpg)
我试图将字典的具体内容保存到文件中,但是当我尝试编写它时,出现以下错误:
Traceback (most recent call last):
File "P4.py", line 83, in <module>
outfile.write(u"{}\t{}\n".format(keyword, str(tagSugerido)).encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 40: ordinal not in range(128)
这是代码:
from collections import Counter
with open("corpus.txt") as inf:
wordtagcount = Counter(line.decode("latin_1").rstrip() for line in inf)
with open("lexic.txt", "w") as outf:
outf.write('Palabra\tTag\tApariciones\n'.encode("utf-8"))
for word,count in wordtagcount.iteritems():
outf.write(u"{}\t{}\n".format(word, count).encode("utf-8"))
"""
2) TAGGING USING THE MODEL
Dados los ficheros de test, para cada palabra, asignarle el tag mas
probable segun el modelo. Guardar el resultado en ficheros que tengan
este formato para cada linea: Palabra Prediccion
"""
file=open("lexic.txt", "r") # abrimos el fichero lexic (nuestro modelo) (probar con este)
data=file.readlines()
file.close()
diccionario = {}
"""
In this portion of code we iterate the lines of the .txt document and we create a dictionary with a word as a key and a List as a value
Key: word
Value: List ([tag, #ocurrencesWithTheTag])
"""
for linea in data:
aux = linea.decode('latin_1').encode('utf-8')
sintagma = aux.split('\t') # Here we separate the String in a list: [word, tag, ocurrences], word=sintagma[0], tag=sintagma[1], ocurrences=sintagma[2]
if (sintagma[0] != "Palabra" and sintagma[1] != "Tag"): #We are not interested in the first line of the file, this is the filter
if (diccionario.has_key(sintagma[0])): #Here we check if the word was included before in the dictionary
aux_list = diccionario.get(sintagma[0]) #We know the name already exists in the dic, so we create a List for every value
aux_list.append([sintagma[1], sintagma[2]]) #We add to the list the tag and th ocurrences for this concrete word
diccionario.update({sintagma[0]:aux_list}) #Update the value with the new list (new list = previous list + new appended element to the list)
else: #If in the dic do not exist the key, que add the values to the empty list (no need to append)
aux_list_else = ([sintagma[1],sintagma[2]])
diccionario.update({sintagma[0]:aux_list_else})
"""
Here we create a new dictionary based on the dictionary created before, in this new dictionary (diccionario2) we want to keep the next
information:
Key: word
Value: List ([suggestedTag, #ocurrencesOfTheWordInTheDocument, probability])
For retrieve the information from diccionario, we have to keep in mind:
In case we have more than 1 Tag associated to a word (keyword ), we access to the first tag with keyword[0], and for ocurrencesWithTheTag with keyword[1],
from the second case and forward, we access to the information by this way:
diccionario.get(keyword)[2][0] -> with this we access to the second tag
diccionario.get(keyword)[2][1] -> with this we access to the second ocurrencesWithTheTag
diccionario.get(keyword)[3][0] -> with this we access to the third tag
...
..
.
etc.
"""
diccionario2 = dict.fromkeys(diccionario.keys())#We create a dictionary with the keys from diccionario and we set all the values to None
with open("estimation.txt", "w") as outfile:
for keyword in diccionario:
tagSugerido = unicode(diccionario.get(keyword[0]).decode('utf-8')) #tagSugerido is the tag with more ocurrences for a concrete keyword
maximo = float(diccionario.get(keyword)[1]) #maximo is a variable for the maximum number of ocurrences in a keyword
if ((len(diccionario.get(keyword))) > 2): #in case we have > 2 tags for a concrete word
suma = float(diccionario.get(keyword)[1])
for i in range (2, len(diccionario.get(keyword))):
suma += float(diccionario.get(keyword)[i][1])
if (diccionario.get(keyword)[i][1] > maximo):
tagSugerido = unicode(diccionario.get(keyword)[i][0]).decode('utf-8'))
maximo = float(diccionario.get(keyword)[i][1])
probabilidad = float(maximo/suma);
diccionario2.update({keyword:([tagSugerido, suma, probabilidad])})
else:
diccionario2.update({keyword:([diccionario.get(keyword)[0],diccionario.get(keyword)[1], 1])})
outfile.write(u"{}\t{}\n".format(keyword, tagSugerido).encode("utf-8"))
所需的输出将如下所示:
keyword(String) tagSugerido(String):
Hello NC
Friend N
Run V
...etc
冲突线是:
outfile.write(u"{}\t{}\n".format(keyword, str(tagSugerido)).encode("utf-8"))
谢谢.
解决方法:
由于您没有提供简单明了的代码来说明您的问题,因此,我将向您提供一般性建议,以说明错误应该是什么:
如果遇到解码错误,那就是tagSugerido被读为ASCII而不是Unicode.要解决此问题,您应该执行以下操作:
tagSugerido = unicode(diccionario.get(keyword[0]).decode('utf-8'))
将其存储为unicode.
然后,您可能会在write()阶段遇到编码错误,并且应通过以下方式解决您的写入问题:
outfile.write(u"{}\t{}\n".format(keyword, str(tagSugerido)).encode("utf-8"))
应该:
outfile.write(u"{}\t{}\n".format(keyword, tagSugerido.encode("utf-8")))
我随便回答了一个非常类似的问题moments ago.使用unicode字符串时,切换到python3,它将使您的生活更轻松!
如果您仍不能切换到python3,则可以使用python-future import语句使python2的行为几乎像python3:
from __future__ import absolute_import, division, print_function, unicode_literals
N.B .:而不是:
file=open("lexic.txt", "r") # abrimos el fichero lexic (nuestro modelo) (probar con este)
data=file.readlines()
file.close()
在读取行失败时将无法正确关闭文件描述符,您最好这样做:
with open("lexic.txt", "r") as f:
data=f.readlines()
这将确保即使出现故障也始终关闭文件.
N.B.2:避免使用文件,因为这是您要隐藏的python类型,但请使用f或lexic_file…
内容总结
以上是互联网集市为您收集整理的python-UnicodeDecodeError:“ ascii”编解码器无法解码位置40的字节0xc3:序数不在范围内(128)全部内容,希望文章能够帮你解决python-UnicodeDecodeError:“ ascii”编解码器无法解码位置40的字节0xc3:序数不在范围内(128)所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。