首页 / PYTHON / python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)

python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含5439字，纯文字阅读大概需要8分钟。

内容图文

python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)

我试图将字典的具体内容保存到文件中,但是当我尝试编写它时,出现以下错误：

Traceback (most recent call last):
  File "P4.py", line 83, in <module>
    outfile.write(u"{}\t{}\n".format(keyword, str(tagSugerido)).encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 40: ordinal not in range(128)

这是代码：

from collections import Counter

with open("corpus.txt") as inf:
    wordtagcount = Counter(line.decode("latin_1").rstrip() for line in inf)

with open("lexic.txt", "w") as outf:
    outf.write('Palabra\tTag\tApariciones\n'.encode("utf-8"))
    for word,count in wordtagcount.iteritems():
        outf.write(u"{}\t{}\n".format(word, count).encode("utf-8"))
"""
2) TAGGING USING THE MODEL
Dados los ficheros de test, para cada palabra, asignarle el tag mas
probable segun el modelo. Guardar el resultado en ficheros que tengan
este formato para cada linea: Palabra  Prediccion
"""
file=open("lexic.txt", "r") # abrimos el fichero lexic (nuestro modelo) (probar con este)
data=file.readlines()
file.close()
diccionario = {}

"""
In this portion of code we iterate the lines of the .txt document and we create a dictionary with a word as a key and a List as a value
Key: word
Value: List ([tag, #ocurrencesWithTheTag])
"""
for linea in data:
    aux = linea.decode('latin_1').encode('utf-8')
    sintagma = aux.split('\t')  # Here we separate the String in a list: [word, tag, ocurrences], word=sintagma[0], tag=sintagma[1], ocurrences=sintagma[2]
    if (sintagma[0] != "Palabra" and sintagma[1] != "Tag"): #We are not interested in the first line of the file, this is the filter
        if (diccionario.has_key(sintagma[0])): #Here we check if the word was included before in the dictionary
            aux_list = diccionario.get(sintagma[0]) #We know the name already exists in the dic, so we create a List for every value
            aux_list.append([sintagma[1], sintagma[2]]) #We add to the list the tag and th ocurrences for this concrete word
            diccionario.update({sintagma[0]:aux_list}) #Update the value with the new list (new list = previous list + new appended element to the list)
        else: #If in the dic do not exist the key, que add the values to the empty list (no need to append)
            aux_list_else = ([sintagma[1],sintagma[2]])
            diccionario.update({sintagma[0]:aux_list_else})

"""
Here we create a new dictionary based on the dictionary created before, in this new dictionary (diccionario2) we want to keep the next
information:
Key: word
Value: List ([suggestedTag, #ocurrencesOfTheWordInTheDocument, probability])

For retrieve the information from diccionario, we have to keep in mind:

In case we have more than 1 Tag associated to a word (keyword ), we access to the first tag with keyword[0], and for ocurrencesWithTheTag with keyword[1],
from the second case and forward, we access to the information by this way:

diccionario.get(keyword)[2][0] -> with this we access to the second tag
diccionario.get(keyword)[2][1] -> with this we access to the second ocurrencesWithTheTag
diccionario.get(keyword)[3][0] -> with this we access to the third tag
...
..
.
etc.
"""
diccionario2 = dict.fromkeys(diccionario.keys())#We create a dictionary with the keys from diccionario and we set all the values to None
with open("estimation.txt", "w") as outfile:
    for keyword in diccionario:
        tagSugerido = unicode(diccionario.get(keyword[0]).decode('utf-8')) #tagSugerido is the tag with more ocurrences for a concrete keyword
        maximo = float(diccionario.get(keyword)[1]) #maximo is a variable for the maximum number of ocurrences in a keyword
        if ((len(diccionario.get(keyword))) > 2): #in case we have > 2 tags for a concrete word
            suma = float(diccionario.get(keyword)[1])
            for i in range (2, len(diccionario.get(keyword))):
                suma += float(diccionario.get(keyword)[i][1])
                if (diccionario.get(keyword)[i][1] > maximo):
                    tagSugerido = unicode(diccionario.get(keyword)[i][0]).decode('utf-8'))
                    maximo = float(diccionario.get(keyword)[i][1])
            probabilidad = float(maximo/suma);
            diccionario2.update({keyword:([tagSugerido, suma, probabilidad])})

        else:
            diccionario2.update({keyword:([diccionario.get(keyword)[0],diccionario.get(keyword)[1], 1])})

        outfile.write(u"{}\t{}\n".format(keyword, tagSugerido).encode("utf-8"))

所需的输出将如下所示：

keyword(String)  tagSugerido(String):
Hello    NC
Friend   N
Run      V
...etc

冲突线是：

outfile.write(u"{}\t{}\n".format(keyword, str(tagSugerido)).encode("utf-8"))

谢谢.

解决方法:

由于您没有提供简单明了的代码来说明您的问题,因此,我将向您提供一般性建议,以说明错误应该是什么：

如果遇到解码错误,那就是tagSugerido被读为ASCII而不是Unicode.要解决此问题,您应该执行以下操作：

tagSugerido = unicode(diccionario.get(keyword[0]).decode('utf-8'))

将其存储为unicode.

然后,您可能会在write()阶段遇到编码错误,并且应通过以下方式解决您的写入问题：

outfile.write(u"{}\t{}\n".format(keyword, str(tagSugerido)).encode("utf-8"))

应该：

outfile.write(u"{}\t{}\n".format(keyword, tagSugerido.encode("utf-8")))

我随便回答了一个非常类似的问题moments ago.使用unicode字符串时,切换到python3,它将使您的生活更轻松！

如果您仍不能切换到python3,则可以使用python-future import语句使python2的行为几乎像python3：

from __future__ import absolute_import, division, print_function, unicode_literals

N.B .：而不是：

file=open("lexic.txt", "r") # abrimos el fichero lexic (nuestro modelo) (probar con este)
data=file.readlines()
file.close()

在读取行失败时将无法正确关闭文件描述符,您最好这样做：

with open("lexic.txt", "r") as f:
    data=f.readlines()

这将确保即使出现故障也始终关闭文件.

N.B.2：避免使用文件,因为这是您要隐藏的python类型,但请使用f或lexic_file…

内容总结

以上是互联网集市为您收集整理的python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)全部内容，希望文章能够帮你解决python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/679932.html

来源：【匿名】

【上一篇】python-如何使用matplotlib用Unicode文本注释热图？【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)】教程文章相关的互联网学习教程文章

python中unicode的坑

项目中遇到这么一个python编码的问题，通过网络得到了一个带‘\u’的字符串，了解到这是unicode码，对应的是若干个中文，所以尝试用网上“python中unicode转中文”的方法做encode和decode，结果都不行。后来查到做decode的时候可以带一个‘unicode-escape’的选项，加上之后就可以了。通过这次问题的解决，大概了解到，python里面做任何编码的转换都离不开解码和编码两个过程，解码可以理解为解密，编码可以理解为加密，‘\u‘这种...

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multibyte sequence【代码】

python读取文件时提示"UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x80 in position 205: illegal multibyte sequence"解决办法1.FILE_OBJECT= open(‘order.log‘,‘r‘, encoding=‘UTF-8‘)解决办法2.FILE_OBJECT= open(‘order.log‘,‘rb‘) ' codec can't decode byte 0x80 in position 205: illegal multibyte sequence' ref='nofollow'>python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode by...

PyQt QString 与 Python str&unicode【代码】【图】

昨日，将许久以前做的模拟网页登录脚本用PyQt封装了一下，结果出大问题了，登录无数次都提示登录失败！！而不用PyQt实现的GUI登录直接脚本登录无数次都提示登录成功！！心中甚是伤痛，于是探究起来，解决这一问题。问题描述及证据如下：上图是脚本MD5加密过程及结果上图是PyQt GUI中获取密码框内容后加密的结果，其实现代码如下：# -*- coding: gbk -*-‘‘‘ Version : Python27 Author : Spring God Date : 2013-6-28...

Python报错：UnicodeDecodeError: ‘gbk‘ codec can‘t ...【代码】

python读取文件时提示：UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xaa in position 82: illegal multibyte sequence 解决办法：例如，错误代码是：f=open(file_new,‘r‘)改为如下：f=open(file_new,‘r‘, encoding=‘UTF-8‘)这样执行python就不会报这个错误啦~~~~原文：http://blog.51cto.com/11959825/2286979

python判断unicode是否是汉字，数字，英文，或者其他字符

下面这个小工具包含了判断unicode是否是汉字，数字，英文，或者其他字符。全角符号转半角符号。 unicode字符串归一化等工作。 #!/usr/bin/env python # -*- coding:GBK -*- """汉字处理的工具: 判断unicode是否是汉字，数字，英文，或者其他字符。全角符号转半角符号。""" def is_chinese(uchar): """判断一个unicode是否是汉字""" if uchar >= u‘u4e00‘ and uchar<=u‘u9fa5‘: return...

python写文件时遇到UnicodeEncodeError: 'gbk' codec can't encode character...时处理办法

网页上爬取内容后要写入文件保存，在写入时出错了。写文件时遇到“UnicodeEncodeError: ‘gbk‘ codec can‘t encode character ‘\uf0fc‘ in position 147: illegal multibyte sequence”错误网上查找处理办法，经过实测=后以下方法可行：在打开文件时定义文件编码为UTF-8，出现错误时忽略，即：f = open(‘tt.csv‘, ‘w‘, encoding="utf-8", errors="ignore")f.writelines("爬取的内容")f.close()或：with open(hid+".html",‘...

解决Python 2下的json.loads()导致的unicode编码问题,json数据转换前面带u,去掉字典类型前面的u

https://blog.csdn.net/qq_24342335/article/details/84561341 def unicode_convert(input): if isinstance(input, dict): return {unicode_convert(key): unicode_convert(value) for key, value in input.iteritems()} elif isinstance(input, list): return [unicode_convert(element) for element in input] elif isinstance(input, unicode): return input.encode(‘utf-8‘) else: ...

python 对Unicode解码【代码】

打印：print(‘我喜欢你‘.encode(‘utf8‘))得到Unicode编码：b‘\\u6211\\u559c\\u6b22\\u4f60 将上面的编码赋值给str后解码：#Unicode s1=‘\\u6211\\u559c\\u6b22\\u4f60‘#转为utf-8(明文)print(s1.encode(‘utf8‘).decode(‘unicode_escape‘)) #转为utf-8编码print(s1.encode(‘utf8‘).decode(‘unicode_escape‘).encode(‘utf8‘))输出：我喜欢你 b‘\xe6\x88\x91\xe5\x96\x9c\xe6\xac\xa2\xe4\xbd\xa0‘ 原文：https:/...

Python-正确使用Unicode

正确处理文本，特别是正确处理Unicode。是个老生常谈的问题，有时甚至会难倒经验丰富的开发者。并不是因为这个问题很难，而是因为对软件中的文本，开发者没有正确理解一些关键概念及其表示方法。在StackOverflow上搜索关于UnicodeDecodeError相关的问题，可以看到很多人都有这样的误解。这些错误的概念可以追溯到Unicode出现之前。那时许多现今的开发者还没入职，也包括我自己。如果这些错误的概念没有散布开来，其实不是个问题。...

python读取csv,txt,excel类似文件出现UnicodeDecodeError错误

错误如下：UnicodeDecodeError: utf-8 codec cant decode byte 0xff in position 0简单粗暴，用Windows的记事本打开源文件，编码格式改为utf-8，再用pandas，xlrd等库文件读文件就解决了。原因是因为python读文件的时候默认使用utf-8编码，而存下来的文件鱼龙混杂，很大一部分是ANSI编码。

将python中的unicode字符串解析为字典【代码】

我确信这既非常简单,也是SO的其他问题的组合,但我找不到正确的答案. 我有一个unicode字符串：u“word1 word2 word3 …”它将始终采用相同的格式.我想将它解析成一个总是有相同键的字典： “key1：word1 key2：word2 key3：word3 ……” 我该怎么做呢？解决方法:试试这个：keys = ['key1', 'key2', 'key3'] words = u'word1 word2 word3' vals = words.split()d = dict(zip(keys, vals))然后,如果要检索字符串中的键/值对,就像示例中...

Python：在Ubuntu上覆盖os.path.supports_unicode_filenames【代码】

我在Ubuntu服务器上运行python网络应用程序,而我在OS X上本地开发. 我为希伯来语使用了很多unicode字符串,包括处理图像的文件名,因此它们将使用希伯来语字符保存在文件系统中. 我的Ubuntu服务器已针对UTF-8进行了完全配置-我在文件系统上(此应用程序之外)在其他文件中使用希伯来语名称,希伯来语命名目录等. 但是,当尝试在Ubuntu(而不是OS X)上使用希伯来语文件名保存图像时,我的应用程序返回错误. 错误是：UnicodeEncodeError: 'as...

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205:【代码】

python读取文件时提示"UnicodeDecodeError: gbk codec cant decode byte 0x80 in position 205: illegal multibyte sequence" 解决办法1:FILE_OBJECT= open(order.log,r, encoding=UTF-8) 解决办法2：FILE_OBJECT= open(order.log,rb)解决Python中出现的ValueError: not enough values to unpack (expected 2, got 1)的问题姓名地区身高体重电话况咏蜜北京 171 48 13651054608 王心颜上...

python – TypeError：强制转换为Unicode：需要字符串或缓冲区,找到用户【代码】

我必须为用户抓取last.fm(大学练习).我是python的新手,并得到以下错误：Traceback (most recent call last):File "crawler.py", line 23, in <module>for f in user_.get_friends(limit='200'):File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pylast.py", line 2717, in get_friendsfor node in _collect_nodes(limit, self, "user.getFriends", False):File "/opt/local/Librar...

python中的smtplib.server.sendmail函数引发了UnicodeEncodeError：’ascii’编解码器无法编码字符【代码】

我正在尝试编辑文本文件,然后使用python脚本将其作为电子邮件正文发送,但我得到unicode编码错误.经过一些研究后,我发现解决方案使用的方法.encode(‘utf-8’),但这并不适合我,因为sendmail()方法只发送字符串这是我使用的python代码片段：irtem = open('irtemplate.txt') data = irtem.read().replace('(name)', eng_name).replace('(customer)', cu_name).replace('(sr)', SR_num).replace('(problem)', prob_description).rep...

PYTHON - 技术教程分类

Python3 教程 Python3 简介 Python3 环境搭建 Python3 基础语法 Python3 基本数据类型 Python3 解释器 Python3 注释 Python3 运算符 Python3 数字(Number) Python3 字符串 Python3 列表 Python3 元组 Python3 字典 Python3 集合 Python3 编程第一步 Python3 条件控制 Python3 循环语句 Python3 迭代器与生成器 Python3 函数 Python3 数据结构 Python3 模块 Python3 输入和输出 Python3 File Python3 OS Python3 错误和异常 Python3 面向对象 Python3 命名空间/作用域 Python3 标准库概览 Python3 实例 Python3 CGI编程 Python3 MySQL(PyMySQL) Python3 网络编程 Python3 SMTP发送邮件 Python3 多线程 Python3 日期和时间 Python3 内置函数 Python3 MongoDB Python3 urllib python 全部

PYTHON - 最热教程

python如何统计字符串中字母个数？使用Python进行微信公众号开发（三）回...Python+PyQT5的子线程更新UI界面的实例 python时间戳怎么获得？如何获得当前时...vscode调试python时提示无法将“conda”...python接口自动化全局变量access_token...python收取邮件(腾讯企业邮箱)python如何绘制降水图详解python并发获取snmp信息及性能测试...怎么卸载Python3.6？

首页 / PYTHON / python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)

python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)

内容导读

内容图文

内容总结

内容备注

内容手机端

【python-UnicodeDecodeError：“ ascii”编解码器无法解码位置40的字节0xc3：序数不在范围内(128)】教程文章相关的互联网学习教程文章

python中unicode的坑

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multibyte sequence【代码】

PyQt QString 与 Python str&unicode【代码】【图】

Python报错：UnicodeDecodeError: ‘gbk‘ codec can‘t ...【代码】

python判断unicode是否是汉字，数字，英文，或者其他字符

python写文件时遇到UnicodeEncodeError: 'gbk' codec can't encode character...时处理办法

解决Python 2下的json.loads()导致的unicode编码问题,json数据转换前面带u,去掉字典类型前面的u

python 对Unicode解码【代码】

Python-正确使用Unicode

python读取csv,txt,excel类似文件出现UnicodeDecodeError错误

将python中的unicode字符串解析为字典【代码】

Python：在Ubuntu上覆盖os.path.supports_unicode_filenames【代码】

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205:【代码】

python – TypeError：强制转换为Unicode：需要字符串或缓冲区,找到用户【代码】

python中的smtplib.server.sendmail函数引发了UnicodeEncodeError：’ascii’编解码器无法编码字符【代码】

UNICODE - 相关标签

PYTHON - 相关标签

DECODE - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程