首页 / PYTHON / python – ElementTree Unicode编码/解码错误

python – ElementTree Unicode编码/解码错误

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了python – ElementTree Unicode编码/解码错误，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含4481字，纯文字阅读大概需要7分钟。

内容图文

对于一个项目,我应该增强一些XML并将其存储在一个文件中.我遇到的问题是我不断收到以下错误：

Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Bart\Dropbox\Studie\2013-2014\BSc-KI\cite_parser\parser.py", line 193, in parse_references
    outputXML = ET.tostring(root, encoding='utf8', method='xml')
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1126, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
 ECLI:NL:RVS:2012:BY1564
 File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: ordinal not in range(128)

该错误产生于：

outputXML = ET.tostring(root, encoding='utf8', method='xml')

在寻找这个问题的解决方案时,我发现了一些建议,说我应该在函数中添加.decode(‘utf-8’),但这会导致写入函数出现编码错误(首先是解码),所以不会工作…

编码错误：

Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Bart\Dropbox\Studie\2013-2014\BSc-KI\cite_parser\parser.py", line 197, in parse_references
    myfile.write(outputXML)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 13559: ordinal not in range(128)

它由以下代码生成：

outputXML = ET.tostring(root, encoding='utf8', method='xml').decode('utf-8')

来源(或至少相关部分)：

# URL encodes the parameters
encoded_parameters = urllib.urlencode({'id':ecli})

# Opens XML file
feed = urllib2.urlopen("http://data.rechtspraak.nl/uitspraken/content?"+encoded_parameters, timeout = 3)

# Parses the XML
ecliFile = ET.parse(feed)

# Fetches root element of current tree
root = ecliFile.getroot()

# Write the XML to a file without any extra indents or newlines
outputXML = ET.tostring(root, encoding='utf8', method='xml')

# Write the XML to the file
with open(file, "w") as myfile:
    myfile.write(outputXML)

最后但并非最不重要的是XML示例的URL：http://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:RVS:2012:BY1542

解决方法:

异常是由字节字符串值引起的.

回溯中的文本应该是一个unicode值,但如果它是一个普通的字节字符串,Python将隐式地首先将它(使用ASCII编解码器)解码为Unicode,这样您就可以再次编码它.

这是解码失败.

因为您实际上没有向我们展示您插入到XML树中的内容,所以除了确保在插入文本时始终使用Unicode值时,很难告诉您要修复的内容.

演示：

>>> root.attrib['oops'] = u'Data with non-ASCII codepoints \u2014 (em dash)'.encode('utf8')
>>> ET.tostring(root, encoding='utf8', method='xml')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1126, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
    v = _escape_attrib(v, encoding)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1090, in _escape_attrib
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 31: ordinal not in range(128)
>>> root.attrib['oops'] = u'Data with non-ASCII codepoints \u2014 (em dash)'
>>> ET.tostring(root, encoding='utf8', method='xml')
'<?xml version=\'1.0\' encoding=\'utf8\'?> ...'

设置包含ASCII范围之外的字节的bytestring属性会触发异常;使用unicode值确保可以生成结果.

内容总结

以上是互联网集市为您收集整理的python – ElementTree Unicode编码/解码错误全部内容，希望文章能够帮你解决python – ElementTree Unicode编码/解码错误所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/732376.html

来源：【匿名】

【上一篇】退出0与返回0 Python偏好【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【python – ElementTree Unicode编码/解码错误】教程文章相关的互联网学习教程文章

Python的ASCII, GB2312, Unicode , UTF-8 相互转换【代码】

ASCII 是一种字符集,包括大小写的英文字母、数字、控制字符等，它用一个字节表示，范围是 0-127 Unicode分为UTF-8和UTF-16。UTF-8变长度的，最多 6 个字节，小于 127 的字符用一个字节表示，与 ASCII 字符集的结果一样，ASCII 编码下的英语文本不需要修改就可以当作 UTF-8 编码进行处理。 Python 从 2.2 开始支持 Unicode ，函数 decode( char_set )可以实现其它编码到 Unicode 的转换，函数 encode( char_set )实现 Unicode 到其...

python中unicode的坑

项目中遇到这么一个python编码的问题，通过网络得到了一个带‘\u’的字符串，了解到这是unicode码，对应的是若干个中文，所以尝试用网上“python中unicode转中文”的方法做encode和decode，结果都不行。后来查到做decode的时候可以带一个‘unicode-escape’的选项，加上之后就可以了。通过这次问题的解决，大概了解到，python里面做任何编码的转换都离不开解码和编码两个过程，解码可以理解为解密，编码可以理解为加密，‘\u‘这种...

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multibyte sequence【代码】

python读取文件时提示"UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x80 in position 205: illegal multibyte sequence"解决办法1.FILE_OBJECT= open(‘order.log‘,‘r‘, encoding=‘UTF-8‘)解决办法2.FILE_OBJECT= open(‘order.log‘,‘rb‘) ' codec can't decode byte 0x80 in position 205: illegal multibyte sequence' ref='nofollow'>python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode by...

PyQt QString 与 Python str&unicode【代码】【图】

昨日，将许久以前做的模拟网页登录脚本用PyQt封装了一下，结果出大问题了，登录无数次都提示登录失败！！而不用PyQt实现的GUI登录直接脚本登录无数次都提示登录成功！！心中甚是伤痛，于是探究起来，解决这一问题。问题描述及证据如下：上图是脚本MD5加密过程及结果上图是PyQt GUI中获取密码框内容后加密的结果，其实现代码如下：# -*- coding: gbk -*-‘‘‘ Version : Python27 Author : Spring God Date : 2013-6-28...

Python报错：UnicodeDecodeError: ‘gbk‘ codec can‘t ...【代码】

python读取文件时提示：UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xaa in position 82: illegal multibyte sequence 解决办法：例如，错误代码是：f=open(file_new,‘r‘)改为如下：f=open(file_new,‘r‘, encoding=‘UTF-8‘)这样执行python就不会报这个错误啦~~~~原文：http://blog.51cto.com/11959825/2286979

python判断unicode是否是汉字，数字，英文，或者其他字符

下面这个小工具包含了判断unicode是否是汉字，数字，英文，或者其他字符。全角符号转半角符号。 unicode字符串归一化等工作。 #!/usr/bin/env python # -*- coding:GBK -*- """汉字处理的工具: 判断unicode是否是汉字，数字，英文，或者其他字符。全角符号转半角符号。""" def is_chinese(uchar): """判断一个unicode是否是汉字""" if uchar >= u‘u4e00‘ and uchar<=u‘u9fa5‘: return...

python写文件时遇到UnicodeEncodeError: 'gbk' codec can't encode character...时处理办法

网页上爬取内容后要写入文件保存，在写入时出错了。写文件时遇到“UnicodeEncodeError: ‘gbk‘ codec can‘t encode character ‘\uf0fc‘ in position 147: illegal multibyte sequence”错误网上查找处理办法，经过实测=后以下方法可行：在打开文件时定义文件编码为UTF-8，出现错误时忽略，即：f = open(‘tt.csv‘, ‘w‘, encoding="utf-8", errors="ignore")f.writelines("爬取的内容")f.close()或：with open(hid+".html",‘...

解决Python 2下的json.loads()导致的unicode编码问题,json数据转换前面带u,去掉字典类型前面的u

https://blog.csdn.net/qq_24342335/article/details/84561341 def unicode_convert(input): if isinstance(input, dict): return {unicode_convert(key): unicode_convert(value) for key, value in input.iteritems()} elif isinstance(input, list): return [unicode_convert(element) for element in input] elif isinstance(input, unicode): return input.encode(‘utf-8‘) else: ...

python 对Unicode解码【代码】

打印：print(‘我喜欢你‘.encode(‘utf8‘))得到Unicode编码：b‘\\u6211\\u559c\\u6b22\\u4f60 将上面的编码赋值给str后解码：#Unicode s1=‘\\u6211\\u559c\\u6b22\\u4f60‘#转为utf-8(明文)print(s1.encode(‘utf8‘).decode(‘unicode_escape‘)) #转为utf-8编码print(s1.encode(‘utf8‘).decode(‘unicode_escape‘).encode(‘utf8‘))输出：我喜欢你 b‘\xe6\x88\x91\xe5\x96\x9c\xe6\xac\xa2\xe4\xbd\xa0‘ 原文：https:/...

Python-正确使用Unicode

正确处理文本，特别是正确处理Unicode。是个老生常谈的问题，有时甚至会难倒经验丰富的开发者。并不是因为这个问题很难，而是因为对软件中的文本，开发者没有正确理解一些关键概念及其表示方法。在StackOverflow上搜索关于UnicodeDecodeError相关的问题，可以看到很多人都有这样的误解。这些错误的概念可以追溯到Unicode出现之前。那时许多现今的开发者还没入职，也包括我自己。如果这些错误的概念没有散布开来，其实不是个问题。...

python读取csv,txt,excel类似文件出现UnicodeDecodeError错误

错误如下：UnicodeDecodeError: utf-8 codec cant decode byte 0xff in position 0简单粗暴，用Windows的记事本打开源文件，编码格式改为utf-8，再用pandas，xlrd等库文件读文件就解决了。原因是因为python读文件的时候默认使用utf-8编码，而存下来的文件鱼龙混杂，很大一部分是ANSI编码。

将python中的unicode字符串解析为字典【代码】

我确信这既非常简单,也是SO的其他问题的组合,但我找不到正确的答案. 我有一个unicode字符串：u“word1 word2 word3 …”它将始终采用相同的格式.我想将它解析成一个总是有相同键的字典： “key1：word1 key2：word2 key3：word3 ……” 我该怎么做呢？解决方法:试试这个：keys = ['key1', 'key2', 'key3'] words = u'word1 word2 word3' vals = words.split()d = dict(zip(keys, vals))然后,如果要检索字符串中的键/值对,就像示例中...

Python：在Ubuntu上覆盖os.path.supports_unicode_filenames【代码】

我在Ubuntu服务器上运行python网络应用程序,而我在OS X上本地开发. 我为希伯来语使用了很多unicode字符串,包括处理图像的文件名,因此它们将使用希伯来语字符保存在文件系统中. 我的Ubuntu服务器已针对UTF-8进行了完全配置-我在文件系统上(此应用程序之外)在其他文件中使用希伯来语名称,希伯来语命名目录等. 但是,当尝试在Ubuntu(而不是OS X)上使用希伯来语文件名保存图像时,我的应用程序返回错误. 错误是：UnicodeEncodeError: 'as...

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205:【代码】

python读取文件时提示"UnicodeDecodeError: gbk codec cant decode byte 0x80 in position 205: illegal multibyte sequence" 解决办法1:FILE_OBJECT= open(order.log,r, encoding=UTF-8) 解决办法2：FILE_OBJECT= open(order.log,rb)解决Python中出现的ValueError: not enough values to unpack (expected 2, got 1)的问题姓名地区身高体重电话况咏蜜北京 171 48 13651054608 王心颜上...

python – TypeError：强制转换为Unicode：需要字符串或缓冲区,找到用户【代码】

我必须为用户抓取last.fm(大学练习).我是python的新手,并得到以下错误：Traceback (most recent call last):File "crawler.py", line 23, in <module>for f in user_.get_friends(limit='200'):File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pylast.py", line 2717, in get_friendsfor node in _collect_nodes(limit, self, "user.getFriends", False):File "/opt/local/Librar...

PYTHON - 技术教程分类

Python3 教程 Python3 简介 Python3 环境搭建 Python3 基础语法 Python3 基本数据类型 Python3 解释器 Python3 注释 Python3 运算符 Python3 数字(Number) Python3 字符串 Python3 列表 Python3 元组 Python3 字典 Python3 集合 Python3 编程第一步 Python3 条件控制 Python3 循环语句 Python3 迭代器与生成器 Python3 函数 Python3 数据结构 Python3 模块 Python3 输入和输出 Python3 File Python3 OS Python3 错误和异常 Python3 面向对象 Python3 命名空间/作用域 Python3 标准库概览 Python3 实例 Python3 CGI编程 Python3 MySQL(PyMySQL) Python3 网络编程 Python3 SMTP发送邮件 Python3 多线程 Python3 日期和时间 Python3 内置函数 Python3 MongoDB Python3 urllib python 全部

PYTHON - 最热教程

python如何统计字符串中字母个数？使用Python进行微信公众号开发（三）回...Python+PyQT5的子线程更新UI界面的实例 python时间戳怎么获得？如何获得当前时...vscode调试python时提示无法将“conda”...python接口自动化全局变量access_token...python收取邮件(腾讯企业邮箱)python如何绘制降水图详解python并发获取snmp信息及性能测试...怎么卸载Python3.6？

首页 / PYTHON / python – ElementTree Unicode编码/解码错误

python – ElementTree Unicode编码/解码错误

内容导读

内容图文

内容总结

内容备注

内容手机端

【python – ElementTree Unicode编码/解码错误】教程文章相关的互联网学习教程文章

Python的ASCII, GB2312, Unicode , UTF-8 相互转换【代码】

python中unicode的坑

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multibyte sequence【代码】

PyQt QString 与 Python str&unicode【代码】【图】

Python报错：UnicodeDecodeError: ‘gbk‘ codec can‘t ...【代码】

python判断unicode是否是汉字，数字，英文，或者其他字符

python写文件时遇到UnicodeEncodeError: 'gbk' codec can't encode character...时处理办法

解决Python 2下的json.loads()导致的unicode编码问题,json数据转换前面带u,去掉字典类型前面的u

python 对Unicode解码【代码】

Python-正确使用Unicode

python读取csv,txt,excel类似文件出现UnicodeDecodeError错误

将python中的unicode字符串解析为字典【代码】

Python：在Ubuntu上覆盖os.path.supports_unicode_filenames【代码】

python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205:【代码】

python – TypeError：强制转换为Unicode：需要字符串或缓冲区,找到用户【代码】

UNICODE - 相关标签

PYTHON - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程