python – ElementTree Unicode编码/解码错误
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – ElementTree Unicode编码/解码错误,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含4481字,纯文字阅读大概需要7分钟。
内容图文
![python – ElementTree Unicode编码/解码错误](/upload/InfoBanner/zyjiaocheng/733/3d2c323580cd40efb40c6d58dc0ecd09.jpg)
对于一个项目,我应该增强一些XML并将其存储在一个文件中.我遇到的问题是我不断收到以下错误:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Bart\Dropbox\Studie\2013-2014\BSc-KI\cite_parser\parser.py", line 193, in parse_references
outputXML = ET.tostring(root, encoding='utf8', method='xml')
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1126, in tostring
ElementTree(element).write(file, encoding, method=method)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
ECLI:NL:RVS:2012:BY1564
File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: ordinal not in range(128)
该错误产生于:
outputXML = ET.tostring(root, encoding='utf8', method='xml')
在寻找这个问题的解决方案时,我发现了一些建议,说我应该在函数中添加.decode(‘utf-8’),但这会导致写入函数出现编码错误(首先是解码),所以不会工作…
编码错误:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Bart\Dropbox\Studie\2013-2014\BSc-KI\cite_parser\parser.py", line 197, in parse_references
myfile.write(outputXML)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 13559: ordinal not in range(128)
它由以下代码生成:
outputXML = ET.tostring(root, encoding='utf8', method='xml').decode('utf-8')
来源(或至少相关部分):
# URL encodes the parameters
encoded_parameters = urllib.urlencode({'id':ecli})
# Opens XML file
feed = urllib2.urlopen("http://data.rechtspraak.nl/uitspraken/content?"+encoded_parameters, timeout = 3)
# Parses the XML
ecliFile = ET.parse(feed)
# Fetches root element of current tree
root = ecliFile.getroot()
# Write the XML to a file without any extra indents or newlines
outputXML = ET.tostring(root, encoding='utf8', method='xml')
# Write the XML to the file
with open(file, "w") as myfile:
myfile.write(outputXML)
最后但并非最不重要的是XML示例的URL:http://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:RVS:2012:BY1542
解决方法:
异常是由字节字符串值引起的.
回溯中的文本应该是一个unicode值,但如果它是一个普通的字节字符串,Python将隐式地首先将它(使用ASCII编解码器)解码为Unicode,这样您就可以再次编码它.
这是解码失败.
因为您实际上没有向我们展示您插入到XML树中的内容,所以除了确保在插入文本时始终使用Unicode值时,很难告诉您要修复的内容.
演示:
>>> root.attrib['oops'] = u'Data with non-ASCII codepoints \u2014 (em dash)'.encode('utf8')
>>> ET.tostring(root, encoding='utf8', method='xml')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1126, in tostring
ElementTree(element).write(file, encoding, method=method)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
v = _escape_attrib(v, encoding)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1090, in _escape_attrib
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 31: ordinal not in range(128)
>>> root.attrib['oops'] = u'Data with non-ASCII codepoints \u2014 (em dash)'
>>> ET.tostring(root, encoding='utf8', method='xml')
'<?xml version=\'1.0\' encoding=\'utf8\'?> ...'
设置包含ASCII范围之外的字节的bytestring属性会触发异常;使用unicode值确保可以生成结果.
内容总结
以上是互联网集市为您收集整理的python – ElementTree Unicode编码/解码错误全部内容,希望文章能够帮你解决python – ElementTree Unicode编码/解码错误所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。