Python请求以utf-8编码的响应,但无法解码
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了Python请求以utf-8编码的响应,但无法解码,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含6355字,纯文字阅读大概需要10分钟。
内容图文
![Python请求以utf-8编码的响应,但无法解码](/upload/InfoBanner/zyjiaocheng/789/7ef52ecf550e4ce0ba97bd3027fdbcad.jpg)
我正在尝试使用python刮我的messenger.com(facebook messenger)聊天,我使用谷歌chromes开发人员工具查看聊天历史记录的POST请求,我已将整个标题和正文复制为请求可以使用的格式.
我得到HTTP代码200暗示请求至少得到了一些东西,但我可以打印res.encoding以获得它返回的编码,其中说的是utf-8.但我无法解码它!
这是功能:
def download_thread(self, limit, offset, message_timestamp):
"""Download the specified number of messages from the
provided thread, with an optional offset
"""
data = request_data(self.thread, offset=offset,
limit=limit, group=self.group,
timestamp=message_timestamp)
res = self.ses.post(url_thread, data=data, headers=headers)
print(res.content)
thread_contents = json.loads(res.content)
print(thread_contents)
return thread_contents
产量
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte
当它试图json.load(或加载)数据
但res.encoding确实返回utf-8.
我尝试使用gzip解压缩但是说它不是gzip压缩的内容.
如果我只是尝试打印(res.content)我得到
Traceback (most recent call last):
File "FBChatScraper.py", line 200, in <module>
main()
File "FBChatScraper.py", line 134, in main
fbms.run()
0f\x82\x048\xbb\xb9=\x87\xebK0.\xff\x90\xdd\xeb\xfa\x16\xc6\xbbz\x8b\x82)\xe8\xaaV\x01^\xda\x8b\xbd\x15d-\xb1\x10@\x17\\\xd43\xa8\x92w\xe8\xc0\xcdU\xc4\xff\xc7\xfa\x90\xb2\xb3\xf5\x84\x11u\x0b\t\x8f\x83r\xf3}\xe5!y$\xe6\xf6c0\xf0\xb4\x98\xcat_\x0c\x08\xb5\xdd\x8ctx\x91\xa9\x95\rB%\xe2\x93\xa52\x85_\xa6\x10\xc2\xc9\xa3\xee4SDb\xa5\x18QJ\x83X\x19)\xaa$\xf4\xb4\xb7\x0b\x84\x15&\x88\x08L\xc9iP\xa2\xb9\xf2\xaf\x96\x96N\xd8\xcf=\x05\xc1\x18\x8d\xa0\xf2Y\x8e\n\xcf\xc8\x0fE4\xd6)\xa1\xd4\xb7D\xd6{i\xc8P\x96R\x11HC\xac\xbcKyT#~}\x93\xf7@K\xc7r/\x82\xb0\xe4\xefX\xf9j\x08\xa6Hp\xfcn\x06\xfdo\x9a\xd0wJ\xb4fJ(\x89+\x1c\xf6\x0eOI\x90\xac\x9eDD\xfd,\xa5\xe9\x89\x1blh\x86Z\x98\x05\xdd9\xc7\xf4\x80\xfcY\x8e\xad\xee\x99!\x15\x13+\x9b\x07\xe8Fdj\xfc\x11\xfc\xfe7\x06h\x02\x00@>]W\x92\xc9\x02\xb1c3\x82\xcd\xa4\xefN9\x90\xe6\x81y\x9c\x84er\xd4\xc3\x06\x1c\x06\x14\xcf\xc7\x07hj\xbfH\xdc\xf5~\xf7z\x18Ce\xaf^\x8c\xab \xdfV\xce\xb8\x11\xf8\x06\x03'
Traceback (most recent call last):
File "FBChatScraper.py", line 200, in <module>
main()
File "FBChatScraper.py", line 134, in main
fbms.run()
File "FBChatScraper.py", line 43, in run
thread_contents = self.download_thread(limit, offset, message_timestamp)
File "FBChatScraper.py", line 74, in download_thread
thread_contents = json.loads(res.content)
File "/Users/silman/anaconda/lib/python3.6/json/__init__.py", line 349, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte
奇怪地在追溯中间打印内容让我觉得有一些看不见的字符将其推倒.
我无法将响应加载到json格式中,因为无论我如何处理响应内容,它都没有正确格式化以供json库解释.
此外,如果我只是打印(res.text)我得到垃圾:
Traceback (most recent call last):
File "FBChatScraper.py", line 200, in <module>
main()
File "FBChatScraper.py", line 134, in main
fbms.run()
}sP???c???f?u0???\? QZed?C??? M$x???H?????eǘ?]???5???^?*??aM?Y??b???/??JW/???>H6z?\??l4????t=i??%?u?x??%?x?
F <???{1i?#%;?r?=Rχm??1B?Z(+?(S-???#??\v?{b??
? f/V?i???_??83? ?_????*??O??
??????Z??i-?TVeaG54?!v?a??|gu-g??.???"J$?L`&?t?#s)?H????s???q???^?0??[)???j???T???U???J?ЁwW???!eg?#j ??r??$y???3?4??4.??M?@Kb?AX?SDb?QJ?X)?,???a? "Sp?h?????sOA0Vé|???????:%?rKdKC???@ M??.?^
? ?g???SWQH?.??B?G?,????@E????????
nras??L?/??ch@>]W???c3???N9??y??er????hj?H??~?zCe?^?? ?Vθ?
Traceback (most recent call last):
File "FBChatScraper.py", line 200, in <module>
main()
File "FBChatScraper.py", line 134, in main
fbms.run()
File "FBChatScraper.py", line 43, in run
thread_contents = self.download_thread(limit, offset, message_timestamp)
File "FBChatScraper.py", line 74, in download_thread
thread_contents = json.loads(res.content)
File "/Users/silman/anaconda/lib/python3.6/json/__init__.py", line 349, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte
编辑:
MWE尽我所能,不确定我的帖子请求中的哪些数据是私有的,所以我留下了一些
使用这些数据
url_thread = "https://www.messenger.com/api/graphqlbatch/"
request_data = {
"batch_name": "MessengerGraphQLThreadFetcher",
"__user": "<user_id>",
"__a": "1",
"__dyn": "<dyn>",
"__req": "9",
'__be' : '-1',
'__pc' : 'PHASED:messengerdotcom_pkg',
"fb_dtsg": "AQFni7TU2nes:AQGSC8FSDqyw",
"ttstamp": "265817254666710077746711957586581715370521181008510710777",
"__rev": "3791607",
"jazoest": "<jazoest>",
"queries": '<queries>'
}
headers = {
"authority": "www.messenger.com",
"method": "POST",
"path": "/api/graphqlbatch/",
"scheme": "https",
"accept": "*/*",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"cache-control": "no-cache",
"content-length": "754",
"content-type" : "application/x-www-form-urlencoded",
"cookie": "<cookies>",
"origin": "https://www.messenger.com",
"pragma": "no-cache",
"referer": "https://www.messenger.com/t/<chatID>",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
}
您可以获取所有< item>通过使用chrome开发人员工具并在网络选项卡上查找请求URL的POST请求:https://www.messenger.com/api/graphqlbatch/.
如果您在Chrome开发工具录制时向上滚动以重新加载旧邮件,则很容易找到.
然后用python汇总一个简单的请求
import requests as rq
import time
ses = rq.Session()
thread = <ID of thread found in URL of messenger.com>
conversation_type = <'thread_fbids' if group chat else 'user_ids'>
data = request_data
data['messages[{}][{}][offset]'.format(conversation_type, thread)] = 0
data['messages[{}][{}][timestamp]'.format(conversation_type, thread)] = int(time.time())
data['messages[{}][{}][limit]'.format(conversation_type, thread)] = 2000
res = ses.post(url_thread, data=data, headers=headers)
print(res.content)
thread_contents = json.loads(res.content)
print(thread_contents)
正如我的开发工具所取回的那样,你可以看到json here的开始
解决方法:
问题是请求标头中的这一行:
"accept-encoding": "gzip, deflate, br",
那个br要求Brotli compression,一个新的压缩标准(见RFC 7932)谷歌正在推动取代网络上的gzip. Chrome正在要求Brotli,因为最新版本的Chrome本身就能理解它.您要求Brotli,因为您从Chrome复制了标题.但请求本身并不了解Brotli.
您可以pip install brotli并注册解压缩程序或只在res.content上手动调用它.但更简单的解决方案是删除br:
"accept-encoding": "gzip, deflate",
…然后你应该得到gzip,你和请求已经知道如何处理.
内容总结
以上是互联网集市为您收集整理的Python请求以utf-8编码的响应,但无法解码全部内容,希望文章能够帮你解决Python请求以utf-8编码的响应,但无法解码所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。