java – 带有UTF-8的byte []字符串在Android上提供的结果与在Windows JVM上的结果不同
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了java – 带有UTF-8的byte []字符串在Android上提供的结果与在Windows JVM上的结果不同,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5707字,纯文字阅读大概需要9分钟。
内容图文
![java – 带有UTF-8的byte []字符串在Android上提供的结果与在Windows JVM上的结果不同](/upload/InfoBanner/zyjiaocheng/970/20562252578340e5b5a5bcbc70ea077a.jpg)
我正在尝试使用以下代码将字节数组转换为Java中的字符串:
byte[] myArray = {25, -50, -86, 81, 47, 44, 97, -5, 69, -4, 87, -114, -47, 62, -113, -64, 58, -32, -121, -102, 53, -89, -122, 12, -2, -23, -127, 111, -100, 53, -87, -23, -44, -28, 4, -21, -42, 75, 87, -112, -38, 118, 54, 92, -116, 4, -118, 110, -87, 7, -13, 3, -72, -63, -69, 123, 92, 94, 56, 61, 120, -52, 98, -17, 5, 41, 101, -3, 121, 81, -90, 12, -35, -21, -24, 112, -94, 123, 62, 8, 27, 54, 107, -77, 64, 8, -102, -99, -1, 119, 127, 43, 12, -31, -1, 51, -15, 83, -4, -68, -30, 91, -104, 84, 18, -122, -120, 66, 116, -17, -101, -24, 105, -112, -116, -64, -108, 112, -35, 61, 66, 100, 5, -24, -26, -44, 81, -84}; // Bytes from Byte.MIN_VALUE to Byte.MAX_VALUE
String result = new String(myArray, StandardCharsets.UTF_8);
问题是,如果我在Windows(JVM 1.8.0_112)中运行代码,而不是在我的Android设备中运行它(在Android 5.1和6.0中测试),我会得到不同的结果.我正在测试一个长度为128的字节数组,在android中我得到一个长度为120的字符串,而在Windows中我得到一个长度为125的字符串.我猜它与某些字节无效有关-8个字符,但依据平台得到不同的结果仍然很奇怪.
如果我将编码更改为US-ASCII,我会在两个平台上获得与预期相同的结果:
String result = new String(myArray, StandardCharsets.US_ASCII);
编辑:抱歉有困惑.我不是每次都随机生成它.我只是说字节没有有意义的UTF-8值.这是我用来测试的字节数组:
System.out.println(Arrays.toString(myArray)): [25, -50, -86, 81, 47, 44, 97, -5, 69, -4, 87, -114, -47, 62, -113, -64, 58, -32, -121, -102, 53, -89, -122, 12, -2, -23, -127, 111, -100, 53, -87, -23, -44, -28, 4, -21, -42, 75, 87, -112, -38, 118, 54, 92, -116, 4, -118, 110, -87, 7, -13, 3, -72, -63, -69, 123, 92, 94, 56, 61, 120, -52, 98, -17, 5, 41, 101, -3, 121, 81, -90, 12, -35, -21, -24, 112, -94, 123, 62, 8, 27, 54, 107, -77, 64, 8, -102, -99, -1, 119, 127, 43, 12, -31, -1, 51, -15, 83, -4, -68, -30, 91, -104, 84, 18, -122, -120, 66, 116, -17, -101, -24, 105, -112, -116, -64, -108, 112, -35, 61, 66, 100, 5, -24, -26, -44, 81, -84]
编辑2:
窗口结果:
System.out.println(String(myArray, StandardCharsets.UTF_8)).length: 125
System.out.println(String(myArray, StandardCharsets.UTF_8)): ?Q/,a?E?W??>??:???5????o?5??????KW??v6\??n?????{\^8=x?b?)e?yQ????p?{6k????w+??3?S???[?T??Bt??i????p?=Bd???Q?
System.out.println(toUnicode(String(myArray, StandardCharsets.UTF_8))): \u0019\u03aa\u0051\u002f\u002c\u0061\ufffd\u0045\ufffd\u0057\ufffd\ufffd>\ufffd\ufffd\u003a\ufffd\ufffd\ufffd\u0035\ufffd\ufffd\u000c\ufffd\ufffd\u006f\ufffd\u0035\ufffd\ufffd\ufffd\ufffd\u0004\ufffd\ufffd\u004b\u0057\ufffd\ufffd\u0076\u0036\u005c\ufffd\u0004\ufffd\u006e\ufffd\u0007\ufffd\u0003\ufffd\ufffd\ufffd\u007b\u005c\u005e\u0038\u003d\u0078\ufffd\u0062\ufffd\u0005\u0029\u0065\ufffd\u0079\u0051\ufffd\u000c\ufffd\ufffd\ufffd\u0070\ufffd\u007b>\u0008\u001b\u0036\u006b\ufffd\u0040\u0008\ufffd\ufffd\ufffd\u0077\u007f\u002b\u000c\ufffd\ufffd\u0033\ufffd\u0053\ufffd\ufffd\ufffd\u005b\ufffd\u0054\u0012\ufffd\ufffd\u0042\u0074\ufffd\ufffd\u0069\ufffd\ufffd\ufffd\ufffd\u0070\ufffd\u003d\u0042\u0064\u0005\ufffd\ufffd\ufffd\u0051\ufffd
android结果:
System.out.println(String(myArray, StandardCharsets.UTF_8)).length: 120
System.out.println(String(myArray, StandardCharsets.UTF_8)): ?Q/,a?E?W??>??:ǚ5????o?5??????KW??v6\??n???{{\^8=x?b?)e?yQ????p?{>6k?@???w+?
System.out.println(toUnicode(String(myArray, StandardCharsets.UTF_8))): \u0019\u03aa\u0051\u002f\u002c\u0061\ufffd\u0045\ufffd\u0057\ufffd\ufffd>\ufffd\ufffd\u003a\u01da\u0035\ufffd\ufffd\u000c\ufffd\ufffd\u006f\ufffd\u0035\ufffd\ufffd\ufffd\ufffd\u0004\ufffd\ufffd\u004b\u0057\ufffd\ufffd\u0076\u0036\u005c\ufffd\u0004\ufffd\u006e\ufffd\u0007\ufffd\u0003\ufffd\u007b\u007b\u005c\u005e\u0038\u003d\u0078\ufffd\u0062\ufffd\u0005\u0029\u0065\ufffd\u0079\u0051\ufffd\u000c\ufffd\ufffd\ufffd\u0070\ufffd\u007b>\u0008\u001b\u0036\u006b\ufffd\u0040\u0008\ufffd\ufffd\ufffd\u0077\u007f\u002b\u000c\ufffd\ufffd\u0033\ufffd\u0053\ufffd\ufffd\u005b\ufffd\u0054\u0012\ufffd\ufffd\u0042\u0074\ufffd\ufffd\u0069\ufffd\ufffd\u0014\u0070\ufffd\u003d\u0042\u0064\u0005\ufffd\ufffd\ufffd\u0051\ufffd
编辑3:添加了正确的UTF-16字符串
编辑4:将代码更改为工作示例
解决方法:
看来,Android在解释UTF-8序列时有点草率.该标准的相关部分在07年的D92中:
Before the Unicode Standard, Version 3.1, the problematic “non-shortest form”
byte sequences in UTF-8 were those where BMP characters could be represented
in more than one way. These sequences are ill-formed, because they are
not allowed by Table 3-7.
您的输入具有“非最短形式”序列,例如-32,-121,-102和-63,-69.虽然Android将每个序列解释为单个字符,但Java正确拒绝这些序列并将格式错误的输入的每个字节转换为单个替换字符,从而导致更长的字符串.
您可以使用解释“Modified UTF-8”的解析器在Java中演示它:
byte[][] samples = {
{ -32, -121, -102 },
{ -63, -69 }
};
for(byte[] array: samples) {
System.out.println("source: "+Arrays.toString(array));
String string = new String(array, StandardCharsets.UTF_8);
System.out.println("strictly interpreted: "+string);
System.out.println("length: "+string.length());
ByteBuffer bb = ByteBuffer.allocate(array.length+2);
bb.putShort((short)array.length).put(array);
ByteArrayInputStream bis = new ByteArrayInputStream(bb.array());
DataInputStream dis = new DataInputStream(bis);
string = dis.readUTF();
System.out.println("sloppily interpreted: "+string);
System.out.println("length: "+string.length());
byte[] actual = string.getBytes(StandardCharsets.UTF_8);
System.out.println("correct sequence: "+Arrays.toString(actual));
System.out.println();
}
这将打印
source: [-32, -121, -102]
strictly interpreted: ???
length: 3
sloppily interpreted: ǚ
length: 1
correct sequence: [-57, -102]
source: [-63, -69]
strictly interpreted: ??
length: 2
sloppily interpreted: {
length: 1
correct sequence: [123]
它还显示了正确的“最短形式”字符序列.
内容总结
以上是互联网集市为您收集整理的java – 带有UTF-8的byte []字符串在Android上提供的结果与在Windows JVM上的结果不同全部内容,希望文章能够帮你解决java – 带有UTF-8的byte []字符串在Android上提供的结果与在Windows JVM上的结果不同所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。