PLUM
首页
分类
归档
关于我
中文尾部字符出现方块乱码?
转自:https://blog.csdn.net/liupan_664021/article/details/88998154 作者:panyliu - 先看一下我写的测试代码,注释是实时的值。 ```java public class EncoderError { public static void main(String[] args) throws UnsupportedEncodingException { String gbk = "我是谁"; byte[] gbkGetGBKBytes = gbk.getBytes();//[-50, -46, -54, -57, -53, -83] byte[] gbkGetUTF8Bytes = gbk.getBytes("UTF-8");//[-26, -120, -111, -26, -104, -81, -24, -80, -127] String utf8 = new String(gbkGetUTF8Bytes);//鎴戞槸璋? byte[] utf8GetGBKBytes = utf8.getBytes();//[-26, -120, -111, -26, -104, -81, -24, -80, 63] byte[] utf8GetUTF8Bytes = utf8.getBytes("UTF-8");//[-23, -114, -76, -26, -120, -98, -26, -89, -72, -25, -110, -117, -17, -65, -67] String print = new String(utf8GetGBKBytes,"UTF-8"); System.out.println(print);//我是?? System.out.println(new String(gbk.getBytes("UTF-8"), "UTF-8"));//我是谁 System.out.println(new String(new String(gbk.getBytes("UTF-8")).getBytes(), "UTF-8"));//我是?? } } ``` - 直接新建变量,“我是谁”,GBK编码,6个字节。gbk.getBytes()和gbk.getBytes("UTF-8")分别获得的是GBK(6个字节)和UTF-8(9个细节)编码。接下来注意了, ```java String utf8 = new String(gbkGetUTF8Bytes);//鎴戞槸璋? ``` - 这句,很奇葩,直接用UTF-8的编码,new字符串(用文件默认的GBK编码),这很显然出现乱码:“鎴戞槸璋?”。此时gbkGetUTF8Bytes数组中存了9个字节,在构造GBK编码的字符串的时候,会两个两个字节的构造,直到第九个字节,发现少了一个,这时候处理方法是,舍弃最后一个,也就是-127这个字节,然后就上‘?’字符的编码,也就是63。这时候,从UTF-8的角度讲,这个字符串的编码已经变了。从[-26, -120, -111, -26, -104, -81, -24, -80, -127]变成了[-26, -120, -111, -26, -104, -81, -24, -80, 63],然后我们用下面语句重新按照UTF-8编码,打印字符串 ```java byte[] utf8GetGBKBytes = utf8.getBytes(); String print = new String(utf8GetGBKBytes,"UTF-8"); System.out.println(print);//我是?? ``` - 就出现了乱码问题,而如果汉字数目是偶数,就不会出现第奇数个字节被舍弃,然后强行加了个63的字节上去,这样,不管GBK和UTF-8怎么变,都不会出现问题。 - 总结下来就是,问题出在,强行将UTF-8编码的数据用GBK编码,导致数据发生了改变,这时候再变回去UTF-8就变不回去了。
评论
发布
目录
留言
评论