UTF-8问题__length_getBytes_bytes_String_UnsupportedEncodingException_

当前位置: 技术问答>java相关

UTF-8问题

来源: 互联网发布时间：2015-04-28

本文导语: 问题简述：汉字字符串在gb2312和utf-8编码之间的转换，我是用了类似于gb2312和iso-8859-1编码转化的方法，结果对中文字符无效参考以下例子程序： public class test { public static void main(String[] args) { toUTF8("abc...

问题简述：汉字字符串在gb2312和utf-8编码之间的转换，我是用了类似于gb2312和iso-8859-1编码转化的方法，结果对中文字符无效

参考以下例子程序：

public class test {
  public static void main(String[] args) {
    toUTF8("abc");
    toUTF8("汉字");
  }

  static void toUTF8(String s){
    System.out.println("Source string: "+s);
    try{
s = new String(s.getBytes(), "UTF8");
byte[] bytes = s.getBytes("UTF8");
System.out.println("length: "+bytes.length);
    }catch(java.io.UnsupportedEncodingException uee){
uee.printStackTrace();
    }
  }
}

输出：
Source string: abc

length: 3

Source string: 汉字

length: 0

我发现中文string从gb2312转化到utf-8后，成了一个空string

中文win2000+jb6,系统默认编码GBK

try{
s = new String(s.getBytes(), "UTF8");
byte[] bytes = s.getBytes("UTF8");
System.out.println("length: "+bytes.length);
    }catch(java.io.UnsupportedEncodingException uee){
uee.printStackTrace();
    }

改为：
try{
s = new String(s.getBytes(), "GBK");
byte[] bytes = s.getBytes("UTF8");
System.out.println("length: "+bytes.length);
    }catch(java.io.UnsupportedEncodingException uee){
uee.printStackTrace();
    }
看看。

static void toUTF8(String s)
    {
      System.out.println("Source string: "+s);
      try
      {
        byte[] bytes = s.getBytes("ISO-8859-1");
        String strReturn = new String(bytes,"UTF-8");
        System.out.println(strReturn);
        System.out.println( "length: "+strReturn.length() );
      }
      catch(java.io.UnsupportedEncodingException uee)
      {
        uee.printStackTrace();
      }
  }

应该直接
byte[] bytes = s.getBytes("UTF8");
你多了一个NEW所以才这样，字符转换时不需要的。
长度这样是4。

public byte[] getBytes(String enc)
throws UnsupportedEncodingException
Convert this String into bytes according to the specified character encoding,
storing the result into a new byte array.
这个函数是将字符串按指定的编码方案编码，返回其编码
所以，想知道字符串的UTF8编码的话，可以使用getBytes("UTF8")
byte[] bytes = "中文".getBytes("UTF8");
System.out.println("length: "+bytes.length);
>>length: 6

您可能感兴趣的文章:

文件编码及UTF-8、BOM、0XFEFF相关问题

救急！中文问题！！utf-8编码转成GBK，因为位数不同而产生字符丢掉问题！！

unix下utf-8如何能克服UCS-2的问题

汉字转utf-8的一个小问题

天啊，又是中文问题：utf-8编码转成GBK，因为位数不同而产生字符丢掉问题！！

jdom处理xml问题，为什么总是“UTF-8”?

UTF-8的问题（本周5，18：00前结账）。

UTF-8转utf-8问题

php iconv函数解决utf-8与gb2312编码转换问题

android POST数据遇到的UTF-8编码（乱码）问题解决办法

关于一个UTF-8编码的简单问题，求助

php中隐形字符65279（utf-8的BOM头）问题

linux下GB到UTF-8和UNICODE码转换的问题！

本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术，将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外，均为转载、整理或搜集自网络。欢迎任何形式的转载，转载请注明出处。