解决日文中文导出CSV文件乱码问题

乱码问题很头疼,日文的乱码问题更是头疼。按照常理来讲日本人较真工匠精神那么按理来说搜索工具类对应的日文应用问题应该能很好的搜索出来问题的解决方案。但是结果让人惊讶,日文应用乱码的问题搜索结果出来最多的是英文和中文,最后才是日文。打开日文的文章一看,大跌眼镜,简简单单的把最基础的测试用例一COPY算是完事了,完全对不起他们的民族精神。再回过头来说问题,日文导出CSV有两个问题需要解决:

对于上述两个问题都归咎于字符集的问题。解决好了字符集的问题,以上的问题迎刃而解。另外一个问题则是引入UTF-8时候BOM的问题。好在这里顺带补一下BOM的知识。
BOM(英语:byte-order mark)字节顺序标记:是位于码点U+FEFF的统一码字符的名称。当以UTF-16或UTF-32来将UCS/统一码字符所组成的字符串编码时,这个字符被用来标示其字节序。它常被用来当做标示文件是以UTF-8、UTF-16或UTF-32编码的记号。BOM也是Unicode标准的一部分,有它特定的适用范围。通常BOM是用来标示Unicode纯文本字节流的,用来提供一种方便的方法让文本处理程序识别读入的.txt文件是哪个Unicode编码(UTF-8,UTF-16BE,UTF-16LE)。Windows相对对BOM处理比较好,是因为Windows把Unicode识别代码集成进了API里,主要是CreateFile()。打开文本文件时它会自动识别并剔除BOM。Windows用这个有历史原因,因为它最初脱胎于多代码页的环境。而引入Unicode时Windows的设计者又希望能在用户不注意的情况下同时兼容Unicode和非Unicode(Multiple byte)文本文件,就只能借助这种小trick了。相比之下,Linux这样的系统在多locale的环境中浸染的时间比较短,再加上社区本身也有足够的动力轻装前进(吐槽:微软对兼容性的要求确实是到了非常偏执的地步,任何一点破坏兼容性的做法都不允许,以至于很多时候是自己绑住自己的双手),所以干脆一步到位进入UTF-8。


了解完了BOM,咱们回归正题,看看日文、中文乱码怎么解决:
在这里,由于考虑到可能是jar的支持问题,我尝试了两个开源的组件:分别是opencsvsupercsv

[xml]
<dependency>
<groupId>au.com.bytecode</groupId>
<artifactId>opencsv</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>net.sf.supercsv</groupId>
<artifactId>super-csv</artifactId>
<version>2.4.0</version>
</dependency>
[/xml]

具体的实现咱们看下代码:

[java]
package com.yneit.test;

import au.com.bytecode.opencsv.CSVWriter;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.math.NumberUtils;
import org.supercsv.cellprocessor.FmtBool;
import org.supercsv.cellprocessor.FmtDate;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.LMinMax;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.constraint.UniqueHashCode;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvListWriter;
import org.supercsv.io.ICsvListWriter;
import org.supercsv.prefs.CsvPreference;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.Calendar;
import java.util.GregorianCalendar;
import java.util.List;

/**
* CSV 日文乱码解决
* Created by 还在路上 on 2016/12/6.
*/
public class Test {

//日文字使用符集
public static String UTF_16LE = "UTF-16LE";
public static String UTF_8 = "UTF-8";

public static void main(String[] args) throws IOException {
String path = "G:\\Project_Java\\MyProject\\src\\com\\yneit\\work\\test.csv";
String encoder = UTF_8;

OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(path), encoder);
out.write(0xFEFF);
CSVWriter writer = new CSVWriter(out, CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER, CSVWriter.NO_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);

String[] entries = {"1", "fir", "リージョン", "6", "人民", "ond,政府\"ird"};
String[] datas = new String[entries.length];
for (int i = 0; i < entries.length; i++) {
String item = verityCell(entries[i]);
datas[i] = item;
}

writer.writeNext(datas);
writer.close();

System.out.println("over");

try {
writeWithCsvListWriter();
} catch (Exception e) {
e.printStackTrace();
}
}

/**
* Sets up the processors used for the examples. There are 10 CSV columns, so 10 processors are defined. All values
* are converted to Strings before writing (there’s no need to convert them), and null values will be written as
* empty columns (no need to convert them to "").
*
* @return the cell processors
*/
private static CellProcessor[] getProcessors() {

final CellProcessor[] processors = new CellProcessor[]{
new UniqueHashCode(), // customerNo (must be unique)
new NotNull(), // firstName
new NotNull(), // lastName
new FmtDate("dd/MM/yyyy"), // birthDate
new NotNull(), // mailingAddress
new Optional(new FmtBool("Y", "N")), // married
new Optional(), // numberOfKids
new NotNull(), // favouriteQuote
new NotNull(), // email
new LMinMax(0L, LMinMax.MAX_LONG) // loyaltyPoints
};

return processors;
}

/**
* An example of reading using CsvListWriter.
*/
private static void writeWithCsvListWriter() throws Exception {

// create the customer Lists (CsvListWriter also accepts arrays!)
final List<Object> john = Arrays.asList(new Object[]{"1", "リージョン", "リー",
new GregorianCalendar(1945, Calendar.JUNE, 13).getTime(),
"1600 リー Parkway\nz中国现实 View, CA 94043\nUnited States", null, null,
"\"May the Force リー with you.\" – Star Wars", "jdunbar@gmail.com", 0L});

final List<Object> bob = Arrays.asList(new Object[]{"2", "Bob", "Down",
new GregorianCalendar(1919, Calendar.FEBRUARY, 25).getTime(),
"1601 Willow Rd.\nリー Park, CA 94025\nUnited States", true, 0,
"\"Frankly, my dear, I don’t give a damn.\" – Gone With The Wind", "天朝@hotmail.com", 123456L});

ICsvListWriter listWriter = null;
try {
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream("G:\\Project_Java\\MyProject\\src\\com\\yneit\\work\\writeWithCsvListWriter.csv"), Charset.forName(UTF_8));
out.write(‘\uFEFF’);
listWriter = new CsvListWriter(out, CsvPreference.STANDARD_PREFERENCE);

final CellProcessor[] processors = getProcessors();
final String[] header = new String[]{"customerNo", "firstName", "lastName", "birthDate",
"mailingAddress", "married", "numberOfKids", "favouriteQuote", "email", "loyaltyPoints"};

// write the header
listWriter.writeHeader(header);

// write the customer lists
listWriter.write(john, processors);
listWriter.write(bob, processors);

} finally {
if (listWriter != null) {
listWriter.close();
}
}
}

public static String verityCell(String content) {
if (content == null) {
return "";
}

if (NumberUtils.isNumber(content) && content.length() > 0) {
content = "\t" + content;
}

if (content.indexOf(CSVWriter.DEFAULT_QUOTE_CHARACTER) > -1 || content.indexOf(CSVWriter.DEFAULT_SEPARATOR) > -1) {
content = StringUtils.replace(content, String.valueOf(CSVWriter.DEFAULT_QUOTE_CHARACTER), "" + CSVWriter.DEFAULT_QUOTE_CHARACTER + CSVWriter.DEFAULT_QUOTE_CHARACTER);
content = CSVWriter.DEFAULT_QUOTE_CHARACTER + content + CSVWriter.DEFAULT_QUOTE_CHARACTER;
} else {
content = CSVWriter.DEFAULT_QUOTE_CHARACTER + content + CSVWriter.DEFAULT_QUOTE_CHARACTER;
}
return content;
}
}

[/java]

参考链接:
CSV 乱码 处理
http://super-csv.github.io/super-csv/index.html
http://stackoverflow.com/questions/32072017/write-utf-8-bom-with-supercsv
https://www.zhihu.com/question/20167122/answer/14199022

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据