java - UTF-8 - I don't understand this byte sequence -
i have data provider send me data supposed coded utf-8. data presents sequence of bytes:
28 49 4e 54 e2 80 99 4c 29 20 (int’l) => "(int’l)" for reason when java program fetch data , store in database, above sequence becomes:
28 49 4e 54 19 4c 29 20 (int.l) => "(int\u0019l)" the java program built on top of hibernate. first fetches data provider, stores in entity , entity persisted in database (postgresql).
why loosing bytes (e2 80 99 becomes 19) ?
how can avoid ?
here core method used transfer data fetched provider entity:
import java.sql.clob; //... public static string convertstreamtostring(clob clob) throws sqlexception { if (clob == null) { return ""; } bufferedreader br = null; stringbuilder result = new stringbuilder(); try { br = new bufferedreader(new inputstreamreader(clob.getasciistream(), charset.forname("utf-8"))); string lig; int n = 0; while ((lig = br.readline()) != null) { if (n > 0) { result.append("\n"); } result.append(lig); n++; } } catch (ioexception ioe) { // exception handling code ... } catch (sqlexception sqlex) { // exception handling code ... } { ioutil.close(br); } return result.tostring(); } // ... myentity entity = ... oracle.sql.nclob clob = ... entity.setproperty(convertstreamtostring(clob)); @entity class myentity { @column(name="prop", length=100000) private string prop; public void setproperty(string value) { this.prop=value; } }
you using getasciistream() read contents of clob. name says, method usable ascii; breaks non-ascii characters.
use getcharacterstream method instead.
bufferedreader br = null; stringbuilder result = new stringbuilder(); try { br = new bufferedreader(clob.getcharacterstream()); ....
Comments
Post a Comment