Set proper encoding on Detection.EncodingGuessingDefaultEncoding
Last Updated December 13, 2011
Detection.EncodingGuessingDefaultEncoding is set for a default language detection; some content detection is failing for that language.
Some languages have multiple encoding types. In general, one of the encoding names is rarely used, but it sometimes does not include platform dependent characters (Microsoft Windows, Mac OS, UNIX, Mainframe, etc.).
For example, in Japanese Shift-JIS doesn't include such characters (Microsoft Windows, IBM, NEC), Windows-31J includes them, but even Windows-31J doesn't include Mac, Fujitsu etc. vendor's special such characters. In Chinese GB2312 is superseded by GBK and GB18030.
If content detection fails, check if the file contains such platform dependent characters or not. Detection may fail to recognize on these characters.
Extended encoding like mentioned above might include such platform dependent characters. Please try to set a proper encoding name at Detection.EncodingGuessingDefaultEncoding in such case. (Detection.EncodingGuessingDefaultEncoding accepts encoding name that is used in Java (JDK).)
Imported Document ID: TECH220902
Subscribing will provide email updates when this Article is updated. Login is required to Subscribe