Apache OpenOffice (AOO) Bugzilla – Issue 43666
Add TIS620 encoding missing from sal/textenc/tencinfo.c
Last modified: 2013-08-07 15:00:08 UTC
The encoding TIS620 is missing from the tables in sal/textenc/tencinfo.c. This prevents, for one thing, Thai spelling-check dictionary file encoded in TIS-620 to work in OOo. The attached patch add support for ISO8859-11, TIS620, TIS620.2529 and TIS620.2533. See http://linux.thai.net/~thep/th-xwindow/#Charsets for info on Thai encodings.
Created attachment 23106 [details] Patch to add TIS620 to sal/textenc/tencinfo.c
confimed. with patch.
change owner.
accepted
Can you make it in OOo 2.0? - the patch is provided. - the patch is tested and used in OfficeTLE, a well-known local version of OOo 1.1.x. - without it, users or localization pack can't add Thai dictionary The inability to use Thai dictionary is crucial because it disables an important feature (spell checking) for Thai.
No problem.
sb->samphan: Looking at your patch, I'm not sure how exactly rtl_getTextEncodingFromUnixCharset should behave. Applying your patch directly, it would map (1) "TIS620" -> DONTKNOW (2) "TIS620-2529" -> TIS_620 (3) "TIS620-2533" -> TIS_620 (4) "TIS620-1234" -> TIS_620 (5) "TIS620.2529" -> DONTKNOW (6) "TIS620.2529-foobar" -> TIS_620 I wonder whether (4) and (6) are as expected. Can you specify exactly which input shall be accepted as TIS_620 (<ftp://ftp.x.org/pub/DOCS/registry> does not mention TIS620, so I assume the values in question are nonstandard but in general use in Thailand)?
Sorry. I may not understand the code thouroughly. X do support tis-620. See /usr/X11R6/lib/X11/fonts/encoding/tis620-2.enc or http://cvs.freedesktop.org/xorg/xc/fonts/encodings/iso8859-11.enc?rev=1.1.1.1&view=markup STARTENCODING iso8859-11 ALIAS tis620-0 ALIAS tis620.2529-1 ALIAS tis620.2533-1 ALIAS tis620.2533-0 ---- glic also support tis-620. See /usr/share/i18n/charmaps/TIS-620.gz: % alias TIS620 % alias TIS620-0 % alias TIS620.2529-1 % alias TIS620.2533-0 % alias ISO-IR-166 ---- Can you modify the patch to accept these values?
I adapted the patch so that now exactly TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1 (ignoring letter case) are accepted. Additionally accepting the glibc variants that do not have exactly one hyphen would be more tricky; if they turn out to be needed in practice, we have to reopen this issue. Tests in sal/qa/rtl/textenc/rtl_tencinfo.cxx.
I've no idea about the X.org's registry. But TIS-620 is an official industrial standard in Thailand. Also registered with IANA http://www.iana.org/assignments/character-sets Solaris (and its CDE) do has TIS-620 since version 7. http://docs.sun.com/app/docs/doc/806-1360/6jalch36t?a=view
may be we have to register this TIS-620 (along with other standard encodings that currently not there) to xregistry@x.org
sb->arthit: Some clarification: The patch in this issue was only about rtl_getTextEncodingFromUnixCharset, which "obviously" (the documentation unfortunately is litlle more than a bad joke) is about the final two segments of those long X11 font names with lots of hypens in them (e.g., "...-iso8859-1", "...-tis620.2533-0"). Thus, IANA charset names are irrelevant here (however, OOo does know the MIME character name "TIS-620", see rtl_getTextEncodingFromMimeCharset); and all the X11 font names listed at <http://docs.sun.com/app/docs/doc/806-1360/6jalch36t?a=view> indeed end in "tis620.2533-0", which is now understood by rtl_getTextEncodingFromUnixCharset.
arthit->sb: Sorry. Now I got the point. Thank you :)
verified
close