Apache OpenOffice (AOO) Bugzilla – Issue 31230
THAI spell checking
Last modified: 2013-02-24 20:34:28 UTC
Dear All, I've implemented THAI dictionary for THAI spell checking in OpenOffice.org. I will attach THAI dict and patch file to OpenOffice.org for using. Thank you.
Created attachment 16310 [details] THAI dictionary and patch file
James->hin Apart from adding Thai dictionary support, the patch includes a fix to sw/source/core/txtnode/txtedt.cxx. I'm not sure exactly what it's doing. It looks like it tries to fixes an issue with mixed language spell checking; this ought to be a separate issue.
The patch of sw/source/core/txtnode/txtedt.cxx should not be necessary for OOo 2.0 anymore. The patched function SwScanner::NextWord( LanguageType ) has been replaced with SwScanner::NextWord() which already works correctly for all script types and languages.
set target to 2.0
Hi, Now, I'm appended a new Thai word list into th_TH.dic file. I attach a new update file to you. Thank you.
Created attachment 26670 [details] new thai dictionary for spell check
Set target to OOo 2.0.1 (I have to do more changes in the area of spellhecking to make it easier to integrate them). README says: name : th_TH 0.2 version of the thai dictionary date : 2005.05.30 License: LGPL Copyright 2005 by NECTEC, Thailand mh: the license is LGPL. OK to integrate? If so, reassign back to me. If any paperwork is to be done, hin will surely do that.
TL made me aware that this issue contains a patch for sal/textenc/tencinfo.c that collides with the fix of issue 43666. Please see issue 43666 for a description of what TIS620-related input to rtl_getTextEncodingFromUnixCharset is mapped to what output. If the current behaviour (i.e., including the fix for issue 43666) is not acceptable, please give a list of what additional inputs should map to what outputs.
.
OK, to make the spellchecker happy TL and I just added the two mappings (case insensitive) "TIS620-2529" -> RTL_TEXTENCODING_TIS_620 "TIS620-2533" -> RTL_TEXTENCODING_TIS_620 to rtl_getTextEncodingFromUnixCharset (as suggested by the patch).
Fixed in CWS thaidict. Files changed: - sal/textenc/tencinfo.c new revision: 1.26.18.1 - sal/qa/rtl/textenc/rtl_tencinfo.cxx new revision: 1.3.18.1 - dictionaries/prj/build.lst new revision: 1.2.104.1 - dictionaries/th_TH/makefile.mk new file - dictionaries/th_TH/dictionaty.lst new file - dictionaries/th_TH/README_th_TH.txt new file - dictionaries/th_TH/th_TH.aff new file - dictionaries/th_TH/th_TH.dic new file -kb
Please be aware of issue #53168#, that is by chance a text font is set that is not availabale on the system the glyphfallback may in some circumstance choose to replace that by a symbol font. If that happens spellchecking will not function at all since symbol fonts are excluded from spellchecking. Thus be sure to have a proper font set!
Note: There are only OOo installation sets.
TL->MH: Since you said you'll probably have to arrange for external testing I'm giving this one to you. re-open issue and reassign to mh@openoffice.org
reassign to mh@openoffice.org
reset resolution to FIXED
SBA: Verified in CWS thaidict. Spellcheck works in general, but I can not tell about the correctness of the proposals :-) Set to verified.
Hi ALl, I add new words into dictionary file suxh as Thai spellout for country/city name and some based Thai words. Please update to new Thai dictionary file. Thank you.
Hi ALl, I add new words into dictionary file such as Thai spellout for country/city name and some based Thai words. Please update to new Thai dictionary file. Thank you.
Hi ALl, I add new words into dictionary file such as Thai name for country/city and some based Thai words. Please update to new Thai dictionary file. Thank you.
Created attachment 29590 [details] new thai dictionary for spell check, add country/city name and more.
seen THAI spell checker in m130. For updates, please file new issue. We can't update the file every week though... Please concentrate on QA of it and target your new developments to 2.0.1.
pjanik, I'm the one who tested m126 with Thai spellcheck. That dictionary lacks many common Thai words and is actually unusable for Thai people. Any Thai users who use 2.0 (with the old current dictionary) will have bad experience with it and blame OOo 2.0 for poor quality Thai spellcheck. I discussed this issues with hin and he already fixed that dictionary (the above comment). I think that new dictionary should be included in 2.0, not 2.0.1 and replacing it won't cost us much in QA term.
OK, so file *new issue* to me and I'll check it into 2.0.
began commit to prepforooo20final. The new README states that the LICENSE is now GPL, it was until now LGPL. @hin: please clarify license of new dictionary.
I'm very sorry. The THAI dictionary license is LGPL. Now I change license to correct and sent to you again. Thank you.
Created attachment 29695 [details] Edit license for THAI dictionary
committed update README with LGPL license.
@mh: I can not set build option for THAI dictionary in configure step (--with-dict="THAI"). Please add THAI locale to OOo build configuration.
This is fixed in dicconfigure. It should be THTH, BTW ;-)
Created attachment 29799 [details] Configure for THAI dictionary.
@pjanik: I can not use 'THTH' for build THAI dictionary. I see 'DIC_THAI' variable in 'dictionaries/th_TH/makefile.mk', Please check again.
hin: please check the cws I mentioned again.
I wrote earlier in this issue: <quote> OK, to make the spellchecker happy TL and I just added the two mappings (case insensitive) "TIS620-2529" -> RTL_TEXTENCODING_TIS_620 "TIS620-2533" -> RTL_TEXTENCODING_TIS_620 to rtl_getTextEncodingFromUnixCharset (as suggested by the patch). </quote> Now, I found out that what we actually commited then was a slightly larger change: for example, rtl_getTextEncodingFromUnixCharset("TIS620-1") or rtl_getTextEncodingFromUnixCharset("TIS620-nonsense") now also return RTL_TEXTENCODING_TIS_620 instead of RTL_TEXTENCODING_UNKNOWN, which breaks the unit tests at sal/qa/rtl/textenc, see issue 61507. I assume that this was a mistake, but do not want to break anything when fixing issue 61507. Who would be a good QA person to verify that Thai spell checking still works the same way as before once issue 61507 is fixed (and "TIS620-1", "TIS620-nonsense" etc. map to RTL_TEXTENCODING_UNKNOWN again)?
close issue.