Apache OpenOffice (AOO) Bugzilla – Issue 54739
Need Thai support in IndexEntrySupplier
Last modified: 2013-08-07 15:01:25 UTC
For Thai, when extracting the initial letter of an index entry, it is not sufficient to take the first letter: leading vowels must be skipped. For example, given an index entry of เจมส์, the initial letter should be จ not เ. This ensures that all entries with the same initial letter will be adjacent in the sort order.
Created attachment 29633 [details] Alphabetic index sorted incorrectly
.
FT: Please take over. Thx a lot.
I extend indexKey string format, you can specify a list of initial chars you want to skip in square bracket, <IndexKey unoid="alphanumeric" default="true" phonetic="false">ก-ฮ[ฯ]</IndexKey>
I have changed IndexKey field in th_TH.xml as, <IndexKey unoid="alphanumeric" default="true" phonetic="false">ก-ฮ[เ-ไ]</IndexKey> which contains 5 leading vowels as skipping characters.
No. This is not the way to handle that. I think James's description of the problem may not be accurate. What you need to correctly make index with Thai words is to use Thai collation order, not just ignore the first initial vowel. The algorithm is a bit more complex than that (swapping of initial vowel and multi-level weight) but all is specified in the UCA and implemented in ICU. You can just call ICU, e.g. Calc sort Thai text correctly.
We do use ICU collator to sort index entry. IndexKey field in locale data is to generate index key from index entry. For example, index entry ==> index key About ==> A Cat ==> C clear == > C Collator does not generate index key, but sort index entry. You have to tell me how to generate index key from index entry. For the attached example, index section is as below, Alphabetical Index จ จอย 1 ส สมชัย 1 เ เจมส์ 1 เสรี 1 After applying the fix, it becomes, Alphabetical Index จ จอย 1 เจมส์ 1 ส สมชัย 1 เสรี 1 Does it make sense?
I don't understand why you need to explicitly list the skipping characters. Can't you just search for the first occurrence of ก-ฮ?
khong <- I've just understand. Yes, your fix should work correctly.
james, if I write a Thai specific implementation, I can search first occurrence of what list in IndexKey field. But I would like to implement a language neutral algorithm, your suggestion would not work for other languages, like English, listed IndexKey A-Z, I could not skip lower case 'a'.
fixed in cws i18n25
Ready for QA. re-open issue and reassign to sba@openoffice.org
reassign to sba@openoffice.org
reset resolution to FIXED
SBA: Verified in CWS i18n25
SBA: OK in 680m5. Closed.