Apache OpenOffice (AOO) Bugzilla – Issue 47752
Allow conversion to unicode beyond base plane
Last modified: 2017-05-20 11:13:22 UTC
Currently some encoding conversions to unicode from e.g. BIG5 map unicodes which would lie outside the base plane into the base plane's private use area. When the rest of OOo can properly handle surrogate pairs this workaround is no longer needed.
The encoding in question is Big5HKSCS, see $surrogates in sal/textenc/generate/big5hkscs2001.pl 1.3.
Maybe this should not be a compile time constant. E.g. when a Big5 document was imported and then exported as a PUA encoded. When one wants to work with real unicode encoding one needs a converter from "PUA unicode" version.
The conversion from Unicode to Big5HKSCS handles both PUA and non-BMP, regardless of $surrogates. If you want two different conversions from Big5HKSCS to Unicode at runtime, one using PUA, the other using non-BMP, then I think it would be better to have two different RTL_TEXTENCODINGs for them (as the sal/textcvt.h interface does not easily allow to make this distinction).
Yes, having two different target encodings is a better idea than the compile time option. What would it look like though? RTL_TEXTENCODING_UCS2 and RTL_TEXTENCODING_UTF16 where RTL_TEXTENCODING_UCS2 is a synonym to RTL_TEXTENCODING_UNICODE? In this way other encodings than BIG5 could harvest the benefits of runtime surrogate/non-surrogate encoding too.
Clarified offline with hdu that the last three comments (April 20, 2005 to May 30, 2005) went in a wrong direction and should be ignored. However, when we eventually do the $surrogates switch for Big5HKSCS (so that some Big5HKSCS then map to Unicode non-BMP instead of PUA), the following problem probably needs to be addressed: According to hdu, some fonts for Big5HKSCS have their glyphs ordered according to "Unicode PUA," not "Unicode non-BMP." To work correctly with those fonts then, some function is needed to map "Unicode non-BMP" to "Big5HKSCS-specific Unicode PUA."
Reset assigne to the default "issues@openoffice.apache.org".