Issue 47752 - Allow conversion to unicode beyond base plane
Summary: Allow conversion to unicode beyond base plane
Status: ACCEPTED
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: OOo 2.0
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-04-19 14:30 UTC by hdu@apache.org
Modified: 2017-05-20 11:13 UTC (History)
2 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description hdu@apache.org 2005-04-19 14:30:20 UTC
Currently some encoding conversions to unicode from e.g. BIG5 map unicodes which
would lie outside the base plane into the base plane's private use area. When
the rest of OOo can properly handle surrogate pairs this workaround is no longer
needed.
Comment 1 Stephan Bergmann 2005-04-19 15:35:56 UTC
The encoding in question is Big5HKSCS, see $surrogates in
sal/textenc/generate/big5hkscs2001.pl 1.3.
Comment 2 hdu@apache.org 2005-04-20 17:20:40 UTC
Maybe this should not be a compile time constant. E.g. when a Big5 document was
imported and then exported as a PUA encoded. When one wants to work with real
unicode encoding one needs a converter from "PUA unicode" version.
Comment 3 Stephan Bergmann 2005-04-21 08:00:37 UTC
The conversion from Unicode to Big5HKSCS handles both PUA and non-BMP,
regardless of $surrogates.  If you want two different conversions from Big5HKSCS
to Unicode at runtime, one using PUA, the other using non-BMP, then I think it
would be better to have two different RTL_TEXTENCODINGs for them (as the
sal/textcvt.h interface does not easily allow to make this distinction).
Comment 4 hdu@apache.org 2005-05-30 10:08:08 UTC
Yes, having two different target encodings is a better idea than the compile
time option. What would it look like though? RTL_TEXTENCODING_UCS2 and
RTL_TEXTENCODING_UTF16 where RTL_TEXTENCODING_UCS2 is a synonym to
RTL_TEXTENCODING_UNICODE? In this way other encodings than BIG5 could harvest
the benefits of runtime surrogate/non-surrogate encoding too.
Comment 5 Stephan Bergmann 2005-05-31 08:38:34 UTC
Clarified offline with hdu that the last three comments (April 20, 2005 to May
30, 2005) went in a wrong direction and should be ignored.

However, when we eventually do the $surrogates switch for Big5HKSCS (so that
some Big5HKSCS then map to Unicode non-BMP instead of PUA), the following
problem probably needs to be addressed:  According to hdu, some fonts for
Big5HKSCS have their glyphs ordered according to "Unicode PUA," not "Unicode
non-BMP."  To work correctly with those fonts then, some function is needed to
map "Unicode non-BMP" to "Big5HKSCS-specific Unicode PUA."
Comment 6 Marcus 2017-05-20 11:13:22 UTC
Reset assigne to the default "issues@openoffice.apache.org".