Apache OpenOffice (AOO) Bugzilla – Issue 105571
Surrogate Pair Character selection handling can lose characters while formatting
Last modified: 2013-08-07 14:44:00 UTC
1.Open "SurrogatePair.odt". 2.Select center charactor. 3.Ctrl + B(Bold) (screenshot attached as "SurrogatePair_2.png") 4.Character representation will abnormal. (screenshot attached as "SurrogatePair_3.png") 5.Save and close the document. 6.Reopen the document. 7.Center charactor is lost. (screenshot attached as "SurrogatePair_4.png")
Created attachment 65108 [details] SurrogatePair.odt
Created attachment 65109 [details] SurrogatePair_2.png
Created attachment 65110 [details] SurrogatePair_3.png
Created attachment 65111 [details] SurrogatePair_4.png
Created attachment 65112 [details] SurrogatePair_after.odt
Indeed, Writer really loses the second character. It is no longer in the content.xml. This might be related to issue 78162, where Writer doesn't treat surrogate pairs as a unit.
data loss issue ?
First investigation with English OOo version and without the correct font reveals that the loss of the character which has been formatted as Bold and is part of a surrogate pair occurs at least since OOo 2.0.1 OD->MH: Yes, this is a data loss issue in my opinion.
set target 3.2 because of data loss
Deeper investigation reveals that the following: - If the selection is made via "cursor-traveling": (a) open the attached document - cursor is at the beginning of the document (b) move cursor via key "Right" in front of the center character. (c) hold key "Shift" and select center character via key "Right" --> center character selected. (d) click "Bold" button in the toolbar or hit keys Ctrl + B --> Everything is fine, even after save-and-load cycle - If the selection is made via double-click with mouse: (a) open the attached document - cursor is at the beginning of the document (b) move mouse pointer over center character (c) perform double-click with mouse --> center character selected. (d) click "Bold" button in the toolbar or hit keys Ctrl + B --> Everything is fine, even after save-and-load cycle - If the selection is made via "mouse-movement" (a) open the attached document - cursor is at the beginning of the document (b) move mouse pointer in area between first character and center character (c) click mouse button and hold it (d) move mouse pointer in area between center character and third character --> center character selected. (e) click "Bold" button in the toolbar or hit keys Ctrl + B --> Described defect occurs. Thus, workaround until this issue is fixed: To select to be formatted surrogate pair character use cursor keys or double-click on mouse
Adjsuted summary to reflect the findings. Put myself on CC.
fixed in cws oooimprovement5 - changed file: /sw/source/core/txtnode/fntcache.cxx, rev. 276898
> fntcache.cxx, rev. 276898 Looking at the diff it seems that the proper break iterator in that case is now used; previously it was only triggered for CTL-scripts now it has also handle CJK-scripts. This is good but not good enough since surrogate pairs can happen regardless of script type (e.g. "Gothic" is considered a Roman-script but it has codepoints beyond the baseplane U+10330..U+1034A). I suggest to get rid of the script-type test altogether and always use the proper break iterator.
HDU, You are right, but due to the fact that this fix is a show stopper fix I decided the following: - Provide a fix for this issue and assure that it effects are as small as possible. Thus, the fix stays as it is. - Submit a new issue for next the release to generalize this fix for all script types.
OD->SBA: Checked in internal installation set of cws oooimprovement5 - please verify.
Verified inCWS oooimprovement5.
OK in OOO320_m3. Closed.