Apache OpenOffice (AOO) Bugzilla – Issue 42660
[Calc] Need feature for easy manual override of incorrect word breaking
Last modified: 2013-08-07 15:03:05 UTC
Algorithms for finding word-breaks in Thai text are not 100% accurate. The dictionary-based algorithm used by OOo's ICU based line-breaker gives poor results when text contains words not in the dictionary, which can easily happen, for example, with new words or with words that are transliterations of English words. Although this can to some extent be alleviated with better algorithms or better dictionaries, no algorithm is likely to be 100% accurate in the foreseeable future. It is therefore important for there to be a easy way for users to manually override the word-breaks that are found automatically. Two characters in the Unicode are designed for this - "zero-width space" (ZWSP : U+200B) and "word joiner" (U+2060). 8<-- From Unicode 4.0 - Chapter 15 -->8 Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as Thai, Khmer, or Japanese. When text is justified, ZWSP has no effect on letter spacing—for example, in English or Japanese usage. Word Joiner. U+2060 WORD JOINER behaves like U+00A0 NO-BREAK SPACE in that it indicates the absence of word boundaries; however, the word joiner has no width. The function of the character is to indicate that line breaks are not allowed between the adjoining characters, except next to hard line breaks. 8<----------------------------->8 So the users should be able to put a ZWSP to add a breakable position and a WJ to prevent break at a position. I think ICU should already handle this two characters. However, users need some way to input the two Unicode characters into the document. For example:- Ctrl-space = Non-breaking space (normal OOo shortcut key) Shift-space = Zero-width space Ctrl-shift-space = Word joiner And this will allow the users to easily adjusting where the word-breaker break lines, whatever lanugage the text is.
Microsoft Office Word 2003 has exactly this feature. The zero-width space is called "No-Width Optional Break". The word joiner is called "No-Width Non Break". It can be reach from Insert->Symbol->Special Characters. There're no default shortcut keys associate with them but you can define one. Now in Word 2003 I can insert the 'no-width optional break' inside an English word at the begining of a following line and the previous line will break there, much like soft-hyphen. And I can insert the 'no-width non break' at the end of a line to stop the line from breaking there, much like nonbreaking space. The feature seem not to work reliably with Thai, however. Look like a MS Office bug result from special handling of Thai text.
confirmed.
a bug in Issue Tracker? time stamps are in reverse order! --- Additional comments from arthit Fri Apr 1 15:36:08 -0800 2005 --- --- Additional comments from arthit Fri Apr 1 15:38:07 -0800 2005 --- set to FIXED, and will set back to NEW (instead of STARTED as now).
reopen
set target to 2.0.1
Created attachment 29699 [details] Spec draft
->FT: How about an entry with the non breaking hyphen?
Created attachment 29747 [details] Spec update!
Implemented in: sw/inc/cmdid.h sw/inc/swtypes.hxx sw/sdi/_textsh.sdi sw/sdi/swriter.sdi sw/sdi/swslots.src sw/source/ui/shells/textsh.cxx sw/source/ui/shells/textsh1.cxx sw/uiconfig/sglobal/menubar/menubar.xml sw/uiconfig/sweb/menubar/menubar.xml sw/uiconfig/swriter/menubar/menubar.xml officecfg/registry/data/org/openoffice/Office/UI/WriterCommands.xcu officecfg/registry/data/org/openoffice/Office/UI/GenericCommands.xcu
The spec doesn't make clear which apps this feature is supposed to be implemented for. It is supposed to work not just for Writer but for the other OOo applications in particular Impress. The feature was designed so that it can work uniformly across Writer/Impress/Calc/Draw. From the files you mention, it looks like it's implemented just in Writer, so I'm reopening.
->FT: It looks as if you have to change the spec.
FT: Specification is now checked into CVS and available through _see URL in URL-field of this issue_. Please disregard the attached early draft from now on. Thnks.
FT->James: you are right, it wasn't obvious enough. I updated the spec therefore. FT->DR: Please have a look at the spec and implement the feature in question within Calc, thanks.
FT: Spec updated, please refer only to updated spec (dated 29.09.050
fixed in SRC680/thaiissues
back to QA re-open issue and reassign to oc@openoffice.org
reassign to oc@openoffice.org
reset resolution to FIXED
Hi Stefan, please take over re-open issue and reassign to sba@openoffice.org
reassign to sba@openoffice.org
SBA: Reopened to reassign.
SBA: Reassigned to OC.
SBA: Resolution set back to "Fixed".
verified in internal build cws_thaiissues
closed because fix available in OOo2.0m142
created new help file text/shared/01/formatting_mark.xhp added links from text/swriter/main0104.xhp, text/scalc/main0104.xhp, text/sdraw/main0104.xhp, text/simpress/main0104.xhp