Apache OpenOffice (AOO) Bugzilla – Issue 42909
Interoperability with MS Word - text language doesn't convert properly
Last modified: 2013-08-07 15:00:08 UTC
Both MS Word and OOo have a concept of text language that is a property of the characters. In OOo, you set the text language in 'Format->Character-> Fonts'. In MS Word, you set the text language in 'Tools->Lanugage->Set Lanugage' The reason is, for some features, the softwares must know the language of the text to work correctly. An example : Thai texts don't have spaces between words but still line-breaking is done at word boundaries. In OOo you must set the text language correctly (CTL font->Language = Thai) for the Thai line-breaking to work. However, even when the text langauge is set correctly in OOo, if you save the OOo document as Word XP document, the langauge information saved by OOo will not be recognized by Word. Test case:- - The attached Writer document contains a line with 16 copy of a 3 characters Thai word 'การ' and a space in the middle. The line is formatted as CTL=Thai. การการการการการการการการ การการการการการการการการ - I saved the Writer document as a MS Word XP .doc file, attached 1) Load the .doc file in MS Word XP/2003. The line can be breaked at every 3 characters but Word breaks the line at the space in the middle. That's because Word doesn't think that the text is Thai. You can check this by checking the current language in 'Tools->Lanugage->Set Lanugage'. It'll be 'English (U.S.)'. 2) The problem is : You can't even change the language to Thai to make the line-breaking behave correctly. Try select (all) the text and open the 'Set Language' dialog box, then choose Thai. Nothing will change. Checking the current language in the 'Set Language' dialog box again, you'll see that it still is 'English (U.S.)'. 3) Try load the .doc back in Writer, you'll see that the language information is still there, 'Format->Character->CTL Font' still is Thai. So the information must be saved, but in a way that MS Word doesn't recognize and use. This bug is very serious because it makes it impossible to convert Thai documents from OOo to MS Word or create a MS Word document in Thai using OOo. I don't know if this happen to other languages too but I guess it should be.
Created attachment 22701 [details] The original Writer document for line-breaking test
Created attachment 22702 [details] The same document convert to MS Word XP format
This looks similar to issue #23784, which sba concluded was a feature request for more text alignment options, but I don't see what it has to do with text alignment . (From a user's perspective, it's a serious bug, not a missing feature.)
Confirmed. Raised Priority to P2 - data loss (language information) - basic functionality is not working properly (export document) Set Target milestone to OOo 2.0, please change this if you find it inappropriate.
comment from james_clark "issue 42909 : this hasn't been analyzed yet; it makes the OOo functionality of exporting to .doc format effectively non-functional for Thai users. It affects other languages as well, but the effects are much more serious for Thai: text is not properly tagged with its language, and the language of text cannot be changed in Word; *** this is critical for Thai because line-breaking does not work in Word if text is not properly tagged as Thai. *** There's no known workaround."
FT: Andreas please check. I consider this a serious issue, too.
Yes, I agree. It's a serious issue. At least for OOo2.0.1 we have to find a solution. BTW: if you use RTF format instead of .doc, the language information is recognized by Word.
If within Word you save the document (e.g. LineBreakTest.doc) as XML, close the document, and then open the XML version of the document, the text is tagged as Thai. This is in Word 2003, Thai edition.
AFAIK FME and you did already some investigation into this issue.
The fix to issue #46087 will fix this issue as well. See that issue for more details. Runs are in fact correctly tagged with the language. The problem is that runs are not marked as being complex script: Word evidently doesn't allow something that's not complex script to be tagged with a complex script language.
We will investigate to find a solution for OOo2.0
Created attachment 24528 [details] bugdoc with correct language settings - but still not working
flr: The problem is *not* the language setting. I have attached a .DOC file - generated with a modified Writer - whose language is correctly set to Thai. However, WW does not brake the lines correctly. I suggest there is a Unicode export problem. The .DOC format has a strange "chp.idctHint" flag...
Solved with patch from james_clark for #i46087#.
Solved with patch from james_clark for #46087#. Fixed in dvoqbfix2. *** This issue has been marked as a duplicate of 46087 ***
flr: The patch from james leads to correct language attributes. However my version of Word still does *not* perform the line break. Can you try it with your Word Version; perhaps my setting for complex scripts are set incorrectly. The patch from james is applied in dvoqbfix2.
Created attachment 24531 [details] Bugdoc exported with patch from james - attribues set corretly; however the line-break is not performed in my version of word
I can confirm that Word 2003 (Thai edition) does not perform correct line-breaking on LineBreakTest_expored_with_patch_from_james.doc. Some possible clues: a) saving this to XML in Word 2003 and reopening solves the problem; if the file is saved again as .doc, then when the .doc file is reopened is still works correctly b) if in Word you change the keyboard layout to Thai, then type a space (with the cursor still before the first character), Word performs correct line-breaking; if you then do backspace (or Ctrl-Z), the correct line-breaking remains c) if you do b), but with US keyboard layout, Word doesn't do correct line-breaking If after b) and c) (using backspace rather than Ctrl-Z), you then resave the file as .doc, you get two very similar .doc files, for one of which Word does correct line-breaking and for one of which it does not. Maybe analyzing the difference between these files will tell us what the problem is. Unfortunately wv2 debug dumps show no difference.
Created attachment 24534 [details] Bugdoc saved from Word, after space/backspace with US keyboard, with bad breaks
Created attachment 24535 [details] Bugdoc saved from Word, after space/backspace with TH keyboard, with good breaks
I think I've figured it out. The problem is a missing document property. If you go to the Compatibility tab of the Options dialog, there should be an option called something like "Apply breaking rules" (I've only got the Thai language version, so I'm not sure what it's called in English). The problem is that OOo isn't setting this property, which causes Word not to apply Thai breaking rules. Word is smart enough to set this property automatically when you enter Thai text or open an XML file containing Thai, but it doesn't set it when you open a .doc file with Thai. In the Word XML format this corresponds to the <w:applyBreakingRules/> element. In the .doc format, it's towards the end of the DOP structure, specifically bit 0x20 in the byte immediately after the 0x04 from fDontUseHTMLAutoSpacing.
Created attachment 24537 [details] Manually hacked version of flr's bugdoc with the applyBreakingRules flag set
Created attachment 24538 [details] Untested patch to unconditionally set the applyBreakRules flag on export
flr: duplicate to #i46732#. fixed in fr8fix1 (with appropriate language tests...) *** This issue has been marked as a duplicate of 46732 ***
*** Issue 23784 has been marked as a duplicate of this issue. ***
closing