Apache OpenOffice (AOO) Bugzilla – Issue 84981
spellcheck do wrong in English version of writer with Chinese character
Last modified: 2013-08-07 14:44:07 UTC
1. Open the attachment writer document; 2. Click Spellcheck button; 3. Always click Ignore button in the Spellcheck dialog; 4. You can see the text changed, some duplicated characters inserted in the text. It works correctly in Chinese version.
Created attachment 50656 [details] Sample of spellcheck error
Created attachment 50657 [details] before spellcheck
Created attachment 50658 [details] after spellcheck, some duplicated characters inserted
Reassigned to SBA.
I cannot reproduce it on a OOo2.3.0 coming with Ubuntu 7.10. So, it's either a regression or we needs some special basic conditions to experience this result. Peter
Hi, Peter, It's reproduced on my English version of OOo2.3.1 with Simplified Chinese XP system. I tried it in OOo2.4-DEV, it's not reproduced. Maybe it's fix in OOo2.4? But it's do reproduced in my enviroment. Best regards, Zhu Lihua
Hi, It's caused by some settings about language. But I can't tell which setting cause this issue. But I found out a file cause this issue in C:\Documents and Settings\Administrator\Application Data\OpenOffice.org2 \user\registry\data\org\openoffice\Office. The filename is "Linguistic.xcu" I'll upload this file to issuetracker, maybe it helps. if you replace your file with this one, the issue will come up. And a longer document will be attached.
Created attachment 50750 [details] a longer text sample
Created attachment 50751 [details] replace your file this file, the issue will come up.
Hi Zhu Lihua, please check the following: - Close all soffice processes including the quick starter - Then either + rename the folder 'OpenOffice.org2' or + backup and delete it - Restart OOo Now OOo should be in a state as if it is freshly installed. Is the issue still reproducible now? Best regards Peter
Hi, Peter, I've already tried this. The issue didn't reproduce after I do this. But if I replace the "Linguistic.xcu" file with the original file. The issue come up again. So I think some settings have modified the file, and cause this issue. But I don't know which setting in this file cause the problem. Best regards, Zhu Lihua
SBA->TL: Please take over. Maybe the current changes in Linguistic modules will solve this "along the way".
TL->:redflagzhulihua: Could you please state if the Linguistic.xcu is from the share or the user layer? Also please add the other one as well.
TL->redflagzhulihua: Also by just copying the Linguistic.xcu in my installation I can not reproduce the problem. Actually since the sample text is completely in Chinese simplified and StarOffice / OpenOffice does not have a spell checker for that by default it seems that you guys must have one implemented (or at least in use) for that language. Without that spell checker nothing at all will happen! The spell checking dialog will just report that it has finished checking the document. In order to see the problem I need your spell checker implementation for Chinese simplified! Can you provide it? Or is there a workaround to still reproduce the problem?
Adding a new sample document that reproduces the problem even with DEV300 build. As one can see in the dialog there are two characters added to the sentence that do not exist in the document. And those two characters are the problem.
Created attachment 51906 [details] Sample bugdoc to reproduce the problem with DEV300 build
.
Created attachment 52047 [details] Reduced sampledocument with problems
Created attachment 52049 [details] Yet another reduced sample document with problems
A first inspection showed that there might be at least two problems: 1) the Chines text gets spell checked with English spell checker 2) the spell check dialog displays characters that are not part of the document. Debugging into 1) it turns out that the CJK Language attribute of the respective text is set to English, which should never have been the case. Thus of course there is at least a workaround for that problem: Just select the problem text and assign the correct Asian language in the Format/Character dialog. Problem case 2) needs still to be inspected. TL->redflagzhulihua: However since in case 1) the document should never have had a CJK language attribute set to English please describe step by step how that document was created. If the problem lies in the creation of the document it can probably not be fixed anymore for existing documents. But we like to know if at least the creation of such documents with broken language settings can be avoided in the future.
Hi, TL, In fact, I found this issue by chance. The creation of the document is very common. I just created a document and input Chinese characters in the document. It often occurs from wherever I copy text too. Later I found it caused by a file named "Linguistic.xcu". I have ever compared this file to one from a newly installed OOo. There are several sections difference. But I don't know which one cause the problem and what operation in OOo can result in these modification in the file.
TL->redflagzhulihua : - Does the problem also occur if you type in characters via the keyboard only? - Or does the problem ONLY occur if you paste text? In the latter case the problem might be in the paste operation and I may need the original document where the text was copied from and then pasted. Currently from what you have said so far I still do not see that the Linguistic.xcu you attached could cause this effect. Since there are no wrong assignmenst made to the 3 entries 'DefaultLocale_CJK', 'DefaultLocale' and 'DefaultLocale_CTL'. (It would be different if e.g. DefaultLocale_CJK was set to en-US). Also copying this xcu file under an existing installation does not cause the effect. So far the problem (for case 1) ) lies in the document itself, by having western languages assigned to the CJK language attribut and assigning Chinese simplified to the Western language attribut. And as said above I can image this to happen because of pasted text. But I do no have an idea how that could happen with text typed via the keyboard only. Thus it would be great if you could check both cases and confirm or deny the occurrence of the problem in each case.
Hi, I would strongly recommend to close this as 'wontfix'. Besides the discussion, if this is a bug or not, we also have to talk about practically relevance. So, how likely is it, that a real existing user copies Linguistic.xcu into his installation and uses the English spell checker on Chinese language. My guess it's almost zero. Consequently, even if this is a bug, the described behavior is probably completely irrelevant. Or, did somebody notice any other bug reports like this? If not, please stop the discussion and close the issue. Best regards, Peter
I agree with Peter. Although this issue can be reproduced at the condition I said. But it seems hard to find out which operation can cause this effect. And it really a occasional case. So maybe too much effort we will give. If we can find the operation that cause this effect later, we can reopen it.
TL->PJ: It is already clear that this issue can not be fixed or work arounded later properly. BUT: - the question still remains how such a document can be created! (Aside from manually using API where it can easily be done) - and I once more voice the opinion, that I can see no reason for the attached Linguistic.xcu to cause such a problem. From my view it lies exclusively with the document itself. But I'm also against closing this issue right away before it is verified that at least two easily thinkable ways to create such a problem do not cause this problem. What I like to have checked is what happens if yoo have a foreign document (e.g. MS Word or whatever) that holds Chinese text but has the language attribute set to English US. ANd there are to cases to it: a) what happens when import that document b) what when copy and pasting from it If the problem does not occur in those two cases then we will just close this issue. If it does occur one must at least think once more if the import of such text could and should be improved. E.g. by forcefully setting the language attribute to none if e.g. a western language is assigned to the CJK language attribute or by ignoring such assignments and keeping the default value. TL->PJ: Can you have this checked out quickly? Otherwise I will do so myself sometime next week.
> TL->PJ: Can you have this checked out quickly? > Otherwise I will do so myself sometime next week. Hi Zhu Lihua, please go ahead with the checks TL suggested. Best regards, Peter
Hi, TL, I tested again. It does reproduce in a keyboard input document. 1. replace the Linguistic.xcu with the one I attached; 2. Create a new writer document, and input a paragraph Chinese article; Maybe you are not convenience to input Chinese, so you can simple input a short paragraph like: "天气预报说明天下雪" 3. Click spellcheck button; On my computer, the spell check dialog won't response for a while, After a while, it start to spell check, and the dictionary language is Danish! 4. Always click "ignore once" button. The result is "天气预报说说明天天下雪". In Chinese "说明" is a Word, and "天下" is also a word, the extra character inserted is "说" and "天", They all are the first character of the 2 words. And I found Most of the occurence of the duplicated characters are the first character of a WORD. But not all the WORD act like this. Maybe it have something to do with the Chinese Word break? I tried to create a MS-word document which contains Chinese characters with western language attibute. But I can't find a way to do so. Ms-word's character dialog can't choose any language attribute for characters, it seems it's always default. Even in OOo, we can only set it to Chinese or Korean or Japanese or none. And is your Operating system a Chinese version? Maybe it only reproduce in Chinese version OS ?
I havn't a copy of English version windows. So I've call for help in the zh.openoffice.org. Hope there are somebody use a English version windows. I'll test in a English locale linux soon.
In MS Word the way to set the language is "Tools/Language/Set Language"...
Hi, TL, I've tried this, it always changed to Chinese automatically. even I uncheck the auto check box.
Hello guys! My testing environment: OOo_2.4.0rc6_20080314_Win32Intel_install_en-US.exe + Windows XP SP3 English OS OOo_2.4.0rc6_20080314_Win32Intel_install_en-US.exe + Windows XP SP2 Chinese OS This bug no longer exists with the original OOo 2.4.0rc6 installation, but, with the "Linguistic.xcu" file provided by redflagzhulihua, it is reproducible! On both OS, the Spellcheck popup window will be frozen for a couple of seconds, then begin to check the spelling. After clicking ""Ignore Once" button several times, the duplication issue did happen. As redflagzhulihua said, it duplicates the first Chinese character of a phrase. I think it has something to do with Linguistic Module. Regards, ZDYX
Zhu Lihua->ZDYX: Thank you very much! It reproduced in both Chinese and English locale of fedora 8. Something difference in linux version of OOo: There is no text shown in "Not in dictionary" field of spellcheck dialog. and the "Dictionary language" is German(Germany). After spellcheck by several click on "Ignore Once" button, the text changed.
I just checked this with DEV300_m41 again. I still see the problem in the long document and in spellcheck4.odt. Setting target to 3.2.
Created attachment 60741 [details] Short example that shows the problem when starting the spell check dialog
Looking into this it turned out the problem is that the language attributes for the respective text parts are invalid. Usually the language attributed should look like this for such a short text: <style:style style:name="P1" style:family="paragraph" style:parent-style-name="Standard"> <style:text-properties fo:language="en" fo:country="ZW" style:language-asian="zh" style:country-asian="TW" style:language-complex="te" style:country-complex="IN"/> </style:style> Here fo:language and fo:country contain the western locale, style:language-asian and style:country-asian the Asian locale and last style:language-complex and style:country-complex the CTL locale. But when looking at the actual content in your file it is like this: <style:style style:name="T1" style:family="text"> <style:text-properties fo:language="zh" fo:country="CN" style:language-asian="en" style:country-asian="US"/> </style:style> That is fo:language and fo:country the Western locale is set to Chinese simplified(!) and the Asian locale is set to English US(!). None of those should ever be possible. And because of that if you select e.g. the first single character you can't see those attribues in the Format/Character dialog becuase the list boxes there support only the correct locales. Thus the language display for Western and Asian is empty. To fix the problem simply selct all the text (e.g. by using CTRL-A) open the Format/Character dialog and set the Western and the Asian language to their correct values that will solve the issue! However, the one interesting problem is how did you manage to get a document with these broken language settings? If it is done by importing a document the respective filter needs to be fixed. If it was done by using some input method editor (IME) either that one or the code handling that IME needs to be fixed. Thus, basically this issue is invalid but if you can tell us how the documents got created the way they are it might be useful to look more closely into what happens at that time. TL->SBA: Please take over for the time being, until we may get some more info about this.
TL->redflagzhulihua: Can you add some original (MS Word, or whatever) document where you copy the text from? Maybe this will be helpful.
Hi, The short sample document is input with keyboard. and the longer sample is copied from a topic in a forum, pasted with unformated text. the URL: http://www.mychery.net/forum/bin/ut/topic_show.cgi?id=688022&pg=1&age=0&bpg=1&del=&stamp=1236570329#12685329
SBA->TL: The office "believing" (at least showing) to spell-check Chinese can only lead to unwanted results. So some "strage settings" seem to take place. This is my scenario with OOO320_m1: - Start a newly installed office (or remove user tree first) (my Office has German And English spellcheck among other Western ones) - Tools - Otions-Language Settings-Languages - Check CTL and Asian language support (Note that in Asian Language list, there is no spell check check mark at all) - Download bugdoc "Spellcheck3" and open it in Office (to have it editable) -> See that there are red wavy underlines under chinese characters -> Put cursor at end of line, see that language "German" is set - Launch spellcheck (<F7>) -> note that German spellcheck highlights Chinese characters - Click "Ignore once" two times -> Message box "The spellcheck is complete" comes up - Format-Character -> See that there is a blue check mark in front of "Chinese simplified"... Please proceed.
Findings in DEV300_m84: With bugdoc "Spellcheck3" I still get wavy underlines under Chinese characters. TL told me that the languag attributes in this document are set to "invalid values" Even with this document, the "Fake to be able to spellcheck Chinese" Problem does not occur anymore. That got fixed by issue 106497. Set to "Worksforme".
Closed.