Issue 84981 - spellcheck do wrong in English version of writer with Chinese character
Summary: spellcheck do wrong in English version of writer with Chinese character
Status: CLOSED IRREPRODUCIBLE
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 2.3.1 RC1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: thomas.lange
QA Contact: issues@sw
URL:
Keywords: CJK
Depends on:
Blocks: 84405
  Show dependency tree
 
Reported: 2008-01-04 09:00 UTC by redflagzhulihua
Modified: 2013-08-07 14:44 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample of spellcheck error (7.07 KB, application/vnd.sun.xml.writer)
2008-01-04 09:16 UTC, redflagzhulihua
no flags Details
before spellcheck (42.57 KB, image/jpeg)
2008-01-04 09:35 UTC, redflagzhulihua
no flags Details
after spellcheck, some duplicated characters inserted (30.62 KB, image/jpeg)
2008-01-04 09:36 UTC, redflagzhulihua
no flags Details
a longer text sample (26.27 KB, application/vnd.oasis.opendocument.text)
2008-01-09 03:27 UTC, redflagzhulihua
no flags Details
replace your file this file, the issue will come up. (13.50 KB, text/plain)
2008-01-09 03:29 UTC, redflagzhulihua
no flags Details
Sample bugdoc to reproduce the problem with DEV300 build (7.07 KB, application/octet-stream)
2008-03-05 14:20 UTC, thomas.lange
no flags Details
Reduced sampledocument with problems (7.30 KB, text/plain)
2008-03-12 10:16 UTC, thomas.lange
no flags Details
Yet another reduced sample document with problems (7.48 KB, application/octet-stream)
2008-03-12 10:17 UTC, thomas.lange
no flags Details
Short example that shows the problem when starting the spell check dialog (7.30 KB, application/octet-stream)
2009-03-05 12:00 UTC, thomas.lange
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description redflagzhulihua 2008-01-04 09:00:49 UTC
1. Open the attachment writer document;
2. Click Spellcheck button;
3. Always click Ignore button in the Spellcheck dialog;
4. You can see the text changed, some duplicated characters inserted in the 
text.

It works correctly in Chinese version.
Comment 1 redflagzhulihua 2008-01-04 09:16:43 UTC
Created attachment 50656 [details]
Sample of spellcheck error
Comment 2 redflagzhulihua 2008-01-04 09:35:41 UTC
Created attachment 50657 [details]
before spellcheck
Comment 3 redflagzhulihua 2008-01-04 09:36:53 UTC
Created attachment 50658 [details]
after spellcheck, some duplicated characters inserted
Comment 4 michael.ruess 2008-01-04 09:47:49 UTC
Reassigned to SBA.
Comment 5 peter.junge 2008-01-05 08:43:23 UTC
I cannot reproduce it on a OOo2.3.0 coming with Ubuntu 7.10.
So, it's either a regression or we needs some special basic conditions to
experience this result.

Peter
Comment 6 redflagzhulihua 2008-01-07 01:42:06 UTC
Hi, Peter,

It's reproduced on my English version of OOo2.3.1 with Simplified Chinese XP 
system.

I tried it in OOo2.4-DEV, it's not reproduced. 

Maybe it's fix in OOo2.4? But it's do reproduced in my enviroment.

Best regards,
Zhu Lihua
Comment 7 redflagzhulihua 2008-01-09 03:15:57 UTC
Hi,
It's caused by some settings about language. But I can't tell which setting 
cause this issue. But I found out a file cause this issue in C:\Documents and 
Settings\Administrator\Application Data\OpenOffice.org2
\user\registry\data\org\openoffice\Office.
The filename is "Linguistic.xcu"
I'll upload this file to issuetracker, maybe it helps. if you replace your file 
with this one, the issue will come up.
And a longer document will be attached.
Comment 8 redflagzhulihua 2008-01-09 03:27:56 UTC
Created attachment 50750 [details]
a longer text sample
Comment 9 redflagzhulihua 2008-01-09 03:29:53 UTC
Created attachment 50751 [details]
replace your file this file, the issue will come up.
Comment 10 peter.junge 2008-01-09 03:35:40 UTC
Hi Zhu Lihua,

please check the following:
- Close all soffice processes including the quick starter
- Then either
  + rename the folder 'OpenOffice.org2'
  or
  + backup and delete it
- Restart OOo
Now OOo should be in a state as if it is freshly installed.
Is the issue still reproducible now?

Best regards
Peter
Comment 11 redflagzhulihua 2008-01-09 03:48:43 UTC
Hi, Peter,

I've already tried this.

The issue didn't reproduce after I do this. But if I replace 
the "Linguistic.xcu" file with the original file. The issue come up again.

So I think some settings have modified the file, and cause this issue.

But I don't know which setting in this file cause the problem.

Best regards,
Zhu Lihua
Comment 12 stefan.baltzer 2008-03-05 13:24:51 UTC
SBA->TL: Please take over. Maybe the current changes in Linguistic modules will
solve this "along the way".

 
Comment 13 thomas.lange 2008-03-05 13:55:53 UTC
TL->:redflagzhulihua: 
Could you please state if the Linguistic.xcu is from the share or the user layer?
Also please add the other one as well.
Comment 14 thomas.lange 2008-03-05 14:02:49 UTC
TL->redflagzhulihua:

Also by just copying the Linguistic.xcu in my installation I can not reproduce
the problem.


Actually since the sample text is completely in Chinese simplified and
StarOffice / OpenOffice does not have a spell checker for that by default it
seems that you guys must have one implemented (or at least in use) for that
language.

Without that spell checker nothing at all will happen!
The spell checking dialog will just report that it has finished checking the
document.

In order to see the problem I need your spell checker implementation for Chinese
simplified! Can you provide it?
Or is there a workaround to still reproduce the problem?
Comment 15 thomas.lange 2008-03-05 14:19:39 UTC
Adding a new sample document that reproduces the problem even with DEV300 build.
As one can see in the dialog there are two characters added to the sentence that
do not exist in the document. And those two characters are the problem.


Comment 16 thomas.lange 2008-03-05 14:20:50 UTC
Created attachment 51906 [details]
Sample bugdoc to reproduce the problem with DEV300 build
Comment 17 thomas.lange 2008-03-06 12:58:42 UTC
.
Comment 18 thomas.lange 2008-03-12 10:16:36 UTC
Created attachment 52047 [details]
Reduced sampledocument with problems
Comment 19 thomas.lange 2008-03-12 10:17:35 UTC
Created attachment 52049 [details]
Yet another reduced sample document with problems
Comment 20 thomas.lange 2008-03-12 10:27:03 UTC
A first inspection showed that there might be at least two problems:
1) the Chines text gets spell checked with English spell checker
2) the spell check dialog displays characters that are not part of the document.

Debugging into 1) it turns out that the CJK Language attribute of the respective
text is set to English, which should never have been the case.

Thus of course there is at least a workaround for that problem:
Just select the problem text and assign the correct Asian language in the
Format/Character dialog.

Problem case 2) needs still to be inspected.


TL->redflagzhulihua: 
However since in case 1) the document should never have had a CJK language
attribute set to English please describe step by step how that document was
created. If the problem lies in the creation of the document it can probably not
be fixed anymore for existing documents.
But we like to know if at least the creation of such documents with broken
language settings can be avoided in the future.
Comment 21 redflagzhulihua 2008-03-12 11:29:00 UTC
Hi, TL,

In fact, I found this issue by chance. The creation of the document is very
common. I just created a document and input Chinese characters in the document.
It often occurs from wherever I copy text too. 

Later I found it caused by a file named "Linguistic.xcu". I have ever compared
this file to one from a newly installed OOo. There are several sections
difference. But I don't know which one cause the problem and what operation in
OOo can result in these modification in the file.
Comment 22 thomas.lange 2008-03-12 12:37:23 UTC
TL->redflagzhulihua :
- Does the problem also occur if you type in characters via the keyboard only?
- Or does the problem ONLY occur if you paste text?
In the latter case the problem might be in the paste operation and I may need
the original document where the text was copied from and then pasted.

Currently from what you have said so far I still do not see that the
Linguistic.xcu you attached could cause this effect. Since there are no wrong
assignmenst made to the 3 entries 'DefaultLocale_CJK', 'DefaultLocale' and
'DefaultLocale_CTL'. (It would be different if e.g. DefaultLocale_CJK was set to
en-US).
Also copying this xcu file under an existing installation does not cause the
effect. 
So far the problem (for case 1) ) lies in the document itself, by having western
  languages assigned to the CJK language attribut and assigning Chinese
simplified to the Western language attribut.

And as said above I can image this to happen because of pasted text. But I do no
have an idea how that could happen with text typed via the keyboard only.

Thus it would be great if you could check both cases and confirm or deny the
occurrence of the problem in each case.
Comment 23 peter.junge 2008-03-14 04:29:07 UTC
Hi,

I would strongly recommend to close this as 'wontfix'. Besides the discussion,
if this is a bug or not, we also have to talk about practically relevance. So,
how likely is it, that a real existing user copies Linguistic.xcu into his
installation and uses the English spell checker on Chinese language. My guess
it's almost zero. Consequently, even if this is a bug, the described behavior is
probably completely irrelevant. Or, did somebody notice any other bug reports
like this? If not, please stop the discussion and close the issue.

Best regards,
Peter
Comment 24 redflagzhulihua 2008-03-14 09:29:24 UTC
I agree with Peter.

Although this issue can be reproduced at the condition I said. But it seems hard
to find out which operation can cause this effect. And it really a occasional
case. So maybe too much effort we will give. If we can find the operation that
cause this effect later, we can reopen it.
Comment 25 thomas.lange 2008-03-14 10:08:48 UTC
TL->PJ: It is already clear that this issue can not be fixed or work arounded
later properly. BUT:

- the question still remains how such a document can be created! 
  (Aside from manually using API where it can easily be done)
- and I once more voice the opinion, that I can see no reason for 
  the attached Linguistic.xcu to cause such a problem.
  From my view it lies exclusively with the document itself.

But I'm also against closing this issue right away before it is verified that at
least two easily thinkable ways to create such a problem do not cause this problem.

What I like to have checked is what happens if yoo have a foreign document (e.g.
MS Word or whatever) that holds Chinese text but has the language attribute set
to English US. ANd there are to cases to it:
a) what happens when import that document
b) what when copy and pasting from it

If the problem does not occur in those two cases then we will just close this issue.
If it does occur one must at least think once more if the import of such text
could and should be improved. E.g. by forcefully setting the language attribute
to none if e.g. a western language is assigned to the CJK language attribute or
by ignoring such assignments and keeping the default value.


TL->PJ: Can you have this checked out quickly?
Otherwise I will do so myself sometime next week.
Comment 26 peter.junge 2008-03-17 02:23:51 UTC
> TL->PJ: Can you have this checked out quickly?
> Otherwise I will do so myself sometime next week.

Hi Zhu Lihua,

please go ahead with the checks TL suggested.

Best regards,
Peter
Comment 27 redflagzhulihua 2008-03-18 09:47:09 UTC
Hi, TL,

I tested again. It does reproduce in a keyboard input document.

1. replace the Linguistic.xcu with the one I attached;
2. Create a new writer document, and input a paragraph Chinese article;
     Maybe you are not convenience to input Chinese, so you can simple input
     a short paragraph like: "天气预报说明天下雪"
3. Click spellcheck button;
     On my computer, the spell check dialog won't response for a while,
     After a while, it start to spell check, and the dictionary language is Danish!
4. Always click "ignore once" button.

The result is "天气预报说说明天天下雪". In Chinese "说明" is a Word, and "天下"
is also a word, the extra character inserted is "说" and "天", They all are the
first character of the 2 words. And I found Most of the occurence of the
duplicated characters are the first character of a WORD. But not all the WORD
act like this. Maybe it have something to do with the Chinese Word break?

I tried to create a MS-word document which contains Chinese characters with
western language attibute. But I can't find a way to do so. Ms-word's character
dialog can't choose any language attribute for characters, it seems it's always
default. Even in OOo, we can only set it to Chinese or Korean or Japanese or none.

And is your Operating system a Chinese version? Maybe it only reproduce in
Chinese version OS ? 
Comment 28 redflagzhulihua 2008-03-18 10:05:20 UTC
I havn't a copy of English version windows. So I've call for help in the
zh.openoffice.org. Hope there are somebody use a English version windows.

I'll test in a English locale linux soon.
Comment 29 thomas.lange 2008-03-18 11:44:16 UTC
In MS Word the way to set the language is "Tools/Language/Set Language"...
Comment 30 redflagzhulihua 2008-03-18 11:55:25 UTC
Hi, TL,

I've tried this, it always changed to Chinese automatically. even I uncheck the
auto check box.
Comment 31 zdyx 2008-03-19 02:42:53 UTC
Hello guys!

My testing environment:
OOo_2.4.0rc6_20080314_Win32Intel_install_en-US.exe + Windows XP SP3 English OS
OOo_2.4.0rc6_20080314_Win32Intel_install_en-US.exe + Windows XP SP2 Chinese OS

This bug no longer exists with the original OOo 2.4.0rc6 installation, but, with the 
"Linguistic.xcu" file provided by redflagzhulihua, it is reproducible!

On both OS, the Spellcheck popup window will be frozen for a couple of seconds, then begin 
to check the spelling.

After clicking ""Ignore Once" button several times, the duplication issue did happen. As 
redflagzhulihua said, it duplicates the first Chinese character of a phrase.

I think it has something to do with Linguistic Module.

Regards,
ZDYX
Comment 32 redflagzhulihua 2008-03-19 03:44:03 UTC
Zhu Lihua->ZDYX: Thank you very much!

It reproduced in both Chinese and English locale of fedora 8.

Something difference in linux version of OOo:
There is no text shown in "Not in dictionary" field of spellcheck dialog. and
the "Dictionary language" is German(Germany). After spellcheck by several click
on "Ignore Once" button, the text changed.
Comment 33 thomas.lange 2009-03-05 09:25:22 UTC
I just checked this with DEV300_m41 again. I still see the problem in the long
document and in spellcheck4.odt.
Setting target to 3.2.
Comment 34 thomas.lange 2009-03-05 09:25:56 UTC
.
Comment 35 thomas.lange 2009-03-05 12:00:26 UTC
Created attachment 60741 [details]
Short example that shows the problem when starting the spell check dialog
Comment 36 thomas.lange 2009-03-05 12:12:11 UTC
Looking into this it turned out the problem is that the language attributes for
the respective text parts are invalid.

Usually the language attributed should look like this for such a short text:
<style:style style:name="P1" style:family="paragraph"
style:parent-style-name="Standard">
<style:text-properties fo:language="en" fo:country="ZW"
style:language-asian="zh" style:country-asian="TW" style:language-complex="te"
style:country-complex="IN"/>
</style:style>

Here fo:language and fo:country contain the western locale, style:language-asian
and style:country-asian the Asian locale and last style:language-complex and
style:country-complex the CTL locale.


But when looking at the actual content in your file it is like this:
<style:style style:name="T1" style:family="text">
<style:text-properties fo:language="zh" fo:country="CN"
style:language-asian="en" style:country-asian="US"/>
</style:style>

That is fo:language and fo:country the Western locale is set to Chinese
simplified(!) and the Asian locale is set to English US(!). None of those should
ever be possible. 
And because of that if you select e.g. the first single character you can't see
those attribues in the Format/Character dialog becuase the list boxes there
support only the correct locales. Thus the language display for Western and
Asian is empty.
To fix the problem simply selct all the text (e.g. by using CTRL-A) open the
Format/Character dialog and set the Western and the Asian language to their
correct values that will solve the issue!

However, the one interesting problem is how did you manage to get a document
with these broken language settings? If it is done by importing a document the
respective filter needs to be fixed. If it was done by using some input method
editor (IME) either that one or the code handling that IME needs to be fixed.

Thus, basically this issue is invalid but if you can tell us how the documents
got created the way they are it might be useful to look more closely into what
happens at that time.

TL->SBA: Please take over for the time being, until we may get some more info
about this.

Comment 37 thomas.lange 2009-03-05 12:16:06 UTC
TL->redflagzhulihua: Can you add some original (MS Word, or whatever) document
where you copy the text from? Maybe this will be helpful.
Comment 38 redflagzhulihua 2009-03-09 04:51:23 UTC
Hi,

The short sample document is input with keyboard.

and the longer sample is copied from a topic in a forum, pasted with unformated
text.

the URL:

http://www.mychery.net/forum/bin/ut/topic_show.cgi?id=688022&pg=1&age=0&bpg=1&del=&stamp=1236570329#12685329
Comment 39 stefan.baltzer 2009-10-16 14:47:03 UTC
SBA->TL: The office "believing" (at least showing) to spell-check Chinese can
only lead to unwanted results. So some "strage settings" seem to take place.
This is my scenario with OOO320_m1:
 - Start a newly installed office (or remove user tree first)
(my Office has German And English spellcheck among other Western ones)
 - Tools - Otions-Language Settings-Languages
 - Check CTL and Asian language support
(Note that in Asian Language list, there is no spell check check mark at all)
 - Download bugdoc "Spellcheck3" and open it in Office (to have it editable)
-> See that there are red wavy underlines under chinese characters
-> Put cursor at end of line, see that language "German" is set
 - Launch spellcheck (<F7>)
-> note that German spellcheck highlights Chinese characters
 - Click "Ignore once" two times
-> Message box "The spellcheck is complete" comes up
 - Format-Character
-> See that there is a blue check mark in front of "Chinese simplified"...
Please proceed.
Comment 40 stefan.baltzer 2010-07-02 10:57:25 UTC
Findings in DEV300_m84:
With bugdoc "Spellcheck3" I still get wavy underlines under Chinese characters. 
TL told me that the languag attributes in this document are set to "invalid values"

Even with this document, the "Fake to be able to spellcheck Chinese" Problem
does not occur anymore. That got fixed by issue 106497.

Set to "Worksforme". 
Comment 41 stefan.baltzer 2010-07-02 10:58:03 UTC
Closed.