Issue 80888 - Titles of some html files become to garbage.
Summary: Titles of some html files become to garbage.
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: 680m222
Hardware: PC All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: andreas.martens
QA Contact: issues@sw
URL:
Keywords: CJK, needmoreinfo, oooqa
Depends on:
Blocks:
 
Reported: 2007-08-21 12:42 UTC by redflagzhulihua
Modified: 2013-08-07 14:38 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
the Chinese character title comes to garbage (54.38 KB, image/jpeg)
2007-08-21 12:44 UTC, redflagzhulihua
no flags Details
HTML file which shows the problem (3.62 KB, application/octet-stream)
2007-08-23 10:04 UTC, hdu@apache.org
no flags Details
aspect in Simplified_Chinese version (XP) (34.98 KB, image/jpeg)
2007-11-16 02:51 UTC, redflagzhulihua
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description redflagzhulihua 2007-08-21 12:42:21 UTC
input "http://www.baidu.com" in the URL field of IE

in the menu: File->Save as    Save it to local disk.

open writer, file ->open    Open the html file just saved.

The title is garbage.

see attachment for details.
Comment 1 redflagzhulihua 2007-08-21 12:44:35 UTC
Created attachment 47668 [details]
the Chinese character title comes to garbage
Comment 2 michael.ruess 2007-08-21 12:48:53 UTC
Reassigned to ES.
Comment 3 eric.savary 2007-08-22 15:44:49 UTC
@HDU: can you help analyzing this please? I don't know if it's a question of 
encoding (gb2312), fonts or locale... iE and Firefox also display trash in the
title but Firefox on the SunRay displays the title correctly...
Comment 4 hdu@apache.org 2007-08-23 10:04:51 UTC
Created attachment 47743 [details]
HTML file which shows the problem
Comment 5 hdu@apache.org 2007-08-23 10:25:41 UTC
The HTML file starts directly with
  <html><head><title>GB2312_ENCODED_TITLE_BYTES</title>
  <meta http-equiv=Content-Type content="text/html;charset=gb2312">

Since the html header's title doesn't mention its encoding directly, OOo's html import code has to guess 
the encoding. The import code seems to simply use the thread specific encoding, which is obviously not 
sufficient in this case. Tweaking the html import heuristic to work around this particular problem doesn't 
sound too difficult.
Comment 6 eric.savary 2007-08-23 15:32:11 UTC
@AMA: please have a look.
Comment 7 eric.savary 2007-08-23 15:32:37 UTC
.
Comment 8 redflagzhulihua 2007-11-16 02:51:57 UTC
Created attachment 49680 [details]
aspect in Simplified_Chinese version (XP)
Comment 9 thackert 2010-08-29 08:20:41 UTC
Hello redflagzhulihua, *,
during my TCM test I stumbled (again ... ;) ) upon one of your issues .. ;) I
tested it with the Germanophone version of OOO330m4 under Debian
SID/Experimental AMD64, and here it looks like your attached Snap1.jpg :) I have
to add, that I went to www.baidu.com with Firefox 4.0b4, saved the whole page on
my harddisk and then opened it in OOo. Would you be so kind to test it on your
system(s?) as well and report back, if this issue is fixed there, too? And it
would be nice, if you could close this issue, if it is the case ... ;)
HTH
Thomas.
Comment 10 redflagzhulihua 2010-08-30 03:12:28 UTC
Hi Thomas,

Again, Thank you for concern. I tested again, and can not see the problem any
more. I'll close this issue.
Comment 11 redflagzhulihua 2010-08-30 03:13:28 UTC
closing...