Apache OpenOffice (AOO) Bugzilla – Issue 19514
html export: charset utf-8 uses named entities
Last modified: 2013-08-07 14:38:26 UTC
In "Importing and Exporting in HTML Format" the online-help says, "When exporting to HTML, the character set selected in Tools - Options - Load/Save - HTML Compatibility is used. Characters not present there are written in a substitute form, which is displayed correctly in modern web browsers. When exporting such characters, you will receive an appropriate warning." But when I use utf-8, my "ä","ü" and so on are turned to named entities although they are presented in utf-8 and no 'warning' appeares. I would like it, when not the help-text is changed, but the export. In utf-8 you need no named entities and no &#xnn;. Even in ISO-8859-1 it is not nessecary for "ä" and "ü". When someone likes such substitute, he can choose ASCII/US. kind regards Regina
Please Attach the documents which make this problem, so we can test it/faster to confirm. (Without the documents, we cannot confirm the problem easily/need more time) Don't forget to cut other part of the documents, so the file size is small, but we still able to see the problem.
Created attachment 9949 [details] example produced HTML
The attached file was produced by 'new HTML-DOcument'. The behavior is indepent of the field 'Export' in the 'HTML Compatibility'-dialog. 'Character Set' in that dialog was set to 'UTF-8'.
Created attachment 9950 [details] how the document should be
The document testspecialcharacter_correct shows the correct coding of umlaut 'ü' in UTF-8.
confirming. But since this doesn't produce broken HTML, I set this to Prio5 (changed subject, OS to ALL) original summary: html export: information in help doesn't fit to behavior
Reassigned to ES
ES->MIB: Please evaluate
To offer as much compatibility as possible, the HTML export in fact uses (named) entities for as much characters as possible. One can consider this to be a bug or a feature ...
Created attachment 15618 [details] This patch fixes the problem for ISO8859-1 and MS 1250
A similar patch which fixes the problem for ISO 8859-7 and MS 1253 is mentioned in the Issue #28241.
*** Issue 53483 has been marked as a duplicate of this issue. ***
Problem occures still in OOo2.0 And there is yet another problem - maybe should be a new bug/issue: "To offer as much compatibility as possible, the HTML export in fact uses (named) entities for as much characters as possible. One can consider this to be a bug or a feature ..." That's not true at all! OO replaces „ “ and other by " (99 down) and " (66 up)! I don't like (I hate), that OOo replaces code, which was created manually before.