Apache OpenOffice (AOO) Bugzilla – Issue 80789
Use XLIFF as translation format for OpenOffice.org
Last modified: 2017-05-20 11:31:41 UTC
XLIFF is the standard for professional translation, and will become the standard for free-software translation. It is XML for translators: a standard created specifically for localization. XLIFF has powerful metadata capabilities that both simplify and support localization. Compared even to PO format, XLIFF has major advantages. It is exactly what we need to manage the complex background of an OpenOffice.org translation file. XLIFF is extremely easy to manipulate. All the professional editors handle it, but you can also translate it in a text editor if you like. My translation editor, LocFactoryEditor for OSX, is based on XLIFF. Pootle already converts to and from XLIFF. Pootling, the Wordforge offline editor, will be based on XLIFF. The XLIFF Tools project and XLIFF RoundTrip Tool are other free XLIFF tools available. http://xliff-tools.freedesktop.org/wiki/Projects/xlifftool http://sourceforge.net/projects/xliffroundtrip The SDF format is cumbersome, fragile and has absolutely no metadata capability. I translate for over 20 other projects, and I've never seen such a useless format. Every single string in the current file retains a date of "2002-02-02 02:02:02" which appears to have no utility whatsoever. The SDF file stores no localization metadata. We can't store dates of translation or update, or names of translators working on the file, much less character sets, plural expressions, contextual information, alternative translations or translation memory keys. PO does some of this quite well. XLIFF does it all, much better. Optional conversion to PO format has brought us more translators, more contribution. However, having to convert back to SDF not only dumps all our PO metadata, it's a barrier to participation. People just don't have time to mess around like that. We want to be able to use translation memory, to be able to track modifications, to be able to handle plural cases for different languages. We want a professional translation format. SDF is a problem, not a solution. XLIFF is the standard for professional translators. The sooner we adopt it, the sooner we will have more professional translators donating time to OpenOffice.org. XLIFF is efficient and robust, maximizing the effect of input data. The sooner we adopt it, the more work we can get done in the same amount of time. XLIFF is an open standard. If OpenOffice.org supports OpenDocument, it should certainly support XLIFF. I formally request that the OpenOffice.org project change translation format to XLIFF. I want to be able to do my job properly. Clytie Siddall, Vietnamese Free-Software Translation Team.
There are several methods to translate OOo. One of them is PO-path. Another one could be XLIFF. Volunteers can work on this path. OOo's format for export and import of translations is SDF. There are no plans to change it right now. I will personally not spend any time on it in the close future.
rather feature or enhancement than defect
i) This is an issue not in the sense that something is broken but as a "a vital or unsettled matter" (Merriam Webster) ii) I support Pavel in his position of "I wont fix whats not broken" - although I have to mention the fact SDF is NOT the format to keep multilingual information. We dont write our books in txt format anymore either. iii) I support Clythies common sense suggestion I just dont know how to proceed. But I can offer my free time. One last comment - this issue belongs somewhere else. Possibly into some powwow round at Sun.
The original documentation being authored in XML it is trivial to provide XLIFF convertion with open tool. SUN's OpenLanguageTools filters provide a configurable XML filter, but the output (an xlz file) is not compatible with tools besides for OLT itself. It is necessary to open the xlz package and extract the xlf file for translation. Okapi Project (a .NET 2.0 GPL project hosted on SourceForge) has a full fledged convertion filter set that allows for PO/XLIFF/XLIFF (for OmegaT) exports. The exported files are standard XLIFF files and can be used in most tools, except in OLT since OLT does not support XLIFF 1.1... The Okapi Project is waiting for Mono to be fully .NET 2.0 compatible and is already readying the code for an easy transition so as to be available on OSX and Linux as well. Regarding the format of the XLIFF output (either diffs or full source) that should not be an issue since using properly translation memories created from previous versions (TMX, another XML based industry standard) should make updating of non modified segments a trivial operation. XLIFF output of source files _should_ come with TMX creation based on the most recent source files sets. Alignment tools for sub-paragraph segments exist and should be used to further leverage the existing corpus. See the NetBeans translation workflow for further informations.
I do consider SDF format a defect in localization. I would be interested in any reasons for using it, bar preserving the status quo. What are its advantages in localization?
Clytie: ? Your logic is broken. You should provide reasons (technical, not marketing buzzwords like industry standards etc) for XLIFF.
sdf format is feature, bad or good (rather bad, I think). But labelling this as defect is a bit emotional and subjective. We talk about adding new feature or enhancing existing feature. :)
I use the word "defect" because the format doesn't work for localization. If a feature doesn't work, that is described as a defect. It's not enhancement to request a feature do its original job. Industry standards are important, not just buzzwords, or we wouldn't support them. They provide consistency and portability, and they are designed for the jobs they do. OpenDocument has been designed as an open document format, and OpenOffice.org has adopted it because (at the very least) it allows OpenOffice.org to reach a wider audience. XLIFF has been designed as an open translation format. OpenOffice.org should adopt it because it will allow both professional and volunteer translators to do a better job in the same amount of time. It will reduce barriers to translation, and attract more contribution. We will also be able to monitor and improve our localization practices, when we use a real L10N format. SDF format does not have any localization features, bar the capacity to list an original string and a translation, with a string ID. Localization is much more than that (and I shouldn't have to explain that to a translator). We need to be able to use translation memory, to be able to compare similar translations, to be able to store metadata about who translated what, when and how, in what language, with what plurals behaviour and regarding what context, how translations have been updated, when they have been updated, when they have been submitted etc. XLIFF can handle any amount of specialized i18n metadata. PO handles some. SDF does not handle any metadata at all, so we lose it all the minute we convert back to SDF. In this issue, I have given several functional reasons why we should switch to XLIFF. Please respond to those, and to my query why SDF should be used as a l10n format.
I can see the advantages for the localisation that offers XLIFF. Right now you can convert sdf to po and to XLIFF if you like. From my point of view it does not makes sense to use XLIFF to store our translation because the millions of strings would cause a huge increase of of the build time as XML access is pretty slow. The SDF file format is very basic, easy and fast to parse during build time.
SDF format may be quick to build, but it is not an effective translation format. Although we can convert to effective translation formats like PO or XLIFF, we lose all our metadata every time we do so. So retaining SDF as the official translation format does not provide the translation features we need. Friedel, can you provide info on building with XLIFF? Is it likely to be a lot slower? If so, can we reduce the size of the file to build, e.g. automatically strip the metadata out of a XLIFF file before build, while retaining it in the translation file(s)?
setting to startet - we are actuelly testing xliff in pootle
Reset assigne to the default "issues@openoffice.apache.org".