Issue 80789 - Use XLIFF as translation format for OpenOffice.org
Summary: Use XLIFF as translation format for OpenOffice.org
Status: ACCEPTED
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: current
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL: http://www.oasis-open.org/committees/...
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-17 14:26 UTC by clytie
Modified: 2017-05-20 11:31 UTC (History)
5 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description clytie 2007-08-17 14:26:24 UTC
XLIFF is the standard for professional translation, and will become the standard for free-software 
translation. It is XML for translators: a standard created specifically for localization.

XLIFF has powerful metadata capabilities that both simplify and support localization. Compared even to 
PO format, XLIFF has major advantages. It is exactly what we need to manage the complex background 
of an OpenOffice.org translation file.

XLIFF is extremely easy to manipulate. All the professional editors handle it, but you can also translate it 
in a text editor if you like. My translation editor, LocFactoryEditor for OSX, is based on XLIFF. Pootle 
already converts to and from XLIFF. Pootling, the Wordforge offline editor, will be based on XLIFF. The 
XLIFF Tools project and XLIFF RoundTrip Tool are other free XLIFF tools available.

http://xliff-tools.freedesktop.org/wiki/Projects/xlifftool
http://sourceforge.net/projects/xliffroundtrip

The SDF format is cumbersome, fragile and has absolutely no metadata capability. I translate for over 20 
other projects, and I've never seen such a useless format. Every single string in the current file retains a 
date of "2002-02-02 02:02:02" which appears to have no utility whatsoever. 

The SDF file stores no localization metadata. We can't store dates of translation or update, or names of 
translators working on the file, much less character sets, plural expressions, contextual information, 
alternative translations or translation memory keys.

PO does some of this quite well. XLIFF does it all, much better.

Optional conversion to PO format has brought us more translators, more contribution. However, having 
to convert back to SDF not only dumps all our PO metadata, it's a barrier to participation. People just 
don't have time to mess around like that.

We want to be able to use translation memory, to be able to track modifications, to be able to handle 
plural cases for different languages. We want a professional translation format. SDF is a problem, not a 
solution.

XLIFF is the standard for professional translators. The sooner we adopt it, the sooner we will have more 
professional translators donating time to OpenOffice.org. 

XLIFF is efficient and robust, maximizing the effect of input data. The sooner we adopt it, the more work 
we can get done in the same amount of time.

XLIFF is an open standard. If OpenOffice.org supports OpenDocument, it should certainly support XLIFF.

I formally request that the OpenOffice.org project change translation format to XLIFF. I want to be able 
to do my job properly.

Clytie Siddall, Vietnamese Free-Software Translation Team.
Comment 1 pavel 2007-08-17 19:18:52 UTC
There are several methods to translate OOo. One of them is PO-path. Another one could be XLIFF. 
Volunteers can work on this path.

OOo's format for export and import of translations is SDF. There are no plans to change it right now.

I will personally not spend any time on it in the close future.
Comment 2 avagula 2007-08-17 19:39:14 UTC
rather feature or enhancement than defect
Comment 3 smolejv 2007-08-17 22:37:37 UTC
i) This is an issue not in the sense that something is broken but as a "a vital
or unsettled matter" (Merriam Webster) 
ii) I support Pavel in his position of "I wont fix whats not broken"  - although
I have to mention the fact SDF is NOT the format to keep multilingual
information. We dont write our books in txt format anymore either. 
iii) I support Clythies common sense suggestion

I just dont know how to proceed. But I can offer my free time.

One last comment - this issue belongs somewhere else. Possibly into some powwow
round at Sun.

Comment 4 jean.christophe.helary 2007-08-18 01:22:03 UTC
The original documentation being authored in XML it is trivial to provide XLIFF convertion with open 
tool.

SUN's OpenLanguageTools filters provide a configurable XML filter, but the output (an xlz file) is not 
compatible with tools besides for OLT itself. It is necessary to open the xlz package and extract the xlf 
file for translation.

Okapi Project (a .NET 2.0 GPL project hosted on SourceForge) has a full fledged convertion filter set that 
allows for PO/XLIFF/XLIFF (for OmegaT) exports. The exported files are standard XLIFF files and can be 
used in most tools, except in OLT since OLT does not support XLIFF 1.1... The Okapi Project is waiting 
for Mono to be fully .NET 2.0 compatible and is already readying the code for an easy transition so as to 
be available on OSX and Linux as well.

Regarding the format of the XLIFF output (either diffs or full source) that should not be an issue since 
using properly translation memories created from previous versions (TMX, another XML based industry 
standard) should make updating of non modified segments a trivial operation.

XLIFF output of source files _should_ come with TMX creation based on the most recent source files 
sets. Alignment tools for sub-paragraph segments exist and should be used to further leverage the 
existing corpus.

See the NetBeans translation workflow for further informations.
Comment 5 clytie 2007-08-18 06:33:24 UTC
I do consider SDF format a defect in localization. I would be interested in any reasons for using it, bar 
preserving the status quo. What are its advantages in localization?
Comment 6 pavel 2007-08-18 09:12:34 UTC
Clytie: ? Your logic is broken. You should provide reasons (technical, not marketing buzzwords like 
industry standards etc) for XLIFF.
Comment 7 avagula 2007-08-18 09:23:49 UTC
sdf format is feature, bad or good (rather bad, I think). But labelling this 
as defect is a bit emotional and subjective. We talk about adding new feature 
or enhancing existing feature. :)
Comment 8 clytie 2007-08-18 11:15:07 UTC
I use the word "defect" because the format doesn't work for localization. If a feature doesn't work, that is 
described as a defect. It's not enhancement to request a feature do its original job.

Industry standards are important, not just buzzwords, or we wouldn't support them. They provide 
consistency and portability, and they are designed for the jobs they do. OpenDocument has been 
designed as an open document format, and OpenOffice.org has adopted it because (at the very least) it 
allows OpenOffice.org to reach a wider audience. XLIFF has been designed as an open translation 
format. OpenOffice.org should adopt it because it will allow both professional and volunteer translators 
to do a better job in the same amount of time. It will reduce barriers to translation, and attract more 
contribution. We will also be able to monitor and improve our localization practices, when we use a real 
L10N format.

SDF format does not have any localization features, bar the capacity to list an original string and a 
translation, with a string ID. Localization is much more than that (and I shouldn't have to explain that to 
a translator). We need to be able to use translation memory, to be able to compare similar translations, 
to be able to store metadata about who translated what, when and how, in what language, with what 
plurals behaviour and regarding what context, how translations have been updated, when they have 
been updated, when they have been submitted etc. XLIFF can handle any amount of specialized i18n 
metadata. PO handles some. SDF does not handle any metadata at all, so we lose it all the minute we 
convert back to SDF.

In this issue, I have given several functional reasons why we should switch to XLIFF. Please respond to 
those, and to my query why SDF should be used as a l10n format.
Comment 9 ivo.hinkelmann 2007-08-20 11:53:51 UTC
I can see the advantages for the localisation that offers XLIFF. Right now you
can convert sdf to po and to XLIFF if you like. From my point of view it does
not makes sense to use XLIFF to store our translation because the millions of
strings would cause a huge increase of of the build time as XML access is pretty
slow. The SDF file format is very basic, easy and fast to parse during build time.
Comment 10 clytie 2007-08-21 09:39:33 UTC
SDF format may be quick to build, but it is not an effective translation format. Although we can convert 
to effective translation formats like PO or XLIFF, we lose all our metadata every time we do so. So 
retaining SDF as the official translation format does not provide the translation features we need.

Friedel, can you provide info on building with XLIFF? Is it likely to be a lot slower? If so, can we reduce 
the size of the file to build, e.g. automatically strip the metadata out of a XLIFF file before build, while 
retaining it in the translation file(s)?
Comment 11 andreschnabel 2008-11-14 18:20:51 UTC
setting to startet - we are actuelly testing xliff in pootle
Comment 12 Marcus 2017-05-20 11:31:41 UTC
Reset assigne to the default "issues@openoffice.apache.org".