Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | CSV import could ignore leading spaces if the field content without them is quoted. | ||||||
---|---|---|---|---|---|---|---|
Product: | Calc | Reporter: | guidogam <g.gambardella> | ||||
Component: | open-import | Assignee: | AOO issues mailing list <issues> | ||||
Status: | CONFIRMED --- | QA Contact: | |||||
Severity: | Trivial | ||||||
Priority: | P4 | CC: | issues, oliver.brinzing, ooo | ||||
Version: | OOo 3.2.1 | ||||||
Target Milestone: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Issue Type: | ENHANCEMENT | Latest Confirmation in: | --- | ||||
Developer Difficulty: | --- | ||||||
Attachments: |
|
Description
guidogam
2010-07-26 10:54:43 UTC
Created attachment 70816 [details]
The second field cannot be impoorted, with european settings
The behavior is caused by the leading space: , "3,14159", If that is changed to ,"3,14159", instead it works as expected. It is not true that if a field contains spaces it must be enclosed in quotation marks. Actually the import _does_ follow the standard http://tools.ietf.org/html/rfc4180 that in 2.4. says "Spaces are considered part of a field and should not be ignored." and hence the field _does not_ start quoted. At first hand the file content does not follow the standard.. I agree that for convenience the import could look if without leading spaces the field content would be quoted and treat it as such. However, this is of low priority. And no, the import dialog is not useless as it it used for any type of text file import, apart from comma separated there are also tab separated, semicolon separated, fixed field width files and others. For now, the example file can be loaded if you select both comma and space as delimiter, and "Merge delimiters" in the dialog. Thanks for letting me know about RFC4180. It's weird that the original spec clearly stated that leading and trailing blanks (I should have specified) were going to be trimmed, while the RFC states the opposite. But, at least, it's a formal spec (although with that RFC name...) and so it's better to stick to it. But your implementation non only does not follow RFC 793 [8], but violates also RFC4180 [5]: "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." It means that the program should have taken everythig between the quotes and treated it as a single field. The import dialog for CSV files is useless, since a CSV file is what we are talking about. Other formats, that use tabs, semicolon or other field delimiters are not CSV. Thanks for your prompt and informative answer. @guidogam: > But your implementation non only does not follow RFC 793 [8], How should this be related to RFC 793 TCP? > but violates also RFC4180 [5]: "If fields are not enclosed with double > quotes, then double quotes may not appear inside the fields." It means > that the program should have taken everythig between the quotes and > treated it as a single field. No. It means that this is a file content that is not defined by the standard, as the field is not enclosed with double quotes. Field content starts right after the comma and there is a space. At first hand the generator didn't follow standards, not our import implementation. Now as we take the entire content between commas as one field and do not detect a quoted content in this case we take all characters up to the next delimiter, including any quotes encountered. We do so because there are too many implementations that write broken files that have quotes in non-quoted field content, and even unescaped quotes in quoted content. This is a long ongoing discussion and I won't repeat it here. See issue 78926 for details. In the "Interoperability considerations:" of RFC4180 there is a quotation of RFC793 about being liberal in what you accept. In fact you are more liberal than what the standard say, since you are able to handle the relatively common broken files with single quotation marks. Thanks for your time. *** Issue 116209 has been marked as a duplicate of this issue. *** . *** Issue 116919 has been marked as a duplicate of this issue. *** The summary for this issue does not match the content. Having read through the comments, I see why my bug (116919) was marked as a duplicate, but when I originally searched for the problem I missed this issue because the summary doesn't directly mention Calc ignoring text delimiters in the csv import. I recommend the summary be changed to: "CSV import ignores text delimiters when there's a leading space" Thanks for the prompt response, knowing about the leading space resolves my issue. |