Apache OpenOffice (AOO) Bugzilla – Issue 76874
Calc imports csv of +1MB only incomplete
Last modified: 2007-07-11 11:59:50 UTC
I have an CSV with with 1820 lines but quite big fields - hole file exceeds 1 MB CALC terminated importation with "max.number of lines" at only 1700 lines martin supposed you want to what it was: http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=ACT_OTH_CLS_DLD&StrNom=CN_2007&StrFormat=CSV&StrLanguageCode=DE&IntKey=17692716&IntLevel=&bExport=
I can confirm the faulty behavior for OOo2.3.0m210 on WinXP. It is not issue75199, here are only 9 columns. The line break is CR but the same error is, if the line break is CR LF. The error seems to be in the csv import, because you have the same problem, if you open the file in writer, copy it and paste it as "unformated text", which brings up the csv dialog. You can open the file without problems in Base, with all rows. Then you can copy the base-table and insert it in Calc, both as RTF or as HTML will work. Then all rows are there. Therefore the amount of text is not to much for Calc itself.
This is probably NOT valid CSV. The ERROR is probably invalid. For details about the CSV-format, see e.g. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs (actually the wgole document is interesting). 1.) Fields that contain double quote characters must be surounded by double-quotes AND the embedded double-quotes MUST each be represented by a pair of consecutive double quotes - this csv-file contains lots of fields like the following one: ...;"some text"> more text"even more text";... those inner quotes are probably invalid 2.) Fields that contains embedded line-breaks must be surounded by double-quotes - this csv-file contains many lines that contain an ODD number of quotes, therefore, the filed DOES NOT end on that line, but would extend on the next line until an ending quote is reached!!! see e.g. line 184 in the original csv-file: - it contains 21 quotes I will append a simple gawk-script that can be used to detect lines with an ODD-number of quotes, therefore these are probably INVALID CSV-Files (IF one line is supposed to be equivalent to ONE record)!
Created attachment 44846 [details] Simple GAWK-script to test CSV validity
Hi, the problem is in the text delimiters, making the columns exceed the possibility's. So this Issue is a double to Issue 75199. Frank *** This issue has been marked as a duplicate of 75199 ***
setting the text delimiter to none in the csv import dialog solves the problem
@ discoleo well, easy doing, but it looks as you did not really got the problem: setting the single delimiter to ; should work in any case, usersetting has to overrule everything and oo.o has not to mix it up with the delimiter ";" @fst sorry, I did not got what you ment. However, Issue 75199 is not a duplicate of this If - as you suggested - it is a problem related to the internal logic reading lines (delimiter and/or column-count problem) you should try to solve these two issues first. But I still treat this issue as sort of an IO-error related to the filesize/filehandling too, and that should be solved separately. I intend to reopen this issue, but I appreciate to see your responds first Martin
Hi, this *is* a double to Issue 75199 as not the rows exceed the limits but the columns do. Therefore the Messagebox isn't correct in telling you the lines ar limited. If the text delimiter is unequal, all text behind the starting delimiter is text, making a field separator a normal text and vice versa. So the problem is in the file as it is not conform to the standards of csv files. Therefore this Issue will be closed again if re-open it. Frank
hallo frank as said before I am opening this issue again since the text-delimiters quote and double-quote are treated in OO.o in a rather propriatary way - and that feature had be introduced in this regid behavior in one of the last versions only - this issue must be re-opend as a defect, as the behavior can not be disabled by user, which is unwanted user-domination at that point. Bytheway: did I mention, that the starting CSV is from governmental-side, I have no intention to teach them how to deal better with CSV-files. I just want/need to read that files. And: you should not compare a wrong builded CSV with a right-builded CSV, so the issue can not be a double to Issue 75199, even when the result - not read successfully - lock the same. Hope you can manage this. Martin
OOo doesn't treat the quotes in a proprietary way. The data _is_ broken, record 184 contains unescaped quotes in the last field's data: "Garne, ungezwirnt, aus gekämmten Baumwollfasern, mit einem Anteil an Baumwolle von >= 85 GHT und mit einem Titer von 106,38 dtex bis < 125 dtex "> Nm 80" bis Nm 94" (ausg. Nähgarne sowie Garne in Aufmachungen für den Einzelverkauf)" *** This issue has been marked as a duplicate of 78926 ***
Closing dup.