Issue 76874 - Calc imports csv of +1MB only incomplete
Summary: Calc imports csv of +1MB only incomplete
Status: CLOSED DUPLICATE of issue 78926
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 2.2.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: spreadsheet
QA Contact: issues@sc
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2007-05-02 18:54 UTC by mhatheoo
Modified: 2007-07-11 11:59 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Simple GAWK-script to test CSV validity (383 bytes, text/plain)
2007-05-03 19:38 UTC, discoleo
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description mhatheoo 2007-05-02 18:54:02 UTC
I have an CSV with with 1820 lines but quite big fields - hole file exceeds 1 MB

CALC terminated importation with "max.number of lines" at only 1700 lines


martin


supposed you want to what it was:
http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=ACT_OTH_CLS_DLD&StrNom=CN_2007&StrFormat=CSV&StrLanguageCode=DE&IntKey=17692716&IntLevel=&bExport=
Comment 1 Regina Henschel 2007-05-02 19:50:44 UTC
I can confirm the faulty behavior for OOo2.3.0m210 on WinXP.

It is not issue75199, here are only 9 columns. The line break is CR but the same
error is, if the line break is CR LF.

The error seems to be in the csv import, because you have the same problem, if
you open the file in writer, copy it and paste it as "unformated text", which
brings up the csv dialog.

You can open the file without problems in Base, with all rows. Then you can copy
the base-table and insert it in Calc, both as RTF or as HTML will work. Then all
rows are there. Therefore the amount of text is not to much for Calc itself.
Comment 2 discoleo 2007-05-03 19:35:25 UTC
This is probably NOT valid CSV.

The ERROR is probably invalid.

For details about the CSV-format, see e.g.
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs (actually the
wgole document is interesting).

1.) Fields that contain double quote characters must be surounded by double-quotes
    AND
    the embedded double-quotes MUST each be represented by a pair of consecutive
double quotes
  - this csv-file contains lots of fields like the following one:
     ...;"some text"> more text"even more text";...
    those inner quotes are probably invalid

2.) Fields that contains embedded line-breaks must be surounded by double-quotes
  - this csv-file contains many lines that contain an ODD number of quotes,
    therefore, the filed DOES NOT end on that line, but would extend on the next
line
    until an ending quote is reached!!!
    see e.g. line 184 in the original csv-file:
    - it contains 21 quotes

I will append a simple gawk-script that can be used to detect lines with an
ODD-number of quotes, therefore these are probably INVALID CSV-Files (IF one
line is supposed to be equivalent to ONE record)!
Comment 3 discoleo 2007-05-03 19:38:43 UTC
Created attachment 44846 [details]
Simple GAWK-script to test CSV validity
Comment 4 frank 2007-06-27 16:24:20 UTC
Hi,

the problem is in the text delimiters, making the columns exceed the
possibility's. So this Issue is a double to Issue 75199.

Frank

*** This issue has been marked as a duplicate of 75199 ***
Comment 5 frank 2007-06-27 16:25:15 UTC
setting the text delimiter to none in the csv import dialog solves the problem
Comment 6 mhatheoo 2007-07-07 23:57:14 UTC
@ discoleo

well, easy doing, but it looks as you did not really got the problem:
setting the single delimiter to ; should work in any case, usersetting has to
overrule everything and oo.o has not to mix it up with the delimiter ";"


@fst

sorry, I did not got what you ment. However, Issue 75199 is not a duplicate of 
this 

If - as you suggested - it is a problem related to the internal logic reading
lines (delimiter and/or column-count problem) you should try to solve these two
issues first.
But I still treat this issue as sort of an IO-error related to the
filesize/filehandling too, and that should be solved separately.

I intend to reopen this issue, but I appreciate to see your responds first

Martin
Comment 7 frank 2007-07-09 08:21:47 UTC
Hi,

this *is* a double to Issue 75199 as not the rows exceed the limits but the
columns do. Therefore the Messagebox isn't correct in telling you the lines ar
limited.

If the text delimiter is unequal, all text behind the starting delimiter is
text, making a field separator a normal text and vice versa. So the problem is
in the file as it is not conform to the standards of csv files.

Therefore this Issue will be closed again if re-open it.

Frank
Comment 8 mhatheoo 2007-07-10 17:29:37 UTC
hallo frank

as said before I am opening this issue again

since the text-delimiters quote and double-quote are treated in OO.o in a rather
propriatary way - and that feature had be introduced in this regid behavior in
one of the last versions only - this issue must be re-opend as a defect, as the
behavior can not be disabled by user, which is unwanted user-domination at that
point.

Bytheway: did I mention, that the starting CSV is from governmental-side, I have
no intention to teach them how to deal better with CSV-files. I just want/need
to read that files.
And: you should not compare a wrong builded CSV with a right-builded CSV, so the
issue can not be a double to Issue 75199, even when the result - not read
successfully - lock the same.

Hope you can manage this.

Martin
Comment 9 ooo 2007-07-11 11:59:08 UTC
OOo doesn't treat the quotes in a proprietary way. The data _is_ broken, record
184 contains unescaped quotes in the last field's data:

"Garne, ungezwirnt, aus gekämmten Baumwollfasern, mit einem Anteil an Baumwolle
von >= 85 GHT und mit einem Titer von 106,38 dtex bis < 125 dtex "> Nm 80" bis
Nm 94" (ausg. Nähgarne sowie Garne in Aufmachungen für den Einzelverkauf)"



*** This issue has been marked as a duplicate of 78926 ***
Comment 10 ooo 2007-07-11 11:59:50 UTC
Closing dup.