Issue 111579 - Opening large html excel document from SAS
Summary: Opening large html excel document from SAS
Status: CLOSED DUPLICATE of issue 57176
Alias: None
Product: Calc
Classification: Application
Component: open-import (show other issues)
Version: OOO320m13
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: data_loss, oooqa
Depends on:
Blocks:
 
Reported: 2010-05-13 14:38 UTC by gioppo
Modified: 2023-01-04 22:15 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: 4.2.0-dev
Developer Difficulty: ---


Attachments
the file that generate the problem (410.67 KB, application/octet-stream)
2010-05-13 14:39 UTC, gioppo
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description gioppo 2010-05-13 14:38:37 UTC
I've got a big excel sheet coming from SAS that is an HTML with the
xmlns:x="urn:schemas-microsoft-com:office:excel" schema.
It contains around 26900 rows, but at around row 5600 it break and the rest of
the rows are merged into a single cell.
Is there a bug around?
Comment 1 gioppo 2010-05-13 14:39:56 UTC
Created attachment 69445 [details]
the file that generate the problem
Comment 2 Rainer Bielefeld 2010-05-14 07:47:51 UTC
My MS EXCEL-Viewer will not open "dettaglio_rendiconti_dip_fittizio.xls" at all
(without explaining what the problem might be), so I believe "opens with errors"
is not such a bad result.

My "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" hangs when I try to open sample
document from explorer (or, may be, I only was to impatient).  

My Seamonkey 2.0.4 browser opens a renamed
"dettaglio_rendiconti_dip_fittizio.html" without problems.

"Ooo-Dev 3.2.1 multilingual version English UI WIN XP: [OOO320m16 (Build 9497)]"
will open as a (html?) text document with file type "*", the document looks like
renamed .html document in OOo and shows the reported error. 

Related to Issue 89332?

@gioppo:
Pls. explain SAS - Scandinavin Airlines? Société par Actions Simplifiée? ;-)
In what OOo component did you see the document - really CALC or WRITER?
Comment 3 gioppo 2010-05-14 08:21:08 UTC
SAS business analitics and business intelligence tool.
The file opens correctly with excel 2003.
Is produced by a SAS reporting web application (is an "excel export").
This is a problem since big vendors usually do not ship with OOo export option,
but only excel.
The program used is just Calc since it was an excel export.
Tryed renaming it as html and opening with writer, it takes A LOT of time to
open (take a couple of coffee and have a chat) it opens in writer/web but is
SSSLLOOOWWWW painfully slow adn present the same error at some point the table
breaks and all ends up in the same paragraph.
I believe the relation to issue 89332 could be correct.
It seems that on large documents the xml/html parser gets mad.
I understantd that the formatting is the M$ mess, but is just a big table,
something a browser digest with no problem and also any html editor.
Any java out of memory problems around possible?
Comment 4 Rainer Bielefeld 2010-05-14 08:21:59 UTC
I forgot to mention: " ... open sample document from explorer BY DOUBLE CLICK.

I tried EXCEL VIEWER 8.0 and 12.0

Result with "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" and right click "open
with SCALC" from WIN EXPLORER": OOo hanged (I stopped the attempt after 15
minutes). 
Comment 5 gioppo 2010-05-14 08:33:15 UTC
Some new info.
Also firefox takes a bit of time in opening the file.
Also if I open the file with excel (no problem and really quick) and save as
excel format and not html the file is opened by Calc without any trouble.
There is something in the parsing of the file (HTML/XML)
Comment 6 gioppo 2010-05-14 08:34:35 UTC
Have to wait more than 15 minutes, just take a lunch ;-) and have faith
Comment 7 gioppo 2010-05-14 08:37:31 UTC
Also if you leave the extension xsl calc propose an import much like a csv stuff
... no good.
If file renamed xslx it opens it without no problem.
Version OOo used 3.2m13 from go-oo, will also check on SUN (ops Oracle) build.
Comment 8 Regina Henschel 2010-05-14 09:51:59 UTC
The file is not in xsl format. Excel detects this and selects an import filter
for you. Because OOo is not only a spreadsheet application like Excel, but a
suite with other modules, it cannot know, which module you want and takes that
module, which best fit to the content. Your file starts with <html> and so
Writer/Web will be used.
 
To open such files in Calc, you have to select the suitable filter yourself.
Start OOo and open the file from inside OOo. First select the file and then
select the file type. You need the type "Web Page Query (OpenOffice.org Calc)
(*.htm;*.html)". The file will open in normal speed, at least it does in my
DEV300m77.

The upcoming dialog box helps, that numbers and dates are imported correctly.
This is needed because for example in German 1.000 means one thousand and in US
English only one; or in US English 2/3/2010 means 3.February and in GB English
it means 2.March.

Issue 89332 is about the feature, that OOo will open files, that are html in
content but have xls as extension, in Calc automatically. But the solution is
not integrated yet.

*** This issue has been marked as a duplicate of 89332 ***
Comment 9 gioppo 2010-05-14 13:32:29 UTC
Mind that the problem IS NOT in the opening of the file, but in the visualization.
It breaks at row 6000 or so.
This is the real problem, making Calc open it is just a detail.
Comment 10 Regina Henschel 2010-05-14 14:27:59 UTC
OK, I see. It opens in normal speed using the Web Page Quest filter, but reads
only about 6712 lines of original 26902 lines.
Comment 11 Regina Henschel 2010-05-14 14:30:34 UTC
The file opens with all lines in Gnumeric too. The error remains for me on
WinXP, if I exchange the Unix line ends with DOS line ends.
Comment 12 damjan 2023-01-03 11:31:08 UTC
This is the infamous 16 bit paragraph limit from bug 57176, that only allows importing the first 65534 cells. Resolving duplicate.

Thank you for your bug report and sample file!

A patch to fix this is available, and a release with it will hopefully be out soon.

*** This issue has been marked as a duplicate of issue 57176 ***