Issue 87672 - autocorrect limit. acor.dat with entry 65535: Loop and/or loss of acor data
Summary: autocorrect limit. acor.dat with entry 65535: Loop and/or loss of acor data
Status: ACCEPTED
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: OOo 2.4.1
Hardware: All All
: P2 Trivial with 5 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL: http://www.oooforum.org/forum/viewtop...
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-31 20:28 UTC by tommy27
Modified: 2017-05-20 10:47 UTC (History)
7 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
the .zip contains the acor_it-IT.dat and documentlist.xml files that crash in my OOo 2.4 (703.47 KB, application/x-compressed)
2008-03-31 20:34 UTC, tommy27
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description tommy27 2008-03-31 20:28:27 UTC
unfortunately it seem that autocorrect entries number is not infinite and a 
limit exist. 

i was able to calculate the exact number of entries inside my acor_it-IT.dat 
file as it follows: 
- open the .dat file with 7-Zip ( http://www.7-zip.org/ ) 
- extract the documentlist.xml and create a backup copy 
- open that documentlist.xml with OpenOffice Writer 
- do a series of search-and-replace-with-nothings to strip away the XML code to 
leave just a list of abbreviations and replacements 
- at the end of the operation a OOo window will tell you the total amount of 
search & replace done which correnspond to the total of autocorrect entries

i have 65213 entries. 
i believe this corrensponds to the upper limit OOo can handle (both with 2.3 
and 2.4 version). 

if you add another autoccorect entry while mispelling a word and you select the 
suggested autocorrect entry from the right-click menu something weird happens. 

the acor_it-IT.dat "implodes"... you loose all the previous 65213 corrections 
and you remain only with the last one you inserted (i had a backup copy so i 
did not loose it) 
the dat file that is 399 kb with 65213 entries, is then replaced by a new file 
with 57 kb size. 

it seems to me that something prevents OOo to handle more that 65213 per 
language. 
adding new corretcions to my acor_en-US.dat file (which has just few entries) 
has no effect on the acor_it-IT.dat file. 

so, i'm gonna ask you some help: 
- what is the cause of the acor.dat file crash?
- do you come up with any alternative method to keep adding new autocorrect 
entries for my italian language file without loosing the previossly inserted 
ones? (i know dictionaries have an upper limit of 2000 words, then you have to 
start a new dictionary; can i t be done the same for autocor files?) 

i uploaded here ( http://www.sendspace.com/file/nlpcqj ) my acor_it-IT.dat and 
documentlist.xml files if you wanna use them for testing. 

thank you for your help. 

----------------- 

Related topics: 

http://www.oooforum.org/forum/viewtopic.phtml?t=66128&start=15 

http://user.services.openoffice.org/en/forum/viewtopic.php?
f=7&t=3156&sid=82bd0184693e60dae5586893d33ec089

http://www.openoffice.org/servlets/ReadMsg?list=users&msgNo=176711
Comment 1 tommy27 2008-03-31 20:34:05 UTC
Created attachment 52416 [details]
the .zip contains the acor_it-IT.dat and documentlist.xml files that crash in my OOo 2.4
Comment 2 michael.ruess 2008-04-01 07:27:11 UTC
Reassigned to SBA.
Comment 3 tommy27 2008-04-11 21:05:45 UTC
i found that MS Word has officially no definite autocorrect entries:

"Maximum number of AutoCorrect entries. Limited to memory/hard disk space"

Source: http://support.microsoft.com/kb/109296

it would be great to see something like that for OOo as well.
Comment 4 tommy27 2008-04-18 06:19:46 UTC
user Jim Plante (and other users as well) suggested this as an explanation for 
the autocorrect entries limit and crash.

http://www.oooforum.org/forum/viewtopic.phtml?p=282038#282038

"Just to add some information, 64K bytes is actually 65,535, not 64,000. This 
is just a wild guess, but I would surmise that the autocorrect entries are held 
in an array. Arrays are indexed with integers. The biggest integer allowable is 
65,535 (as a regular integer). So I suspect that in trying to access that extra 
entry, you're running the array out of range. Without digging into the source 
code (which I am NOT going to do), we can't tell."

the only thing that doesn't fot with this theory is that the limit seem 65,213 
and not 65,535.

could you confirm or not this hypothesis? thanks.



Comment 5 tommy27 2008-06-10 20:07:13 UTC
i still haven't found an explanation about the reason OOo crashes and erase the 
database with the entry 65213.

however i found a workaround to keep adding new entries without loosing the old 
database. the trick is to start a new autocorrect database which works on any 
language. 

let's see it step by step:

1- opened the autocorrect entries table, clicked on the first tab which is 
"Language" and selected "All languages" which is on the top of the list". A new 
acor_.dat file was created.

2- closed OpenOffice

3- swapped filenames between the 2 acor.dat files.
acor_it-IT.dat (which is the saturated "65213 entries" file) is renamed to 
acor_.dat (which is the new empty one) and viceversa.


this means that the old autocorrect entries are stored and untouched, while the 
new database is brand new and can be opened with Ctrl+H in a few seconds 
without waiting many minutes (remember, the more entries you have the slower 
the autocorrect replacement table is opened).

as i said this is a workaround, not a definitive solution.
once the 65231 entries limit will be reached in the new database i'll be in 
troubles again.

i took 2 years to fill the first database.
i hope the OOo team will find a definitive solution before that time.


i found a way to keep adding italian autocorrect entries even if the acor_it-
IT.dat file has been saturated by the 65213 entries i already put inside it. 


1- 


if you 
Comment 6 tommy27 2008-06-30 08:41:09 UTC
the "acor.dat file crash with entry 65214" that is reported in OOo 2.4.0 is 
present also in OOo 2.4.1.
Comment 7 tommy27 2008-08-25 17:40:58 UTC
just some informations:

1- has any reaseach been done about the cause of the "autocorrect limit crash 
bug"?

2- does the "P2" priority level mean that this issue will be addressed in OOo 
3.0?

3- why the status is still "UNCONFIRMED"? did you make some tests with the 
files i uploaded?

thanks for your attention, hoping a reply.
Comment 8 stefan.baltzer 2008-10-27 12:55:26 UTC
SBA: My findings:
(1) Have the (obviously already too big) attachched acor file in the
/.../user/../autocorr path
(2) Launch Office (Writer)
(3) Tools-AutoCorrect
(4) On tab "replace", select language "Italian"
-> Loop in OOo 2.4.1 and Ooo 3.0

Variation: 
(1) Have the attachched acor file in the /.../share/../autocorr path
(2) Launch Office (Writer), AutoSpellCheck ON
(3) Set text language to Italian
(4) In a misspelled word, call context -> AutoCorr... Add
-> The word in the text gets changed.
Office does NOT crash
-> Look at tab "Replace", the list is EMPTY
-> Look in file system: Acor file is shrinked to ~59kB (Data Loss)

Confirming issue. Adjusting summary to reflect the findings.
SBA->OS: Please proceed, thank you.
Comment 9 stefan.baltzer 2008-10-27 15:34:10 UTC
Reassigned. 
Comment 10 tommy27 2008-10-27 21:24:43 UTC
i hope you guys will find the reason (and hopefully) the solutions to this bug.

i still have room for another 40000 autocorrections in my new (swapped 
filename) acor_it-IT.dat but i'm going to saturate it too one day.
Comment 11 Oliver Specht 2008-10-28 12:56:44 UTC
The reason of the problem is the fact that data structures based on 16 bit data
are used. This naturally produces problems if there are more than 65535 entries
in the list. 
Subject changed (no is 65535) component changed to framework, platform+os
changed to All
Comment 12 tommy27 2008-10-28 14:38:30 UTC
65535 ?

my OOo crashed with entry number 65214 but i suppose the method i used to 
calculate the autocorrect entries number (see my first post) was not 100% 
accurate.

anyway, now i hope there will be a wat to solve this.
with time other people will saturate their acor.dat files and loss of data would 
be very annoying
Comment 13 tommy27 2008-11-01 11:01:24 UTC
@os

you said: "data structures based on 16 bit data
are used. This naturally produces problems if there are more than 65535 entries
in the list".

i have a suggestion: OOo should be able to handle more than 1 acor.dat file for 
the same language.

actually if you saturate the acor_it-IT.dat you cannot enter more italian 
autocorrection. the only way to do it is to use the workaround method i told 
before using a universal acor_.dat for all languages.

however once you have saturated that file you are again at the starting point.

my suggestion is that OOo should be able to handle more acor.dat files for the 
same language.

i.e. acor_it-IT1.dat , acor_it-IT3.dat , acor_it-IT4.dat etc. etc.
each one can contain 65535 entries

this is just like with dictionaries in which you have a 2000 word limit. once 
you fill the first standard.dic you cannot enter new words inside it but you 
can start a standard2.dic or a standard3.dic.

autocorrect should work the same way.
once you reach the 65535 limit a warning should alert that the file is full 
(data loss crash should be prevented) and a new acor.dat file with proper 
language should be started.

do you think this is possible?


Comment 14 Oliver Specht 2008-11-01 11:45:13 UTC
->tommy27: We should fix it. Integrating a workaround takes the same amount of
work as fixing it. 
The difference to dictionaries is that there the size has direct influence on
the performance as the spell checking searches in these dictionaries while
spelling. The auto correction file is only checked at the time you write a new
word. So in this case size doesn't matter that much.
Comment 15 tommy27 2008-11-01 12:11:18 UTC
i understand, however i noticed the the performance of manual insertion of 
corrections in the autocorrect replacement table with  "the ctrl + h" hotkey is 
affected by the acod.dat file size.

if you have few entries in it the replacement table opens in a few seconds but 
the more entries you put inside it the opening becomes slower and slower.

with my previous acor.dat file it took "15 minutes" to access the replacement 
table after clicking ctrl+h. that was due to its enornous amount of entries.

i was thinkg that using more acor.dat files as i suggested could solve also 
this issue.

if you indeed remove the 65353 limit, having more entries in a single acor.dat 
file  (i.e. 75000, 90000. etc) will make the process even slower.
Comment 16 tommy27 2008-12-08 22:30:48 UTC
to os.

i respectfully ask you again to consider the "handle more acor.dat files for  
each language" solution.

i.e. acor_it-IT1.dat , acor_it-IT3.dat , acor_it-IT4.dat etc. etc.

there is people who is submitting their own autocorrect entries database as 
extensions. (i.e. http://extensions.services.openoffice.org/project/
AutocorrectRomanian ).

if you wanna use it you should overwrite your pre-existing 
acor_yourlanguage.dat file

handling more than one .dat file for each language (just like distionaries)
would have several advantages:

- no overwriting of pre-existing "acor_yourlanguage.dat file" allowing sharing 
of .dat files among users, just like dictionaries

- workarounding the 65535 limit (you should however prevent the data loss, and 
not let add more entries when it's full... just like the 2000 limit in 
dictionaries)

- faster loading of Ctrl+H replacement table (as i said with 65535 is terribly 
slow, allowing more entries in the same file would make it even slower. it 
would be much better to start with an empty and faster one once the first one 
is full, bloated and slow as a turtle)

these are just my 3 cents.
i'd like to know what you think about it.

Comment 17 tommy27 2009-01-22 17:36:35 UTC
has any progress been done on this issue in the meantime?

i suggested it as a 3.1 blocker but u think it didn't make the cut before the 
code freeze.

what about 3.2? do you think there are chances to see it fixed for that release?

please, understand my question are only led by curiosity... i'm not complaining 
at all about not having fixed yet my issue.

i'm just wondering what kind of startegy are you thinking to use to prevent the 
data crash and overcome the 16-bit limitation
Comment 18 tommy27 2009-02-26 17:01:50 UTC
i counted again my autocorrect entries...

from june 2008 to february 2009 i collected another 26600 items...

with this pace i think i'll reach the 65535 limit at the end of 2009...

do you think that OOo 3.2 (which is planned for september 2009) is going to fix 
this issue?
Comment 19 tommy27 2009-03-30 14:36:38 UTC
new update... my count is now 31667 which means that almost 50% of my database 
is already full...

i was thinking that a workaround could be to enable sharing of autocorrect 
entries among “same language subgroups”...

OOo has indeed separate replacement tables for UK English, USA English, AUS 
English...
it has also “Italian (Italy)” and “Italian (Swiss)”...

however actually they are not mutual... if u enter an entry in the UK database 
it won't correct that mistake if you are writing in a US English document.

Do u think is possible to make them mutual or at least to create a new common 
replacement table for all of the “same language subgroups”?

something like: 

acor_it-ALL.dat   working both on italian and swiss language
acor_en-ALL.dat working on both UK, US, AUS etc. ect. English variants 
that would mean another  65535 entries available once the original acor_it-
IT.dat file will be completely full.

something like: 

acor_it-ALL.dat   working both on italian and swiss language
acor_en-ALL.dat working on both UK, US, AUS etc. ect. English variants 
that would mean another  65535 entries available once the original acor_it-
IT.dat file will be completely full.

i opened a topic on OOo Forum about it:

http://www.oooforum.org/forum/viewtopic.phtml?
t=81211&highlight=&sid=af038b278131c0743dbca648347f05ec

hope somebody will have an idea about it.
Comment 20 tommy27 2009-04-07 20:10:33 UTC
excuse me, what's the meaning of Status whiteboard: RM-S4?
Comment 21 tommy27 2009-04-20 15:18:10 UTC
in the meantime i reached 33960 entries in my acor.dat file

that means that it's half full now.

i also opened a new issue 101224 asking for the "common replacement table for 
similar language groups" as a new feature.
Comment 22 tommy27 2009-05-07 08:10:57 UTC
i'm curious about the status of this issue.

1- did you find a way to try to fix it?

2- what's the meaning of Status whiteboard RM-S4?

3- is there any chance to see the issue fixed Regolare ampiezza dei forami di 
coniugazione nervosa. OOo 3.1.1 or at least 3.2?

thank you
Comment 23 Oliver Specht 2009-05-07 09:02:46 UTC
1- did you find a way to try to fix it?
No, I didn't. This issue is not the one with the highest priority.
The way you are using this feature is not the common practice as your list is
more or less a mechanism to correct the spelling of lots of word with lots of
alternatives for each of it.
2- what's the meaning of Status whiteboard RM-S4?
mh added this status (now cc'ed)

3- is there any chance to see the issue fixed Regolare ampiezza dei forami di 
coniugazione nervosa. OOo 3.1.1 or at least 3.2?
Not a real chance. To extend the number of entries in the list is not that
complicated. But the user interface is not even capable to handle the current
possible number of entries. Opening the dialog with 64 K entries takes minutes
to fill the ListBox. 
Comment 24 tommy27 2009-05-07 17:24:22 UTC
Thanks for feedback... however i respectfully diasgree with some of the things 
you said:

a- issue priority
maybe is not the highest, but it's still a P2 issue which is higher than many 
P3 and P4 issues i see round here.

b- the way i use autocorrect
well, i think my use of autocorrect is not that strange. 
The whole database is a list of the many typing errors i made in my daily work.

I write medical reports all the day and i have to write them fast... that's why 
i have collected so many mistakes and i entered in the replacement table to 
avoid future errors.

With my massive everyday use of Writer i saturated the OOo acor.dat file with 
65534 entries in 2 years.  You should expect that other users will reach that 
limit one day... maybe they will take longer than me but day after day they 
will get at that point.

I think the problem is that:

1- autocorrect entries number is limited

entries are not infinite unlike in MS Word (as far as i know - read previous 
posts) and this is a weak point of OOo.

2- reaching the limit causes data loss

as i reported, once you add entry number  65535 the acor.dat file crashes and 
all the database is permanently erased... this is really bad... 
i was lucky i had backupped it before so i did not loose my precious 
autocorrect databases... i would have been very pissed off if that ever happened

POSSIBLE SOLUTIONS

i think you should find a way to address at first the data loss issue and, if 
possible, also the entry limit issue.

Solution A: lock the acor.dat file once entry 65534 is added.

you should prevent users to add entry 65535 that causes the crash and data loss.
I remember that OOo user custom dictionaries had a 2000 words limit (that has 
been now set to 30000), once you reached that limit and tried to add a new 
word, you were alerted with an error message telling you “dictionary is full”.

Solution B: let acor.dat file store more than 65K entries.
I know that would make OOo replacement table open very slowly (even 15 minutes 
with the actual 65K limit)

Solution C (which would be my favourite one and that i suggested in earlier 
posts): allow creation of additional acor.dat files 
once an acor.dat file is full, the user should be allowed to start a new 
replacement table database just like with dictionaries. Old autocorrection 
would still be active and new ones should be added to the brand new acor.dat 
file which woyld also load faster since it is almost empty.

Issue 101224 that i open few days ago could be a kind of workaround for this.
Comment 25 tommy27 2010-01-22 07:23:55 UTC
hi guys.
this morning while i was shaving i came out with this idea.

what about splitting each language acor.dat file, into subfiles according to 
first letter?

let me try to explain:

- actually the acor_en-US.dat contains a list of all spelling errors from A to 
Z and has an upper limit of 65535 entries

- would it be possible to do somenthing like:
acor_en-US_ABCD.dat   --> aople --> apple  
acor_en-US_EFGH.dat   --> eplehant --> elephant
acor_en-US.IJKL.dat   --> juce --> juice
.....
.....
acor_en-US_WXYZ.dat

that would split the database into smaller ones according to the first letter 
of the spelling error.

each one of this smaller database would still have the 65535 entry limit, but 
this would dramatically amplify the number of available autocorrect entries per 
language:  user could store 65K entries in the acor_en-US_ABCD.dat, another 65K 
entries in the acor_en-US_EFGH.dat  etc. etc.

morevoer the loading of the repalacement table would be easier and faster if 
OOo doesn't have to load all the entries from A to Z but only a limited set (A 
to D or E to H etc. etc.)

do you think that this workaround would be a feasible solution?


Comment 26 tommy27 2010-06-20 11:10:55 UTC
@os

hi, I've heard that OOo 3.3 will enhance Calc allowing it to handle up to 2^20 
columns from the actual upper limit which is 2^16 (65536 columns)

2^16 is actually the same upper limit for autocorrect database entries.

do you think that Writer autocorrect database could be upgraded to 2^16 to 2^20 
as well as Calc?
Comment 27 tommy27 2011-01-21 16:19:08 UTC
Hello, any chance this issue which to be considered to be fixed in OOo 3.4?

http://wiki.services.openoffice.org/wiki/OOoRelease34

this issue has P2 priority and causes data loss.
Comment 28 tommy27 2012-05-14 21:02:03 UTC
the problem here is that the .xml containing all the autocorrect entries is still a 16-bit encoded structure.

isn't it a little anachronistic in the era of 64-bit computers?
Comment 29 tommy27 2012-12-13 21:21:44 UTC
this issue has been fixed in LibreOffice and will be available in LibO 4.0 (february 2013)

autocorrect capacity has been expanded to 4 millions entries

see: https://bugs.freedesktop.org/show_bug.cgi?id=48729
Comment 30 Ariel Constenla-Haile 2012-12-13 22:01:43 UTC
(In reply to comment #29)
> this issue has been fixed in LibreOffice and will be available in LibO 4.0
> (february 2013)

nice way of promoting another project's release in a bug tracker
Comment 31 tommy27 2012-12-28 19:01:56 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > this issue has been fixed in LibreOffice and will be available in LibO 4.0
> > (february 2013)
> 
> nice way of promoting another project's release in a bug tracker


just facing the fact that the bug report I opened in 2008 in OOo bug-tracker has been fixed by LibO. I hope AOO will fix it as well.

by the way, let me correct infos I gave in Comment 29...
the new limit is not 4 millions, but 4 billions (2^32 = 4294967296)
Comment 32 Marcus 2017-05-20 10:47:51 UTC
Reset assigne to the default "issues@openoffice.apache.org".