Apache OpenOffice (AOO) Bugzilla – Issue 87672
autocorrect limit. acor.dat with entry 65535: Loop and/or loss of acor data
Last modified: 2017-05-20 10:47:51 UTC
unfortunately it seem that autocorrect entries number is not infinite and a limit exist. i was able to calculate the exact number of entries inside my acor_it-IT.dat file as it follows: - open the .dat file with 7-Zip ( http://www.7-zip.org/ ) - extract the documentlist.xml and create a backup copy - open that documentlist.xml with OpenOffice Writer - do a series of search-and-replace-with-nothings to strip away the XML code to leave just a list of abbreviations and replacements - at the end of the operation a OOo window will tell you the total amount of search & replace done which correnspond to the total of autocorrect entries i have 65213 entries. i believe this corrensponds to the upper limit OOo can handle (both with 2.3 and 2.4 version). if you add another autoccorect entry while mispelling a word and you select the suggested autocorrect entry from the right-click menu something weird happens. the acor_it-IT.dat "implodes"... you loose all the previous 65213 corrections and you remain only with the last one you inserted (i had a backup copy so i did not loose it) the dat file that is 399 kb with 65213 entries, is then replaced by a new file with 57 kb size. it seems to me that something prevents OOo to handle more that 65213 per language. adding new corretcions to my acor_en-US.dat file (which has just few entries) has no effect on the acor_it-IT.dat file. so, i'm gonna ask you some help: - what is the cause of the acor.dat file crash? - do you come up with any alternative method to keep adding new autocorrect entries for my italian language file without loosing the previossly inserted ones? (i know dictionaries have an upper limit of 2000 words, then you have to start a new dictionary; can i t be done the same for autocor files?) i uploaded here ( http://www.sendspace.com/file/nlpcqj ) my acor_it-IT.dat and documentlist.xml files if you wanna use them for testing. thank you for your help. ----------------- Related topics: http://www.oooforum.org/forum/viewtopic.phtml?t=66128&start=15 http://user.services.openoffice.org/en/forum/viewtopic.php? f=7&t=3156&sid=82bd0184693e60dae5586893d33ec089 http://www.openoffice.org/servlets/ReadMsg?list=users&msgNo=176711
Created attachment 52416 [details] the .zip contains the acor_it-IT.dat and documentlist.xml files that crash in my OOo 2.4
Reassigned to SBA.
i found that MS Word has officially no definite autocorrect entries: "Maximum number of AutoCorrect entries. Limited to memory/hard disk space" Source: http://support.microsoft.com/kb/109296 it would be great to see something like that for OOo as well.
user Jim Plante (and other users as well) suggested this as an explanation for the autocorrect entries limit and crash. http://www.oooforum.org/forum/viewtopic.phtml?p=282038#282038 "Just to add some information, 64K bytes is actually 65,535, not 64,000. This is just a wild guess, but I would surmise that the autocorrect entries are held in an array. Arrays are indexed with integers. The biggest integer allowable is 65,535 (as a regular integer). So I suspect that in trying to access that extra entry, you're running the array out of range. Without digging into the source code (which I am NOT going to do), we can't tell." the only thing that doesn't fot with this theory is that the limit seem 65,213 and not 65,535. could you confirm or not this hypothesis? thanks.
i still haven't found an explanation about the reason OOo crashes and erase the database with the entry 65213. however i found a workaround to keep adding new entries without loosing the old database. the trick is to start a new autocorrect database which works on any language. let's see it step by step: 1- opened the autocorrect entries table, clicked on the first tab which is "Language" and selected "All languages" which is on the top of the list". A new acor_.dat file was created. 2- closed OpenOffice 3- swapped filenames between the 2 acor.dat files. acor_it-IT.dat (which is the saturated "65213 entries" file) is renamed to acor_.dat (which is the new empty one) and viceversa. this means that the old autocorrect entries are stored and untouched, while the new database is brand new and can be opened with Ctrl+H in a few seconds without waiting many minutes (remember, the more entries you have the slower the autocorrect replacement table is opened). as i said this is a workaround, not a definitive solution. once the 65231 entries limit will be reached in the new database i'll be in troubles again. i took 2 years to fill the first database. i hope the OOo team will find a definitive solution before that time. i found a way to keep adding italian autocorrect entries even if the acor_it- IT.dat file has been saturated by the 65213 entries i already put inside it. 1- if you
the "acor.dat file crash with entry 65214" that is reported in OOo 2.4.0 is present also in OOo 2.4.1.
just some informations: 1- has any reaseach been done about the cause of the "autocorrect limit crash bug"? 2- does the "P2" priority level mean that this issue will be addressed in OOo 3.0? 3- why the status is still "UNCONFIRMED"? did you make some tests with the files i uploaded? thanks for your attention, hoping a reply.
SBA: My findings: (1) Have the (obviously already too big) attachched acor file in the /.../user/../autocorr path (2) Launch Office (Writer) (3) Tools-AutoCorrect (4) On tab "replace", select language "Italian" -> Loop in OOo 2.4.1 and Ooo 3.0 Variation: (1) Have the attachched acor file in the /.../share/../autocorr path (2) Launch Office (Writer), AutoSpellCheck ON (3) Set text language to Italian (4) In a misspelled word, call context -> AutoCorr... Add -> The word in the text gets changed. Office does NOT crash -> Look at tab "Replace", the list is EMPTY -> Look in file system: Acor file is shrinked to ~59kB (Data Loss) Confirming issue. Adjusting summary to reflect the findings. SBA->OS: Please proceed, thank you.
Reassigned.
i hope you guys will find the reason (and hopefully) the solutions to this bug. i still have room for another 40000 autocorrections in my new (swapped filename) acor_it-IT.dat but i'm going to saturate it too one day.
The reason of the problem is the fact that data structures based on 16 bit data are used. This naturally produces problems if there are more than 65535 entries in the list. Subject changed (no is 65535) component changed to framework, platform+os changed to All
65535 ? my OOo crashed with entry number 65214 but i suppose the method i used to calculate the autocorrect entries number (see my first post) was not 100% accurate. anyway, now i hope there will be a wat to solve this. with time other people will saturate their acor.dat files and loss of data would be very annoying
@os you said: "data structures based on 16 bit data are used. This naturally produces problems if there are more than 65535 entries in the list". i have a suggestion: OOo should be able to handle more than 1 acor.dat file for the same language. actually if you saturate the acor_it-IT.dat you cannot enter more italian autocorrection. the only way to do it is to use the workaround method i told before using a universal acor_.dat for all languages. however once you have saturated that file you are again at the starting point. my suggestion is that OOo should be able to handle more acor.dat files for the same language. i.e. acor_it-IT1.dat , acor_it-IT3.dat , acor_it-IT4.dat etc. etc. each one can contain 65535 entries this is just like with dictionaries in which you have a 2000 word limit. once you fill the first standard.dic you cannot enter new words inside it but you can start a standard2.dic or a standard3.dic. autocorrect should work the same way. once you reach the 65535 limit a warning should alert that the file is full (data loss crash should be prevented) and a new acor.dat file with proper language should be started. do you think this is possible?
->tommy27: We should fix it. Integrating a workaround takes the same amount of work as fixing it. The difference to dictionaries is that there the size has direct influence on the performance as the spell checking searches in these dictionaries while spelling. The auto correction file is only checked at the time you write a new word. So in this case size doesn't matter that much.
i understand, however i noticed the the performance of manual insertion of corrections in the autocorrect replacement table with "the ctrl + h" hotkey is affected by the acod.dat file size. if you have few entries in it the replacement table opens in a few seconds but the more entries you put inside it the opening becomes slower and slower. with my previous acor.dat file it took "15 minutes" to access the replacement table after clicking ctrl+h. that was due to its enornous amount of entries. i was thinkg that using more acor.dat files as i suggested could solve also this issue. if you indeed remove the 65353 limit, having more entries in a single acor.dat file (i.e. 75000, 90000. etc) will make the process even slower.
to os. i respectfully ask you again to consider the "handle more acor.dat files for each language" solution. i.e. acor_it-IT1.dat , acor_it-IT3.dat , acor_it-IT4.dat etc. etc. there is people who is submitting their own autocorrect entries database as extensions. (i.e. http://extensions.services.openoffice.org/project/ AutocorrectRomanian ). if you wanna use it you should overwrite your pre-existing acor_yourlanguage.dat file handling more than one .dat file for each language (just like distionaries) would have several advantages: - no overwriting of pre-existing "acor_yourlanguage.dat file" allowing sharing of .dat files among users, just like dictionaries - workarounding the 65535 limit (you should however prevent the data loss, and not let add more entries when it's full... just like the 2000 limit in dictionaries) - faster loading of Ctrl+H replacement table (as i said with 65535 is terribly slow, allowing more entries in the same file would make it even slower. it would be much better to start with an empty and faster one once the first one is full, bloated and slow as a turtle) these are just my 3 cents. i'd like to know what you think about it.
has any progress been done on this issue in the meantime? i suggested it as a 3.1 blocker but u think it didn't make the cut before the code freeze. what about 3.2? do you think there are chances to see it fixed for that release? please, understand my question are only led by curiosity... i'm not complaining at all about not having fixed yet my issue. i'm just wondering what kind of startegy are you thinking to use to prevent the data crash and overcome the 16-bit limitation
i counted again my autocorrect entries... from june 2008 to february 2009 i collected another 26600 items... with this pace i think i'll reach the 65535 limit at the end of 2009... do you think that OOo 3.2 (which is planned for september 2009) is going to fix this issue?
new update... my count is now 31667 which means that almost 50% of my database is already full... i was thinking that a workaround could be to enable sharing of autocorrect entries among “same language subgroups”... OOo has indeed separate replacement tables for UK English, USA English, AUS English... it has also “Italian (Italy)” and “Italian (Swiss)”... however actually they are not mutual... if u enter an entry in the UK database it won't correct that mistake if you are writing in a US English document. Do u think is possible to make them mutual or at least to create a new common replacement table for all of the “same language subgroups”? something like: acor_it-ALL.dat working both on italian and swiss language acor_en-ALL.dat working on both UK, US, AUS etc. ect. English variants that would mean another 65535 entries available once the original acor_it- IT.dat file will be completely full. something like: acor_it-ALL.dat working both on italian and swiss language acor_en-ALL.dat working on both UK, US, AUS etc. ect. English variants that would mean another 65535 entries available once the original acor_it- IT.dat file will be completely full. i opened a topic on OOo Forum about it: http://www.oooforum.org/forum/viewtopic.phtml? t=81211&highlight=&sid=af038b278131c0743dbca648347f05ec hope somebody will have an idea about it.
excuse me, what's the meaning of Status whiteboard: RM-S4?
in the meantime i reached 33960 entries in my acor.dat file that means that it's half full now. i also opened a new issue 101224 asking for the "common replacement table for similar language groups" as a new feature.
i'm curious about the status of this issue. 1- did you find a way to try to fix it? 2- what's the meaning of Status whiteboard RM-S4? 3- is there any chance to see the issue fixed Regolare ampiezza dei forami di coniugazione nervosa. OOo 3.1.1 or at least 3.2? thank you
1- did you find a way to try to fix it? No, I didn't. This issue is not the one with the highest priority. The way you are using this feature is not the common practice as your list is more or less a mechanism to correct the spelling of lots of word with lots of alternatives for each of it. 2- what's the meaning of Status whiteboard RM-S4? mh added this status (now cc'ed) 3- is there any chance to see the issue fixed Regolare ampiezza dei forami di coniugazione nervosa. OOo 3.1.1 or at least 3.2? Not a real chance. To extend the number of entries in the list is not that complicated. But the user interface is not even capable to handle the current possible number of entries. Opening the dialog with 64 K entries takes minutes to fill the ListBox.
Thanks for feedback... however i respectfully diasgree with some of the things you said: a- issue priority maybe is not the highest, but it's still a P2 issue which is higher than many P3 and P4 issues i see round here. b- the way i use autocorrect well, i think my use of autocorrect is not that strange. The whole database is a list of the many typing errors i made in my daily work. I write medical reports all the day and i have to write them fast... that's why i have collected so many mistakes and i entered in the replacement table to avoid future errors. With my massive everyday use of Writer i saturated the OOo acor.dat file with 65534 entries in 2 years. You should expect that other users will reach that limit one day... maybe they will take longer than me but day after day they will get at that point. I think the problem is that: 1- autocorrect entries number is limited entries are not infinite unlike in MS Word (as far as i know - read previous posts) and this is a weak point of OOo. 2- reaching the limit causes data loss as i reported, once you add entry number 65535 the acor.dat file crashes and all the database is permanently erased... this is really bad... i was lucky i had backupped it before so i did not loose my precious autocorrect databases... i would have been very pissed off if that ever happened POSSIBLE SOLUTIONS i think you should find a way to address at first the data loss issue and, if possible, also the entry limit issue. Solution A: lock the acor.dat file once entry 65534 is added. you should prevent users to add entry 65535 that causes the crash and data loss. I remember that OOo user custom dictionaries had a 2000 words limit (that has been now set to 30000), once you reached that limit and tried to add a new word, you were alerted with an error message telling you “dictionary is full”. Solution B: let acor.dat file store more than 65K entries. I know that would make OOo replacement table open very slowly (even 15 minutes with the actual 65K limit) Solution C (which would be my favourite one and that i suggested in earlier posts): allow creation of additional acor.dat files once an acor.dat file is full, the user should be allowed to start a new replacement table database just like with dictionaries. Old autocorrection would still be active and new ones should be added to the brand new acor.dat file which woyld also load faster since it is almost empty. Issue 101224 that i open few days ago could be a kind of workaround for this.
hi guys. this morning while i was shaving i came out with this idea. what about splitting each language acor.dat file, into subfiles according to first letter? let me try to explain: - actually the acor_en-US.dat contains a list of all spelling errors from A to Z and has an upper limit of 65535 entries - would it be possible to do somenthing like: acor_en-US_ABCD.dat --> aople --> apple acor_en-US_EFGH.dat --> eplehant --> elephant acor_en-US.IJKL.dat --> juce --> juice ..... ..... acor_en-US_WXYZ.dat that would split the database into smaller ones according to the first letter of the spelling error. each one of this smaller database would still have the 65535 entry limit, but this would dramatically amplify the number of available autocorrect entries per language: user could store 65K entries in the acor_en-US_ABCD.dat, another 65K entries in the acor_en-US_EFGH.dat etc. etc. morevoer the loading of the repalacement table would be easier and faster if OOo doesn't have to load all the entries from A to Z but only a limited set (A to D or E to H etc. etc.) do you think that this workaround would be a feasible solution?
@os hi, I've heard that OOo 3.3 will enhance Calc allowing it to handle up to 2^20 columns from the actual upper limit which is 2^16 (65536 columns) 2^16 is actually the same upper limit for autocorrect database entries. do you think that Writer autocorrect database could be upgraded to 2^16 to 2^20 as well as Calc?
Hello, any chance this issue which to be considered to be fixed in OOo 3.4? http://wiki.services.openoffice.org/wiki/OOoRelease34 this issue has P2 priority and causes data loss.
the problem here is that the .xml containing all the autocorrect entries is still a 16-bit encoded structure. isn't it a little anachronistic in the era of 64-bit computers?
this issue has been fixed in LibreOffice and will be available in LibO 4.0 (february 2013) autocorrect capacity has been expanded to 4 millions entries see: https://bugs.freedesktop.org/show_bug.cgi?id=48729
(In reply to comment #29) > this issue has been fixed in LibreOffice and will be available in LibO 4.0 > (february 2013) nice way of promoting another project's release in a bug tracker
(In reply to comment #30) > (In reply to comment #29) > > this issue has been fixed in LibreOffice and will be available in LibO 4.0 > > (february 2013) > > nice way of promoting another project's release in a bug tracker just facing the fact that the bug report I opened in 2008 in OOo bug-tracker has been fixed by LibO. I hope AOO will fix it as well. by the way, let me correct infos I gave in Comment 29... the new limit is not 4 millions, but 4 billions (2^32 = 4294967296)
Reset assigne to the default "issues@openoffice.apache.org".