Apache OpenOffice (AOO) Bugzilla – Issue 74245
ODFF: LOOKUP in columns if range is wider than tall
Last modified: 2013-08-07 15:15:24 UTC
In Excel (V.X), using the LOOKUP command, after the search criteria are selected, one is allowed to select either an array $F$3:$J$4, or one may enter as search vector $F$3:$J$3 plus a result vector $F$4:$J$4. Both work equally well. However, it appears that Open Office does not recognize the array method of syntax, nor does it translate it to the search vector + result vector syntax upon import from Excel. It seems to me for full compatibility, Open Office needs to do one or the either. Thank you. Karl
Hi, not a bug but an enhancement. Re-flagged and re-assigned. Frank
Generally both forms work well. Could you please elaborate what exactly doesn't work? Best if you also attach a test case document to this issue to illustrate. Thanks Eike
Ah, I was blind. This is the "more columns than rows" case, where Excel assumes that values are to be looked up in columns instead of rows. Nasty syntax and easily breaks documents in case you insert/delete rows in such ranges. Which most certainly is the reason that LOOKUP is deprecated, better to use HLOOKUP or VLOOKUP instead. Accepting issue. This is not an import problem, the column semantics aren't implemented in the interpreter engine.
change target from 2.x to 3.x according to http://wiki.services.openoffice.org/wiki/Target_3x
ODFF relevant.
I'm working on this currently. There are more undocumented behaviors of Excel than just the column vs row direction. I'm trying to uncover all that. Since this involves a matrix handling, it will be a bit tricky to get it all covered.
Created attachment 51484 [details] test document
Created attachment 51485 [details] patch
This patch is against CWS odff (base m244). Hopefully I've covered all the Excel's behaviors. That test document covers a wide range of scenarios, and with this patch, Calc's LOOKUP behave identically to Excel's under the scenarios tested in the document.
Thank you, Kohei! You mentioned cases that are probably not covered by the ODFF spec. Could you please give details, so I can add those to the specification? Thanks Eike
Ok, here is my note. Let me just copy-n-paste. --------------------------------------------------------------------------- * 2-parameter case ([lookup value], [lookup-result matrix]) * The search direction, either horizontal-search or vertical-search, is determined from the dimensions of the matrix; if the data matrix is wider than taller, the search direction is horizontal, and if it is taller than wider, the direction is vertical. If the data matrix is a square, the search direction is vertical. * In vertical search mode, only the first column gets searched, and the last column is used as the result array. Those columns that are not either the first nor the last are simply not used. In case of a one-column matrix, the search column and the result column are identical. These rules also apply in horizontal search mode but to rows instead of columns. * 3-parameter case ([lookup value], [lookup matrix], [result matrix(vector)]) * The result matrix must be a vector, or it raises an error. * The lookup matrix does not need to be a vector. If it's not a vector only the first column or row gets searched (= search vector). [NOTE: the ODFF spec requires this to be a vector, but Excel does not.] * The search direction is determined from the dimensions of the lookup matrix. The same rules as the 2-parameter case apply. * The directions of the lookup matrix and the result matrix can differ. [NOTE: the ODFF spec does not mention this.] * The lengths of the search vector and the result vector do not need to be identical; however, when the match position falls outside the length of the result vector, an error is raised if the result vector is given as an array object. If it's a cell range, it gets automatically extended to the length of the searched vector. If the cell range cannot be extended due to size limit, then '0' is displayed. [NOTE: the ODFF spec requires the search vector and the result vector to be of the same length.] --------------------------------------------------------------------------- The spec covers the 2-parameter case pretty well, and my note reveals nothing new here. But for the 3-parameter case there are some behaviors of Excel that are not mentioned or not accurately depicted in the ODFF spec. Actually I have a new patch to cover the last item of my note. I'll attach it here shortly.
Created attachment 51498 [details] revised patch
I wish to thank all of you for efforts in resolving this. We are hoping to migrate to the NeoOffice flavor of OpenOffice within the next year. It is efforts like these that make this more possible. Karl
Hi Kohei, I have some doubt about a detail of the latest patch. In lcl_CompareMatrix2Query() this if (rStr1 == rStr2) return 0; return rStr1 < rStr2 ? -1 : 1; changed to return rStr1.CompareIgnoreCaseToAscii(rStr2); // case-insensitive As the method name says that works only for ASCII strings. To make that work with Unicode, either collation or transliteration have to be used. As the comparison is done in a binary search on data that is assumed to be sorted, using ScGlobal::pCollator->compareString() should be appropriate. lcl_CompareMatrix2Query() is also used in ScInterpreter::ScMatch(), I guess there shouldn't be any negative impact by these changes, correct? Thanks Eike
>I guess there shouldn't be any negative impact by these changes, correct? I think the change you're suggesting is the right one. Please go ahead with the change. :-) Kohei
In cws odff03: sc/source/core/tool/interpr1.cxx 1.56.20.6 Several changes were necessary: Loading the attached test case document in a non-product gave lots of Error: ScMatrix::GetDouble: dimension error From File .../odff03/DEV300/src.m3/sc/source/core/tool/scmatrix.cxx at Line 430 because nDelta==-1 if no value was found (already the first item greater than query) and nFirst==0 if (nLen == 1) // first and last items are next to each other. { nDelta = nCmp < 0 ? nLast - 1 : nFirst - 1; These cases then showed up as not being identical to Excel results. In case the result array was shorter than the lookup array an off-bounds memory access happened with pResMat->Get...(nDelta), returning an arbitrary value if lucky ;-) and pResMat->IsValue(nDelta) returned TRUE before. Also showed up as not being identical to Excel results in cells B70:B71 (0 displayed instead of #N/A). Introduced handling of mixed numeric/string of both query and matrix element and empty matrix element in lcl_CompareMatrix2Query() Introduced handling of mixed numeric/string/empty matrix elements in lcl_GetLastMatch(), also there the result was arbitrary for a rMat.GetDouble() if the element was a string instead, usually not identical though to the value to be matched, so it wasn't obvious.
Reassigning to QA for verification.
verified in internal build cws_odff3
closed because fix available in builds OOO300_m9 and DEV300_m33