Issue 85381 - Heap corruption bug in dmake
Summary: Heap corruption bug in dmake
Status: CLOSED FIXED
Alias: None
Product: Build Tools
Classification: Code
Component: dmake (show other issues)
Version: OOH680m2
Hardware: PC Windows, all
: P1 (highest) Trivial (vote)
Target Milestone: ---
Assignee: quetschke
QA Contact: issues@tools
URL:
Keywords:
Depends on:
Blocks: 84957
  Show dependency tree
 
Reported: 2008-01-18 21:47 UTC by tml
Modified: 2013-08-07 15:34 UTC (History)
6 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Suggested patch (1.25 KB, patch)
2008-01-18 21:53 UTC, tml
no flags Details | Diff
Patch for make.c (1.20 KB, patch)
2008-01-21 08:18 UTC, quetschke
no flags Details | Diff
Patch for make.c (1.21 KB, patch)
2008-01-22 02:16 UTC, quetschke
no flags Details | Diff
Test program (9.85 KB, text/plain)
2008-01-23 11:26 UTC, tml
no flags Details
Patch for make.c (1.77 KB, patch)
2008-01-29 17:38 UTC, quetschke
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description tml 2008-01-18 21:47:31 UTC
There is a heap corruption bug in dmake that occurs only on Cygwin. Look at the
function gen_path_list_string(). This function takes a list of pathnames in
Cygwin (Unix) form and turns them into a single string of pathnames in Windows
form separated by spaces. Look at the code where Cygwin paths are turned into
Windows paths:

      tpath = DO_WINPATH(cell->datum);
      if( tpath == cell->datum ) {
         len=cell->len;
      }
      else {
         /* ... but only if DO_WINPATH() did something. */
         len = strlen(tpath);
      
         if( len > slen_rest ) {
            /* We need more memory. As DOS paths are usually shorter than the
             * original cygwin POSIX paths (exception mounted paths) this should
             * rarely happen. */

The condition (len > slen_rest) is bogus. slen_rest is irrelevant here. What len
should be tested against is the length of the original Cygwin pathname,
cell->len. After that change, and correspondingly for the same reason changing
the subexpression len-slen_rest to len->cell-len a couple of lines below, we
then also notice that there is no use for the slen_rest variable at all. Also,
looking at the following code where the result buffer is reallocated:

            /* Get more than needed. */
            slen += slen + len-cell->len + 128;
            if((result = realloc( result, (unsigned)(slen*sizeof(char)) ) ) == NULL)
               No_ram();

we see some odd things. Firstly, slen += slen + len-cell->len + 128. Wtf?
Clearly what is meant is slen += len-cell->len + 128. However, there is no real
reason for that "extra" 128. It seems that the persons behind this code were
aware that there is something wrong with it, but didn't quite know what, so they
throw in an "extra" 127 bytes and hope for the best. (Alas, this doesn't help
for the crash I observed as the problem was that this "We need more memory"
branch isn't taken in the first place even if needed.) There is no need for that
"more than needed", just doing slen += (len-cell->len) + 1 works fine.
Comment 1 tml 2008-01-18 21:52:49 UTC
Also, the sizeof(char) is totally pointless. sizeof(char) is 1 by definition.
Pretending to "code carefully" by using sizeof(char) is silly.
Comment 2 tml 2008-01-18 21:53:12 UTC
Created attachment 50986 [details]
Suggested patch
Comment 3 quetschke 2008-01-21 06:51:09 UTC
My bug, my issue ;) Thanks for the extensive analysis.
Comment 4 quetschke 2008-01-21 07:25:03 UTC
Oh, and did you get enough coffee lately?

> ... Firstly, slen += slen + len-cell->len + 128. Wtf?

That wouldn't compile if it really were in the code.

Ah yes, you clearly understood what "the persons behind this code" were
planning.

> The condition (len > slen_rest) is bogus. slen_rest is irrelevant here.
..
> However, there is no real reason for that "extra" 128. It seems that the
> persons behind this code were aware that there is something wrong with it,
> but didn't quite know what, so they throw in an "extra" 127 bytes and hope
> for the best.

There is no hidden agenda, only a bug. If the code would have worked as intended
it would have gotten more memory as needed to accommodate for future memory
needs of *result and hereby reduced the number of needed realloc() calls at the
expense of a little memory overhead - Yes a little questionable but IMHO valid.

I will fix this bug and you might sometime have a look into issue 65083.
Comment 5 quetschke 2008-01-21 08:18:09 UTC
Created attachment 51040 [details]
Patch for make.c
Comment 6 tml 2008-01-21 09:00:56 UTC
Unfortunately, your patch fails to correct the root cause of the problem, the
check for (len > slen_rest).

BTW, I *think* that the only situation where the conversion from Cygwin pathname
to Windows pathname can cause the pathname length to grow is if one against
recommendations haven't used the /cygdrive/... form of pathnames consistently
and everywhere when building OOo. I guess that is why this bug usually doesn't
bite. I had by accident used a Cygwin path starting with a Cygwin mount point
/ooo in the build tree where I noticed the bug.

Or is there some situation where even a Cygwin pathname starting with /cygdrive/
can correspond to a longer Windows pathname?
Comment 7 quetschke 2008-01-22 02:16:01 UTC
Created attachment 51066 [details]
Patch for make.c
Comment 8 quetschke 2008-01-22 02:22:10 UTC
Let me answer the second path first, if for example /tmp or /home/..  or another
of the "standard" unix directories is used it is well possible that they get
longer when they are translated to DOS, e.g. /tmp -> c:/cygwin/tmp. But I think
it is a bug if using mounted paths, like your /ooo is failing.

Now, I didn't check if I can force the error, but I miss the reason why the
 if(len > slen_rest)
check is failing to find that more memory is needed - except that my patch
put another bug in. :( Its fixed with the previous patch (iz85381_2.diff).

slen_rest is initialized to the same number as the originally needed amount
of memory without the POSIX->DOS conversion. Inside the loop, after copying
the buffer, slen_rest is set to the number of remaining bytes in that buffer.
That should work, if len is smaller than slen_rest, increase the buffer size
plus 128 bytes headroom to reduce possible malloc calls if more elements are
longer in DOS than the POSIX version.

@mh: Once we get this sorted out, the patch should go into OOH if it is not to
late.
Comment 9 tml 2008-01-22 07:56:31 UTC
The point is that the input to the function is a *list* of pathnames in Cygwin
form. slist_rest is initialised to the length required by all the pathnames
concatenated, separated by spaces.

In the loop each pathname from the list is converted to Windows form. len is set
to the length of the converted pathname. To check if the length of the current
pathname was increased by the conversion, it is pointless to compare len against
slen_rest, which is the length of the concatenation of the original pathnames.
One needs to compare to the original length of the current pathname, i.e. cell->len.
Comment 10 tml 2008-01-22 16:07:46 UTC
Or actually, I guess your way to fix the problem might work, too. I will have to
check with dmalloc.
Comment 11 tml 2008-01-23 11:24:51 UTC
Sorry, thinking about this in the shower, I came to the conclusion your second
patch is still buggy. But I had to write a test program to be sure.

Try the attached test program. It contains three implementations of
gen_path_list_string(): The original one from ooh680-m3, with my patch, and with
your second patch. I just added a couple of debugging printouts to them.
Additionally, so that the test program won't actually corrupt the heap itself, I
allocate a bit more than the program thinks (look for TEST_PROGRAM_SAFETY). 

Run the program and pay attention to the output lines saying "Allocated n
bytes", "Reallocated to n bytes" and "Wrote to result n bytes".
Comment 12 tml 2008-01-23 11:26:15 UTC
Created attachment 51102 [details]
Test program
Comment 13 quetschke 2008-01-29 17:38:52 UTC
Created attachment 51232 [details]
Patch for make.c
Comment 14 quetschke 2008-01-29 17:47:07 UTC
Ouch! Thanks for your excellent test program tml. No wonder that was corrupting
the heap. I was only checking for the remaining memory if the DO_WINPATH
function actually did something. The previous patch fixes the problem in your
testcase.

@mh: If tml is OK with it this patch should go in OOH, preferably as a masterfix.
Comment 15 Martin Hollmichel 2008-01-29 18:38:21 UTC
I'm fine with doing this as masterfix, cc'ing obo for that.
Comment 16 oliver.bolte 2008-01-30 09:14:34 UTC
Fixed make.c commited on MWS for OOH680_m6.
Comment 17 tml 2008-01-30 09:40:26 UTC
Yup, that last patch looks fine.
Comment 18 quetschke 2008-02-04 21:12:24 UTC
Patch also committed to CWS dmake412.
Comment 19 hjs 2008-02-25 16:45:07 UTC
last comment from tml counts as "verified"
Comment 20 hjs 2009-02-24 11:53:04 UTC
.