core dump from within asctime_r()

Robert Elz kre at munnari.OZ.AU
Sat Feb 13 23:22:25 UTC 2010


If you all recall ...

  | locatime() can return NULL these days, and given a nonsense time
  | value (which is quite likely under these circumstances), probably did,

it turns out that wasn't the cause, the time_t value being
converted was perfectly sane -- there's another bug.

The fix that's currently proposed is fine, and needed, so this doesn't
impact upon that, nor does it need to delay the release of the code
with the fix on Tuesday (US time) - but it wouldn't hurt if this bug
was fixed at the same time.

Recall from the earlier mail that the core dump happened in single user
mode while attempting to fix a filesystem with fsck_ffs (one that had
corrupted inodes that needed fixing).   It was from that I assumed that
the underlying cause was probably a bogus time value that was part of
garbage in an inode.

But, of course, there's another difference here - in single user mode
(at least on NetBSD) the timezone files are not available, or might not
be, and in this case weren't (when /usr was mounted, the core dump didn't
happen).  From that I assumed (and yes, this was a stretch) that the bogus
time value must have been one which was valid in the local timezone (that's
UTC-0500 I think for the guy who had the problem), but invalid in GMT
(perhaps subtracting 5 hours was just enough to move the time back enough
so it would fit in a struct tm.).   Unlikely, of course, but possible...
And nonsense of course (well, it could have happened - it just didn't).

NetBSD compiles localtime with ALL_STATE defined (this is important)
(also USG_COMPAT, that one is irrelevant I believe.)

The actual cause of the bug is simpler, someone was silly enough to
write a line of code ...

                                        return NULL;    /* "cannot happen" */

which of course guaranteed that it would, and it did...

I'm tempted to say that the fix is to remove the comment, after which it
will never happen again!   But no.

The execution path (relevant functions that are called, in order) is
	ctime -> localtime -> tzset -> tzsetwall -> tzload -> gmtload ->
		tzload -> tzparse -> tzload -> localsub -> BANG

(To be strictly correct, NetBSD's code is slightly modified, using
different names for the some of the internal functions, in order to
deal with thread locking issues - but none of that is relevant to
what's going on, so I'll just use the names from tzcode2009t).

The immediate cause of the bug was the test ...

        if ((sp->goback && t < sp->ats[0]) ||
                (sp->goahead && t > sp->ats[sp->timecnt - 1])) {

At this point, sp->timecount was 0 (no time data was loaded), but
sp->goback was true (not sure about sp->goahead, didn't need to
investigate that one) and sp->ats[0] was a very large value (much
much bigger than t).

The reason for that is found in a combination of tzsetwall() and
tzload().

tzsetwall(void)                         
{                               
        if (lcl_is_set < 0)
                return; 
        lcl_is_set = -1;
                                
#ifdef ALL_STATE                
        if (lclptr == NULL) {           
                lclptr = (struct state *) malloc(sizeof *lclptr);
                if (lclptr == NULL) {
                        settzname();    /* all we can do */
                        return;
                }
        }
#endif /* defined ALL_STATE */
        if (tzload((char *) NULL, lclptr, TRUE) != 0)
                gmtload(lclptr);
        settzname();
}

and then tzload() (with comments deleted to save space here).

tzload(name, sp, doextend)      
register const char *           name;
register struct state * const   sp;
register const int              doextend;
{                               
        register const char *           p;
        register int                    i;
        register int                    fid;
        register int                    stored;
        register int                    nread; 
        union {         
                struct tzhead   tzhead;
                char            buf[2 * sizeof(struct tzhead) +
                                        2 * sizeof *sp +
                                        4 * TZ_MAX_TIMES];
        } u;    

        if (name == NULL && (name = TZDEFAULT) == NULL)
                return -1;
        {
                register int    doaccess;
                char            fullname[FILENAME_MAX + 1];
  
                if (name[0] == ':') 
                        ++name;
                doaccess = name[0] == '/';
                if (!doaccess) {
                        if ((p = TZDIR) == NULL)
                                return -1;
                        if ((strlen(p) + strlen(name) + 1) >= sizeof fullname)
                                return -1;
                        (void) strcpy(fullname, p);
                        (void) strcat(fullname, "/");
                        (void) strcat(fullname, name);
                        if (strchr(name, '.') != NULL)
                                doaccess = TRUE;
                        name = fullname;
                }       
                if (doaccess && access(name, R_OK) != 0)
                        return -1;
                if ((fid = open(name, OPEN_MODE)) == -1)
                        return -1;      
        } 

the rest of the function isn't relevant, as in the circumstances that
apply, the open() always fails, as there are no zone files anywhere
to open, so that "return -1" always happens.

Lower down in tzload() we find ...

        sp->goback = sp->goahead = FALSE;

but that never gets executed - in fact, almost none of the state is
ever initialised, just a little in tzparse() ...

                dstlen = 0;     
                sp->typecnt = 1;                /* only standard time */ 
                sp->timecnt = 0;        
                sp->ttis[0].tt_gmtoff = -stdoffset;
                sp->ttis[0].tt_isdst = 0; 
                sp->ttis[0].tt_abbrind = 0;

No mention there if goback, goahead, or ats.

Often, we're lucky, including in all the tests I was able to make
that were small enough to actually debug, and the space returned
by malloc() is all zero, and we don't see a problem.   So to force
this I explicitly set

	lclptr->ats[0] = 0xFFFFFFFFFFLL;
	lclptr->goback = 1;

immediately after the malloc() (after lclptr is tested for non NULL of course)
and sure enough, BANG...

Moving the

        sp->goback = sp->goahead = FALSE;

from way down near the end of tzload() to way up the front, so it always
gets done is enough to fix this for me.   I put it before even the test
of name, just in case (though that one can only return if TZDEFAULT is
NULL, and as that's a constant, that's unlikely!)   Whether there's
anything else that also needs initialising I'm not sure at the minute.

Another fix (I think, I didn't try this one) would be to test
sp->timecount in the
        if ((sp->goback && t < sp->ats[0]) ||
                (sp->goahead && t > sp->ats[sp->timecnt - 1])) {
test, it is a little rude to be enen contemplating accessing
sp->ats[sp->timecnt - 1] if sp->timecount == 0 after all.
That is, something like
        if (sp->timecount > 0 && ((sp->goback && t < sp->ats[0]) ||
                (sp->goahead && t > sp->ats[sp->timecnt - 1]))) {


Alternatives would be to init these in tzparse() with the others, or to
simply memset() the state to 0's after the malloc (or use calloc()) so
we always have a nice clean state array.

I'll leave it for someone else to decide which way of fixing this you
prefer, but this is definitely a bug that needs fixing.

kre




More information about the tz mailing list