FW: changes to zdump.c and zic.c invocations of is.* macros

Paul Eggert eggert at CS.UCLA.EDU
Thu Dec 8 21:23:21 UTC 2005


> From: Robert Elz [mailto:kre at munnari.OZ.AU] 
> Sent: Tuesday, December 06, 2005 10:20 PM
>   | ! 		while (isascii((unsigned char) *cp) &&
>
> You sometimes do better if you write that as
>
> 		while (isascii(*(unsigned char *)cp) &&
>
> It can also be a little clearer what you're intending - there's no
> intention here to fetch the char, then convert it to unsigned, all we
> want is the 0..255 value that cp points at.

If memory serves, the latter form (*(unsigned char *)cp) is not
portable to all C89 hosts, whereas the former form ((unsigned char)
*cp) is.  The idea is that some C89 hosts might have padding bits in
their unsigned char representation, and it's incorrect to access a
char as if it were unsigned char.

I believe this issue got cleared up in C99, so the code is portable to
C99 compilers.  But the zic stuff attempts to be portable to C89 (as
well as earlier) compilers.

> it is certainly true that it's possible to test for digits by using
> >= '0' && <= '9' tests - but if that's the best way to write it,
> then that's what isdigit() ought to be doing.

Alas, that's not true in practice.  isdigit is typically slower, and
it can be quite a bit slower.  For example, on my host (Debian
GNU/Linux stable, GCC 4.0.2, gcc -O4), with the following code:

int F (char *p) { return isdigit ((unsigned char) *p) != 0; }
int G (char *p) { return '0' <= *p && *p <= '9'; }

F compiles into 12 instructions that contain a subtroutine call (for a
total of 31 instructions executed), whereas G compiles into 10
instructions of straight-line code.

I think part of the problem is that isdigit might be sensitive to the
locale.  So there's a correctness issue here as well; isdigit might
actually return the wrong value, since it might think that some other
byte code is a digit.  (This is just a theoretical issue, as far as I
know, though.)

> Paul's version may be textually shorter, but with that cp++ side effect
> buried in the middle of the && sequence, it is not nearly as easy to
> read.

True, but in my defense that buried cp++ was in the original code.

How about this instead?  It might be a bit clearer.

    char c = *cp;
    if ('0' <= c && c <= '9') {
       cp++;
       if (c == '1' && '0' <= *cp && *cp <= '4')
         cp++;
    }



More information about the tz mailing list