FW: FW: changes to zdump.c and zic.c invocations of is.* macros

Mon Dec 12 14:47:30 UTC 2005

Another that seems to have gone to lecserver rather than elsie.

				--ado

-----Original Message-----
From: Paul Eggert [mailto:eggert at cs.ucla.edu] 
Sent: Thursday, December 08, 2005 4:23 PM
To: tz at lecserver.nci.nih.gov
Subject: Re: FW: changes to zdump.c and zic.c invocations of is.* macros

> From: Robert Elz [mailto:kre at munnari.OZ.AU]
> Sent: Tuesday, December 06, 2005 10:20 PM
>   | ! 		while (isascii((unsigned char) *cp) &&
>
> You sometimes do better if you write that as
>
> 		while (isascii(*(unsigned char *)cp) &&
>
> It can also be a little clearer what you're intending - there's no 
> intention here to fetch the char, then convert it to unsigned, all we 
> want is the 0..255 value that cp points at.

If memory serves, the latter form (*(unsigned char *)cp) is not portable
to all C89 hosts, whereas the former form ((unsigned char)
*cp) is.  The idea is that some C89 hosts might have padding bits in
their unsigned char representation, and it's incorrect to access a char
as if it were unsigned char.

I believe this issue got cleared up in C99, so the code is portable to
C99 compilers.  But the zic stuff attempts to be portable to C89 (as
well as earlier) compilers.

> it is certainly true that it's possible to test for digits by using
> >= '0' && <= '9' tests - but if that's the best way to write it,
> then that's what isdigit() ought to be doing.

Alas, that's not true in practice.  isdigit is typically slower, and it
can be quite a bit slower.  For example, on my host (Debian GNU/Linux
stable, GCC 4.0.2, gcc -O4), with the following code:

int F (char *p) { return isdigit ((unsigned char) *p) != 0; } int G
(char *p) { return '0' <= *p && *p <= '9'; }

F compiles into 12 instructions that contain a subtroutine call (for a
total of 31 instructions executed), whereas G compiles into 10
instructions of straight-line code.

I think part of the problem is that isdigit might be sensitive to the
locale.  So there's a correctness issue here as well; isdigit might
actually return the wrong value, since it might think that some other
byte code is a digit.  (This is just a theoretical issue, as far as I
know, though.)

> Paul's version may be textually shorter, but with that cp++ side 
> effect buried in the middle of the && sequence, it is not nearly as 
> easy to read.

True, but in my defense that buried cp++ was in the original code.

How about this instead?  It might be a bit clearer.

    char c = *cp;
    if ('0' <= c && c <= '9') {
       cp++;
       if (c == '1' && '0' <= *cp && *cp <= '4')
         cp++;
    }