[tz] Minor (unimportant really) technical UB bug in strftime() ?

Clive D.W. Feather clive at davros.org
Wed Nov 9 07:02:18 UTC 2022

Paul Eggert said:
> > You could replace the assignment by a memcpy. Assignment via unsigned
> > chars (which is what memcpy does) are exempt from the undefined behaviour.
> As I understand it, that exemption is for using memcpy to copy a trap 
> representation. There's no similar exemption for using memcpy to copy 
> uninitialized data; for example, the following function has undefined 
> behavior:
>     int y;
>     void f(void) { int x; memcpy(&y, &x, sizeof y); }
> If I'm right, we can't get by simply by replacing the assignment with 
> memcpy.

You're wrong, pure and simple. There's no such thing as uninitialized data
in that sense.

The following quotes are from a late draft because that's all I have to
hand right this second, but the final C99 wording was either the same or
effectively so.

       Values  stored  in  unsigned bit-fields and objects of
       type unsigned char shall be represented using a pure  binary

Elsewhere we say that unsigned char doesn't have any padding bits, so it
holds values in the range 0 to (1<<CHAR_BIT) - 1 inclusive.

       Values  stored  in  non-bit-field objects of any other
       object type consist of n x CHAR_BIT bits, where n is the  size
       of  an  object  of  that  type,  in bytes.  The value may be
       copied into an object of type unsigned char  [n]  (e.g.,  by
       memcpy);  the  resulting  set  of bytes is called the object
       representation of the value.
I've omitted the bit about bit-fields.

       Certain object representations  need  not  represent  a
       value  of the object type.  If the stored value of an object
       has  such  a  representation  and  is  read  by  an   lvalue
       expression  that  does not have character type, the behavior
       is undefined.  If such a representation  is  produced  by  a
       side  effect  that modifies all or any part of the object by
       an lvalue expression that does not have character type,  the
       behavior is undefined.  Such a representation is called a
       trap representation.

So, for any type T other than character types, some byte sequences can be
trap representations. Reading or writing a trap representation using type T
is undefined behaviour. But reading or writing it using a character type
isn't, though in the case of writing the result could be a trap
representation. That means that memcpy always has defined behaviour.

       indeterminate value
       either an unspecified value or a trap representation

       If  an  object that has automatic storage duration is
       not initialized explicitly, its value is indeterminate.

So, given,

    int x;
    int y = x;

within a block, x holds either some unspecified (valid) value or a trap
representation. If it's an unspecified value, y is set to the same value.
If it's a trap representation, you hit undefined behaviour.

In particular, if int has N bits and can hold 1<<N different values, then
all possible object representations are valid and therefore x can't hold a
trap representation, so the assignment is safe.

We very explicitly wanted memcpy to be safe with uninitialized values.
That's why it's worded this way.

Clive D.W. Feather          | If you lie to the compiler,
Email: clive at davros.org     | it will get its revenge.
Web: http://www.davros.org  |   - Henry Spencer
Mobile: +44 7973 377646

More information about the tz mailing list