[tz] [PROPOSED 1/7] Avoid undefined behavior if no Link lines

Guy Harris gharris at sonic.net
Wed Oct 26 05:29:55 UTC 2022


On Oct 25, 2022, at 9:34 PM, Jonathan Leffler via tz <tz at iana.org> wrote:

> I see nothing in the C standard specification of qsort() that says sorting an array with zero members leads to undefined behaviour.
> 
> http://port70.net/~nsz/c/c11/n1570.html#7.22.5.2

I see nothing in C90 through C18 that explicitly say anything about passing NULL as base and 0 as nmemb.

What I do see is

	The qsort function sorts an array of nmemb objects, the initial element of which is pointed to by base. The size of each object is specified by size.

(from C18, but they all  pretty much say the same thing).

However, C18, at least, says

	An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called "array of T". The construction of an array type from an element type is called "array type derivation".

emphasis on the "nonempty"  There are cases where the array size can be omitted, such as

	extern int foo[];

or

	struct bar {
		int	nelem;
		int	elements[];	// flexible array member
	};

but, in the first case, I'm not sure "foo" could be zero-length (as it would have to be defined elsewhere, and you couldn't just have "int foo[0]"), and in the latter case, you could pass "elements" to qsort(), along with "nelem", but C18, at least, says of this case:

	As a special case, the last member of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or-> ) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

so, in that case, a pointer is generated, but the results of trying to use that pointer are undefined.  The specification for qsort() says nothing about whether, if "nmemb" is 0, any attempt is made to reference anything through "base".

> The POSIX specification for qsort() is quite clear that sorting zero elements is safe.
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/qsort.html

	If the nel argument has the value zero, the comparison function pointed to by compar shall not be called and no rearrangement shall take place.

is a POSIX addition, not present in the C standard.  (It probably *should* be in the C standard, but it isn't in anything between C90 and C18.)

So, whilst I'd trust any POSIX-compliant system to handle qsort(NULL, 0, ...) sanely, and I'd trust the MSVC qsort() to handle it sanely based on

	https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort?view=msvc-170

	"This function validates its parameters. If compare or number is NULL, or if base is NULL and number is nonzero, or if width is less than zero, the invalid parameter handler is invoked, as described in Parameter validation."

which suggests that they consider "base" being NULL and "number" being 0 to be valid and, presumably to be handled sanely, and given that the qsort() implementation is not guaranteed that any attempt to refer to anything pointed to by "base + i", for i >= nelem, will do anything nasty, thus meaning that if nelem == 0, that's "no guarantees for anything pointed to by base", I would *expect* sane behavior from all qsort() implementations.

However, the C standard doesn't, as far as I can tell, *explicitly* require sanity in the nelem == 0 case, so maybe there's some weird implementation out there that looks at *base even if nelem == 0.


More information about the tz mailing list