[tz] Tzdb and the Sunshine Protection Act

Paul Eggert eggert at cs.ucla.edu
Sat Mar 4 00:38:48 UTC 2023

On 2023-03-03 14:47, Robert Elz wrote:

>    | That could lead to problems, as Internet RFC 8536 relies on POSIX TZ
>    | format,
> If it relies upon it by reference, then it should probably start being
> updated to specify whatever it needs itself.   Just in case.

That's not strictly necessary, as the RFC specifies the POSIX version, 
so even if POSIX comes out with a new version the RFC is still valid.

When writing RFC 8536 I didn't want to duplicate the POSIX spec. I 
wanted to refer to an existing standard; that way, we could avoid errors 
that inevitably arise when duplicating, and readers could easily see 
that they can reuse their POSIX code to implement the spec. This sort of 
thing is common practice.

>    | and by lots of other downstream code.
> What kind?   I doubt that anything other than tzset() and related
> stuff ever parses a TZ string contents, though I guess someone might
> have written a TZ string -> what it means converter, to help users
> get that right.

There's a partial list at 
<https://data.iana.org/time-zones/tz-link.html#TZif>. There's plenty of 
other code like that, both to parse TZif files and to deal with other 
uses of POSIX TZ strings. A quick search reports 
for example; this is part of a Nintendo Switch emulator written in C#.

> As long as it remains in POSIX, users can keep insisting upon their
> right to use those strings, and implementations (even ones not based
> upon tzcode, which have no use for that nonsense at all) have to keep
> supporting it.

I see uses for these POSIX TZ strings, even with tzcode and tzdata. 
Here's a scenario: your government abruptly changed the DST rules and 
you don't have access to the network (or perhaps your distributor hasn't 
updated its copy of tzdata yet) and so you can't get the latest tzdata 
easily. You can work around the problem with one of these POSIX TZ 
strings. Even if your platform is built out of a bunch of different 
modules, all the code should still work because they all conform to this 
longstanding POSIX standard.

I don't much like POSIX TZ strings either. However, now that we have 
them, they're useful on occasions like these, and removing them from 
POSIX would be a small benefit to implementers and a significant hassle 
for some use cases.

>    | > There is no problem with
>    | >
>    | > 	TZ='<Z>0'
>    |
>    | No there is a real problem, in current POSIX anyway, since POSIX says
>    | for this case "the std and dst fields in this case shall not include the
>    | quoting characters" ('<' and '>') and it also says that std must be at
>    | least three characters.
> Yes, but you are misinterpreting what "std" is.   That is not the abbreviation
> (or tzname, or whatever one wants to call it), it is the field of the TZ
> string in which that name is specified.

No, because POSIX says that for TZ strings "The std and dst fields in 
this case shall not include the quoting characters." In the TZ setting 
TZ='<+1245>-12:45<+1345>,M9.5.0/2:45,M4.1.0/3:45' (isn't that a *beauty* 
:-) the std field is simply "+1245", without the angle brackets.

I realize your interpretation of that wording differs. However, my 
interpretation is more plausible and better reflects existing practice.

> I consider glibc broken in that case.

macOS behaves like glibc. That's an independent code base, but evidently 
both sets of developers read POSIX the way that I'm reading it, and it'd 
be a stretch to say we're all wrong.

AIX and Solaris behave in yet a third way: they treat TZ='<Z>0' as if it 
were TZ='<Z  >0' (i.e., two spaces after the "Z").

All these behaviors conform to POSIX because POSIX doesn't specify the 
behavior when dst has fewer than 3 characters.

> What does a pure (as distributed) tzcode version do in this case?

It behaves like NetBSD, which isn't surprising as NetBSD is derived from 

> I'd expect even more problems if the name doesn't appear at all.
> But Ubuntu seems to be surviving that

? this is backwards. People don't use TZ='<Z>0' or TZ='<ET>4' because 
those usages are nonconforming and don't work in general.

If TZ='America/New_York' started saying just 'ET', that would be more 
like what the situation was when TZDB put spaces in time zone 
abbreviations. But I'd be loath to do that.

> What happens using glibs with TZ='<A>1' ?

   $ TZ='<A>1' date; TZ='<Z>0' date; date -u
   Sat Mar  4 00:35:26  2023
   Sat Mar  4 00:35:26  2023
   Sat Mar  4 00:35:26 UTC 2023

That is, both TZ settings are invalid, and in that case glibc which uses 
UTC without any abbreviation (POSIX says %Z is empty when unknown). When 
NetBSD sees an invalid TZ setting it does something similar, except it 
uses the abbreviation "GMT" instead of "", and it extends POSIX in a 
different way so it has a different opinion about what is invalid. These 
behaviors all conform to POSIX since the TZ settings don't conform to POSIX.

Here's a more-outlandish example, run on NetBSD:

   $ TZ="$(awk 'BEGIN {for (i=0; i<512; i++) printf "A"; print "4"}')" 
date; date -u
   Sat Mar  4 00:28:09 GMT 2023
   Sat Mar  4 00:28:09 UTC 2023

Here NetBSD treats the TZ setting as invalid (a time zone abbreviation 
of 512 "A"s!) and silently substitutes GMT. Glibc treats this same 
example as specifying a 512-byte abbreviation for a time zone 4 hours 
west of Greenwich. Both behaviors conform to POSIX since the TZ string 
exceeds POSIX length limits.

More information about the tz mailing list