[tz] Tzdb and the Sunshine Protection Act
Paul Eggert
eggert at cs.ucla.edu
Sat Mar 4 00:38:48 UTC 2023
On 2023-03-03 14:47, Robert Elz wrote:
> | That could lead to problems, as Internet RFC 8536 relies on POSIX TZ
> | format,
>
> If it relies upon it by reference, then it should probably start being
> updated to specify whatever it needs itself. Just in case.
That's not strictly necessary, as the RFC specifies the POSIX version,
so even if POSIX comes out with a new version the RFC is still valid.
When writing RFC 8536 I didn't want to duplicate the POSIX spec. I
wanted to refer to an existing standard; that way, we could avoid errors
that inevitably arise when duplicating, and readers could easily see
that they can reuse their POSIX code to implement the spec. This sort of
thing is common practice.
> | and by lots of other downstream code.
>
> What kind? I doubt that anything other than tzset() and related
> stuff ever parses a TZ string contents, though I guess someone might
> have written a TZ string -> what it means converter, to help users
> get that right.
There's a partial list at
<https://data.iana.org/time-zones/tz-link.html#TZif>. There's plenty of
other code like that, both to parse TZif files and to deal with other
uses of POSIX TZ strings. A quick search reports
<https://github.com/Ryujinx/Ryujinx/blob/master/Ryujinx.HLE/HOS/Services/Time/TimeZone/TimeZone.cs>
for example; this is part of a Nintendo Switch emulator written in C#.
> As long as it remains in POSIX, users can keep insisting upon their
> right to use those strings, and implementations (even ones not based
> upon tzcode, which have no use for that nonsense at all) have to keep
> supporting it.
I see uses for these POSIX TZ strings, even with tzcode and tzdata.
Here's a scenario: your government abruptly changed the DST rules and
you don't have access to the network (or perhaps your distributor hasn't
updated its copy of tzdata yet) and so you can't get the latest tzdata
easily. You can work around the problem with one of these POSIX TZ
strings. Even if your platform is built out of a bunch of different
modules, all the code should still work because they all conform to this
longstanding POSIX standard.
I don't much like POSIX TZ strings either. However, now that we have
them, they're useful on occasions like these, and removing them from
POSIX would be a small benefit to implementers and a significant hassle
for some use cases.
> | > There is no problem with
> | >
> | > TZ='<Z>0'
> |
> | No there is a real problem, in current POSIX anyway, since POSIX says
> | for this case "the std and dst fields in this case shall not include the
> | quoting characters" ('<' and '>') and it also says that std must be at
> | least three characters.
>
> Yes, but you are misinterpreting what "std" is. That is not the abbreviation
> (or tzname, or whatever one wants to call it), it is the field of the TZ
> string in which that name is specified.
No, because POSIX says that for TZ strings "The std and dst fields in
this case shall not include the quoting characters." In the TZ setting
TZ='<+1245>-12:45<+1345>,M9.5.0/2:45,M4.1.0/3:45' (isn't that a *beauty*
:-) the std field is simply "+1245", without the angle brackets.
I realize your interpretation of that wording differs. However, my
interpretation is more plausible and better reflects existing practice.
> I consider glibc broken in that case.
macOS behaves like glibc. That's an independent code base, but evidently
both sets of developers read POSIX the way that I'm reading it, and it'd
be a stretch to say we're all wrong.
AIX and Solaris behave in yet a third way: they treat TZ='<Z>0' as if it
were TZ='<Z >0' (i.e., two spaces after the "Z").
All these behaviors conform to POSIX because POSIX doesn't specify the
behavior when dst has fewer than 3 characters.
> What does a pure (as distributed) tzcode version do in this case?
It behaves like NetBSD, which isn't surprising as NetBSD is derived from
tzcode.
> I'd expect even more problems if the name doesn't appear at all.
> But Ubuntu seems to be surviving that
? this is backwards. People don't use TZ='<Z>0' or TZ='<ET>4' because
those usages are nonconforming and don't work in general.
If TZ='America/New_York' started saying just 'ET', that would be more
like what the situation was when TZDB put spaces in time zone
abbreviations. But I'd be loath to do that.
> What happens using glibs with TZ='<A>1' ?
$ TZ='<A>1' date; TZ='<Z>0' date; date -u
Sat Mar 4 00:35:26 2023
Sat Mar 4 00:35:26 2023
Sat Mar 4 00:35:26 UTC 2023
That is, both TZ settings are invalid, and in that case glibc which uses
UTC without any abbreviation (POSIX says %Z is empty when unknown). When
NetBSD sees an invalid TZ setting it does something similar, except it
uses the abbreviation "GMT" instead of "", and it extends POSIX in a
different way so it has a different opinion about what is invalid. These
behaviors all conform to POSIX since the TZ settings don't conform to POSIX.
Here's a more-outlandish example, run on NetBSD:
$ TZ="$(awk 'BEGIN {for (i=0; i<512; i++) printf "A"; print "4"}')"
date; date -u
Sat Mar 4 00:28:09 GMT 2023
Sat Mar 4 00:28:09 UTC 2023
Here NetBSD treats the TZ setting as invalid (a time zone abbreviation
of 512 "A"s!) and silently substitutes GMT. Glibc treats this same
example as specifying a 512-byte abbreviation for a time zone 4 hours
west of Greenwich. Both behaviors conform to POSIX since the TZ string
exceeds POSIX length limits.
More information about the tz
mailing list