[tz] Tzdb and the Sunshine Protection Act
Paul Eggert
eggert at cs.ucla.edu
Fri Mar 3 18:11:12 UTC 2023
On 2023-03-03 00:45, Robert Elz wrote:
> The latest draft (currently available) of the forthcoming
> standard is clearer, it adds:
>
> If the dst field is specified and the rule field is not, it is
> implementation-defined when the changes to and from Daylight
> Saving Time occur.
Thanks, I didn't know that. In other words, in the current POSIX
standard TZ='EST5EDT' has unspecified behavior, whereas in the draft
next POSIX standard TZ='EST5EDT' has partly-specified behavior in that
the implementation must only shuttle back and forth between standard
time and DST via some schedule.
If I understand things correctly, the draft allows for more than two
transitions per year, e.g., one for Ramadan and another for summer as
Morocco used to do. (Or is this really required? could an implementation
use permanent standard time? or permanent DST? it's not clear from the
text you quoted.)
> Also note, that sometime in the future this "POSIX TZ" format will almost
> certainly be deprecated, and then removed.
That could lead to problems, as Internet RFC 8536 relies on POSIX TZ
format, and the format is embedded in the TZif files interpreted by
tzcode and by lots of other downstream code. For example, on the Ubuntu
workstation I'm typing this message on, /usr/share/zoneinfo/Europe/Paris
contains the string 'CET-1CEST,M3.5.0,M10.5.0/3' and glibc uses this
string to process future time stamps.
I suppose if POSIX stops specifying strings like this, we could move the
spec to the successor of RFC 8536. But what would be the point? Every
tzcode-like implementation would still need to parse such strings, and
there seems little point to deprecating the exposure of that parser to
the user.
> I believe there is now general
> acceptance that it is simply inadequate for real world timezones
Yes, it's certainly inadequate if the goal is to represent all
timestamps since 1970. However, it's useful for specific use cases,
e.g., if you care only about timestamps now and in the future (this is
how TZif files use it). So it would make sense to keep it in POSIX, to
support those use cases.
> The current draft also contains this:
>
> Daylight Saving Time is in effect all year if it starts
> January 1 at 00:00 and ends December 31 at 24:00 plus the
> difference between Daylight Saving Time and standard time,
> leaving no room for standard time in the calendar. For
> example, TZ='EST5EDT,0/0,J365/25' represents a time zone that
> observes Daylight Saving Time all year, being 4 hours west of UTC
> with abbreviation "EDT".
Yes, as I recall this was put in at my suggestion, before Michael
Deckers pointed out on this list that this draft change to POSIX is not
necessary. Instead of TZ='EST5EDT,0/0,J365/25' you can use
TZ='XXX3EDT4,0/0,J365/23' which conforms to current POSIX, so there's no
need for the draft POSIX change (though it doesn't hurt, I suppose, and
Internet RFC 8536 refers to it...).
> There is no problem with
>
> TZ='<Z>0'
No there is a real problem, in current POSIX anyway, since POSIX says
for this case "the std and dst fields in this case shall not include the
quoting characters" ('<' and '>') and it also says that std must be at
least three characters.
This is not just a standard-lawyer quibble. Real-world software breaks
if you set TZ='<Z>0'. For example, on current Ubuntu:
$ TZ='<Z>0' date
Fri Mar 3 17:49:16 2023
with no Z in the output anywhere. This Ubuntu behavior conforms to POSIX
since POSIX doesn't say what to do with nonconforming strings like '<Z>0'.
> This 3 char rule also applies only to POSIX form TZ strings, the zone names
> specified by tzdata format TZ specifications (or whatever other provider of
> timezone data an implementation chooses to use) have no such restriction.
I suppose you're right about that, if it's merely an issue of conforming
to POSIX, That is, in theory TZ='Europe/Paris' can use whatever time
zone abbreviation we like (including the empty string, or a string
containing newlines :-).
Still, I hesitate to depart from the POSIX form, as too much software
expects it. TZDB used to depart from the POSIX form, in that 'date' and
'strftime' %Z would sometimes expand to strings containing spaces.
However, this led to downstream trouble, in that parsers of the output
of 'date' and 'strftime' got confused. I would not be surprised if we
encountered similar problems with time zone abbreviations containing
less than 3 characters, for reasons similar to why Ubuntu 'date' does
not do what you want with TZ='<Z>0' or with TZ='<ET>4'.
More information about the tz
mailing list