[tz] Tzdb and the Sunshine Protection Act

Fri Mar 3 18:11:12 UTC 2023

On 2023-03-03 00:45, Robert Elz wrote:

> The latest draft (currently available) of the forthcoming
> standard is clearer, it adds:
> 
> 	If the dst field is specified and the rule field is not, it is
> 	implementation-defined when the changes to and from Daylight
> 	Saving Time occur.

Thanks, I didn't know that. In other words, in the current POSIX 
standard TZ='EST5EDT' has unspecified behavior, whereas in the draft 
next POSIX standard TZ='EST5EDT' has partly-specified behavior in that 
the implementation must only shuttle back and forth between standard 
time and DST via some schedule.

If I understand things correctly, the draft allows for more than two 
transitions per year, e.g., one for Ramadan and another for summer as 
Morocco used to do. (Or is this really required? could an implementation 
use permanent standard time? or permanent DST? it's not clear from the 
text you quoted.)

> Also note, that sometime in the future this "POSIX TZ" format will almost
> certainly be deprecated, and then removed.

That could lead to problems, as Internet RFC 8536 relies on POSIX TZ 
format, and the format is embedded in the TZif files interpreted by 
tzcode and by lots of other downstream code. For example, on the Ubuntu 
workstation I'm typing this message on, /usr/share/zoneinfo/Europe/Paris 
contains the string 'CET-1CEST,M3.5.0,M10.5.0/3' and glibc uses this 
string to process future time stamps.

I suppose if POSIX stops specifying strings like this, we could move the 
spec to the successor of RFC 8536. But what would be the point? Every 
tzcode-like implementation would still need to parse such strings, and 
there seems little point to deprecating the exposure of that parser to 
the user.

> I believe there is now general
> acceptance that it is simply inadequate for real world timezones

Yes, it's certainly inadequate if the goal is to represent all 
timestamps since 1970. However, it's useful for specific use cases, 
e.g., if you care only about timestamps now and in the future (this is 
how TZif files use it). So it would make sense to keep it in POSIX, to 
support those use cases.

> The current draft also contains this:
> 
> 	Daylight Saving Time is in effect all year if it starts
> 	January 1 at 00:00 and ends December 31 at 24:00 plus the
> 	difference between Daylight Saving Time and standard time,
> 	leaving no room for standard time in the calendar. For
> 	example, TZ='EST5EDT,0/0,J365/25' represents a time zone that
> 	observes Daylight Saving Time all year, being 4 hours west of UTC
> 	with abbreviation "EDT".

Yes, as I recall this was put in at my suggestion, before Michael 
Deckers pointed out on this list that this draft change to POSIX is not 
necessary. Instead of TZ='EST5EDT,0/0,J365/25' you can use 
TZ='XXX3EDT4,0/0,J365/23' which conforms to current POSIX, so there's no 
need for the draft POSIX change (though it doesn't hurt, I suppose, and 
Internet RFC 8536 refers to it...).

> There is no problem with
> 
> 	TZ='<Z>0'

No there is a real problem, in current POSIX anyway, since POSIX says 
for this case "the std and dst fields in this case shall not include the 
quoting characters" ('<' and '>') and it also says that std must be at 
least three characters.

This is not just a standard-lawyer quibble. Real-world software breaks 
if you set TZ='<Z>0'. For example, on current Ubuntu:

   $ TZ='<Z>0' date
   Fri Mar  3 17:49:16  2023

with no Z in the output anywhere. This Ubuntu behavior conforms to POSIX 
since POSIX doesn't say what to do with nonconforming strings like '<Z>0'.

> This 3 char rule also applies only to POSIX form TZ strings, the zone names
> specified by tzdata format TZ specifications (or whatever other provider of
> timezone data an implementation chooses to use) have no such restriction.

I suppose you're right about that, if it's merely an issue of conforming 
to POSIX, That is, in theory TZ='Europe/Paris' can use whatever time 
zone abbreviation we like (including the empty string, or a string 
containing newlines :-).

Still, I hesitate to depart from the POSIX form, as too much software 
expects it. TZDB used to depart from the POSIX form, in that 'date' and 
'strftime' %Z would sometimes expand to strings containing spaces. 
However, this led to downstream trouble, in that parsers of the output 
of 'date' and 'strftime' got confused. I would not be surprised if we 
encountered similar problems with time zone abbreviations containing 
less than 3 characters, for reasons similar to why Ubuntu 'date' does 
not do what you want with TZ='<Z>0' or with TZ='<ET>4'.