[tz] Tzdb and the Sunshine Protection Act

Robert Elz kre at munnari.OZ.AU
Fri Mar 3 08:45:49 UTC 2023


    Date:        Thu, 2 Mar 2023 15:49:09 -0800
    From:        Paul Eggert via tz <tz at iana.org>
    Message-ID:  <555133d4-2fcc-0a25-e4a0-1ad0a569e661 at cs.ucla.edu>

  | POSIX does not specify TZ strings like TZ='EST5EDT'; they are TZDB 
  | extensions. If you want a TZ string whose meaning is specified, you need 
  | something like TZ='EST5EDT,M3.2.0,M11.1.0'.

That's not correct, otherwise TZ=UTC0 would not be POSIX, and it
certainly is (there's obviously no way to specify summer time or times
when it begins and ends, for UTC).

  | You can see this by looking a few lines before the lines you quoted, 
  | which say that a TZ string contents are "std offset dst offset, rule". 

That's just a generic hint, the actual specification is later (and is
unchanged, other than the reference to "for all TZs whose..." which has
been altered to allow tzdata type names also) in the most recent drafts).
It includes:

	The expanded format (for all TZs whose value does not have a
	<colon> as the first character) is as follows:

	stdoffset[dst[offset][,start[/time],end[/time]]]

	Where:

	std and dst  Indicate no less than three, nor more than {TZNAME_MAX},
	             bytes that are the designation for the standard (std) or
		     the alternative (dst--such as Daylight Savings Time)
		     timezone. Only std is required; if dst is missing, then
		     the alternative time does not apply in this locale.

That's quite clear that all that is needed is XXXn to be a "POSIX" TZ
specification, though any more included must meet the required syntax.

The current standard doesn't say what to do when "dst" is given, but the
rule (everything after the first comma, is not) - making it "implicitly"
unspecified.   The latest draft (currently available) of the forthcoming
standard is clearer, it adds:

	If the dst field is specified and the rule field is not, it is
	implementation-defined when the changes to and from Daylight
	Saving Time occur.

It shouldn't say "Daylight Saving Time" there, while the symbol in the
grammar is "dst" it is otherwise described as "alternative timezone" and
should be there as well.   That may have already been fixed, if not, it
will be before the new version of the standard gets published.

Also note, that sometime in the future this "POSIX TZ" format will almost
certainly be deprecated, and then removed.   I believe there is now general
acceptance that it is simply inadequate for real world timezones, other than
the simplest ones - and is never able to describe anything other than a
single shift to & from a single alternative timezone in one year (though
the "single shift" could be fixed by allowing more than one set of start,end
pairs - similarly, more XXXn fields could be added to specify more
different zone offsets than just the two currently possible, but I very much
doubt that anyone is going to work out how to specify that, particularly
not if no-one is stupid enough to try to implement some version of this.

The current draft also contains this:

	Daylight Saving Time is in effect all year if it starts
	January 1 at 00:00 and ends December 31 at 24:00 plus the
	difference between Daylight Saving Time and standard time,
	leaving no room for standard time in the calendar. For
	example, TZ='EST5EDT,0/0,J365/25' represents a time zone that
	observes Daylight Saving Time all year, being 4 hours west of UTC
	with abbreviation "EDT".

which suggests an intent to be able to support "permanent summer time",
though that complicated mess achieves little more than

	TZ=EDT4

except for the value of tm_isdst, which as you mentioned in an earlier
message really has no effect on anything - it was more or less an index
into the tzname[] array, which it doesn't do well at, as while that array
has no defined upper bound on its index, tm_isdst is only permitted to be
0 or 1.   That's all now largely obsoleted (though not yet in the standard)
by tm_zone (and tm_gmtoff) (which will be in the standard).

In this regard note that the standard already says:

	Implementations are encouraged to use the time zone database
	maintained by IANA to determine when Daylight Saving Time changes
	occur and to handle TZ values that start with a <colon>. See RFC 6557.

That is, it has already been noted that POSIX TZ isn't really good enough.
In the next draft, that will be altered to say

	Implementations are encouraged to incorporate the IANA timezone
	database into the timezone database used for TZ values specifying
	geographical and special timezones, and to provide a way to update
	it in accordance with RFC 6557.

POSIX TZ strings are on their way to oblivion, fortunately.   However, while
they remain (which will be at least until the next (major) version of the
standard, after the coming one - ie: at least another decade) the specification
is that if the TZ value can be interpreted as a valid POSIX TZ string, then
that is what it is.   If that fails - which it will for anything which does
not start xxxxN  (at least 3 chars in the xxxx field), but can for many other
reasons as well, then it is to be interpreted (of possible) as a geographic/
special TZ string (eg: as a tzdata zone name).


And while I'm here, in an earlier message you said:

	| If common practice becomes "ET" we couldn't use that,
	| unfortunately, as POSIX requires at least three characters.

That's also incorrect.   It is true that to use a POSIX TZ string, in
the form normally seen in the wild, like

	TZ=UTC0

(as above) the "std" (and "dst" field if given) must be at least 3 chars.

But that field is allowed to be in what POSIX calls "quoted" format, where
the first char is '<' and the last is '>' and those two count in the
required 3 chars, but are not part of the name created (the minimum is three
so that in quoted form, there is at least one meaningful character remaining,
TZ='<>0' isn't valid.

There is no problem with

	TZ='<Z>0'

if you want to set "zulu" time.   That has 3 chars of "std", but the quoting
chars aren't part of the tzname defined, leaving just "Z".

This 3 char rule also applies only to POSIX form TZ strings, the zone names
specified by tzdata format TZ specifications (or whatever other provider of
timezone data an implementation chooses to use) have no such restriction.
There's no reason at all tzdata could not use "ET" if it wanted to (even now
it really makes more sense to call what is currently EST and EDT as just "ET",
all anyone really cares about is that is eastern (US) time (USET would be 
better, other places have an "east" too, and some of them have timezones
that apply in their eastern areas - and that is > 3 chars...)

Even a POSIX TZ string can handle that

	TZ='<ET>5<ET>4,whatever'

should work on any conforming implementation, right now (with a suitable
value filled in for "whatever" of course, or with it and its preceding
comma omitted - in which case the implementation is expected to supply
the rule for when the switch occurs, but nothing, anywhere, requires that
rule to be in any way consistent with any actual timezone on the planet,
or to supply any actual switch times at all.

kre



More information about the tz mailing list