[tz] data not represented in tzfiles

Fri Sep 6 15:38:19 UTC 2013

Paul Eggert wrote:
>         It increases the window from 400 to 402
>years.  Is that part of the change needed?

Yes, that's an essential part of the robustification.

>                                            As I understand
>it, it's to avoid coalescing (say) a 399-year run
>of a rule to an adjacent one-off that *happens*
>to look like the extension of the rule.

You misunderstand it.  The situation of concern is where the 400 year
period governed by a rule is immediately preceded by something that is
*different from* what the rule would have.  If the preceding DST behaviour
has transitions later in the year than the transitions produced by the
rule, the last preceding transitions could occur less than 400 years
before the last rule-generated transition that zic puts in the output.

For example, suppose you have a rule (not expressible in POSIX form)
applying from 2012 onwards that has transitions in March and September,
plus some one-off transitions in November 2013.  zic sets max_year=2413,
with the intent that the 400 years 2014 to 2413 inclusive will be
repeated.  The last transition listed in the tzfile is in 2413-09,
less than 400 years after the last one-off transition in 2013-11.

The problem is how a tzfile reader is to determine which 400-year
period to repeat.  When creating the tzfile, zic internally has a
clear idea of what period it intends to be repeated, but it doesn't
write anything into the tzfile to make that period explicit.  Indeed,
it doesn't even explicitly indicate that it intends any such repetition.
(Adding a flag for this, in the 15 reserved octets, might be a good idea.)
The nearest thing the tzfile has to an explicit statement of the repeat
period is the last transition time in the file; an obvious approach is
to repeat the 400 years immediately preceding the last transition time.

A tzfile reader could potentially do better by rounding the last
transition time up to the end of the containing calendar year.  That would
fix the example that I gave above.  But it's based on knowledge of zic
implementation details.  Interpretation of the tzfile should be more
objective than that.  There are also edge cases around using extended
times of day, whereby a transition notionally associated with one year
could actually be located in a different year.  (And do you use UT or
local time for the year boundaries?)

Adding on some margin to the period of future explicit transitions
ensures that any reasonable implementation of the repeat on the reading
side will work.  Two years might be overkill, but with the kind of edge
cases available I'm not convinced that one year would suffice, and an
extra two or so transitions in a tzfile is very cheap overkill.

-zefram