[tz] Some thoughts about the way forward

Tom Lane tgl at sss.pgh.pa.us
Fri Sep 24 14:29:01 UTC 2021


Paul Eggert <eggert at cs.ucla.edu> writes:
> If you want to maximize data stability under the constraint of being 
> fair, then the current development repository beats all other proposals 
> I've seen so far.

Several other people have already made this point in varying words,
but: why are you so insistent that the only way to improve fairness
is to make the default contents of tzdb strictly worse?  Why not
strive to make it strictly better, instead?

I will agree that there's some room for debate as to whether enabling
all of backzone by default is "strictly better".  Some of the data in
it is probably wrong.  But a lot of it is probably right, too --- in
particular, a whole lot of what you shoved in there since 2021a is
very well attested.  In any case, the data that we are currently
substituting by default is *certainly* wrong.  Moreover, getting that
data back into mainstream circulation would improve our chances of
finding and fixing remaining errors.

I just finished looking through git-tip backzone to get a better idea
of exactly what's in there.  I count 113 non-commented Zones (up from
82 in 2021a, so this has been a rather large expansion of that category).
Of those, only 13 have comments questioning their veracity:

Africa/Douala
Africa/Malabo
Africa/Porto-Novo
America/Creston
America/Montreal
Antarctica/Vostok
Asia/Hanoi
Asia/Vientiane
Europe/Luxembourg
Indian/Cocos
Pacific/Chuuk
Pacific/Enderbury
Pacific/Midway

There's also America/Rosario, which seems more "superseded by
other entries" than "wrong", though I've not traced it closely.

So it's pretty clear that a lot of what is in backzone is not there
because we have any reason to doubt it.  (I am here rejecting the
proposition that "if the only source is Shanks then it's probably
wrong".  You need evidence to call an entry probably wrong.)

Perhaps we ought to subdivide backzone more finely.  I'm now thinking
about a three-tier classification of zones:

Class A: in-scope per the 1970 cutoff rule.  Included in all builds
of tzdb.

Class B: out-of-scope per the cutoff rule, but we have no reason
to doubt correctness.  Included in the default build, but perhaps
we could offer an easy way to exclude these.

Class C: out-of-scope and there is evidence that it might be wrong.
Not included by default, needs a build choice to include.

Class C would initially be the zones I listed above, but new evidence
could cause zones to move to another class.

In any case, I am firmly of the opinion that link-merging is a horrid
idea and we should get rid of it, not do more of it.  If a given build
does not contain the best data we have for a zone, it should not define
that zone at all, rather than substitute false data.  The path you are
currently on is inevitably going to lead to significant populations of
systems offering different definitions of these zones than other
systems do, and that is going to be a mess.

			regards, tom lane



More information about the tz mailing list