[tz] Issues with pre-1970 information in TZDB
tgl at sss.pgh.pa.us
Thu Sep 23 16:35:12 UTC 2021
Robert Elz via tz <tz at iana.org> writes:
> If you want to claim some data is incorrect, provide evidence to support
> that, otherwise you're simply maligning whoever supplied the data in the
> first place (nb: not necessarily the person who edited it into tzdb, but
> the source of their information.)
Yeah, that is a fair point. It looks like a lot of the stuff that
initially got put into backzone was put there because the only source
for it was Shanks, and we've found enough errors in Shanks to have
healthy distrust for it. Still, in the absence of other evidence,
that remains the best available data.
> ... That way the data
> might be corrected. But not if it is buried in backzone.
After thinking about this for awhile, I think that a lot of our
problems can be summarized as "backzone is misdesigned". Having
links in the base dataset that are overwritten by more-extensive
information if you enable backzone is just an awful design, because
it's got next-door-to-zero discoverability. End users cannot tell
if they have the best available info or a lie, unless they understand
enough about TZif to look into that file tree --- and unless it's
set up using symlinks, which seems to be distinctly a minority
practice, even that won't make it very clear.
As a modest proposal, therefore, I suggest that we should consider
just dropping all the overwritable links. That way, if you have
a base dataset, it will be obvious that Africa/Timbuktu is not
good data because it won't be there at all. If you enable backzone
(which perhaps needs a better name), then you get Africa/Timbuktu
along with a ton of other data of perhaps dubious reliability.
But you know what you have. The current design where the same
zone identifier could refer to two different datasets is bad by
any rational standard, and we've only gotten away with it because
the field usage of backzone is negligible. But if we keep moving
stuff to backzone, that's going to change.
There's a separate question of what the rule should be for putting a
given zone into the "base" or "extended" collections. But maybe that
becomes less of a hill that people are ready to die on. If we do it
this way, I foresee a lot of distros starting to ship the "extended"
collection -- but they won't be shipping different definitions of the
same zone name.
regards, tom lane
More information about the tz