[tz] Preparing to fork tzdb

Steffen Nurpmeso steffen at sdaoden.eu
Thu Sep 23 12:41:05 UTC 2021


Guy Harris wrote in
 <8C6BA6DA-AC7D-4FCE-97F0-1B277A981B55 at sonic.net>:
 ...
 |So one difference here appears to be that the pre-1970 data for Europe/O\
 |slo may be accurate while the pre-1970 data for America/Montreal may \
 |be inaccurate.

But isn't the drive to backzone going on for longer, yes hasn't it
been introduced for exactly the purpose in 2014?

  ac5bf48519
    New data file 'backzone' for out-of-scope and/or poorly-sourced data.

where NEWS says

    A new file 'backzone' contains data which may appeal to
    connoisseurs of old time stamps, although it is out of scope for
    the tz database and is often poorly sourced.  The new file is not
    recommended for ordinary use and its entries are not installed by
    default.  (Thanks to Lester Caine for the Guernsey, Jersey, and
    Isle of Man entries in 'backzone'.)

Ever since i am on this ML (at least a decade i would say) over
and over again it is said to everybody that the IDs are just
strings and the data is the thing, that selection is a downstream
thing etc etc etc.
I surely hated it, but that is seven years!

 |Another difference is that Toronto and Montreal are in the same country, \
 |while Oslo and Berlin aren't.

The thing is that _i_ never treated links as links, even my dumb
brain did manage to get by to separate data from names.  (I posted
some code snippets many years ago.  Yes, negative leaps i failed
for.  It was just a single binary DB file with the zone data and
at the end a TOC of names pointing to offsets of their data.
I was prowd by then, i was stupid.  Whatever.)

Ever since i am on this ML it was discussed often to possibly
replace the identifiers with anonymized strings, i wonder how
anyone could create an interface where one name maps to another
name.  And what for?  This is not just "super correct" and "i give
my users the full knowledge of the real thing, of anything", that
is just broken.  It is anyway a programmer-only thing, maybe also
a UNIX command line power user thing -- but to get it _really_
right for _users_, you had to use ICU data and use their approach
of time zones, that includes translation then also, does it?

You know, as a human german speaking being i would _never_ assume
Europe/Vienna and do "$ TZ=Europe/Vienna date" really, that is
a nerd assumption.  Maybe "$ date --country=AUstrIA" or
"$ date --city Würzburg" or "$ date --state Baden-Württemberg",
and i only write this for case-messing out-of-ASCII.

Sorry for getting bold, but for one i really did not make that
error (!), and then i also find it absurd to claim to offer
a timezone interface that covers all the date and time data, and
then to exclude backzone.  That is schizophrenic, that is
fascistic, because of course bad quality data that has been
collected from an uncounted number of sources (and Mr. Eggert
seems to be very interested in the topic, and actively sleuths in
libraries and other sources to improve his knowledge on the topic,
which i cannot claim from myself) is much better than simply
excluding the few traces that exist, and asserting "this is the
best i can offer".  I do not think Joda would do that.  (iirc)

On the other hand the announced release strategy felt offending to
me.  Even though many hours of talking over several months were
spent where possibly some code fixes etc would have been due.
Also, for JAVA in particular, i seem to remember that there was
discussion that it really would be better to use custom JAVA
tarballs for downloading already years ago, but it seems nothing
changed.  If it would, then a simple sed(1) that postprocesses any
Link would have been put in place, maybe.  I also remember that in
the last discussion it seemed at least one of my mails was
forwarded behind the scenes before the administrator (whoever it
is) waved it through.  I really do not know what for.

One approach could for example be to _require_ a new build flag,
so that downstream consumers aka packagers which do not read the
list but only have robots sitting around that watch out for
releases will get build failures, and thus have to adjust their
packaging recipe.  Then the release announcement could also
prominently reassert the IANA TZ direction of going post 1970 by
default, but to not actively cut zone data at 1970, ie, that
backzone is needed for the full picture if you want to get it
right.  This does not satisfy data-only downstream consumers, but
then again these should really know what they are doing.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


More information about the tz mailing list