[tz] Pre-1970 data

Stijn “Adhemar” Vandamme adhemar at adhemar.eu
Wed Oct 20 21:30:56 UTC 2021


The “should timezones be merged when they contain identical offsets from
1970 onwards” discussion has been going on for quite a while on the tz
mailing list.
Still, contributions that I find insightful are still being made.

Stephen Colebourne recently made this observation:

> The key observation is that the segregation between these two groups
> > of IDs *did not exist* until around 2014. It was only in 2018 that the
> > ISO country rule was removed.
>

This change was reflected in the documentation <theory.html> as recently as
20/21 February 2019
<
https://github.com/eggert/tz/commit/5f48dee82bc111b21f3610893e3b788e37396bce
>
<
https://github.com/eggert/tz/commit/6176aefe79e83ddb8f255849b85c149f34d46aba
>

And Stephen made this spot-on point of analysis:

> FWIW, it is clear to me that there is an aspect of imposing a
> > US-centric timezone system on other parts of the world. The recent
> > tzdb approach of focussing entirely on timezone regions makes perfect
> > sense for the US, where region boundaries do not follow state lines,
> > and ordinary members of the public need to be aware of whether they
> > are in US/Mountain or US/Central. This simply isn't the timezone model
> > in many other parts of the world. In places like Europe and Asia, the
> > timezone is driven primarily by the country you live in - an ordinary
> > member of the public in Iceland is never going to associate with some
> > abstract timezone region stretching down the Atlantic that is not
> > named, not legally defined and is little more than a random outcome
> > based on tzdb's choice of 1970. Even in somewhere like Norway, an
> > ordinary member of the public will understand that although they
> > follow CET, their timezone is actually driven by their Government in
> > Oslo. The brilliance of the original rule - "ID as needed for
> > post-1970 data, with at least one per ISO country" - was that it
> > seamlessly handled *both* models of timezone in one unified set of
> > IDs. Removal of the ISO country part has completely destabilised that
> > balance.
>


If I may throw in my two-penny, it would pertain to the documentation of
this policy.

Presently, <theory.html> does mention

   - that timezones are zones whose clocks all agree about timestamps that
   occur after the 1970 POSIX Epoch;
   - that 1970 is indeed a somewhat-arbitrary cutoff, but that moving it
   earlier even by a decade or two would impose significant challenges, due to
   the wide variety of local practices before computer timekeeping became
   prevalent;
   - that pre-1970 clock transitions are recorded for location-based
   timezones, but that the database is not designed for and does not suffice
   for applications requiring accurate handling of all past times everywhere;
   - that the previous
   “at-least-one-timezone-per-country-with-officially-designated-ISO-3166-1-two-letter-country-code”
   rule is dropped for this imposed too high a maintenance burden; and
   - that some information outside the scope of the database (implied:
   pre-1970) is collected in a file backzone that is distributed along with
   the database proper, but that this file is less reliable and does not
   necessarily follow database guidelines.


What astonishes me in this issue is that high-quality information is being
removed from the database proper into backzone, without this being
mentioned in the documentation <theory.html>. Granted, <theory.html> does
not explicitly state an
“at-least-attempt-to-get-monotonically-closer-to-the-truth-with-increasing-versions”
policy, but one would think that’s implied in maintaining a (versioned)
database, unless explicitly stated otherwise.

The other way around (moving information from backzone to the database
proper, e.g. when historical information is double-checked and confirmed to
be in accordance with database guidelines) is totally acceptable without
mentioning in the documentation <theory.html>.
Indeed, even though I know that correcting pre-1970 data is not the
priority of this project, I’ve seen some great sluthing of historical time
zone offsets presented on the tz mailing list in the past, and I kinda
assumed that the results of such research found its way to the database
proper (or backzone when doubts remained).

In my opinion, removing high-quality information — thereby having correct
offsets changed into incorrect offsets in information systems that rely on
the tz database and keep it up-to-date on the default settings (which
disregard backzone)  — should cease, and should preferably be undone where
such damage has already occured.

Seeing how much problems pursuing “fairness” under the new rules causes,
perhaps it’s worth reinstating the previous
“at-least-one-timezone-per-country-with-officially-designated-ISO-3166-1-two-letter-country-code”
rule, either officially or unofficially. (And for what it’s worth, this
comes from someone who generally tries to avoid relying on country lists
such as ISO 3166-1, due to strong political belief in the desirability of
independence or at least increased self-governance for several
countries/regions currently lacking an officially-designated ISO 3166-1
two-letter country code, including Flanders, Wallonia, Scotland, Catalonia,
Basque Country, Canary Islands, Tibet, and others.)


But alternatively, if this project persists in its
“removing-high-quality-information and merging-for-merging’s-sake” policy,
then at the very least this should be mentioned clearly, unambiguously and
conspicuously highlighted (bold, bigger font size, bright but fully
saturated background colour, following an animated warning icon, the lot)
in the documentation <theory.html>:

PREVIOUSLY OR CURRENTLY CORRECT PRE-1970 OFFSET INFORMATION IS
> INTENTIONALLY BEING CHANGED AND/OR MAY INTENTIONALLY BE CHANGED IN LATER
> VERSIONS OF THE tz DATABASE INTO INCORRECT OFFSET INFORMATION for the
> braindead reason of merging timezones for merging’s sake.
> USERS MUST NOT RELY ON UNDISPUTEDLY AND UNCONTESTEDLY CORRECT AND VERIFIED
> OFFSET INFORMATION REMAINING CORRECT in the database proper, except for
> timezones that have post-1970 particularities which prevent them from being
> merged with foreign timezones. (And even then — who knows what the
> coordinator-maintainer will come up with next…)
> Information systems that rely on tz for offset information, including for
> pre-1970 datetimes-with-timezone-identifier, ought to pass this warning on
> to their users.
>


— Adhemar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/tz/attachments/20211020/543bdfed/attachment.html>


More information about the tz mailing list