[tz] attic data for the tz database

Thu Aug 29 17:44:26 UTC 2013

"On 29 August 2013 17:28, Paul Eggert <eggert at cs.ucla.edu> wrote:
> the tz database evolves
+1

> we hope for the better.
So why are you making it worse?

Thats what I cannot fathom. The data that is being changed/deleted
results in nonsense pre-1970. Until you have a comprehensive solution
to the pre-1970 issue, you should revert the commit that has resulted
in the nonsense.

As a reminder, "America/Atikokan" used to be this:
LMT -06:06:28
Transition[Gap at 1895-01-01T00:00-06:06:28 to -06:00]
Transition[Gap at 1918-04-14T02:00-06:00 to -05:00]
Transition[Overlap at 1918-10-27T02:00-05:00 to -06:00]
Transition[Gap at 1940-09-29T00:00-06:00 to -05:00]

but is now the same as "America/Panama":
LMT -05:18:08
Transition[Overlap at 1890-01-01T00:00-05:18:08 to -05:19:36]
Transition[Gap at 1908-04-22T00:00-05:19:36 to -05:00]

>From this we can say (as fact, not opinion):
- the LMT value has changed (Panama is nowhere near Atokokan)
- the history of data before 1940 in Atikokan has changed
- the history previously showed Atikokan started defining zones in 1895
- the history now shows Atikokan started defining zones in 1908
- the history now shows Atikokan as never having had a -06:00 offset

The other IDs being altered have similar issues, but its easier to focus on one.

You might argue that it is just pre-1970 data which is inaccurate and
should not be relied on. I simply argue that you've taken the data
from unknown quality to definitely inaccurate - clearly worse. (And
that pre-1970 data is very visible in the work I do)

> I've compiled some "attic" data (appended to this
> email) which makes it clear that we have regularly replaced
> zones by links during tz maintenance.  This practice hasn't
> caused hardships for users.

Links for spelling mistakes obviously cause no issues. Beyond that,
they are clearly going to be losers unless the entire history of the
two zones are exactly identical. By entire history, I mean LMT,
pre-1970 and post-1970.

Bear in mind that the entire source data is visible in Java (LMT,
pre-1970 and post-1970) and that we parse the source files directly.
zic is a distraction.

Looking at the attic data, its clear that the LMT has been treated as
irrelevant in the past. Every time a zone with a unique ID is
converted to a Link, then its LMT is lost.

> Both filters could be implemented, and they could be applied
> in series.

This is all very well, but ignores the fact that other applications
parse the source files, including Java. Those applications would need
additional complex logic to fixup the data. There is too much focus on
the C code developed here and Unix, and not on other consumers of the
data.

FWIW, I also think that filtering like this isn't really a good idea
in practical terms for users. For example, say you filter most of
central Europe after say 2010, you'll only get one zone as everywhere
uses the same time. Now, lots of people setup their machines to that
one central zone. Then imagine the case where Greece leaves the EU and
starts to set its own time-zone. Everyone in Greece will now need to
reset their zone ID. Whereas, if everyone had just selected
Europe/Athens up front there would have been no problem. ie. zones
split as well as merge, and they will often do so on the historic
boundaries that are already captured in the tzdb.

As I said above, you should start by reverting the controversial
changes (see my other email). That takes the heat out of the immediate
issue.

Then, only make changes once you have a fully agreed strategy for
handling pre-1970 data that is not destructive, and that gives enough
notice to others to be able to adapt.

Stephen