[tz] Issues with pre-1970 information in TZDB

Stephen Colebourne scolebourne at joda.org
Wed Sep 22 10:52:09 UTC 2021


(David Braverman recently asked for a summary of the issue. This is my
attempt to summarize in a relatively even-handed way)

The TZDB theory file starts with the following:

"The tz database attempts to record the history and predicted future
of civil time scales. It organizes time zone and daylight saving time
data by partitioning the world into timezones whose clocks all agree
about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00
UTC. Although 1970 is a somewhat-arbitrary cutoff, there are
significant challenges to moving the cutoff earlier even by a decade
or two, due to the wide variety of local practices before computer
timekeeping became prevalent."

I have always thought this is a wise choice for the management of time
zone data. I suspect that most people on this list would agree.

The rules for creating new IDs where post-1970 data differs are well
understood and, I believe, agreed upon. Apart from debates over
negative daylight saving, I do not believe there are any significant
issues with the data post-1970.


Despite the theory file introduction above, TZDB does in fact contain
data for some locations before 1970. The issue at hand is which
timezone IDs are allowed to have this data and which are not. And more
broadly, the degree to which it is acceptable to change the status quo
on the pre-1970 data.

Over many years, pre-1970 data was added to many different IDs.
However, over recent years, pre-1970 data has been removed.
(Technically, it has been moved to another file, but for the purposes
of those consuming the main set of tzdb files it has effectively been
removed.)

The net result of recent changes are various situations which might be
described as "cleaner", "more equitable", "nonsensical",
"unacceptable" or "less equitable" depending on your viewpoint. For
example, Europe/Berlin has pre-1970 data, but Europe/Oslo and
Europe/Stockholm do not. In technical terms, Europe/Oslo and
Europe/Stockholm are now aliases for Europe/Berlin (known as Links in
tzdb).

Where the problem lies is that a user who queries Europe/Oslo for the
timezone offset in 1950 used to get the data for Oslo but will now get
the data for Berlin. Depending on your viewpoint this is "irrelevant",
"unfortunate" or "offensive".

Why has the data changed? Because Oslo, Stockholm and Berlin all have
the same timezone data post-1970, and Berlin is the largest city. The
argument is that if only post-1970 data matters (as per the theory
file), then there is no justification for three separate data sets
when only one will do, and the one chosen is the one with the largest
city. The counter argument is that merging data sets across country
boundaries is unacceptable and politically naive, particularly when
there were no complaints about the previous status quo.


More broadly, different individuals, some representing organizations,
have expressed different opinions on what they do or do not want from
the tzdb data set. Some would like a full historical record of time
zone data, others want stability, many I suspect have absolutely no
interest whatsoever in pre-1970 data.

My personal concerns are data stability (agreed managed changes are
OK), and the politically-sensitive inaccuracy that results from
merging across country boundaries.


Can we keep responses limited on this thread? Perhaps only respond if
you think I've mischaracterized the issues at stake here? Or missed
something obvious?

Stephen


More information about the tz mailing list