[tz] Issues with pre-1970 information in TZDB

Wed Sep 22 18:31:17 UTC 2021

I've been following this discussion for months but I now find myself a 
little confused about what the issues are and from what the controversy 
arises.

As I've understood it there seems to have been three issues that 
motivated Paul to implement the 'merger':

1) Reducing the number of time zones to reduce the size of the distribution.
2) Merging time zones that have identical rule sets (at least since 1970).
3) Separating pre-1970 rule sets from post-1970 rule sets.

Stephen's summary is helpful but seems to concentrate on the 
ramifications of the 'merger' on time zone names (tags) and country 
codes, which I agree is an important consideration.

I'm inclined to agree with those that feel a reversion to 2021a would be 
the best approach. Indeed the changes would have significant 
consequences to my current (in development) tzdb parser which reads the 
source files directly with no modifications to accumulate all time zones 
that have existed. Filtering this comprehensive list of time zones to 
some subset for specific purposes, such as CLDR for Windows, is then my 
client application's responsibility. I'd really rather not have to make 
significant changes to accommodate the reorganization of the merged 
tzdb. I gather I am not alone

I am most thankful for and impressed by Paul's contributions over the 
years. Maybe I could ask him to summarize his initial reasons for the 
merger and why he feels reverting to 2021a is not a good interim solution?

Thanks,

-Brooks Harris

On 2021-09-22 10:34 AM, David Braverman via tz wrote:
> Thank you, I appreciate this.
>
> David Braverman
>
> -----Original Message-----
> From: tz<tz-bounces at iana.org>  On Behalf Of Stephen Colebourne via tz
> Sent: Wednesday 22 September 2021 05:52
> To: Time Zone Mailing List<tz at iana.org>
> Subject: [tz] Issues with pre-1970 information in TZDB
>
> (David Braverman recently asked for a summary of the issue. This is my
> attempt to summarize in a relatively even-handed way)
>
> The TZDB theory file starts with the following:
>
> "The tz database attempts to record the history and predicted future
> of civil time scales. It organizes time zone and daylight saving time
> data by partitioning the world into timezones whose clocks all agree
> about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00
> UTC. Although 1970 is a somewhat-arbitrary cutoff, there are
> significant challenges to moving the cutoff earlier even by a decade
> or two, due to the wide variety of local practices before computer
> timekeeping became prevalent."
>
> I have always thought this is a wise choice for the management of time
> zone data. I suspect that most people on this list would agree.
>
> The rules for creating new IDs where post-1970 data differs are well
> understood and, I believe, agreed upon. Apart from debates over
> negative daylight saving, I do not believe there are any significant
> issues with the data post-1970.
>
>
> Despite the theory file introduction above, TZDB does in fact contain
> data for some locations before 1970. The issue at hand is which
> timezone IDs are allowed to have this data and which are not. And more
> broadly, the degree to which it is acceptable to change the status quo
> on the pre-1970 data.
>
> Over many years, pre-1970 data was added to many different IDs.
> However, over recent years, pre-1970 data has been removed.
> (Technically, it has been moved to another file, but for the purposes
> of those consuming the main set of tzdb files it has effectively been
> removed.)
>
> The net result of recent changes are various situations which might be
> described as "cleaner", "more equitable", "nonsensical",
> "unacceptable" or "less equitable" depending on your viewpoint. For
> example, Europe/Berlin has pre-1970 data, but Europe/Oslo and
> Europe/Stockholm do not. In technical terms, Europe/Oslo and
> Europe/Stockholm are now aliases for Europe/Berlin (known as Links in
> tzdb).
>
> Where the problem lies is that a user who queries Europe/Oslo for the
> timezone offset in 1950 used to get the data for Oslo but will now get
> the data for Berlin. Depending on your viewpoint this is "irrelevant",
> "unfortunate" or "offensive".
>
> Why has the data changed? Because Oslo, Stockholm and Berlin all have
> the same timezone data post-1970, and Berlin is the largest city. The
> argument is that if only post-1970 data matters (as per the theory
> file), then there is no justification for three separate data sets
> when only one will do, and the one chosen is the one with the largest
> city. The counter argument is that merging data sets across country
> boundaries is unacceptable and politically naive, particularly when
> there were no complaints about the previous status quo.
>
>
> More broadly, different individuals, some representing organizations,
> have expressed different opinions on what they do or do not want from
> the tzdb data set. Some would like a full historical record of time
> zone data, others want stability, many I suspect have absolutely no
> interest whatsoever in pre-1970 data.
>
> My personal concerns are data stability (agreed managed changes are
> OK), and the politically-sensitive inaccuracy that results from
> merging across country boundaries.
>
>
> Can we keep responses limited on this thread? Perhaps only respond if
> you think I've mischaracterized the issues at stake here? Or missed
> something obvious?
>
> Stephen