[tz] Pre-1970 data

Brian Park brian at xparks.net
Fri Nov 5 04:17:34 UTC 2021


I get the impression that this debate is caused by the existence of 2
different schools of thought:

* Descriptive: Paul wants to describe the timezones of the world without
regard to how those time zones were created, and merge them into the
smallest set that can generate the timekeeping rules. I can see that in
this view, merging timezones from different countries into the same
equivalence class is reasonable.

* Prescriptive: I think Stephen and others start with the fact that time
zones are the creations of political organizations which write the
regulations that define the timezones. Those governing bodies are
predominantly organized by country in a hierarchical structure. In this
view, it does *not* make sense to merge timezones from different countries.
This view also implies that the TZ identifiers should reflect the political
organizational structure of the world.

I want to suggest that it may be possible for these 2 views to coexist. We
could create a new file, e.g. call it 'countryzone', which contains a set
of Links organized in a hierarchical tree by country, pointing to the Core
zones. Paul can maintain the Core files as before, and 'countryzone' would
be maintained by a different set of people. Assuming the Core timezones is
a complete set that covers all unique timezones in the world, then all
other ISO-country based timezones can be mapped to one of the Core
timezones.

For this to work, I think we need to clarify the semantics of the 'Link'
records in the TZ database. As far as I can tell, there are at least 3
different meanings of the Link record:

1) Link Canonical Deprecated
  * Deprecated is an old zone which should no longer be used
2) Link Canonical Alternate
  * alternate spelling or alias, but not deprecated
3) Link Canonical Merged
  * zones which were merged because they have the same rules by chance, but
there is no semantic relationship to each other

I propose that we replace the 'Link' keyword with 3 new keywords that
identify the precise meaning: LinkOld, LinkAlt, and LinkMerged. (My hope is
that keeping the 'Link' prefix will make it easy to update existing TZDB
parsers to preserve their previous behavior.) Slight aside: I learned that
some 3rd party timezone libraries do not preserve round-trip zone Id for
Links. In other words, (pseudo-code) `TimeZone(linkName).getName() !=
linkName`. I wonder if it is worth defining the expected behavior of each
type of Links for downstream libraries.

For the pre-1970 data, it is my understanding that the 'backzone' file
contains Zone records which should replace ONLY the LinkMerged records
found in the other files. I propose that all LinkMerged records be
extracted into a separate file (let's call it 'mergedzone') so that there
is a clear symmetry between 'backzone' and 'mergedzone', which allows them
to be substituted for each other. The dependency diagram looks something
like this:

countryzone
   |
   v
  Core (africa, asia, etc...)
    +-- backzone
    +-- mergedzone

Downstream libraries which want only post-1970 can use: countryzone, Core,
mergedzone

Downstream libraries which want to include pre-1970 can use: countryzone,
Core, backzone

@Stephen: We may be at a point where further debate is not productive.
Perhaps we should create an exploratory fork of the TZDB to evaluate these
ideas explicitly. It is easier to get feedback from a concrete
implementation than to continue discussing ideas and options in a vacuum. I
propose a GitHub project with an initial seed of the 10 raw TZDB files. And
let's use the usual GitHub PR, Issues, and Discussions workflow, so that
proposals can be reviewed and discussed before being committed into the
repo.

If there is any chance that this will result in being able to type
"Canada/Toronto" instead of "America/Toronto", that would resolve an
annoyance that has lasted some 30-35 years.

Brian

On Thu, Nov 4, 2021 at 4:04 PM Stephen Colebourne via tz <tz at iana.org>
wrote:

> On Wed, 3 Nov 2021 at 22:40, Paul Eggert <eggert at cs.ucla.edu> wrote:
> > On 10/18/21 06:07, Stephen Colebourne via tz wrote:
> > > What tzdb previously offered was a set of IDs,
> > > based on a simple rule - "ID as needed for post-1970 data, with at
> > > least one per ISO country". Full history was available for each of
> > > these (whether accurate or not).
> >
> > That wasn't ever the case. For example, there was never full history
> > (accurate or not) for San Marino. We shouldn't base our analysis on the
> > idea that we formerly had at least one Zone per ISO country, as we never
> > had an ironclad rule like that and we did just fine without any such
> rule.
>
> Lets unpack this for a minute.
>
> Looking at the state of tzdb in mid 2012:
> - Europe/San_Marino existed as an ID
> - it was an alias for Europe/Rome
>
> https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/europe#L1452-L1461
>
> In practical terms as a user:
> - you could query it for full history
> - the data you got back was accurate post-1970
> - the data you got back pre-1970 was of unknown accuracy (except LMT
> which was definitely inaccurate)
> - the data was the best researched data for San Marino available
>
> As such, I don't think it is correct to say that "there was never full
> history" for San Marino. The ID existed and history could be queried.
> The data that was available was good enough because San Marino shares
> enough geopolitical history with Rome that users can overlook the
> distinction. And no-one has ever been motivated to do better. This is
> a hugely different scenario to Reykjavik returning data from Abidjan
> where you are intending to knowingly make the data worse for
> end-users.
>
> The ironclad rule (AFAICT) is that there was always an *ID* for each
> ISO country, and that the data it returned was acceptably accurate,
> not outrageously wrong.
>
>
> > There's no *timekeeping* reason to require a Zone for every ISO country.
> > Adding such a requirement would complicate maintenance.
>
> I think someone born in Iceland before 1970 might well disagree that
> there is no timekeeping reason at work here.
>
> I think the real problem here is that you are trying to fundamentally
> change what tzdb offers. I'm here communicating as clearly as I can
> that end-users expect one zone per country as a minimum because that
> is what they have had for 15 or 20 years. Retaining backwards
> compatibility for IDs is great, but meaningless if those IDs return
> backwards incompatible data.
>
> Ultimately, you haven't addressed my key point that a perfectly
> rational unified set of IDs has been bifurcated into ones that are
> deemed important and ones that are not. That is quite specifically
> something *new*, a change from what the project previously provided.
> And I think most would objectively judge it as being a degradation of
> what is offered by tzdb.
>
> > These downsides of a one-Zone-per-country rule may not appear to be all
> > that serious to people who are not actively maintaining the database,
> > but as the primary maintainer of a database that I would like to be as
> > accurate as possible, I would object to adding distracting and
> > error-prone makework like that to my volunteer workload.
>
> To be clear, I think this is exactly why tzdb should move beyond being
> a volunteer-led project. In practical terms, the only realistic
> financially supported option I'm aware of is CLDR. But it is up to
> those funding CLDR to decide if they are willing to pay to expand it's
> mandate.
>
> In reality I don't think there actually is any extra work, as you have
> already separately committed to including any historical data people
> provide, and new ISO codes are an extremely rare occurrence. The real
> work in recent years has been the fallout from your choice to degrade
> what tzdb offers.
>
> If you genuinely do want to reduce your volunteer work to only be the
> abstract post-1970 regions and not to maintain any data pre-1970, then
> you really should be clear about that. You could then look for an
> alternate maintainer of tzdb itself as you would be maintaining what
> amounts to a new database, which would best sit in a different git
> repo. That data could then be an input to tzdb itself.
>
> Stephen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/tz/attachments/20211104/23780550/attachment-0001.html>


More information about the tz mailing list