[tz] Pre-1970 data

Thu Nov 4 23:03:33 UTC 2021

On Wed, 3 Nov 2021 at 22:40, Paul Eggert <eggert at cs.ucla.edu> wrote:
> On 10/18/21 06:07, Stephen Colebourne via tz wrote:
> > What tzdb previously offered was a set of IDs,
> > based on a simple rule - "ID as needed for post-1970 data, with at
> > least one per ISO country". Full history was available for each of
> > these (whether accurate or not).
>
> That wasn't ever the case. For example, there was never full history
> (accurate or not) for San Marino. We shouldn't base our analysis on the
> idea that we formerly had at least one Zone per ISO country, as we never
> had an ironclad rule like that and we did just fine without any such rule.

Lets unpack this for a minute.

Looking at the state of tzdb in mid 2012:
- Europe/San_Marino existed as an ID
- it was an alias for Europe/Rome
https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/europe#L1452-L1461

In practical terms as a user:
- you could query it for full history
- the data you got back was accurate post-1970
- the data you got back pre-1970 was of unknown accuracy (except LMT
which was definitely inaccurate)
- the data was the best researched data for San Marino available

As such, I don't think it is correct to say that "there was never full
history" for San Marino. The ID existed and history could be queried.
The data that was available was good enough because San Marino shares
enough geopolitical history with Rome that users can overlook the
distinction. And no-one has ever been motivated to do better. This is
a hugely different scenario to Reykjavik returning data from Abidjan
where you are intending to knowingly make the data worse for
end-users.

The ironclad rule (AFAICT) is that there was always an *ID* for each
ISO country, and that the data it returned was acceptably accurate,
not outrageously wrong.

> There's no *timekeeping* reason to require a Zone for every ISO country.
> Adding such a requirement would complicate maintenance.

I think someone born in Iceland before 1970 might well disagree that
there is no timekeeping reason at work here.

I think the real problem here is that you are trying to fundamentally
change what tzdb offers. I'm here communicating as clearly as I can
that end-users expect one zone per country as a minimum because that
is what they have had for 15 or 20 years. Retaining backwards
compatibility for IDs is great, but meaningless if those IDs return
backwards incompatible data.

Ultimately, you haven't addressed my key point that a perfectly
rational unified set of IDs has been bifurcated into ones that are
deemed important and ones that are not. That is quite specifically
something *new*, a change from what the project previously provided.
And I think most would objectively judge it as being a degradation of
what is offered by tzdb.

> These downsides of a one-Zone-per-country rule may not appear to be all
> that serious to people who are not actively maintaining the database,
> but as the primary maintainer of a database that I would like to be as
> accurate as possible, I would object to adding distracting and
> error-prone makework like that to my volunteer workload.

To be clear, I think this is exactly why tzdb should move beyond being
a volunteer-led project. In practical terms, the only realistic
financially supported option I'm aware of is CLDR. But it is up to
those funding CLDR to decide if they are willing to pay to expand it's
mandate.

In reality I don't think there actually is any extra work, as you have
already separately committed to including any historical data people
provide, and new ISO codes are an extremely rare occurrence. The real
work in recent years has been the fallout from your choice to degrade
what tzdb offers.

If you genuinely do want to reduce your volunteer work to only be the
abstract post-1970 regions and not to maintain any data pre-1970, then
you really should be clear about that. You could then look for an
alternate maintainer of tzdb itself as you would be maintaining what
amounts to a new database, which would best sit in a different git
repo. That data could then be an input to tzdb itself.

Stephen