<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I get the impression that this debate is caused by the existence of 2 different schools of thought:<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Descriptive: Paul wants to describe the timezones of the world  without regard to how those time zones were created, and merge them into the smallest set that can generate the timekeeping rules. I can see that in this view, merging timezones from different countries into the same equivalence class is reasonable.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Prescriptive: I think Stephen and others start with the fact that time zones are the creations of political organizations which write the  regulations that define the timezones. Those governing bodies are predominantly organized by country in a hierarchical structure. In this view, it does *not* make sense to merge timezones from different countries. This view also implies that the TZ identifiers should reflect the political organizational structure of the world.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I want to suggest that it may be possible for these 2 views to coexist. We could create a new file, e.g. call it 'countryzone', which contains a set of Links organized in a hierarchical tree by country, pointing to the Core zones. Paul can maintain the Core files as before, and 'countryzone' would be maintained by a different set of people. Assuming the Core timezones is a complete set that covers all unique timezones in the world, then all other ISO-country based timezones can be mapped to one of the Core timezones.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">For this to work, I think we need to clarify the semantics of the 'Link' records in the TZ database. As far as I can tell, there are at least 3 different meanings of the Link record:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">1) Link Canonical Deprecated</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">  * Deprecated is an old zone  which should no longer be used<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">2) Link Canonical Alternate</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">  * alternate spelling or alias, but not deprecated<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">3) Link Canonical Merged</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">  * zones which were merged because they have the same rules by chance, but there is no semantic relationship to each other </div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I propose that we replace the 'Link' keyword with 3 new keywords that identify the precise meaning: LinkOld, LinkAlt, and LinkMerged. (My hope is that keeping the 'Link' prefix will make it easy to update existing TZDB parsers to preserve their previous behavior.) Slight aside: I learned that some 3rd party timezone libraries do not preserve round-trip zone Id for Links. In other words, (pseudo-code) `TimeZone(linkName).getName() != linkName`. I wonder if it is worth defining the expected behavior of each type of Links for downstream libraries.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">For the pre-1970 data, it is my understanding that the 'backzone' file contains Zone records which should replace ONLY the LinkMerged records found in the other files. I propose that all LinkMerged records be extracted into a separate file (let's call it 'mergedzone') so that there is a clear symmetry between 'backzone' and 'mergedzone', which allows them to be substituted for each other. The dependency diagram looks something like this:<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default"><span style="font-family:monospace">countryzone</span></div><div class="gmail_default"><span style="font-family:monospace">   |</span></div><div class="gmail_default"><span style="font-family:monospace">   v<br></span></div><div class="gmail_default"><span style="font-family:monospace">  Core (africa, asia, etc...)<br></span></div><div class="gmail_default"><span style="font-family:monospace">    +-- backzone</span></div><span style="font-family:monospace"><span class="gmail_default"></span>    +-- mergedzone<br></span></div><div dir="ltr"><br></div><div dir="ltr"><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">Downstream libraries which want only post-1970 can use: countryzone, Core, mergedzone</div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default"><br></div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">Downstream libraries which want to include pre-1970 can use: countryzone, Core, backzone</div><br><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">@Stephen: We may be at a point where further debate is not productive. Perhaps we should  create an exploratory fork of the TZDB to evaluate these ideas explicitly. It is easier to get feedback from a concrete implementation than to continue  discussing ideas and options in a vacuum.  I propose a GitHub project with an initial seed of the 10 raw TZDB files. And let's use the usual GitHub PR, Issues, and Discussions workflow, so that proposals can be reviewed and discussed before being committed into the repo.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">If there is any chance that this will result in being able to type "Canada/Toronto" instead of "America/Toronto", that would resolve an annoyance that has lasted some 30-35 years.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Brian<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 4, 2021 at 4:04 PM Stephen Colebourne via tz <<a href="mailto:tz@iana.org">tz@iana.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 3 Nov 2021 at 22:40, Paul Eggert <<a href="mailto:eggert@cs.ucla.edu" target="_blank">eggert@cs.ucla.edu</a>> wrote:<br>

> On 10/18/21 06:07, Stephen Colebourne via tz wrote:<br>

> > What tzdb previously offered was a set of IDs,<br>

> > based on a simple rule - "ID as needed for post-1970 data, with at<br>

> > least one per ISO country". Full history was available for each of<br>

> > these (whether accurate or not).<br>

><br>

> That wasn't ever the case. For example, there was never full history<br>

> (accurate or not) for San Marino. We shouldn't base our analysis on the<br>

> idea that we formerly had at least one Zone per ISO country, as we never<br>

> had an ironclad rule like that and we did just fine without any such rule.<br>

<br>

Lets unpack this for a minute.<br>

<br>

Looking at the state of tzdb in mid 2012:<br>

- Europe/San_Marino existed as an ID<br>

- it was an alias for Europe/Rome<br>

<a href="https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/europe#L1452-L1461" rel="noreferrer" target="_blank">https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/europe#L1452-L1461</a><br>

<br>

In practical terms as a user:<br>

- you could query it for full history<br>

- the data you got back was accurate post-1970<br>

- the data you got back pre-1970 was of unknown accuracy (except LMT<br>

which was definitely inaccurate)<br>

- the data was the best researched data for San Marino available<br>

<br>

As such, I don't think it is correct to say that "there was never full<br>

history" for San Marino. The ID existed and history could be queried.<br>

The data that was available was good enough because San Marino shares<br>

enough geopolitical history with Rome that users can overlook the<br>

distinction. And no-one has ever been motivated to do better. This is<br>

a hugely different scenario to Reykjavik returning data from Abidjan<br>

where you are intending to knowingly make the data worse for<br>

end-users.<br>

<br>

The ironclad rule (AFAICT) is that there was always an *ID* for each<br>

ISO country, and that the data it returned was acceptably accurate,<br>

not outrageously wrong.<br>

<br>

<br>

> There's no *timekeeping* reason to require a Zone for every ISO country.<br>

> Adding such a requirement would complicate maintenance.<br>

<br>

I think someone born in Iceland before 1970 might well disagree that<br>

there is no timekeeping reason at work here.<br>

<br>

I think the real problem here is that you are trying to fundamentally<br>

change what tzdb offers. I'm here communicating as clearly as I can<br>

that end-users expect one zone per country as a minimum because that<br>

is what they have had for 15 or 20 years. Retaining backwards<br>

compatibility for IDs is great, but meaningless if those IDs return<br>

backwards incompatible data.<br>

<br>

Ultimately, you haven't addressed my key point that a perfectly<br>

rational unified set of IDs has been bifurcated into ones that are<br>

deemed important and ones that are not. That is quite specifically<br>

something *new*, a change from what the project previously provided.<br>

And I think most would objectively judge it as being a degradation of<br>

what is offered by tzdb.<br>

<br>

> These downsides of a one-Zone-per-country rule may not appear to be all<br>

> that serious to people who are not actively maintaining the database,<br>

> but as the primary maintainer of a database that I would like to be as<br>

> accurate as possible, I would object to adding distracting and<br>

> error-prone makework like that to my volunteer workload.<br>

<br>

To be clear, I think this is exactly why tzdb should move beyond being<br>

a volunteer-led project. In practical terms, the only realistic<br>

financially supported option I'm aware of is CLDR. But it is up to<br>

those funding CLDR to decide if they are willing to pay to expand it's<br>

mandate.<br>

<br>

In reality I don't think there actually is any extra work, as you have<br>

already separately committed to including any historical data people<br>

provide, and new ISO codes are an extremely rare occurrence. The real<br>

work in recent years has been the fallout from your choice to degrade<br>

what tzdb offers.<br>

<br>

If you genuinely do want to reduce your volunteer work to only be the<br>

abstract post-1970 regions and not to maintain any data pre-1970, then<br>

you really should be clear about that. You could then look for an<br>

alternate maintainer of tzdb itself as you would be maintaining what<br>

amounts to a new database, which would best sit in a different git<br>

repo. That data could then be an input to tzdb itself.<br>

<br>

Stephen<br>

</blockquote></div></div>