[tz] Unicode ICU and Zone-to-Link changes

Sat Sep 25 07:23:27 UTC 2021

[retitling from "Re: [tz] Some thoughts about the way forward"]

On 9/24/21 6:49 AM, Mark Davis ☕ wrote:
> The Unicode ICU team discussed the proposed changes in the TZDB in their
> meeting earlier this week and we are reporting the consensus here.

Thanks for taking the time to write about this, as I have not had the 
time to follow how ICU deals with timezones. I am fuzzy, for example, on 
the relationship between CLDR and ICU when it comes to timezone data.

> Members are very concerned about the downstream impact, and the inevitable
> compatibility mismatches between different implementations.

Yes, and similar concerns were expressed by others. We eventually 
muddled through by generating just one new tzdb version, 2021b, which 
nobody really likes but which I hope avoids a tzdb fork for now.

> If the change is made, here are the probable steps that would happen in
> ICU, based on the two areas that would be affected.
> 
> *1. Dropping zone IDs from the zone.tab.*

Which zone.tab is this? I didn't see a zone.tab in the ICU4C 69.1 source 
or data tarballs.

The ICU4C data tarball has a file data/misc/zoneinfo64.txt that contains 
zoneinfo64:table(nofallback) defining Names as an array of tzdb Zone and 
Link names and some other strings; is that's what is meant by zone.tab?

> implementations rely on the mapping of zone IDs to ISO country codes.
> ICU already has an internal exception table that contains certain
> (zone IDs, ISO code) mappings that retains information that used to be in
> zone.tab. We would extend that table to add all of the zones dropped by the
> proposed change.

I'm not quite following, since no names in 2021a were dropped in 2021b. 
All that happened is that some names were changed from Zones to Links.

Although I'm probably barking up the wrong tree, I don't see why the 
abovementioned Names array would need to worry about Zone-to-Link 
changes. For example, America/Creston was changed from a Zone in 2021a 
to a Link in 2021b, but the Names array contains both Zones and Links so 
its "America/Creston" entry should not need to change.

> We would probably also move the data and the rest of
> zone.tab to CLDR, so that we have a public, structured set of data in XML
> and JSON. This would effectively clone the zone.tab data.

Sorry, I'm a bit lost here too. Isn't ICU data mostly sourced from CLDR? 
That's what 
<https://unicode-org.github.io/icu/userguide/icu_data/#icu-and-cldr-data> implies.

> That way, implementations could use the zone.tab information to maintain
> the difference between Europe/Oslo and Europe/Berlin. That is, while the
> internal software might map Europe/Oslo to Europe/Berlin via a Link to get
> rules for evaluation, the library would still treat Europe/Oslo as a
> separate ID from Europe/Berlin.

That sounds reasonable.

What did ICU and/or CLDR do when tzdb made similar Zone-to-Link changes 
in previous tzdb releases? For example, Australia/Currie was changed 
from a Zone in 2020d to a Link in tzdb 2020e.

Is there any reason for ICU to treat 2021b's Zone-to-Link changes 
differently than it treated 2020e's similar change?