[tz] Classifying IDs

Clive D.W. Feather clive at davros.org
Thu Oct 7 15:23:06 UTC 2021

Stephen Colebourne via tz said:
> Following on from the previous thread [1] I wanted to try and classify
> the IDs we have, which may or may not identify missing IDs.
> Again, please avoid talking about pre-1970 data at this point.
> Obsolete
> ------------
> IDs that are obsolete and should never be used. They date from many
> years ago whe tzdb was just starting. Yet these still do appear in
> downstream UIs even today (of course UIs should not use the tzdb ID
> list, but in reality lots do).
> Examples: Portugal, NZ-CHAT, Navajo, Libya.
> Proposal: Provide 3-6 months notice, then move obsolete IDs to a new
> file "obsolete" which downstream projects are strongly encouraged not
> to include. (I would argue that the time has come to properly remove
> these IDs, which are very inconsistent in terms of which are provided
> and which not, eg Portugal, but not Spain)

Counter-proposal. These should be treated as renamings. So Portugal ->
Europe/Lisbon and treated like your next category. I presume that these are
the same as some canonical zone since 1970. Pre-1970 data in these should
be treated however we decide to treat pre-1970 data.

> Deprecated, same location
> ------------------------------------
> IDs that have been deprecated with a single clear alternative ID being
> provided. Both IDs represent the same physical location/city.
> Spelling changes: Asia/Katmandu (replaced by Asia/Kathmandu),
> Asia/Rangoon (replaced by Asia/Yangon)
> ID structure changes: America/Louisville (replaced by
> America/Kentucky/Louisville)
> Proposal: Ensure all of these are in `backward`
> Consider: Is there any way to move these IDs to the obsolete file?
> Maybe after 5 years? Or do we just accept backwards compatibility
> restrictions on these?

Or make the information available (and possibly tools) to allow downstreams
to decide their policy on these.

For example, a file that said:

Asia/Rangoon Asia/Yangon rename 2005-11-26

(or whatever the actual rename date was).

The explicit "rename" there allows this file to show other things, such as
merges of zones that only differ pre-1970:

Europe/Oslo Europe/Berlin merge 2020-12-31

(or "merge-pre-1970").

> Legally described mega-zones
> -----------------------------------------
> IDs for locations where a federal or supra-national body defines
> rules, eg the EU or US DOT.
> Examples: US/Mountain, CET, WET
> Consider: Can we write down a rule to identify when something like
> this should be included? Then move the matching IDs to the main files
> (eg. are the EU and US DOT the only two examples here?)

The EU doesn't define "CET" or "WET", or even specify the names. The EU
specifies constraints on the rule for the zones that cover the places that
follow EU rules. So "CET" is not a zone; it's a collection of zones that
have been in step since early 1983 or whatever later date they joined the
collection. "WET", incidentally, starts from 1998-03-29. Nothing in EU law
stops a country moving from "CET" to "WET" or "EET". And these names do not
appear anywhere I can find in the legislation (it was "Member States
belonging to the zero time zone and the other Member States" and later
became "the Member States apart from Ireland and the United Kingdom, on the
one hand, and Ireland and the United Kingdom, on the other", which is not
the same division; this was probably when Portugal joined).

So, since these don't describe "places keeping the same time since 1970",
what exactly are they and why do we have them?

(I suspect that US/Mountain has a similar problem in that not everywhere in
Mountain time observes DST.)

> Regions
> -----------
> IDs for abstract regions that have had the same wall clock since 1970.
> Examples: Europe/Berlin, America/New_York, Africa/Abidjan
> Proposal: Ensure all of these are in the main files.
> Consider: Should there be new IDs for each of these abstract regions
> to indicate they are a separate and distinct concept? eg.
> "Region/Berlin". (Maybe something to consider in future threads as it
> isn't clear what the benefit of doing so is without considering
> pre-1970 which I'm still trying to avoid)
> Non-region locations
> ---------------------------
> IDs for locations that are not region IDs. Each ID will have the same
> wall clock since 1970 as one of the region IDs.
> Examples: Europe/Oslo, Europe/Amsterdam, Atlantic/Reykjavik
> Consider: Can we write down a rule that covers which IDs are included
> here?

If I'm understanding correctly what Paul's been doing, these are "IDs that
refer to regions that have the same time history since 1970 as another
region but a different time history before that and are not the region that
uses the ID that would be chosen using our standard conventions (basically
'largest town')".

Or, put another way, partition the set of all zones into subsets, each of
which have the same history since 1970. In each subset, one is what you've
called an abstract region and the rest are non-region locations. The choice
of the first is made based on our normal naming rules.

> And therefore when a new ID can be added to this set? If we can
> define a rule, then these can be split so rule-following IDs are in
> the main files and rule-breaking ones are in `backward` (although
> ideally they should be separate from the spelling changes).

Hang on, why should we ever add a new ID at all. My view is that we should
*not* be adding new IDs. So long as we're talking about a post-1970
database, that is. In other words, the rule is "they stay for backwards
compatibility reasons and no other".

For someone only building with 1970-onwards data, these would be equivalent
to aliases, so are treated as equivalent to renames - see above.

> Obviously,
> we can say these IDs only exist for backwards compatibility, but that
> seems like a weak justification,


If we were starting a new TZDB from scratch, we could ignore it because
there wouldn't be any backwards to be compatible with. But there is, so we
need to thing about it.

> and doesn't tackle the issue of when
> a new ID would be added to the list (which has been a point of
> tension).

Why not "never"? Well, apart from following a bug fix.

> As is well known, I think the obvious rule is that the IDs follow the
> ISO-3166-1 standard (rule: one ID per ISO code, additional IDs may be
> added where clocks have diverged since 1970). Using ISO-3166 can be
> justified by IANA domain policy [2]:

That's not a justification, since IANA were handing rights over these names
to those ISO bodies. And IANA have long given up on that policy, which is
why there are .gg and .scot.

> As per the previous thread, these non-region location IDs are actively
> used in downstream business applications, and it is not OK that only
> works because tzdb happens to have IDs for backwards compatibility.
> There needs to be a better justification than that

Sorry, but that is *exactly* the definition of "backwards compatibility".
If someone starts a new application that only uses post-Paul-merges names
because that's all they see, they will *not* be using these names nor care
in the slightest about them.

It's *ALL* about backwards compatibility.

> Fixed/etc type rules
> --------------------------
> IDs with a fixed offset
> Examples: GMT, UTC, Etc/GMT-9
> Proposal: No change, retain in the main files unless a particular ID
> is considered obsolete or deprecated

The easiest way to treat these is to deem that there are certain virtual
places with their own time history (e.g. "international waters near the 30
degrees east meridian") which deserve their own zone on that basis (this
one being Etc/GMT-1).

But you've left out the mammoth in the room, which is pre-1970 data.

Clive D.W. Feather          | If you lie to the compiler,
Email: clive at davros.org     | it will get its revenge.
Web: http://www.davros.org  |   - Henry Spencer
Mobile: +44 7973 377646

More information about the tz mailing list