[tz] Reason for removal of several TZ abbreviations
guy at alum.mit.edu
Tue Dec 5 05:22:03 UTC 2017
On Dec 4, 2017, at 8:02 PM, Tim Parenti <tim at timtimeonline.com> wrote:
> On 4 December 2017 at 19:21, Michael Douglass <mikeadouglass at gmail.com> wrote:
>> Then we can stop having these long arguments about why one name or another isn't in the tz data. Everybody is free to generate their own list if they so wish.
> On 4 December 2017 at 19:38, David Patte ₯ <dpatte at relativedata.com> wrote:
>> Using such a scheme, a database such as geonames would then map a location to the tz id (as it does now), defering political arguments to geonames, and removing them from this list. It seems appropriate.
> Except it likely doesn't make anything better for maintenance. With strictly opaque IDs, this project would still need to track the approximate geographical scope of each identifier in much the same manner as we already do, so as to aid in identifying when zone splits are necessary, etc. Currently, that task is fairly clear with human-readable IDs identifying a city and commentary filling in the details where necessary. This is, of course, only done to roughly record the general contours of zones so we know how to update them, not any attempt at recording precise borders. Indeed, it is (or should be) well-known to this list that our IDs can be considered to overlap geographically in several cases, and that these are often the most geopolitically divisive cases.
> Replacing IDs with opaque strings would complicate this maintenance somewhat,
Because, if, for example, some US state currently in the Eastern time zone near its Western edge, and having obeyed the standard America/New_York daylight savings time transition rules since 1970, were to decide to move to the Central time zone - or to move the western half to the Central time zone - it'd be easier to remember that this means "the borders of tzdb region America/New_York change" than to remember that this means "the borders of tzdb region Q7F9 change"?
> although that complication could be mitigated by further standardization our geographical commentary to assist in maintenance. However, at that point, the questions don't become "why doesn't Important City have its own ID?"
Do you mean people are asking "why doesn't this tzdb region have multiple IDs for all its Important Cities?" or do you mean they're asking "why don't all these Important Cities have their own tzdb regions and therefore have their own IDs"?
> but rather "why isn't Important City in the commentary for this ID?"
For every tzdb region, is there some small number of Important Cities such that 1) it's small enough that we could list *all* of them in the commentary for the region and 2) there aren't any cities for which we'll still get "why isn't Important City in the commentary for this ID?" and can't somewhat reasonably reply "that's not really an Important Enough City, sorry"?
> or "why is Important City listed in the commentary for the 'wrong' ID?"
So you're thinking of a case here where the tzdb region to which a city would belong is disputed (probably due to the political status of the city being disputed)?
>> But what about the tz designations (such as EDT), do we have a solution to that conundrum?
> In the past, there was a proposal to simply eliminate them all, and use some static string like "zzz", "-", "local", or "". But this has similar problems as above, since it is impossible to identify the source zone or corresponding UTC timepoint. The numeric %z format that many zones have recently moved to is marginally better in the one regard, but we've seen that that has had (the expected) knock-on effects caused by (mis)use of the "abbreviation" field when heuristics fail to identify the correct source zone from the incomplete information.
> Honestly, if it weren't for character limitations, the "abbreviation" field should probably just return something akin to "<-05>America/New_York" for all zones at all times, leaving strings like "EST" (in English) or "HNE" (in French) to localization projects like CLDR. This would have the advantage of providing everything needed to reconstruct the UTC timepoint and identify the source zone (given a specific version of this project, at least), but the distinct disadvantage of being a bit unwieldy.
By "character limitations" I assume you're referring to the limitations imposed by
section 8.3 "Other Environment Variables", combined with the fact that a lot of time zone code on POSIX-compliant systems, and trying-at-least-to-be-reasonably-compatible-with-POSIX systems, uses the information in the tzdb to provide time zone abbreviations, and that rather a lot of software would be Rudely Surprised if we provided an offset from UTC and a tzdb region ID rather than an abbreviation of the form that people expect that that UN*Xes have provided for a few decades.
(I think the "combined with" part is actually the relevant part here; the POSIX specification just codifies what UN*X developers, and users of UN*Xes, have come to expect.)
Perhaps what we *should* do is provide, in addition to UN*X-style abbreviations, "metazone" names, as per
so that software can, given a tzdb region ID and a date/time, find out the metazone for that region at that date and time from the tzdb data, and then look up that metazone in the CLDR and find the time zone long name and abbreviation that should be used, in a given locale, for that date and time. If there *is* no metazone, as would be the case in, for example, the US from February 9, 1942 to September 30, 1945 (there's no metazone I can see for "war time"), we wouldn't supply one, and that software would have to fall back on the abbreviation we supply.
We'd still have to supply abbreviations for the benefit of software that *doesn't* use the CLDR as well as for "time zones" with no metazone (and, at least for some period of time, the CLDR would still have to supply supplemental/metaZones.xml, duplicating information from the tzdb, as it might be used on systems that *don't* supply the metazone).
More information about the tz