[tz] An alternate framing of timezone maintenance

Stephen Colebourne scolebourne at joda.org
Wed Sep 22 22:58:59 UTC 2021

I wanted to thank you for this detailed write up.

In response, I'll say that I don't believe that naming is the root of
the problem. There has been a lot of distracting discussion around ISO
countries, city names and more, but the fundamental issue actually is
with the timekeeping data set, not the naming. Whatever a region is
called, it still represents *somewhere*. Changing the history of that
somewhere is a big deal, whatever name you give the region.

Don't get me wrong, I do understand the attraction of arbitrary names.
But the existing names will never disappear as they are so widely
used, and so useful. To me adding new names is net negative at this
stage of tzdb.


On Wed, 22 Sept 2021 at 11:52, Russ Allbery via tz <tz at iana.org> wrote:
> Over the past few days, I've felt like the framing of the discussion
> hasn't taken into account Paul's clearly expressed desire for the part of
> maintenance he wants to focus on, and has not attempted to incorporate
> that into a design that would preserve other properties that other mailing
> list participants are interested in.  I've also wondered if all parties
> are making unnecessarily strong assumptions about the nature of tz
> maintenance that exclude potentially useful designs.
> In the hope of applying the maxim that all problems in computer science
> can be solved by adding a level of indirection, here's a wild proposal
> that, even if not workable as-is, might help in looking at the discussion
> from a different angle.
> One can think of the tz database as two layers.  The first is a collection
> of rulesets that represent rules for clock changes in particular regions.
> Call that the timekeeping data set.  The second is a many-to-one
> assignment of names to those rulesets.  Call that the naming layer.
> The scheme used for the naming layer attempted to avoid politicization of
> that layer by using the continent and largest city approach.  This was
> largely successful, particularly by the standards of attempts of this
> sort, but not entirely so.
> For years now, the tz project has in essence asked people to treat the
> zone names as opaque identifiers and not imbue them with political
> meaning.  Unfortunately, because those identifiers embed real-world names
> with other meanings in other contexts, I believe this effort is doomed to
> never fully succeed.  The names and spellings of cities are political.
> The choice of continent to which to assign a city can be political.
> Population counts are political.  Readers of the mailing list can fill in
> more examples.
> However, the timekeeping data set, divorced from the naming layer, is as
> close to apolitical as anything involving laws and human practice could
> be.  Putting aside timezone abbreviations, nearly all of the political
> conflict is over the naming layer, not the timekeeping data set.
> I believe Paul has clearly indicated that the part of the work that he
> wants to focus on is maintenance of the timekeeping data set.  I would
> characterize his recent proposed changes as attempts to make the naming
> layer less political to reduce political arguments and thus allow more
> time and attention to be spent on the timekeeping data set, which is where
> the primary value of the project lies.  The stability concerns that have
> prompted most of the recent discussion are almost entirely about the
> naming layer.
> Suppose we resurrect the idea of opaque timezone identifiers.
> Specifically, suppose that we *add* a new, random identifier, something
> like TZ0045 with random digits, to all existing rulesets in either the
> main database or backzone.  These identifiers would be unique identifiers
> for the dataset itself, independent of any other names.  These identifiers
> would immediately have some useful properties:
> 1. Historic times for a given identifier would change only if we
>    discovered that the previous times were clearly erroneous.  Apart from
>    fixing discovered errors, historic times would be stable for any given
>    identifier.
> 2. Looking forward, new identifiers may be added if portions of an
>    existing region diverge in their timekeeping practices or if someone
>    gathers new historical information that would prompt the creation of a
>    new backzone ruleset, but that's the only possible change.  Identifiers
>    will never change or be retired.
> 3. These identifiers carry absolutely no additional political content on
>    top of the rules themselves.  In other words, they add no new political
>    problems not inherent and unavoidable in the data itself.
> Adding these identifiers would nearly double the number of names in the
> current tz database, which is unfortunate, but certainly far less
> disruptive than the sorts of changes that have recently been considered.
> Once these identifiers exist, the combination of those identifiers and the
> timekeeping data set form a nearly apolitical collection of data to which
> a naming layer can be cleanly applied.  One can, for example, define a
> naming layer that exactly corresponds to the naming in use in the previous
> release of the tz database.  With the exception of the implementation
> detail that the previous names become links to a new canonical identifier,
> the combination of that naming layer and that conception of the
> timekeeping data set is functionally identical to the previous tz release
> (except for the normal sorts of modifications for on-the-ground
> timekeeping changes).
> This may sound like a lot of work just to get back to where we already
> are, but with a pile of new, ugly names.  But the point of such a change
> is that it now permits a separation of concerns and even potentially a
> separation of maintenance.
> The timekeeping data set is now a separate artifact that those whose
> primary interest is in timekeeping data can focus on without having to get
> involved in political naming discussions.  It achieves the goal that Paul
> has been working towards (but which is impossible to fully achieve with
> the current naming) of separating the data from political and historical
> decisions about who got a timezone name and who didn't.  And (very slowly,
> of course) there is now the possibility for consumers of the tz database
> to opt out of the naming conventions.  One could, for instance, choose a
> timezone based on selection from a map and have that correspond to the
> unique, permanent timezone identifier.
> Meanwhile, clearly there is a strong interest in the naming layer and a
> strong desire to continue to maintain it along lines that Paul is not
> entirely comfortable with.  Recently, that discussion has focused on
> naming stability, but other parties have expressed other interests in the
> past (adding new spellings of cities, ensuring a name exists for every
> ISO-recognized country, ensuring a name exists for regional capitals that
> are commonly referenced locally as the name for a timezone, etc.).
> Nothing is going to make those discussions go away, as the past many years
> of discussions here have shown, but now they are separable from the
> timekeeping data set and participants can decide which part of the
> maintenance they're interested in.
> If Paul (or any other contributor) wished, he could choose to focus on the
> part of the project that he finds the most interesting and leave
> maintenance of the naming layer largely to other parties.  Given recent
> mailing list traffic, there is obviously substantial interest in that
> naming layer and thus I'm sure there will be no shortage of volunteers to
> help maintain it.  And those who make decisions about the naming layer can
> then also absorb the consequences of those decisions, such as handling
> arguments over the spelling of cities.  It would even be possible
> (although not necessary) to move discussion of the naming layer to a
> separate mailing list to more clearly separate political discussion from
> ruleset maintenance and technical work on the associated code libraries.
> The naming layer, which is now nearly devoid of technical decisions, could
> even be delegated to a more political body that deals with these sorts of
> conflicts constantly and is thus better equipped to handle them than the
> tz mailing list.  Numerous options like that become possible.
> Even if a maintenance split doesn't happen, I think everyone may benefit
> from cleanly separating the spectacularly high-quality resource of
> rulesets and their accompanying exhaustive references, discussion, and
> human-readable descriptions of applicable regions from the politically
> fraught but technically quite small and simple naming layer.
> This idea may not be workable for reasons that aren't obvious to me at
> nearly 4:00am, but hopefully it will at least provide a different angle
> from which to look at the current arguments and possibly achieve some
> clarity about which portions of the overall tz project people are
> interested in working on and where the exact controversy lies.
> --
> Russ Allbery (eagle at eyrie.org)             <https://www.eyrie.org/~eagle/>

More information about the tz mailing list