[tz] An alternate framing of timezone maintenance
Russ Allbery
eagle at eyrie.org
Wed Sep 22 10:52:24 UTC 2021
Over the past few days, I've felt like the framing of the discussion
hasn't taken into account Paul's clearly expressed desire for the part of
maintenance he wants to focus on, and has not attempted to incorporate
that into a design that would preserve other properties that other mailing
list participants are interested in. I've also wondered if all parties
are making unnecessarily strong assumptions about the nature of tz
maintenance that exclude potentially useful designs.
In the hope of applying the maxim that all problems in computer science
can be solved by adding a level of indirection, here's a wild proposal
that, even if not workable as-is, might help in looking at the discussion
from a different angle.
One can think of the tz database as two layers. The first is a collection
of rulesets that represent rules for clock changes in particular regions.
Call that the timekeeping data set. The second is a many-to-one
assignment of names to those rulesets. Call that the naming layer.
The scheme used for the naming layer attempted to avoid politicization of
that layer by using the continent and largest city approach. This was
largely successful, particularly by the standards of attempts of this
sort, but not entirely so.
For years now, the tz project has in essence asked people to treat the
zone names as opaque identifiers and not imbue them with political
meaning. Unfortunately, because those identifiers embed real-world names
with other meanings in other contexts, I believe this effort is doomed to
never fully succeed. The names and spellings of cities are political.
The choice of continent to which to assign a city can be political.
Population counts are political. Readers of the mailing list can fill in
more examples.
However, the timekeeping data set, divorced from the naming layer, is as
close to apolitical as anything involving laws and human practice could
be. Putting aside timezone abbreviations, nearly all of the political
conflict is over the naming layer, not the timekeeping data set.
I believe Paul has clearly indicated that the part of the work that he
wants to focus on is maintenance of the timekeeping data set. I would
characterize his recent proposed changes as attempts to make the naming
layer less political to reduce political arguments and thus allow more
time and attention to be spent on the timekeeping data set, which is where
the primary value of the project lies. The stability concerns that have
prompted most of the recent discussion are almost entirely about the
naming layer.
Suppose we resurrect the idea of opaque timezone identifiers.
Specifically, suppose that we *add* a new, random identifier, something
like TZ0045 with random digits, to all existing rulesets in either the
main database or backzone. These identifiers would be unique identifiers
for the dataset itself, independent of any other names. These identifiers
would immediately have some useful properties:
1. Historic times for a given identifier would change only if we
discovered that the previous times were clearly erroneous. Apart from
fixing discovered errors, historic times would be stable for any given
identifier.
2. Looking forward, new identifiers may be added if portions of an
existing region diverge in their timekeeping practices or if someone
gathers new historical information that would prompt the creation of a
new backzone ruleset, but that's the only possible change. Identifiers
will never change or be retired.
3. These identifiers carry absolutely no additional political content on
top of the rules themselves. In other words, they add no new political
problems not inherent and unavoidable in the data itself.
Adding these identifiers would nearly double the number of names in the
current tz database, which is unfortunate, but certainly far less
disruptive than the sorts of changes that have recently been considered.
Once these identifiers exist, the combination of those identifiers and the
timekeeping data set form a nearly apolitical collection of data to which
a naming layer can be cleanly applied. One can, for example, define a
naming layer that exactly corresponds to the naming in use in the previous
release of the tz database. With the exception of the implementation
detail that the previous names become links to a new canonical identifier,
the combination of that naming layer and that conception of the
timekeeping data set is functionally identical to the previous tz release
(except for the normal sorts of modifications for on-the-ground
timekeeping changes).
This may sound like a lot of work just to get back to where we already
are, but with a pile of new, ugly names. But the point of such a change
is that it now permits a separation of concerns and even potentially a
separation of maintenance.
The timekeeping data set is now a separate artifact that those whose
primary interest is in timekeeping data can focus on without having to get
involved in political naming discussions. It achieves the goal that Paul
has been working towards (but which is impossible to fully achieve with
the current naming) of separating the data from political and historical
decisions about who got a timezone name and who didn't. And (very slowly,
of course) there is now the possibility for consumers of the tz database
to opt out of the naming conventions. One could, for instance, choose a
timezone based on selection from a map and have that correspond to the
unique, permanent timezone identifier.
Meanwhile, clearly there is a strong interest in the naming layer and a
strong desire to continue to maintain it along lines that Paul is not
entirely comfortable with. Recently, that discussion has focused on
naming stability, but other parties have expressed other interests in the
past (adding new spellings of cities, ensuring a name exists for every
ISO-recognized country, ensuring a name exists for regional capitals that
are commonly referenced locally as the name for a timezone, etc.).
Nothing is going to make those discussions go away, as the past many years
of discussions here have shown, but now they are separable from the
timekeeping data set and participants can decide which part of the
maintenance they're interested in.
If Paul (or any other contributor) wished, he could choose to focus on the
part of the project that he finds the most interesting and leave
maintenance of the naming layer largely to other parties. Given recent
mailing list traffic, there is obviously substantial interest in that
naming layer and thus I'm sure there will be no shortage of volunteers to
help maintain it. And those who make decisions about the naming layer can
then also absorb the consequences of those decisions, such as handling
arguments over the spelling of cities. It would even be possible
(although not necessary) to move discussion of the naming layer to a
separate mailing list to more clearly separate political discussion from
ruleset maintenance and technical work on the associated code libraries.
The naming layer, which is now nearly devoid of technical decisions, could
even be delegated to a more political body that deals with these sorts of
conflicts constantly and is thus better equipped to handle them than the
tz mailing list. Numerous options like that become possible.
Even if a maintenance split doesn't happen, I think everyone may benefit
from cleanly separating the spectacularly high-quality resource of
rulesets and their accompanying exhaustive references, discussion, and
human-readable descriptions of applicable regions from the politically
fraught but technically quite small and simple naming layer.
This idea may not be workable for reasons that aren't obvious to me at
nearly 4:00am, but hopefully it will at least provide a different angle
from which to look at the current arguments and possibly achieve some
clarity about which portions of the overall tz project people are
interested in working on and where the exact controversy lies.
--
Russ Allbery (eagle at eyrie.org) <https://www.eyrie.org/~eagle/>
More information about the tz
mailing list