Time zone: the next generation

Mon Mar 7 18:25:17 UTC 2005

On Mon, Mar 07, 2005 at 11:36:19AM +0000, Clive D.W. Feather wrote:
> > Thinking about the broader problem a little more, perhaps it would
> > make sense to use XML for the run-time format?
> 
> Very definitely. Looking at your (elided) example I can see several places
> where a nested structure would be preferable, and once you've gone that way
> you might as well do XML.
> 
> > The bad thing is that it would either add an
> > external dependency to the code, or require that we bundle a parser.
> 
> If you assume that the incoming files are lexically correct, a parser is
> actually pretty simple.

On further reflection I'm less convinced that XML is directly
useful (though I'm not opposed to using it for secondary reasons of
interchange with other applications, if someone wants to argue that
case), as I belatedly recall exactly what zic is currently doing and
realize that most of the complexity I was contemplating just doesn't
need to be there: for all dates in the past zic knows the precise
timestamp to use for each transition (to the best of the knowlege
encoded in the tzdata files).  It is only for specifying the last
pair (or larger set?) of "until max" rules that an algorithmic
representation in the run-time data makes any potential sense.  And
as Robert Elz is pointing out, the case can be made that precomputing
estimated transition rules for N years into the future of a given
zic run is probably good enough.

So, based on the discussion so far and further reflection, I see the
following points for "TZ-ng":

  * The tzfile format is basically sound.  Suggested extensions:
    . widen timestamps to 64 bits, of course;
    . add one (or a few?) versioning field(s) --- while the tzh_magic
      field with a different TZ_MAGIC should be adequate for
      "version of tzfile", it'd be nice to record something
      of the character "compiled by tzcode-2004a/zic from
      tzdata-2005d/africa";
    . add a "time reference" field --- have the file document whether
      the transitions are on a TAI ("right") or a UTC ("posix") clock,
      for example (see my "wish-list" item below for another potential
      class of values);
    . add support for additional "optional" extension data --- the
      code written such that it will ignore unknown extensions.
      One idea for such a future extension is to include polygon
      data describing the geographic region covered by the zone.
      (I'm not sure that such data really belongs in tzfiles,
      but I'm also not completely convinced that it doesn't.
      The issue is that the name of the zone is mostly arbitrary;
      it is the spatial and temporal boundaries that really identifies
      a zone.)

  * The complexity of interpreting rules on different calendars is
    all pushed into the preprocessing done by zic; the run-time code
    need not know anything about them.  (Current needs include Gregorian,
    Hebrew, and Persian.  Future needs might include Islamic,
    Eastern Orthodox (like Gregorian, but with different "multiples of 100"
    rules), Chinese, and Japanese, but we should wait until such a need
    actually arises before worrying about them.)  [Did any country which
    used the Julian calendar in the last 100 years or so (e.g., Tsarist
    Russia) ever observe daylight saving transitions based on that
    system of dates?]

    Adding such support can be made at any convenient time, before or
    after the switch to 64-bit timestamps in tzfile; in the interim
    we'll just continue to use the work-around currently employed for
    Iran and Israel: embed a bunch of special-case entries in the
    tzdata source, based on external conversion to Gregorian dates.

  * The run-time APIs in this implementation should continue to be
    limited to the (proleptic) Gregorian calendar, (the one which is
    mandated by the C and POSIX APIs) (no externally visible change).

    Though I still slightly favor the ability to expose a Julian day
    ("modified" or not), in light of the above am also willing to say
    that applications which wish to work with dates in non-Gregorian
    calendars can just base their interconversions on the
    (tm_year,tm_yday) pair instead.  Such applications as can handle
    things like Sweden's multiple transitions to the Gregorian calendar
    or the calendrical chaos in Rome around the time of Julius Caesar's
    reign, or the Mayan calendar, or the World calendar, or any other
    manner of ways that the days have been marked (actual or proposed)
    in different places and times are quite welcome, but outside the
    scope of this project.

An item that is still on my personal wish-list (but I'm now questioning
whether the complexity is justified) is support for "zoneless" times
based on local sun (real/apparent, and/or mean).  I mentioned Saudi
Arabia in my earlier posting, but really my interest is for times in the
pre-standard-time past, and perhaps as a sane "best guess" for dates
between the "N years into the future" cut-off and such time as our
projections of earth rotation become notably inaccurate (by which I
mean, the usefulness of the guess goes down as the error bars on the
projection expand; the code can probably be left to blithely calculate
"local time" beyond the heat-death/big-crunch/whatever of the universe).
Like the addition of support for non-Gregorian calendars in zic, this
can mostly be deferred as something independent of the redefinition of
the tzfile format.  The only support that might be helpful is a means to
annotate "use sun angle at meridian N" (and whether that is real or mean
sun) as an alternative to "UTC" or "TAI".  (Or in addition to:
have the code fall back to sun time when the date is outside of
the range of years covered by zone information?)

An "it might be nice" item that is neither strongly required,
nor particularly hard to provide, is a tzdata-to-XML translator.
This probably should have options to either output what is essentially
tzfile data in XML format, or to re-interpret the tzdata files in
XML form.  The main justification for this is that it would make it
easier for other applications to import our hard-won data without
having to build custom parsers or tzfile readers.  I'm also curious
as to whether an XML based variant of the tzdata file would be any
easier to use/edit/maintain, if someone else is motivated to do the
experiment (my guess is that it would not be, which is why I'm not
making the effort myself).

Cheers,
		--Ken Pizzini