[tz] data uncertainty
Zefram
zefram at fysh.org
Tue Sep 10 11:37:43 UTC 2013
Paul Eggert wrote:
>The same goes for the transition from LMT to standard time,
>which is often not reliably known even to the nearest decade
>(even though the data format requires that it be specified
>to one-second precision!). But how would that be modeled
>so that the caller of localtime could find this out?
Theoretically... we have two types of uncertainty: quantitative ("LMT in
Zurich is within 17 seconds of UT+00:34:08") and qualitative ("in 1860
Zurich was probably using Bern Mean Time but we don't have any actual
record of that"). (My +/- 17 seconds on LMT is based on an estimate
of Zurich's longitudinal extent.) Instead of passing bare numbers back
and forth across the API, we'd have an extended version of the API that
accepts and returns structured pseudo-number objects that represent
the uncertainty. The historical localtime() API itself can't represent
the uncertainty, and would likely be reimplemented as a wrapper for the
extended API that strips off all the uncertainty tags.
Quantitative uncertainty is readily modelled by using interval arithmetic.
Instead of any single number you have a structure giving lower and upper
bounds. An interval structure represents a quantity that could have any
value within the interval. An exact value corresponds to an interval
in which the lower and upper bounds are equal. Arithmetic operations
on intervals produce interval results; for example, [2, 4] * [3, 5] =
[6, 20]. Note that interval arithmetic doesn't obey numerical arithmetic
identities.
So if you ask to convert 1812-06-18T12:00:00Z to Zurich time,
the answer should be an interval structure, [1812-06-18T12:33:51,
1812-06-18T12:34:25], encompassing the local times that Zurich could
have for that UT time.
Qualitative uncertainty requires a different kind of structure, mainly
containing a best-guess value (which might itself be an interval structure
representing quantitative uncertainty) and some indicator of the quality
of the guess. Arithmetic operations on guesses produce results with the
lowest quality of the inputs; for example, {2, probable} * {3, wild-ass
guess} = {6, wild-ass guess}. This too doesn't obey numerical arithmetic
identities. This kind of structure could be extended to also include
secondary possible values ("if it's not BMT then it's probably LMT").
So if you ask to convert 1862-06-18T12:00:00Z to Zurich time, the answer
should be a qualitative-uncertainty structure, {1862-06-18T12:29:44,
probable}, indicating the level of our confidence that Zurich was using
BMT by that date.
The case you raise, of Zurich's adoption of BMT, is slightly more
complicated than the above examples. You've expressed it in the form
of a quantitative uncertainty in a threshold date, but localtime() (even
in the extended form that I imagine) doesn't let you directly ask about
that threshold. For the questions you can ask through the API, that
quantitative uncertainty in the threshold translates into a qualitative
uncertainty about which of two offsets applies during the interval of
possible thresholds. Maybe how far through the threshold interval we are
dictates the confidence level with which we adopt one offset or the other,
expressing the confidence level as a numerical probability. But as the
process of adopting BMT probably didn't take the form of a flag day,
maybe a different arrangement of confidence levels is more appropriate.
The extended API would have to be supported by extended tzfiles that
represent the uncertainty in the data. The simple way to do this is to
directly replace the tzfile's numerical quantities with these interval
and uncertainty structures. Then ordinary-looking arithmetic inside the
extended localtime() operates on these structures to produce the right
uncertain results.
I wrote about these issues in my paper for the 2013 Future of
UTC conference, the bulk of which sketches out a time-handling
API to cleanly handle all the awkward cases. Preprint at
<http://www.fysh.org/~zefram/time/prog_on_time_scales.pdf>; uncertainty
is discussed from the bottom of page 9 to top of page 12.
I must stress, this is at present totally theoretical. That API sketch
is a long-term goal, not a near-term programming project. In fact,
I've found existing programming environments somewhat inadequate for
the job; I think I need to develop a new programming *language* before
I can properly tackle it.
-zefram
More information about the tz
mailing list