[tz] Proposed reversions, for moving forward

Robert Elz kre at munnari.OZ.AU
Sun Aug 3 23:32:13 UTC 2014


    Date:        Sat, 02 Aug 2014 18:35:58 -0700
    From:        Paul Eggert <eggert at cs.ucla.edu>
    Message-ID:  <53DD91FE.90405 at cs.ucla.edu>

  | Of course it's not 100% right.  We don't have reliable information about 
  | old timestamps in Ghana.  It would be better if we didn't have to put 
  | this low-quality data into the tz database at all.  The only reason it's 
  | there is that the format requires it.

Not really, since time_t's are (generally) signed, and these days, 64 bits,
localtime() can be asked to convert any time going back to (way) before the
big bang and is expected to produce some kind of meaningful results.

Of course, since our data (currently anyway) assumes that the gregorian
calendar existed since (apparently way before) the big bang, and converts
dates based upon that assumption - which means we know it is producing
utter nonsense for anything earlier than the 15 century (or whatever) and
for some parts of the world, up to the early 20th century.

But we have to produce something - nonsense or not nonsense, what matters
is that there is (at least for reasonable dates, say back a few thousand
years BC or so) we get some kind of reasonably stable (and comparable)
results, not just whatever random value happens to seem convenient today.

  | Much of the pre-1970 data falls into this category, unfortunately.

Yes, it does, and if you really wanted to get rid of all unverified data,
you'd remove all of it (from all zones) - the format requires that something
be there, not any particular transitions.   Just removing isolated segments
of that unverified data looks wrong.

  | When the quality is this bad, there's nothing wrong with improving the 
  | quality even if the result is not perfect, or with removing bad data if 
  | this can be done without significantly affecting end-user applications.

No, there wouldn't be if there was known bad data.  But that's not what any
of this is - no-one has a problem with correcting data that is known to be
incorrect.  The problem is that that is not what any of this is.  It isn't
bad data, it is just data that we do not know is correct, and we guess might
not be perfect.  That guess might be right - or it might not be, that's the
point - it is possible that (just by chance) you're removing some good data
and replacing it with bad data.   You don't know, I don't know.

I'd suggest just putting everything back, keep the results stable (if they're
wrong at least they're the same wrong today as yesterday) and just replace data
when it is known incorrect.   If you're not going to do that, then at least
do it properly, and delete all of the unverified data - ALL of it.

kre



More information about the tz mailing list