[tz] Dealing with Pre-1970 Data

Guy Harris guy at alum.mit.edu
Sun Sep 1 20:51:23 UTC 2013


On Sep 1, 2013, at 12:27 PM, Lester Caine <lester at lsces.co.uk> wrote:

> Guy Harris wrote:
>> 
>> As far as*I'm*  concerned, anything having to do with non-standardized time, such as LMT and local apparent time, is, and should always be, out of scope.  People who need LMT, or local apparent time, can, and must, calculate it themselves.
> 
> My problem with that statement is ensuring that the 'calculations' match with those used to create the values IN the database.

I assume "the values in the database" are the values in Zone lines such as

	Zone America/New_York -4:56:02 - LMT 1883 Nov 18 12:03:58

as those are the *only* values that can be calculated with a simple formula.  All the other values come from records of what governing bodies have decided to specify as standardized time; there's no formula to calculate or predict *that*. :-)

If we're going to continue to even *have* lines giving local mean solar time in the database, and want to document where the values used come from, what we would do is

	put in Theory the formula used to calculate LMT, which is almost certainly going to be the one given by David Patte:

		LMT at any longitude is a simple offset calculation from GMT. LMT = GMT + Longitude (as HMS)

	   which, of course, does not at all involve the equation of time, as it's *mean* solar time, not *apparent* solar time;

and possibly

	put in the comments for each Zone entry with an LMT line an indication of the longitude used to calculate the LMT offset for that Zone entry, although *that* can easily be calculated from the offset from GMT.

> As a minimum where a value has come from needs to be documented, and essentially that uses the equation I'm looking for,

Look in David Patte's e-mail.

> and a defined location.

Convert the offset in clock hours/minutes/seconds to an offset in degrees/minutes/seconds (15 degrees of longitude = 1 hour).

> Following on from that, indicating that some amendment from this base is in place and can be trusted would be useful.

To what sort of amendments are you referring?  LMT is LMT, i.e. "1 hour for every 15 degrees of longitude from the prime meridian, and similar treatment of minutes and seconds of longitude".  There's nothing to amend there.

> I'm more than happy that the data returned for the UK is all correct, and I can adjust the Isle of Man record if I need to, but how do I assess the accuracy of other historic data?

LMT isn't historic data, it's calculated data.  (No, we're not going to take into account pre-1970 changes of day length, etc..)

To assess the accuracy of the tzdb's historic data about *standardized* times, you're going to have to, err, umm, dig through historical records and see whether they record what time zones were established, what offsets from GMT/UTC were established for them, what changes (if any) were made to those offsets over time, what daylight savings time rules were in effect at what points in time, and whether that agrees with what's in the tzdb.

> The only think I am told is correct is post 1970 information.

We don't provide an absolutely certain guarantee of *that* - we might get informed at some point that East Erewhon briefly introduced Daylight Savings Time in 1974, and have to update the Zone entry for its zone to include lines for that - but we're more confident of the post-1970 information.

What the top-of-trunk Theory file says, in the "scope of the tz database" section, right now is:

	The tz database attempts to record the history and predicted future of
	all computer-based clocks that track civil time.  To represent this
	data, the world is partitioned into regions whose clocks all agree
	about time stamps that occur after the somewhat-arbitrary cutoff point
	of the POSIX Epoch (1970-01-01 00:00:00 UTC).  For each such region,
	the database records all known clock transitions, and labels the region
	with a notable location.

	Clock transitions before 1970 are recorded for each such location,
	because most POSIX-compatible systems support negative time stamps and
	could misbehave if data were omitted for pre-1970 transitions.
	However, the database is not designed for and does not suffice for
	applications requiring accurate handling of all past times everywhere,
	as it would take far too much effort and guesswork to record all
	details of pre-1970 civil timekeeping.  The pre-1970 data in this
	database covers only a tiny sliver of how clocks actually behaved;
	the vast majority of the necessary information was lost or never
	recorded, and much of what little remains is fabricated.
	Although 1970 is a somewhat-arbitrary cutoff, there are significant
	challenges to moving the cutoff back even by a decade or two, due to
	the wide variety of local practices before computer timekeeping
	became prevalent.

	Local mean time (LMT) offsets are recorded in the database only
	because the format requires an offset.  They should not be considered
	meaningful, and should not prompt creation of zones merely because two
	locations differ in LMT.  Historically, not only did different
	locations in the same zone typically use different LMT offsets, often
	different people in the same location maintained mean-time clocks that
	differed significantly, many people used solar or some other time
	instead of mean time, and standard time often replaced LMT only gradually
	at each location.  As for leap seconds, we don't know the history
	of earth's rotation accurately enough to map SI seconds to historical
	solar time to more than about one-hour accuracy; see Stephenson FR
	(2003), Historical eclipses and Earth's rotation, A&G 44: 2.22-2.27
	<http://dx.doi.org/10.1046/j.1468-4004.2003.44222.x>.

	As noted in the README file, the tz database is not authoritative
	(particularly not for pre-1970 time stamps), and it surely has errors.
	Corrections are welcome and encouraged.  Users requiring authoritative
	data should consult national standards bodies and the references cited
	in the database's comments.

What it said as of about three years ago was:

	The tz database attempts to record the history and predicted future of 
	all computer-based clocks that track civil time.  To represent this 
	data, the world is partitioned into regions whose clocks all agree 
	about time stamps that occur after the somewhat-arbitrary cutoff point 
	of the POSIX Epoch (1970-01-01 00:00:00 UTC).  For each such region, 
	the database records all known clock transitions, and labels the region 
	with a notable location.

	Clock transitions before 1970 are recorded for each such location, 
	because most POSIX-compatible systems support negative time stamps and 
	could misbehave if data were omitted for pre-1970 transitions.
	However, the database is not designed for and does not suffice for 
	applications requiring accurate handling of all past times everywhere, 
	as it would take far too much effort and guesswork to record all 
	details of pre-1970 civil timekeeping.

	As noted in the README file, the tz database is not authoritative 
	(particularly not for pre-1970 time stamps), and it surely has errors.
	Corrections are welcome and encouraged.  Users requiring authoritative 
	data should consult national standards bodies and the references cited 
	in the database's comments.

The key thing to note in both versions, when it comes to 1970, is

	Clock transitions before 1970 are recorded for each such location, 
	because most POSIX-compatible systems support negative time stamps and 
	could misbehave if data were omitted for pre-1970 transitions.
	However, the database is not designed for and does not suffice for 
	applications requiring accurate handling of all past times everywhere, 
	as it would take far too much effort and guesswork to record all 
	details of pre-1970 civil timekeeping.

The second sentence in that paragraph describes what the difference is between "pre-1970" and "post-1970".

The last paragraph:

	As noted in the README file, the tz database is not authoritative
	(particularly not for pre-1970 time stamps), and it surely has errors.
	Corrections are welcome and encouraged.  Users requiring authoritative
	data should consult national standards bodies and the references cited
	in the database's comments.

is also important.  Note that it says "*particularly not* for pre-1970 time stamps", not "*only* for pre-1970 time stamps"; the difference between pre-1970 and post-1970 isn't "unreliable and incomplete vs. reliable and complete", it's "less reliable and complete vs. more reliable and complete".  (It's also not, as the "Clock transitions before 1970..." paragraph indicates, "completely absent vs. partially or completely present".)


More information about the tz mailing list