[tz] Rules for TZ+ database

Thu Sep 5 08:20:55 UTC 2013

First I'd like to thank Kevin Lyda for taking the time to construct the original 
tz repository. It is a quite remarkable piece of work. I'm currently looking at 
a beautifully documented history of the Theory file from the moment Authur 
created it on 16/2/1987 ... 1987-02-16 in modern money ...

The first rule is
+*	In SVR2, time display in a process is controlled by the environment
+	variable TZ, which "must be a three-letter time zone name, followed
+	by a munber representing the difference between local time and
+	Greenwich Mean Time in hours, followed by an optional three-letter
+	name for a daylight time zone;" when the optional daylight time zone is
+	present, "standard U.S.A. Daylight Savings Time conversion is applied."
+	This means that SVR2 can't deal with other (for example, Australian)
+	daylight savings time rules, or situations where more than two
+	time zone abbreviations are used in an area.

So it was very much conceived as an 'American' system. Handling other daylight 
saving offsets or more complex rules was to difficult at that time so ignored.

I'll fast forward over the changes which introduced long names to replace the 
three letter codes, and set up rules about size of population being used to 
select them. Politics is nothing new and the entire database is an archive of 
POLITICAL decisions. Until 2001 when a commit was added from a certain Paul 
Eggert ...

+This naming convention is not intended for use by inexperienced users
+to select TZ values by themselves (though they can of course examine
+and reuse existing settings).  Distributors should provide
+documentation and/or a simple selection interface that explains the
+names; see the 'tzselect' program supplied with this distribution for
+one example.

And the new rule of
+ * Uniquely identify every national region where clocks have all
+   agreed since 1970.  This is essential for the intended use: static
+   clocks keeping local civil time.
Does not SPECIFICALLY rule out the gathering of pre 1970 data? It just sets the 
ground rules for ONE view of the data to be limited. This is about making the 
data more accessible and was probably the wrong approach even back then.

I'll skip the 'frivolity' of adding Mars time in 2004, but it does have 
relevance to the next step forward. All documented data is valid!
http://lsces.org.uk/hg/tz/rev/8a52c38a5f56 is the change that pulls support for 
specifically 32 bit systems, that note had only JUST been added but is the first 
reference on scope that I've found? Then finally there is a formal definition of 
the SCOPE of the database in http://lsces.org.uk/hg/tz/rev/7c18a2e16943 so up 
until that time data HAS been gathered for the whole of time not just post 1970.

I'm more than happy to accept this as the rules for 'TZ' but things have moved 
on considerably since 1987 and substantially more material is now available and 
easily accessed to provide insight into all of the documented changes to time 
standards. So if the 'TZ' distribution is going to ignore that material then we 
need to set up a 'TZ+' distribution that actively includes all available 
material. It's not a question it's a statement.

Reviewing the Theory file of cause we fall at the first hurdle since the POSIX 
time format has difficulty coping with some of the intricacies of time 
management. While encoding a daylight saving change in the time is to be 
commended, it is only right for a specific period of time. The more accurate 
approach is purely to identify a 'set of rules' that can be used. This also then 
automatically covers the provision of leap seconds. What I am saying here is 
that POSIX is just an implementation detail and while mapping the data to it 
does require defining, it's not fundamental to the real data. So lets use a 
standard that is better at handling the modern systems.

Obviously I fundamentally disagree with the current scope and that should 
probably have been disputed in 2011. If one accepts that POSIX is incapable of 
handling the material, then applying one of it's limitations is also wrong?

Since the Calendrical issues section already includes the whole of history, we 
do have a good base to build on, and are already discussing what 'Universal 
Time' was prior to 1970.

If we can agree that while the validity of some legacy data may be questionably, 
it is a fact of life that many of these events happened, me just have some 
uncertainty as to when they happened? I think we can also agree that there is 
substantial new material which is currently only recorded in notes and which can 
only enhance the historic record? I'm under no illusion that the problems of 
manipulating this data into a more usable format is substantial, but it is not 
impossible.

NAMING 'timezones' is the fundamental problem here, and the "This naming 
convention is not intended for use by inexperienced users" is of fundamental 
importance here. Mapping rules to usable names is all that is required, so 
internally in the data we have something suitable for managing the complexity of 
the problem, while outside a naive user punches in an ISO3166 and gets a 
timezone suitable for the time frame he is working in. There is a substantial 
amount of additional history here that could potentially be exposed, such as, 
for example, ISO3166 codes which have been superseded, or even 'old' TZ names 
which are now recorded for posterity in archives! The 'backward' file is missing 
key information on when a particular 'match' was changed, and while dates are 
not particularly critical, there may be fundamental data provided if one of the 
legacy names is being reviewed and the CURRENT rule set no longer matches what 
was being supplied at that time. With hg (or git) I can call up an historic view 
of backward, and then select a view of the data at that time. Something that one 
time would have been a substantial amount of work happens in the click of a mouse.

DVCS provides a view of the information in a format that relates to how it was 
entered. That data can be viewed in many other ways if we provide the right 
tools, that is the key here. 'winnowing' is just one way of manipulating the 
data, and another would be 'time slicing', providing a snapshot of all the rules 
active at a particular date. Currently there are a much smaller number of rules 
active than there were pre-1970, and pre standard time there were an infinite 
number due to all the different methods of tracking time. During the transition 
from then until now we have some complex information to map, such as French 
stations being 5 minutes drift, and UK trains using a different time to the 
legal system. But if we are looking at times when those rules apply then we need 
to know that there were differences. There was recorded desention to many 
summertime changes and this happened in a time where those facts were recorded 
so that material needs to be made available once it is established. This cuases 
no end of problems in identifying each area affected, but that's why I brought 
up the OHM historic mapping. Linking up with that, simple things like historic 
changes of country names can link us to the appropriate data. This may well 
require the provision of more than one rule, so that is something that needs 
handling - with notes if necessary.

I've not got as far as I would like and I need to get on, but there is more than 
enough here already.

OK 'TZ' only provides data from 1970 but 'TZ+' would provide all the historic 
data including the 'TZ' filtered set and additional data within the TZ timeframe.

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk