[tz] Rules for TZ+ database
Lester Caine
lester at lsces.co.uk
Thu Sep 5 08:20:55 UTC 2013
First I'd like to thank Kevin Lyda for taking the time to construct the original
tz repository. It is a quite remarkable piece of work. I'm currently looking at
a beautifully documented history of the Theory file from the moment Authur
created it on 16/2/1987 ... 1987-02-16 in modern money ...
The first rule is
+* In SVR2, time display in a process is controlled by the environment
+ variable TZ, which "must be a three-letter time zone name, followed
+ by a munber representing the difference between local time and
+ Greenwich Mean Time in hours, followed by an optional three-letter
+ name for a daylight time zone;" when the optional daylight time zone is
+ present, "standard U.S.A. Daylight Savings Time conversion is applied."
+ This means that SVR2 can't deal with other (for example, Australian)
+ daylight savings time rules, or situations where more than two
+ time zone abbreviations are used in an area.
So it was very much conceived as an 'American' system. Handling other daylight
saving offsets or more complex rules was to difficult at that time so ignored.
I'll fast forward over the changes which introduced long names to replace the
three letter codes, and set up rules about size of population being used to
select them. Politics is nothing new and the entire database is an archive of
POLITICAL decisions. Until 2001 when a commit was added from a certain Paul
Eggert ...
+This naming convention is not intended for use by inexperienced users
+to select TZ values by themselves (though they can of course examine
+and reuse existing settings). Distributors should provide
+documentation and/or a simple selection interface that explains the
+names; see the 'tzselect' program supplied with this distribution for
+one example.
And the new rule of
+ * Uniquely identify every national region where clocks have all
+ agreed since 1970. This is essential for the intended use: static
+ clocks keeping local civil time.
Does not SPECIFICALLY rule out the gathering of pre 1970 data? It just sets the
ground rules for ONE view of the data to be limited. This is about making the
data more accessible and was probably the wrong approach even back then.
I'll skip the 'frivolity' of adding Mars time in 2004, but it does have
relevance to the next step forward. All documented data is valid!
http://lsces.org.uk/hg/tz/rev/8a52c38a5f56 is the change that pulls support for
specifically 32 bit systems, that note had only JUST been added but is the first
reference on scope that I've found? Then finally there is a formal definition of
the SCOPE of the database in http://lsces.org.uk/hg/tz/rev/7c18a2e16943 so up
until that time data HAS been gathered for the whole of time not just post 1970.
I'm more than happy to accept this as the rules for 'TZ' but things have moved
on considerably since 1987 and substantially more material is now available and
easily accessed to provide insight into all of the documented changes to time
standards. So if the 'TZ' distribution is going to ignore that material then we
need to set up a 'TZ+' distribution that actively includes all available
material. It's not a question it's a statement.
Reviewing the Theory file of cause we fall at the first hurdle since the POSIX
time format has difficulty coping with some of the intricacies of time
management. While encoding a daylight saving change in the time is to be
commended, it is only right for a specific period of time. The more accurate
approach is purely to identify a 'set of rules' that can be used. This also then
automatically covers the provision of leap seconds. What I am saying here is
that POSIX is just an implementation detail and while mapping the data to it
does require defining, it's not fundamental to the real data. So lets use a
standard that is better at handling the modern systems.
Obviously I fundamentally disagree with the current scope and that should
probably have been disputed in 2011. If one accepts that POSIX is incapable of
handling the material, then applying one of it's limitations is also wrong?
Since the Calendrical issues section already includes the whole of history, we
do have a good base to build on, and are already discussing what 'Universal
Time' was prior to 1970.
If we can agree that while the validity of some legacy data may be questionably,
it is a fact of life that many of these events happened, me just have some
uncertainty as to when they happened? I think we can also agree that there is
substantial new material which is currently only recorded in notes and which can
only enhance the historic record? I'm under no illusion that the problems of
manipulating this data into a more usable format is substantial, but it is not
impossible.
NAMING 'timezones' is the fundamental problem here, and the "This naming
convention is not intended for use by inexperienced users" is of fundamental
importance here. Mapping rules to usable names is all that is required, so
internally in the data we have something suitable for managing the complexity of
the problem, while outside a naive user punches in an ISO3166 and gets a
timezone suitable for the time frame he is working in. There is a substantial
amount of additional history here that could potentially be exposed, such as,
for example, ISO3166 codes which have been superseded, or even 'old' TZ names
which are now recorded for posterity in archives! The 'backward' file is missing
key information on when a particular 'match' was changed, and while dates are
not particularly critical, there may be fundamental data provided if one of the
legacy names is being reviewed and the CURRENT rule set no longer matches what
was being supplied at that time. With hg (or git) I can call up an historic view
of backward, and then select a view of the data at that time. Something that one
time would have been a substantial amount of work happens in the click of a mouse.
DVCS provides a view of the information in a format that relates to how it was
entered. That data can be viewed in many other ways if we provide the right
tools, that is the key here. 'winnowing' is just one way of manipulating the
data, and another would be 'time slicing', providing a snapshot of all the rules
active at a particular date. Currently there are a much smaller number of rules
active than there were pre-1970, and pre standard time there were an infinite
number due to all the different methods of tracking time. During the transition
from then until now we have some complex information to map, such as French
stations being 5 minutes drift, and UK trains using a different time to the
legal system. But if we are looking at times when those rules apply then we need
to know that there were differences. There was recorded desention to many
summertime changes and this happened in a time where those facts were recorded
so that material needs to be made available once it is established. This cuases
no end of problems in identifying each area affected, but that's why I brought
up the OHM historic mapping. Linking up with that, simple things like historic
changes of country names can link us to the appropriate data. This may well
require the provision of more than one rule, so that is something that needs
handling - with notes if necessary.
I've not got as far as I would like and I need to get on, but there is more than
enough here already.
OK 'TZ' only provides data from 1970 but 'TZ+' would provide all the historic
data including the 'TZ' filtered set and additional data within the TZ timeframe.
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
More information about the tz
mailing list