Question on abbreviations

Paul Schauble Paul.Schauble at ticketmaster.com
Wed Sep 27 22:46:32 UTC 2006


So in this case:
Rule    US    1942    only    -    Feb    9    2:00      1:00    W # War
Rule    US    1945    only    -    Aug    14   23:00u    1:00    P #
Peace

Why is %s undefined in 1943? This was the question that started the
thread. If the time setting carries forward, surely the letter should
also.

    ++PLS


-----Original Message-----
From: tz-request at elsie.nci.nih.gov [mailto:tz-request at elsie.nci.nih.gov]
On Behalf Of Ken Pizzini
Sent: Wednesday, September 27, 2006 3:14 PM
To: tz at lecserver.nci.nih.gov
Subject: Re: Question on abbreviations

On Wed, Sep 27, 2006 at 02:37:58PM -0700, Mark Davis wrote:
> I share your confusion. If Paul (Eggert's) description is right, then
I have
> to ignore the TO field in some circumstances which are entirely
unclear to
> me. I would much rather see the TO field corrected. That is, if
TO=1942 is
> ignored, and 1945 is the real date, then the line should be corrected
to
> TO=1945.

The key to understanding is that the rules describe a list of
*transitions*.

After a transition, the described effect on zone offset and abbreviation
*remain* in effect until the next transition.  The "TO" part of a rule
is
used to enable a shorthand for a _recurring_ transition, such as "first
Tuesday of February", for all years within the range.  If "to" is
"only", then the *transition* being documented is a singleton, but
the transitioned-into offset/abbreviation remains in effect until the
_next_ transition, no matter how far in the future.


> There are other failures in the parsing. My error messages are:
...
> I looked into why this is happening, and found:
> 
> Zone Europe/Amsterdam    0:19:32 -    LMT    1835
>            0:19:32    Neth    %s    1937 Jul  1

> But the first LETTER/S defined by Neth is in 1916, so during the range
from
> 1835 to 1916 this is undefined. If the LETTER/S are magically also
defined
> *before* the first FROM, that should be described in the
specification.

Yes, this is a failure of the documentation.  If a Zone refers to a time
within a Rule that is before the first transition mentioned for that
rule,
then the _oldest_standard_time_ "Letter/s" is used.  In this case, AMT.



> BTW, the documentation was a first a bit confusing to me, since it
says that
> fields are delimited by spaces, and lists a single Zone UNTIL field.
> However, if you look carefully at the documentation, there are really
4
> fields:
> 
> UNTIL_YEAR UNTIL_IN UNTIL_ON UNTIL_AT
> 
> which are optional [but only in "truncation" from the end: that is, it
> corresponds to the (Perl) regex (UNTIL_YEAR (UNTIL_IN (UNTIL_ON
> (UNTIL_AT)?)?)?)?].
> 
> I'm not the only one to have initially made this mistake: the proposed
XML
> format for the TZ database makes the same mistake.

Confusing: granted.  Whether "Until" is one or multiple fields is a
matter of interpretation.  The _traditional_ understanding is that it
is a *single* "timestamp field" which may happen to have spaces within
it.  BTW the subfields aren't "YEAR IN ON AT", but "YEAR MONTH DAY
TIME".

In this regard, a recent addition to the tzcode tarball is
zoneinfo2tdf.pl,
which translates the more free-with-spaces zone tzdata into a form which
strictly uses a single tab between fields.  This may make life easier
for some by simplifying their parser's requirements.  (Or not.)

		--Ken Pizzini




More information about the tz mailing list