Question on abbreviations

Thu Sep 28 05:04:52 UTC 2006

    Date:        Wed, 27 Sep 2006 17:38:45 -0700
    From:        Ken Pizzini <tz. at explicate.org>
    Message-ID:  <20060928003845.GA17660 at 866863.msa.explicate.org>

  | I'll make an attempt at making the text clearer...  but then again,
  | since I understood the original text and you found it misleading,
  | perhaps you'd like to take a stab at clarifying it?

I suspect the problem (like a lot of things that lead to ambiguities)
is that it all depends upon yor state of mind when you start (what you
already believe is true).

If you have it in mind that the rules are defining periods during which
a particular offset from UTC (and a particular abbreviation) applies,
then you're likely to read the text in an entirely different way than
if you start out believing that what is being defined is a set of points
at which the offset frm UTC (and/or the associated abbreviation alters).

For anyone who thinks carefully about it, the first of those two is
clearly not rational, for example, consider the following two lines (rules)
from some version or other of the australasia file (this might not be
current, it's just a version I had conveniently lying around)

Rule    AT      1991    1999    -       Oct     Sun>=1  2:00s   1:00    -
Rule    AT      1991    max     -       Mar     lastSun 2:00s   0       -

If you believe the "specifies a range of times during which an offset
applies" is the correct interpretation, then the first of those rules says
that from some Sunday early Oct 1991 (the 6th it happened to be) until
Some Sunday, early oct 1999, the ofset from UTC (for Tasmania) should have
been +11:00 (the base offset is +10:00).

The second rule says that from some Sunday late March 1991 (the 31st that
year), into the unknown indefinite future, the offset from UTC is +10:00.

What that would have to mean is that all during 1992, 1993, ... there
were two offsets defined to run concurrently.   That would be absurd,
so, proof by contradiction (reductio ad absurdum -- or something like that) 
the original hypothesis must be incorrect.

On the other hand, if you treat the rules as simply saying

Mar 31 1991 (02:00s) change offset to 10:00
Oct 6 1991 (02:00s) change offset to 11:00
Mar 29 1992 (02:00s) change offset to 10:00
Oct 4 1992 (02:00s) change offset to 11:00
Mar 28 1993 (02:00s) change offset to 10:00
Oct 3 1993 (02:00s) change offset to 11:00
(etc)

That is, as a shorthand notation from writing all of that out (which would
also be possible, of course), then it all fits perfectly well, we have the
transitions, and the offset (and abbreviation) between any two transitions,
and the offset (& abbreviation) after the last transition, and even what
applies before the first transition is all trivial to obtain.

If the zic.8 text needs clarifying, perhaps what is needed is not any kind
of change to the text that has been suggested, but to make it quite clear
that a list of transitions is what is being specified, not a list of
ranges of times (those of us who have "grown up" alongside the development
of the database simply know this, but it is apparently not as clear to
those who have started looking at it more recently).

On spaces separating fields, I suspect the answer is that it all works
the same way as the (unix shell) read command - white space separates fields,
until we have as many fields as we need - after that all the rest of the
input line (including anything which would otherwise be a separator) all
just gets included in the value of the final field.

So, to the unix shell

	echo a b c | read var

puts (aside from problems of using "read" from a pipe) "a b c" into var.

	echo a b c | read v1 v2

puts "a" into v1, and "b c" into v2, and

	echo a b c | read v1 v2 v3 v4

puts "a" into v1, "b" into v2 ,"c" into v3 and "" (empty) into v4.

In the database source format, the "until" is the final field (would be the
last name on the "read" command, if there were one), so if this parsing method
is assumed, then the "spaces in until field are OK" all just works out...

It is also certainly not really harder to parse, the parsing method simply
finds the first N fields (delimited by white space) and leaves whatever is
left over (if anything) as the final field - that's trivial to code.
Explaining it is also not really difficult either - though perhaps a few
extra words making it clear that the line has a fixed maximum number of
fields, and any excess data is all part of the final field (white space
included).

kre