[tz] zdump new option -i for easier-to-review output

Paul Eggert eggert at cs.ucla.edu
Sun May 29 18:49:46 UTC 2016

Jon Skeet wrote:
> I'd be perfectly happy with zdump gaining more display options, but I think
> there's still huge benefit in deciding on one *canonical* format for
> validation.

I looked into the format you suggested, along with the other comments noted and 
formats I've seen elsewhere (e.g., Shanks), and came up with the attached 
proposal for a "canonical" -i format for zdump, with the design goals being a 
format that is unambiguous, easy to review, and compact. Although this format's 
columns don't always line up, in general aligning columns appears to be 
impractical (in the extreme case, year numbers might exceed 9999!), and I found 
that unaligned columns make it easier to see glitches anyway. The proposed -i 
format does not contain versioning information as that would complicate 
regression testing.

For what it's worth, the -i format is about 10% the size of -v format, and is 
about 53% the size of the format you proposed.

This proposal is incomplete, for several reasons. First, it doesn't address leap 
seconds. Second, it doesn't abbreviate predicted futures into POSIX TZ strings; 
fixing this would make the output significantly shorter. Third, there is no 
infrastructure for verifying a distribution by checksumming its zdump -i output. 
So the proposal is documented as being experimental in the attached patch, and I 
haven't installed it on github yet. Of course zdump -v has all these problems as 
well, so the proposal format wouldn't make these problems worse.

The first attachment consists of the revised man-page output; the second 
attachment is the change to tzcode.
-------------- next part --------------
ZDUMP(8)                    System Manager's Manual                   ZDUMP(8)

       zdump - time zone dumper

       zdump [ option ... ] [ zonename ... ]

       Zdump prints the current time in each zonename named on the command

       These options are available:

              Output version information and exit.

       -i     (This option is experimental: its behavior may change in future
              versions.)  Output a description of time intervals.  For each
              zonename on the command line, output an interval-format
              description of the zone.  See "INTERVAL FORMAT" below.

       -v     Output a verbose description of time intervals.  For each
              zonename on the command line, print the time at the lowest
              possible time value, the time one day after the lowest possible
              time value, the times both one second before and exactly at each
              detected time discontinuity, the time at one day less than the
              highest possible time value, and the time at the highest
              possible time value.  Each line is followed by isdst=D where D
              is positive, zero, or negative depending on whether the given
              time is daylight saving time, standard time, or an unknown time
              type, respectively.  Each line is also followed by gmtoff=N if
              the given local time is known to be N seconds east of Greenwich.

       -V     Like -v, except omit the times relative to the extreme time
              values.  This generates output that is easier to compare to that
              of implementations with different time representations.

       -c [loyear,]hiyear
              Cut off interval output at the given year(s).  Cutoff times are
              computed using the proleptic Gregorian calendar with year 0 and
              with Universal Time (UT) ignoring leap seconds.  The lower bound
              is exclusive and the upper is inclusive; for example, a loyear
              of 1970 excludes a transition occurring at 1970-01-01 00:00:00
              UTC but a hiyear of 1970 includes the transition.  The default
              cutoff is -500,2500.

       -t [lotime,]hitime
              Cut off interval output at the given time(s), given in decimal
              seconds since 1970-01-01 00:00:00 Coordinated Universal Time
              (UTC).  The zonename determines whether the count includes leap
              seconds.  As with -c, the cutoff's lower bound is exclusive and
              its upper bound is inclusive.

       This format is experimental: it may change in future versions.

       The interval format is a compact text representation that is intended
       to be both human- and machine-readable.  It consists of a first line
       "TZ=string" where string is a double-quoted string giving the zone
       name, a second line "- - interval" describing the time interval before
       the first transition if any, and zero or more following lines "date
       time interval", one line for each transition time and following
       interval.  Fields are separated by single spaces.

       Dates are in yyyy-mm-dd format and times are in 24-hour hh:mm:ss format
       where hh<24.  Times are in local time immediately after the transition.
       A time interval description consists of a UT offset in signed +-hhmmss
       format, a time zone abbreviation, and an isdst flag.  An abbreviation
       that equals the UT offset is omitted; other abbreviations are double-
       quoted strings unless they consist of one or more alphabetic
       characters.  An isdst flag is omitted for standard time, and otherwise
       is a decimal integer that is unsigned and positive (typically 1) for
       daylight saving time and negative for unknown.

       In times and in UT offsets with absolute value less than 100 hours, the
       seconds are omitted if they are zero, and the minutes are also omitted
       if they are also zero.  Positive UT offsets are east of Greenwich.  The
       UT offset -00 denotes a UT placeholder in areas where the actual offset
       is unspecified; by convention, this occurs when the UT offset is zero
       and the time zone abbreviation begins with "-" or is "zzz".

       In double-quoted strings, escape sequences represent unusual
       characters.  The escape sequences are \s for space, and \", \\, \f, \n,
       \r, \t, and \v with their usual meaning in the C programming language.
       E.g., the double-quoted string ""CET\s\"\\"" represents the character
       sequence "CET "\".

       Here is an example:

         - - -103126 LMT
         1896-01-13 12:01:26 -1030 HST
         1933-04-30 03 -0930 HDT 1
         1933-05-21 11 -1030 HST
         1942-02-09 03 -0930 HDT 1
         1945-09-30 01 -1030 HST
         1947-06-08 02:30 -10 HST

       Here, local time begins 10 hours, 31 minutes and 26 seconds west of UT,
       and is a standard time abbreviated LMT.  Immediately after the first
       transition, the date is 1896-01-13 and the time is 12:01:26, and the
       following time interval is 10.5 hours west of UT, a standard time
       abbreviated HST.  Immediately after the second transition, the date is
       1933-04-30 and the time is 03:00:00 and the following time interval is
       9.5 hours west of UT, is abbreviated HDT, and is daylight saving time.
       Immediately after the last transition the date is 1947-06-08 and the
       time is 02:30:00, and the following time interval is 10 hours west of
       UT, a standard time abbreviated HST.

       Here are excerpts from another example:

         - - +031212 LMT
         1924-04-30 23:47:48 +03
         1930-06-21 01 +04
         1981-04-01 01 +05 1
         1981-09-30 23 +04
         2014-10-26 01 +03
         2016-03-27 03 +04

       This time zone is east of UT, so its UT offsets are positive.  Also,
       many of its time zone abbreviations omitted since they duplicate the
       text of the UT offset.

       If multiple zones are present, their representations are separated by
       empty lines.

       Time discontinuities are found by sampling the results returned by
       localtime at twelve-hour intervals.  This works in all real-world
       cases; one can construct artificial time zones for which this fails.

       In the -v and -V output, "UT" denotes the value returned by gmtime(3),
       which uses UTC for modern time stamps and some other UT flavor for time
       stamps that predate the introduction of UTC.  No attempt is currently
       made to have the output use "UTC" for newer and "UT" for older time
       stamps, partly because the exact date of the introduction of UTC is

       newctime(3), tzfile(5), zic(8)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-New-option-i-for-zdump.patch
Type: text/x-diff
Size: 24005 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/tz/attachments/20160529/931f2c18/0001-New-option-i-for-zdump-0001.patch>

More information about the tz mailing list