[tz] [PROPOSED] Add interoperability sections to tzfile.5

Paul Eggert eggert at cs.ucla.edu
Thu Oct 25 07:51:50 UTC 2018

This change consists of text that I contributed to
and that should be useful generally.
* NEWS, tzfile.5: New sections.
 NEWS     |   4 ++
 tzfile.5 | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 943db3e..68f826e 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,10 @@ Unreleased, experimental changes
     This reverts to 2011h, as the abbreviation change in 2011i was
     likely inadvertent.
+  Changes to documentation
+    tzfile.5 has new sections on interoperability issues.
 Release 2018f - 2018-10-18 00:14:18 -0700
diff --git a/tzfile.5 b/tzfile.5
index 79b19bf..bbdccfc 100644
--- a/tzfile.5
+++ b/tzfile.5
@@ -9,6 +9,8 @@ tzfile \- timezone information
 .de q
+.ie \n(.g .ds - \f(CW-\fP
+.el ds - \-
 The timezone information files used by
 .BR tzset (3)
 are typically found under a directory with a name like
@@ -186,8 +188,196 @@ from 0 through 24.
 Second, DST is in effect all year if it starts
 January 1 at 00:00 and ends December 31 at 24:00 plus the difference
 between daylight saving and standard time.
+.SS Interoperability considerations
 Future changes to the format may append more data.
+Version 1 files are considered a legacy format and
+should be avoided, as they do not support transition
+times after the year 2038.
+Readers that only understand Version 1 must ignore
+any data that extends beyond the calculated end of the version
+1 data block.
+Writers should generate a version 3 file if
+TZ string extensions are necessary to accurately
+model transition times.
+Otherwise, version 2 files should be generated.
+The sequence of time changes defined by the version 1
+header and data block should be a contiguous subsequence
+of the time changes defined by the version 2+ header and data
+block, and by the footer.
+This guideline helps obsolescent version 1 readers
+agree with current readers about timestamps within the
+contiguous subsequence.  It also lets writers not
+supporting obsolescent readers use a
+.I tzh_timecnt
+of zero
+in the version 1 data block to save space.
+Time zone designations should consist of at least three (3)
+and no more than six (6) ASCII characters from the set of
+.q "-",
+.q "+".
+This is for compatibility with POSIX requirements for
+time zone abbreviations.
+When reading a version 2 or 3 file, readers
+should ignore the version 1 header and data block except for
+the purpose of skipping over them.
+Readers should calculate the total lengths of the
+headers and data blocks and check that they all fit within
+the actual file size, as part of a validity check for the file.
+.SS Common interoperability issues
+This section documents common problems in reading or writing TZif files.
+Most of these are problems in generating TZif files for use by
+older readers.
+The goals of this section are:
+.IP * 2
+to help TZif writers output files that avoid common
+pitfalls in older or buggy TZif readers,
+.IP *
+to help TZif readers avoid common pitfalls when reading
+files generated by future TZif writers, and
+.IP *
+to help any future specification authors see what sort of
+problems arise when the TZif format is changed.
+When new versions of the TZif format have been defined, a
+design goal has been that a reader can successfully use a TZif
+file even if the file is of a later TZif version than what the
+reader was designed for.
+When complete compatibility was not achieved, an attempt was
+made to limit glitches to rarely-used timestamps, and to allow
+simple partial workarounds in writers designed to generate
+new-version data useful even for older-version readers.
+This section attempts to document these compatibility issues and
+workarounds, as well as to document other common bugs in
+Interoperability problems with TZif include the following:
+.IP * 2
+Some readers examine only version 1 data.
+As a partial workaround, a writer can output as much version 1
+data as possible.
+However, a reader should ignore version 1 data, and should use
+version 2+ data even if the reader's native timestamps have only
+32 bits.
+.IP *
+Some readers designed for version 2 might mishandle
+timestamps after a version 3 file's last transition, because
+they cannot parse extensions to POSIX in the TZ-like string.
+As a partial workaround, a writer can output more transitions
+than necessary, so that only far-future timestamps are
+mishandled by version 2 readers.
+.IP *
+Some readers designed for version 2 do not support
+permanent daylight saving time, e.g., a TZ string
+.q "EST5EDT,0/0,J365/25"
+denoting permanent Eastern Daylight Time (\*-04).
+As a partial workaround, a writer can substitute standard time
+for the next time zone east, e.g.,
+.q "AST4"
+for permanent Atlantic Standard Time (\*-04).
+.IP *
+Some readers ignore the footer, and instead predict future
+timestamps from the time type of the last transition.
+As a partial workaround, a writer can output more transitions
+than necessary.
+.IP *
+Some readers do not use time type 0 for timestamps before
+the first transition, in that they infer a time type using a
+heuristic that does not always select time type 0.
+As a partial workaround, a writer can output a dummy (no-op)
+first transition at an early time.
+.IP *
+Some readers mishandle timestamps before the first
+transition that has a timestamp not less than -2**31.
+Readers that support only 32-bit timestamps are likely to be
+more prone to this problem, for example, when they process
+64-bit transitions only some of which are representable in 32
+As a partial workaround, a writer can output a dummy
+transition at timestamp \*-2**31.
+.IP *
+Some readers mishandle a transition if its timestamp has
+the minimum possible signed 64-bit value.
+Timestamps less than \*-2**59 are not recommended.
+.IP *
+Some readers mishandle POSIX-style TZ strings that
+.q "<"
+.q ">".
+As a partial workaround, a writer can avoid using
+.q "<"
+.q ">"
+for time zone abbreviations containing only alphabetic
+.IP *
+Many readers mishandle time zone abbreviations that contain
+non-ASCII characters.
+These characters are not recommended.
+.IP *
+Some readers may mishandle time zone abbreviations that
+contain fewer than 3 or more than 6 characters, or that
+contain ASCII characters other than alphanumerics,
+.q "-",
+.q "+".
+These abbreviations are not recommended.
+.IP *
+Some readers mishandle TZif files that specify
+daylight-saving time UT offsets that are less than the UT
+offsets for the corresponding standard time.
+These readers do not support locations like Ireland, which
+uses the equivalent of the POSIX TZ string
+.q "IST\*-1GMT0,M10.5.0,M3.5.0/1",
+observing standard time
+(IST, +01) in summer and daylight saving time (GMT, +00) in winter.
+As a partial workaround, a writer can output data for the
+equivalent of the POSIX TZ string
+.q "GMT0IST,M3.5.0/1,M10.5.0",
+thus swapping standard and daylight saving time.
+Although this workaround misidentifies which part of the year
+uses daylight saving time, it records UT offsets and time zone
+abbreviations correctly.
+Some interoperability problems are reader bugs that
+are listed here mostly as warnings to developers of readers.
+.IP * 2
+Some readers do not support negative timestamps.
+Developers of distributed applications should keep this
+in mind if they need to deal with pre-1970 data.
+.IP *
+Some readers mishandle timestamps before the first
+transition that has a nonnegative timestamp.
+Readers that do not support negative timestamps are likely to
+be more prone to this problem.
+.IP *
+Some readers mishandle time zone abbreviations like
+.q "-08"
+that contain
+.q "+",
+.q "-",
+or digits.
+.IP *
+Some readers mishandle UT offsets that are out of the
+traditional range of \*-12 through +12 hours, and so do not
+support locations like Kiritimati that are outside this
+.IP *
+Some readers mishandle UT offsets in the range [\*-3599, \*-1]
+seconds from UT, because they integer-divide the offset by
+3600 to get 0 and then display the hour part as
+.q "+00".
+.IP *
+Some readers mishandle UT offsets that are not a multiple
+of one hour, or of 15 minutes, or of 1 minute.
 .BR time (2),
 .BR localtime (3),

More information about the tz mailing list