[tz] [PROPOSED] Support zi parsers that mishandle negative DST offsets

Stephen Colebourne scolebourne at joda.org
Tue Feb 6 15:40:09 UTC 2018


AFAICT, this does not provide a solution to anything, but perhaps I
don't understand it.

Projects like OpenJDK and Joda-Time parse the source files of tzdb.
zic is not used. Make is not run.

Users are encouraged to update the time-zone data themselves:
http://www.joda.org/joda-time/tz_update.html
http://www.threeten.org/threetenbp/update-tzdb.html

Specifically, users are expected to copy across files like "europe",
"northamerica", and "asia".

There is no `pdstdata.zi` file checked in to the source repository.
Nor is there a vanguard/rearguard file. Since zic/make is not run, how
is a downstream consumer going to use them (assuming it were desirable
to do so, which I don't accept).

If the file isn't in tzdata2018c.tar.gz then it effectively doesn't exist.

Stephen


On 30 January 2018 at 08:49, Paul Eggert <eggert at cs.ucla.edu> wrote:
> This is intended to provide a way to support both clients that require
> data to have only positive DST offsets, and clients that do not have
> this restriction.
> * Makefile (XDST, SDST): New macros.
> (TZDATA_ZI_DEPS): Add zidst.awk.
> (DSTDATA_ZI_DEPS): New macro.
> (all): Depend on fulldata.zi and pdstdata.zi.
> (fulldata.zi pdstdata.zi): New rule.
> (tzdata.zi): Use $(XDST)data.zi instead of reading original source.
> (check_zishrink): Check zidst.awk, too.
> (clean): Remove all *.zi files, not just tzdata.zi.
> * NEWS, europe: Mention this.
> * zidst.awk: New file.
> ---
>  Makefile  | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
>  NEWS      | 30 ++++++++++++++++++++++++++++++
>  europe    | 39 ++++++++++++++++++++++-----------------
>  zidst.awk | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 28 deletions(-)
>  create mode 100644 zidst.awk
>
> diff --git a/Makefile b/Makefile
> index 8c84cd9..92ddb80 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -10,6 +10,26 @@ VERSION=     unknown
>  # Email address for bug reports.
>  BUGEMAIL=      tz at iana.org
>
> +# To install the full data, which can contain daylight saving time
> +# offsets that are negative (relative to standard time), use
> +#      XDST=   full
> +# To install data containing only positive daylight saving time
> +# offsets, but otherwise as close to the full data as practical, use
> +#      XDST=   pdst
> +XDST=          pdst
> +# Parsers requiring DST offsets to be positive should use the file
> +# pdstdata.zi, which contains almost all the data of 'africa' etc.,
> +# except with positive DST offsets.  This works around a problem that
> +# was discovered in January 2018 with negative DST in tests for ICU
> +# and OpenJDK.  See:
> +# https://mm.icann.org/pipermail/tz/2018-January/025825.html
> +# https://mm.icann.org/pipermail/tz/2018-January/025822.html
> +# Currently the 'africa' etc. files use pdst form if comments are
> +# ignored, to ease transition for parsers that do not support
> +# negative DST offsets.  This is intended to change to full form at
> +# some point, so that full-featured zi parsers that use the 'africa'
> +# files will get the full data without changing anything.
> +
>  # Change the line below for your time zone (after finding the zone you want in
>  # the time zone files, or adding it to a time zone file).
>  # Alternately, if you discover you've got the wrong time zone, you can just
> @@ -463,7 +483,8 @@ TDATA=              $(YDATA) $(NDATA) $(BACKWARD)
>  ZONETABLES=    zone1970.tab zone.tab
>  TABDATA=       iso3166.tab $(TZDATA_TEXT) $(ZONETABLES)
>  LEAP_DEPS=     leapseconds.awk leap-seconds.list
> -TZDATA_ZI_DEPS=        zishrink.awk version $(TDATA) $(PACKRATDATA)
> +TZDATA_ZI_DEPS=        zidst.awk zishrink.awk version $(TDATA) $(PACKRATDATA)
> +DSTDATA_ZI_DEPS= zidst.awk $(TDATA) $(PACKRATDATA)
>  DATA=          $(TDATA_TO_CHECK) backzone iso3166.tab leap-seconds.list \
>                         leapseconds yearistype.sh $(ZONETABLES)
>  AWK_SCRIPTS=   checklinks.awk checktab.awk leapseconds.awk zishrink.awk
> @@ -500,7 +521,8 @@ VERSION_DEPS= \
>
>  SHELL=         /bin/sh
>
> -all:           tzselect yearistype zic zdump libtz.a $(TABDATA)
> +all:           tzselect yearistype zic zdump libtz.a $(TABDATA) \
> +                 fulldata.zi pdstdata.zi
>
>  ALL:           all date $(ENCHILADA)
>
> @@ -535,11 +557,15 @@ version:  $(VERSION_DEPS)
>                 printf '%s\n' "$$V" >$@.out
>                 mv $@.out $@
>
> -# This file can be tailored by setting BACKWARD, PACKRATDATA, etc.
> -tzdata.zi:     $(TZDATA_ZI_DEPS)
> +# These files can be tailored by setting BACKWARD, PACKRATDATA, etc.
> +fulldata.zi pdstdata.zi: $(DSTDATA_ZI_DEPS)
> +               $(AWK) -v outfile='$@' -f zidst.awk $(TDATA) $(PACKRATDATA) \
> +                 >$@.out
> +               mv $@.out $@
> +tzdata.zi:     $(XDST)data.zi version
>                 version=`sed 1q version` && \
>                   LC_ALL=C $(AWK) -v version="$$version" -f zishrink.awk \
> -                   $(TDATA) $(PACKRATDATA) >$@.out
> +                   $(XDST)data.zi >$@.out
>                 mv $@.out $@
>
>  version.h:     version
> @@ -721,17 +747,32 @@ check_tzs:        $(TZS) $(TZS_NEW)
>  check_web:     tz-how-to.html
>                 $(VALIDATE_ENV) $(VALIDATE) $(VALIDATE_FLAGS) tz-how-to.html
>
> -# Check that tzdata.zi generates the same binary data that its sources do.
> -check_zishrink: tzdata.zi zic leapseconds $(PACKRATDATA) $(TDATA)
> +# The format of the source files, either full or pdst.
> +# Currently they are in pdst format, but this is expected to change.
> +SDST = pdst
> +
> +# Check that zishrink.awk does not alter the data, and that zidst.awk
> +# preserves $(SDST) data.
> +check_zishrink: zic leapseconds $(PACKRATDATA) $(TDATA) \
> +  $(XDST)data.zi tzdata.zi
>                 for type in posix right; do \
> -                 mkdir -p time_t.dir/$$type time_t.dir/$$type-shrunk && \
> +                 mkdir -p time_t.dir/$$type time_t.dir/$$type-$(SDST) \
> +                   time_t.dir/$$type-shrunk && \
>                   case $$type in \
>                     right) leap='-L leapseconds';; \
>                     *) leap=;; \
>                   esac && \
> -                 $(ZIC) $$leap -d time_t.dir/$$type $(TDATA) && \
> -                 $(AWK) '/^Rule/' $(TDATA) | \
> +                 $(ZIC) $$leap -d time_t.dir/$$type $(XDST)data.zi && \
> +                 $(AWK) '/^Rule/' $(XDST)data.zi | \
>                     $(ZIC) $$leap -d time_t.dir/$$type - $(PACKRATDATA) && \
> +                 case $(XDST) in \
> +                   $(SDST)) \
> +                     $(ZIC) $$leap -d time_t.dir/$$type-$(SDST) $(TDATA) && \
> +                     $(AWK) '/^Rule/' $(TDATA) | \
> +                       $(ZIC) $$leap -d time_t.dir/$$type-$(SDST) \
> +                         $(XDST)data.zi && \
> +                     diff -r time_t.dir/$$type time_t.dir/$$type-$(SDST);; \
> +                 esac && \
>                   $(ZIC) $$leap -d time_t.dir/$$type-shrunk tzdata.zi && \
>                   diff -r time_t.dir/$$type time_t.dir/$$type-shrunk || exit; \
>                 done
> @@ -741,7 +782,7 @@ clean_misc:
>                 rm -f core *.o *.out \
>                   date tzselect version.h zdump zic yearistype libtz.a
>  clean:         clean_misc
> -               rm -fr *.dir tzdata.zi tzdb-*/ $(TZS_NEW)
> +               rm -fr *.dir *.zi tzdb-*/ $(TZS_NEW)
>
>  maintainer-clean: clean
>                 @echo 'This command is intended for maintainers to use; it'
> diff --git a/NEWS b/NEWS
> index 4f763c0..c455f3c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,6 +2,36 @@ News for the tz database
>
>  Unreleased, experimental changes
>
> +  Briefly:
> +  Support zi parsers that mishandle negative DST offsets
> +
> +  Changes to build procedure
> +
> +    The new XDST macro in the Makefile lets the installer choose
> +    XDST=full, which allows arbitrary DST offsets in the data, or
> +    XDST=pdst, which allows only positive DST offsets.  Choosing
> +    XDST=full is arguably more correct for Ireland, which observes
> +    Irish Standard Time (IST, UTC+01) in summer and GMT (UTC) in
> +    winter.  Choosing XDST=pdst is better for zoneinfo parsers that do
> +    not work well with negative DST offsets, notably OpenJDK+CLDR.
> +    On platforms using tzcode or similar APIs, XDST should not affect
> +    any behavior other than that depending on the tm_isdst flag.
> +
> +    For now this change does not affect client-visible behavior by
> +    default, as the Makefile defaults to XDST=pdst and uncommented
> +    parts of the data source files contain only pdst-format data.
> +    After a bit of time for testing, XDST=full and full-format source
> +    files are planned to become the default, so that parsers that
> +    support negative DST offsets can get full data without changing
> +    their build procedures.  Parsers requiring positive DST offsets
> +    should use the new file pdstdata.zi instead of tzdata.zi or the
> +    source files 'africa' etc.: pdstdata.zi is pdst-compatible, it is
> +    automatically built from the data source files, and it will
> +    continue to be pdst-compatible regardless of XDST.  To get
> +    full-format data now, use the new file fulldata.zi, which will
> +    continue to be full-format regardless of XDST.  To get the format
> +    selected by XDST, use tzdata.zi.
> +
>    Changes to code
>
>      The code is a bit more portable to MS-Windows.  (Thanks to Manuela
> diff --git a/europe b/europe
> index 6c1ccbe..5aeda33 100644
> --- a/europe
> +++ b/europe
> @@ -508,11 +508,27 @@ Link      Europe/London   Europe/Jersey
>  Link   Europe/London   Europe/Guernsey
>  Link   Europe/London   Europe/Isle_of_Man
>
> -# From Paul Eggert (2018-01-19):
> +# From Paul Eggert (2018-01-30):
> +# In January 2018 we discovered that the negative DST offsets in the
> +# Eire rules cause problems with tests for ICU:
> +# https://mm.icann.org/pipermail/tz/2018-January/025825.html
> +# and with tests for OpenJDK:
> +# https://mm.icann.org/pipermail/tz/2018-January/025822.html
> +# To work around this problem, zidst.awk translates the following data
> +# lines into two forms.  First, fulldata.zi contains the full data,
> +# which includes negative DST offsets.  Second, pdstdata.zi uses a
> +# traditional approximation for Irish time stamps after 1971-10-31
> +# 02:00 UTC; although this approximation has tm_isdst flags that are
> +# the reverse of the full data, its UTC offsets are correct and this
> +# suffices for ICU and OpenJDK.  Although this source file currently
> +# has pdstdata.zi lines active and fulldata.zi lines commented out,
> +# this is intended to change in the near future and downstream code
> +# should not rely on it.
> +#
>  # The following is like GB-Eire and EU, except with standard time in
>  # summer and negative daylight saving time in winter.
> -# Although currently commented out, this will need to become uncommented
> -# once the ICU/OpenJDK workaround is removed; see below.
> +# This rule set is active in fulldata.zi and is commented out in
> +# pdstdata.zi.
>  # Rule NAME    FROM    TO      TYPE    IN      ON      AT      SAVE    LETTER/S
>  #Rule  Eire    1971    only    -       Oct     31       2:00u  -1:00   GMT
>  #Rule  Eire    1972    1980    -       Mar     Sun>=16  2:00u  0       IST
> @@ -533,24 +549,13 @@ Zone      Europe/Dublin   -0:25:00 -      LMT     1880 Aug  2
>                          0:00   1:00    IST     1947 Nov  2  2:00s
>                          0:00   -       GMT     1948 Apr 18  2:00s
>                          0:00   GB-Eire GMT/IST 1968 Oct 27
> -# From Paul Eggert (2018-01-18):
> -# The next line should look like this:
> +# The next line is active in fulldata.zi and commented out in pdstdata.zi.
>  #                       1:00   Eire    IST/GMT
> -# However, in January 2018 we discovered that the Eire rules cause
> -# problems with tests for ICU:
> -# https://mm.icann.org/pipermail/tz/2018-January/025825.html
> -# and with tests for OpenJDK:
> -# https://mm.icann.org/pipermail/tz/2018-January/025822.html
> -# To work around this problem, use a traditional approximation for
> -# time stamps after 1971-10-31 02:00 UTC, to give ICU and OpenJDK
> -# developers breathing room to fix bugs.  This approximation has
> -# correct UTC offsets, but results in tm_isdst flags are the reverse
> -# of what they should be.  This workaround is temporary and should be
> -# removed reasonably soon.
> +# These three lines are active in pdstdata.zi and commented out in
> +# fulldata.zi.
>                          1:00   -       IST     1971 Oct 31  2:00u
>                          0:00   GB-Eire GMT/IST 1996
>                          0:00   EU      GMT/IST
> -# End of workaround for ICU and OpenJDK bugs.
>
>
>  ###############################################################################
> diff --git a/zidst.awk b/zidst.awk
> new file mode 100644
> index 0000000..7885e9a
> --- /dev/null
> +++ b/zidst.awk
> @@ -0,0 +1,50 @@
> +# Convert tzdata source into full or positive-DST form
> +
> +# Contributed by Paul Eggert.  This file is in the public domain.
> +
> +# This is not a general-purpose converter; it is designed for current tzdata.
> +#
> +# When converting to full form, the output can use negative DST offsets.
> +#
> +# When converting to positive-DST form, the output uses only positive
> +# DST offsets.  The idea is for the output data to simulate the
> +# behavior of the input data as best it can within the constraints of
> +# positive DST offsets.
> +#
> +# In the input, lines requiring the full format are commented #[full]
> +# and the positive DST near-equivalents are commented #[pdst].
> +
> +BEGIN {
> +  dst_type["full"] = 1
> +  dst_type["pdst"] = 1
> +
> +  # The command line should set OUTFILE to the name of the output file,
> +  # which should start with either "full" or "pdst".
> +  todst = substr(outfile, 1, 4)
> +  if (!dst_type[todst]) exit 1
> +}
> +
> +/^Zone/ { zone = $2 }
> +
> +{
> +  in_comment = /^#/
> +
> +  # Test whether this line should differ between the full and the pdst versions.
> +  Rule_Eire = /^#?Rule[\t ]+Eire[\t ]/
> +  Zone_Dublin_post_1968 \
> +    = (zone == "Europe/Dublin" && /^#?[\t ]+[01]:00[\t ]/ \
> +       && (!$(in_comment + 4) || 1968 < $(in_comment + 4)))
> +
> +  # If so, uncomment the desired version and comment out the undesired one.
> +  if (Rule_Eire || Zone_Dublin_post_1968) {
> +    if ((Rule_Eire \
> +        || (Zone_Dublin_post_1968 && $(in_comment + 3) == "IST/GMT"))  \
> +       == (todst == "full")) {
> +      sub(/^#/, "")
> +    } else if (/^[^#]/) {
> +      sub(/^/, "#")
> +    }
> +  }
> +}
> +
> +{ print }
> --
> 2.14.3
>


More information about the tz mailing list