[tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change

Thu Jan 25 01:28:32 UTC 2018

Guy Harris <guy at alum.mit.edu> wrote on 01/24/2018 05:17:24 PM:

> From: Guy Harris <guy at alum.mit.edu>
> To: Yoshito Umaoka <yoshito_umaoka at us.ibm.com>
> Cc: Stephen Colebourne <scolebourne at joda.org>, Time Zone Mailing 
> List <tz at iana.org>, tz <tz-bounces at iana.org>
> Date: 01/24/2018 05:17 PM
> Subject: Re: [tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change
> 
> On Jan 24, 2018, at 1:17 PM, Yoshito Umaoka <yoshito_umaoka at us.ibm.com> 
wrote:
> 
> > CLDR XML (or JSON) data is consumed by other projects such as ICU 
> and Java, and these external projects know those offsets.
> > CLDR only specifies daylight saving time name used for Europe/
> Dublin is "Irish Standard Time".
> > ICU/Java imports zoneinfo from tz database, and obtain offset at a
> given time, then decide whether it's in standard time or daylight time.
> 
> The tz binary database has, for all transition times, an indication 
> of whether, after the transition, you are in "DST".  If the tz 
> binary database is what Java time zone code imports, it doesn't need
> to look at offsets to determine whether the times are standard or 
> "DST", it can just use those values.  (I say "DST" because that's 
> used to set tm_isdst.)
> 

I cannot speak for Java.
ICU does not use the tz binaries - ICU generates own binary resources
for tzdata source files. The information equivalent to tm_isdst is stored
in the ICU binary format. In addition to this, ICU also store raw-offset
and DST saving amount, that is not available in the tz binaries. ICU 
preserve
the information for supporting some legacy APIs - getRawOffset, etc..

> It does *not* contain any offsets other than, for each transition, 
> what the offset from UTC is.  Thus, it provides no notion of "raw-
> offset" vs. "actual-offset", and you can't determine both a "raw-
> offset" and an "actual-offset" from the tz binary database without 
> either 1) additional data or 2) some possibly-incorrect assumptions 
> being made, such as "the only reason why an entry in the table of 
> transitions has a different tt_gmtoff value is that the transition 
> represents starting or ending DST" (that latter assumption has been 
> false for a very very very very very long time for some tzdb 
> regions, as a given region might switch from one time zone to another).
> 
> The tzdb *source* files, however, give the "standard" offset from 
> UTC in zone lines and the "amount to save", to be added to the 
> "standard" offset, in rule lines, so code that parses those files 
> independently, rather than relying on the binary files produced by 
> zic parsing the files, can get both the "standard" and "current" 
> offset from UTC.

Correct. As I explained above, ICU modified zic also store raw (standard)
offset and DST amount.

> 
> Which of those two things does the Java code that "imports zoneinfo 
> from tz database do?  Does it read the binary data, or independently
> read the source data (or read binary data produced by a parser otherthan 
zic)?

CLDR and ICU are two separate projects, although CLDR was originally a
part of ICU project historically.

Our biggest issue with the change in 2018a/b was not actually negative
DST offset. The bigger issue is swapping standard/daylight saving names.
(Although, it's still a problem to adopt such rule, because we have a bug
in our code invalidating the negative DST saving amount in all ICU
versions released in the past, and need to distribute a patch to handle
such case.)

At this moment, the TZ database project does nothing with i18n. Names
used for displaying time zones are pretty much US centric. But there are
many other external projects that want to utilize the rules for clock
changes. CLDR is trying to provide localized expression of time zone names
in various different languages.

CLDR sets an assumption that name of zones are very stable. For example,
"Pacific Standard Time" represents standard time used on US Pacific coast
and the name itself does not change time to time.

However, transition rules are changing much more frequently, thus there
are many releases of new tz database.

To localize time zone display name, CLDR needs to assign a unique key to
each translatable text. And CLDR uses a combination of zone ID and
standard/daylight difference.

Because names are assumed as very stable, a consumer of CLDR usually does
not provide a mechanism to distribute updated names.

Of course, if CLDR and ICU are one project and data is only consumed by
ICU, then it's relatively easy to adopt such change. We just need to
update zone name data and code handling the clock at the same time.

But they are two separate projects, and CLDR is consumed by numbers of
other projects, that does not have any controls for clock calculation.
So such change could easily break downstream consumers, who utilizes the
TZ database.

I'm not sure what we want to do in CLDR if this change is brought back
to the TZ database at this moment. CLDR technical committee may decide
not to make corresponding change, instead, we might just change the
definition of keys assigned to each zone name strings.

Thanks,
Yoshito (ICU/CLDR)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20180124/a18c5f54/attachment-0001.html>