[tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change

Tue Jan 23 18:42:29 UTC 2018

On 22 January 2018 at 19:18, Paul Eggert <eggert at cs.ucla.edu> wrote:
> On 01/22/2018 10:47 AM, Stephen Colebourne wrote:
>> This will happen in the perfect scenario where everything is updated
>> at once. The problem is that the perfect scenario is the anomaly. In
>> many perfectly reasonable scenarios, one will be updated without the
>> other being updated, causing nonsense output.
>
> Could you be more precise about what the "one" is, and what the "other" is?

Remember, there are 2 different data elements here:
- tzdb data
- CLDR-driven text data

Java time-zone data is updated using the tzupdater tool
http://www.oracle.com/technetwork/java/javase/tzupdater-readme-136440.html.
This will update the tzdb data, but not the CLDR-driven data that
drives the text. Were the change to proceed, anyone running tzupdater
with the Ireland change would invert the meaning of inDaylightTime()
and access the wrong array element in the CLDR-driven data - a bug.
And code changes don't help, as we'll see below.

> If we could fix
> future Java implementations to support negative DST offsets, it appears that
> we could remove some of these longstanding minor discrepancies between Java
> and other implementations.

There is no possible fix to Java, as this is primarily an issue
between CLDR and TZDB. The two have a subtle API linkage which has
perhaps never been clearly spelled out here.

CLDR provides textual names for time-zones, as an array [winter,
summer]. As a much larger project with considerable history the order
of that array is not going to change. (I'm using winter and summer for
CLDR for this email to aid clarity, they refer to them as standard and
daylight).

TZDB provides the offsets, SAVE values and a short text string. This
text string - GMT/IST or IST/GMT - is not directly linkable to the
data CLDR provides. Although it may seem that you can use the text
from TZDB as a key to lookup the correct value in CLDR, I know from
painful experience that approach fails (as the TZDB text varies over
time, has the same text in winter and summer, or isn't even text).
Thus, the only reliable way to pick which piece of CLDR data is needed
is from the offsets.

For 20 years, this has been done in a simple and straightforward way -
if (raw-offset != actual-offset) then CLDR uses summer text and array
element 1. This provides the necessary glue to link the two projects:

boolean inSummerTime(instant) {
  return getRawOffset(instant) != getActualOffset(instant)
}
zoneName = inSummerTime(instant) ? cldr-summer-time-text : cldr-winter-time-text

TZDB has always had the raw and actual offsets the same in winter and
different in summer, so this has always worked. It has become the API
between the two projects without anyone really noticing.

The Ireland proposal breaks this, with (raw-offset != actual-offset)
meaning winter, instead of summer. It is fair for TZDB to complain
that CLDR is inflexible with its definitions, but the reality is that
this was and is the only way to connect two separately developed
projects (where API stability is vital).

In order for TZDB and CLDR to co-exist, it is *required* that the raw
offset equals the actual offset in winter, and that they differ in
summer. This fact *requires* positive SAVE values and blocks negative
ones.

This isn't a change that can be delayed for a year. This
interpretation of inSummerTime() relies on positive SAVE values, and
is part of the public API of TZDB just as much as the source code file
format is. In fact, it is the only way that TZDB and CLDR communicate.

In summary, negative SAVE values break the long-standing API with
CLDR, and thus break any project that relies on both, such as Java.
Negative SAVE value simply cannot exist without breaking the much
broader ecosystem of which TZDB is only a very small part. Its time to
close the door on negative SAVE values in TZDB permanently.

Stephen