[tz] Java & Rearguard

Paul Eggert eggert at cs.ucla.edu
Sat Jun 8 19:20:35 UTC 2019

Steve Summit wrote:
> I'm not sure that's an entirely fair challenge, though.
> Given that (as I understand it) Java and ICU/CLDR use tt_isdst
> to decide whether to display their equivalents of "GMT" or "IST",
> I don't think they *can*  get the right answer near 1970

Yes, Ireland in 1970 is an "unfair" challenge. That was its point. It was 
intended to illustrate the inadequacy of the current CLDR/Java model to 
represent real-world aspects of civil timekeeping.

> tzdb changed its mind about the mapping at that point.

I'm not sure what you mean by "mapping", but the 2018a change to Irish data was 
in response to a bug report about Irish time, a bug report that was investigated 
and found to be valid. Since tzdb can represent the Irish data as per Irish law 
and common use, the change was warranted from the tzdb point of view. And since 
Java's TZUpdater program currently rejects the changed data, I developed a 
'rearguard' option to tzdb that lossfully converts the main-format tzdata into a 
rearguard format that pacifies TZUpdater.

However, even with the rearguard option (and even if we go back to circa 2017 
code and data before this latest kerfuffle started), CLDR+Java cannot handle 
Irish time correctly for past timestamps due to what appear to be shortcomings 
in its model. This problem is not limited to Irish time; it also occurs for time 
in Los Angeles during World War II (see example below) and in several other 
areas, including Morocco right this minute and quite possibly in North America
and Europe in the near future.

   $ jshell
   |  Welcome to JShell -- Version 12.0.1
   |  For an introduction type: /help intro

   jshell> var jan1943 = java.time.Instant.ofEpochSecond(-852051600)
   jan1943 ==> 1943-01-01T07:00:00Z

   jshell> var zone = java.time.ZoneId.of("America/Los_Angeles")
   zone ==> America/Los_Angeles

   jshell> var dtf = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd 
HH:mm:ss Z z (zzzz)")
   dtf ==> Value(YearOfEra,4,19,EXCEEDS_PAD)'-'Value(MonthOf ... RT)' 

   jshell> jan1943.atZone(zone).format(dtf)
   $4 ==> "1943-01-01 00:00:00 -0700 PDT (Pacific Daylight Time)"

   $ TZ=America/Los_Angeles date -d at -852051600 +"%s %Y-%m-%d %H:%M:%S %z %Z"
   -852051600 1943-01-01 00:00:00 -0700 PWT

Near the end of the example above, Java says "PDT" where tzdb says "PWT", 
because Java can't handle PWT.

> Now, it's true, isdst might not be the best key to use for this
> sort of thing any more.  Do we have recommendations for what
> projects like Java and ICU/CLDR ought to be keying off of,
> if not isdst? (I suppose tt_abbrind, or more likely the actual
> string it indexes, might be better.)

I'm afraid they will need to solve this problem largely on their own, as one 
cannot look at tzdata and automatically derive strings like "Pacific War Time" 
or "Central Africa Ramadan Time": those strings are not in the data (not even in 
English), and there are no numeric equivalents either. The only 
partially-relevant info in tzdata consists of abbreviations like "IST" and "PDT" 
and unfortunately these abbreviations are well-documented to be ambiguous and 
historically inaccurate in some cases.

It should be possible for CLDR+Java to develop reasonably-reliable heuristics 
for guessing what string to use in some cases. For example, they could have a 
heuristic that "IST" means "India Standard Time" in Asia/Kolkata, "Israel 
Standard Time" in Asia/Gaza, Asia/Hebron and Asia/Jeruslaem, "Irish Summer Time" 
in Ireland before 1968-10-27, and "Irish Standard Time" in Ireland starting 
1968-10-27. Similar heuristics could be used for other abbreviations, and if 
CLDR+Java tune the heuristics enough they'd be accurate. However, they'd have to 
do most of this work on their own. For example tzdb does not have an alphabetic 
abbreviation for the current time in Morocco (+00, a 1-hour negative DST where 
standard time is +01), so CLDR would have to invent an abbreviation there 
(presumably something like "Central Africa Ramadan Time" in English) and base 
its use on a heuristic like "when Africa/Casablanca is at +00 in the year 2019 
or later, its time zone abbreviation is 'Central Africa Ramadan Time'".

More information about the tz mailing list