[tz] Java & Rearguard
Paul Eggert
eggert at cs.ucla.edu
Sat Jun 8 19:20:35 UTC 2019
Steve Summit wrote:
> I'm not sure that's an entirely fair challenge, though.
> Given that (as I understand it) Java and ICU/CLDR use tt_isdst
> to decide whether to display their equivalents of "GMT" or "IST",
> I don't think they *can* get the right answer near 1970
Yes, Ireland in 1970 is an "unfair" challenge. That was its point. It was
intended to illustrate the inadequacy of the current CLDR/Java model to
represent real-world aspects of civil timekeeping.
> tzdb changed its mind about the mapping at that point.
I'm not sure what you mean by "mapping", but the 2018a change to Irish data was
in response to a bug report about Irish time, a bug report that was investigated
and found to be valid. Since tzdb can represent the Irish data as per Irish law
and common use, the change was warranted from the tzdb point of view. And since
Java's TZUpdater program currently rejects the changed data, I developed a
'rearguard' option to tzdb that lossfully converts the main-format tzdata into a
rearguard format that pacifies TZUpdater.
However, even with the rearguard option (and even if we go back to circa 2017
code and data before this latest kerfuffle started), CLDR+Java cannot handle
Irish time correctly for past timestamps due to what appear to be shortcomings
in its model. This problem is not limited to Irish time; it also occurs for time
in Los Angeles during World War II (see example below) and in several other
areas, including Morocco right this minute and quite possibly in North America
and Europe in the near future.
$ jshell
| Welcome to JShell -- Version 12.0.1
| For an introduction type: /help intro
jshell> var jan1943 = java.time.Instant.ofEpochSecond(-852051600)
jan1943 ==> 1943-01-01T07:00:00Z
jshell> var zone = java.time.ZoneId.of("America/Los_Angeles")
zone ==> America/Los_Angeles
jshell> var dtf = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd
HH:mm:ss Z z (zzzz)")
dtf ==> Value(YearOfEra,4,19,EXCEEDS_PAD)'-'Value(MonthOf ... RT)'
''('ZoneText(FULL)')'
jshell> jan1943.atZone(zone).format(dtf)
$4 ==> "1943-01-01 00:00:00 -0700 PDT (Pacific Daylight Time)"
jshell>
$ TZ=America/Los_Angeles date -d at -852051600 +"%s %Y-%m-%d %H:%M:%S %z %Z"
-852051600 1943-01-01 00:00:00 -0700 PWT
Near the end of the example above, Java says "PDT" where tzdb says "PWT",
because Java can't handle PWT.
> Now, it's true, isdst might not be the best key to use for this
> sort of thing any more. Do we have recommendations for what
> projects like Java and ICU/CLDR ought to be keying off of,
> if not isdst? (I suppose tt_abbrind, or more likely the actual
> string it indexes, might be better.)
I'm afraid they will need to solve this problem largely on their own, as one
cannot look at tzdata and automatically derive strings like "Pacific War Time"
or "Central Africa Ramadan Time": those strings are not in the data (not even in
English), and there are no numeric equivalents either. The only
partially-relevant info in tzdata consists of abbreviations like "IST" and "PDT"
and unfortunately these abbreviations are well-documented to be ambiguous and
historically inaccurate in some cases.
It should be possible for CLDR+Java to develop reasonably-reliable heuristics
for guessing what string to use in some cases. For example, they could have a
heuristic that "IST" means "India Standard Time" in Asia/Kolkata, "Israel
Standard Time" in Asia/Gaza, Asia/Hebron and Asia/Jeruslaem, "Irish Summer Time"
in Ireland before 1968-10-27, and "Irish Standard Time" in Ireland starting
1968-10-27. Similar heuristics could be used for other abbreviations, and if
CLDR+Java tune the heuristics enough they'd be accurate. However, they'd have to
do most of this work on their own. For example tzdb does not have an alphabetic
abbreviation for the current time in Morocco (+00, a 1-hour negative DST where
standard time is +01), so CLDR would have to invent an abbreviation there
(presumably something like "Central Africa Ramadan Time" in English) and base
its use on a heuristic like "when Africa/Casablanca is at +00 in the year 2019
or later, its time zone abbreviation is 'Central Africa Ramadan Time'".
More information about the tz
mailing list