[tz] Java & Rearguard

Sun Jun 9 15:35:10 UTC 2019

> CLDR+Java cannot handle Irish time correctly for past timestamps

We have not seen any demand for names before 1970, thus we haven't designed
for more than two (regular) offsets per year for a given zone. It would not
be hard, however, to add additional offsets, either for historic times, or
if for some reason that becomes fashionable in the future.

(Luckily, the tendency seems to be in the other direction, collapsing from
2 offsets into 1.)

Mark

On Sat, Jun 8, 2019 at 9:20 PM Paul Eggert <eggert at cs.ucla.edu> wrote:

> Steve Summit wrote:
> > I'm not sure that's an entirely fair challenge, though.
> > Given that (as I understand it) Java and ICU/CLDR use tt_isdst
> > to decide whether to display their equivalents of "GMT" or "IST",
> > I don't think they *can*  get the right answer near 1970
>
> Yes, Ireland in 1970 is an "unfair" challenge. That was its point. It was
> intended to illustrate the inadequacy of the current CLDR/Java model to
> represent real-world aspects of civil timekeeping.
>
> > tzdb changed its mind about the mapping at that point.
>
> I'm not sure what you mean by "mapping", but the 2018a change to Irish
> data was
> in response to a bug report about Irish time, a bug report that was
> investigated
> and found to be valid. Since tzdb can represent the Irish data as per
> Irish law
> and common use, the change was warranted from the tzdb point of view. And
> since
> Java's TZUpdater program currently rejects the changed data, I developed a
> 'rearguard' option to tzdb that lossfully converts the main-format tzdata
> into a
> rearguard format that pacifies TZUpdater.
>
> However, even with the rearguard option (and even if we go back to circa
> 2017
> code and data before this latest kerfuffle started), CLDR+Java cannot
> handle
> Irish time correctly for past timestamps due to what appear to be
> shortcomings
> in its model. This problem is not limited to Irish time; it also occurs
> for time
> in Los Angeles during World War II (see example below) and in several
> other
> areas, including Morocco right this minute and quite possibly in North
> America
> and Europe in the near future.
>
>    $ jshell
>    |  Welcome to JShell -- Version 12.0.1
>    |  For an introduction type: /help intro
>
>    jshell> var jan1943 = java.time.Instant.ofEpochSecond(-852051600)
>    jan1943 ==> 1943-01-01T07:00:00Z
>
>    jshell> var zone = java.time.ZoneId.of("America/Los_Angeles")
>    zone ==> America/Los_Angeles
>
>    jshell> var dtf =
> java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd
> HH:mm:ss Z z (zzzz)")
>    dtf ==> Value(YearOfEra,4,19,EXCEEDS_PAD)'-'Value(MonthOf ... RT)'
> ''('ZoneText(FULL)')'
>
>    jshell> jan1943.atZone(zone).format(dtf)
>    $4 ==> "1943-01-01 00:00:00 -0700 PDT (Pacific Daylight Time)"
>
>    jshell>
>    $ TZ=America/Los_Angeles date -d at -852051600 +"%s %Y-%m-%d %H:%M:%S %z
> %Z"
>    -852051600 1943-01-01 00:00:00 -0700 PWT
>
> Near the end of the example above, Java says "PDT" where tzdb says "PWT",
> because Java can't handle PWT.
>
> > Now, it's true, isdst might not be the best key to use for this
> > sort of thing any more.  Do we have recommendations for what
> > projects like Java and ICU/CLDR ought to be keying off of,
> > if not isdst? (I suppose tt_abbrind, or more likely the actual
> > string it indexes, might be better.)
>
> I'm afraid they will need to solve this problem largely on their own, as
> one
> cannot look at tzdata and automatically derive strings like "Pacific War
> Time"
> or "Central Africa Ramadan Time": those strings are not in the data (not
> even in
> English), and there are no numeric equivalents either. The only
> partially-relevant info in tzdata consists of abbreviations like "IST" and
> "PDT"
> and unfortunately these abbreviations are well-documented to be ambiguous
> and
> historically inaccurate in some cases.
>
> It should be possible for CLDR+Java to develop reasonably-reliable
> heuristics
> for guessing what string to use in some cases. For example, they could
> have a
> heuristic that "IST" means "India Standard Time" in Asia/Kolkata, "Israel
> Standard Time" in Asia/Gaza, Asia/Hebron and Asia/Jeruslaem, "Irish Summer
> Time"
> in Ireland before 1968-10-27, and "Irish Standard Time" in Ireland
> starting
> 1968-10-27. Similar heuristics could be used for other abbreviations, and
> if
> CLDR+Java tune the heuristics enough they'd be accurate. However, they'd
> have to
> do most of this work on their own. For example tzdb does not have an
> alphabetic
> abbreviation for the current time in Morocco (+00, a 1-hour negative DST
> where
> standard time is +01), so CLDR would have to invent an abbreviation there
> (presumably something like "Central Africa Ramadan Time" in English) and
> base
> its use on a heuristic like "when Africa/Casablanca is at +00 in the year
> 2019
> or later, its time zone abbreviation is 'Central Africa Ramadan Time'".
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20190609/05d262d2/attachment.html>