[tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change
Stephen Colebourne
scolebourne at joda.org
Wed Jan 24 18:36:42 UTC 2018
> 99.9999% of people (not being zic) should really be ignoring those
> files, and everything they contain (the remaining percentage are the
> people who maintain the data - all 10 or 20 or so of them in the world).
> Everything else should be based upon the zoneinfo output files from
> zic
This hasn't been true for many many years. The source files are parsed
by every downstream program I know. Its been discussed before as to
why this is. I'd strongly suggest accepting that the source files are
a primary interface, which is why negative SAVE values matter to
downstream users.
As for the rest, well I'm not going to reply to each line. With no
acceptance of the concept of backwards compatibility, discussion is
pretty pointless. If there was a simple bug fix that solves all the
problems, I'd gladly do my part. There isn't such a fix - every avenue
other than insisting on positive SAVE values will make things worse.
Want to make things truly better? Agree to move TZDB under the
auspices of CLDR, so it can be managed by a paid team who actually
understand stability and compatibility, and the trade off of those
against some abstract notion of purity. As a combined dataset, there
would be the ability to solve the text problem in a realistic and
pragmatic way.
TZDB is not the centre of the universe. It is a small cog in a much
bigger machine. Its time to accept that.
Stephen
On 24 January 2018 at 16:19, Robert Elz <kre at munnari.oz.au> wrote:
> From: Stephen Colebourne <scolebourne at joda.org>
> Date: Tue, 23 Jan 2018 18:42:29 +0000
> Subject: Re: [tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change
>
> | Java time-zone data is updated using the tzupdater tool
> | [URL omitted here]
> | This will update the tzdb data, but not the CLDR-driven data that
> | drives the text.
>
> That is most probably a mistake - the two should be linked, it
> is entirely possible that a zone might change its names (regardless
> of issues of when transitions occur, or what, if anything, is
> regarded as the "standard" time).
>
> | Were the change to proceed, anyone running tzupdater
> | with the Ireland change would invert the meaning of inDaylightTime()
> | and access the wrong array element in the CLDR-driven data - a bug.
>
> Yes, it would be, and CLDR or java (whichever has the issue, or both)
> should fix it. And fix it soon.
>
> | And code changes don't help, as we'll see below.
>
> Of course code changes help - there's a bug, fixing the bug will fix that.
>
> And also of course, for people who don't update, the bug will continue to
> appear - as for any other bug, or security vunerability that is found and
> fixed. Nothing that we can do about that. People who won't, or can't,
> update get screwed by all kinds of things.
>
> | There is no possible fix to Java, as this is primarily an issue
> | between CLDR and TZDB. The two have a subtle API linkage which has
> | perhaps never been clearly spelled out here.
>
> Yes, they do, that ought to be obvious - the linkage is not (or should
> not be) subtle - it should be obvious.
>
> | CLDR provides textual names for time-zones, as an array [winter,
> | summer].
>
> That itself is a bug. It assumes there are just two (not including for
> the "generic" name, mentioned in a later message from Yoshito Umaoka, which
> is probably the more useful one of the three anyway) - and there is
> no guarantee that will (or even always has) remain true.
>
> There is nothing to stop some locality (probably one at a high latitude)
> from deciding that they should advance the clocks in early spring, and
> then advance them further in early-mid summer, returning to the intermediate
> (or some other) value in late summer, and then to the original in late
> autumn (or fall if autumn happens to be called that in the relevant
> location). What's more, they could give 4 different names to the 3 (or 4)
> different offsets, perhaps "winter time" "spring time" "summer time" and
> "autumn time" with 4 different abbreviations.
>
> There could even be a mid winter fallback of even more, just as there
> could be a mid summer skip forward of more.
>
> Calling any of those offsets "standard" and the others as something
> different is really nonsense, though the jurisdiction (and people)
> might pick that label - but when they do, we should all remember that
> it is just another name. One offset is mot more blessed than any other
> because it happens to be labelled as the "standard" time. It might
> be different if we defined "standard time" to be the nearest "natural"
> offset based on lines of longitude - but with what resolution? And how
> would you apply that to China or India? So we don't do that. No-one does.
>
> CLDR (and its clients) needs to be able to represent all this. Tzdb can.
> CLDR must also handle places which (given the durations of the two periods
> that is common these days) decide that "standard" time be the one that
> applies for longer each year, and so should be the time in summer, and
> in winter the clocks should be set backwards some number of minutes for a
> few months, so it does not remain dark quite so late in the mornings
> ("darkness saving time" - aka DST).
>
> | As a much larger project with considerable history the order
> | of that array is not going to change.
>
> More than that needs to change, the order is not, or should not be,
> material.
>
> Just accept it - the design is broken, and must be fixed.
>
> | (I'm using winter and summer for CLDR for this email to aid clarity,
> | they refer to them as standard and daylight).
>
> Either way exposes the broken assumption that there are just two.
>
> | TZDB provides the offsets, SAVE values and a short text string. This
> | text string - GMT/IST or IST/GMT - is not directly linkable to the
> | data CLDR provides.
>
> It probably should be, probably when accompanied by the offset and
> the relevant time (perhaps the offset is less needed, or useful),
> those should be the key to the translated strings. But not as
> indexes into an array, that's just plain stupid. As database
> keys (for "database" in the general, not implying anything SQL based
> or similar).
>
> Alternatively, perhaps localized zoneinfo files should be used
> instead, built from a modified zic, which embeds the localized
> names (for some particular locality) with the raw data (probably
> in a similar way to, or perhaps instead of, how the abbreviations are
> handled now).
>
> That would mean one set of zoneinfo files for each locality an
> installation wants to support, but zoneinfo files are not really
> all that big (and adding a few extra strings to them would not
> make much difference) so this should not be seen as too much of
> a drawback - then CLDR users would simply use those files instead
> of the normal ones (if those even continue to exist on the system)
> for all purposes.
>
> This would obviously handle the problem of the two being updated
> independantly fairly easily.
>
> It does mean that if the "normal" files continue to exist, as both
> cldr and older applications both exist on the system, then those
> would need to be updated together. This should not be a problem,
> the update of one is simply not made available until both are ready.
>
> | Although it may seem that you can use the text
> | from TZDB as a key to lookup the correct value in CLDR, I know from
> | painful experience that approach fails (as the TZDB text varies over
> | time,
>
> Yes, and when it does the CLDR strings ("translations" into local formats)
> [ translations in quotes as I know that is not exactly what they are ]
> may need to change as well. There are multiple reasons why the TZDB names
> might change, some are, frankly, silly, but others represent real changes
> in what the local users call their times. In some cases the CLDR strings
> may have already matched local expectations, and nothing needs to alter,
> but in others the local's name might have changed (in their language,
> as well as in English) and the CLDR strings need to be updated (augmented).
>
> This is why the CLDR data should really be updated (if required) and
> (always) transmitted whenever the tzdb (zoneinfo) data changes.
>
> | has the same text in winter and summer, or isn't even text).
>
> I have no idea what the latter means - they are all text (we do not
> define zone abbreviations as random binary), unless you mean the +04 types,
> which are text, just text containing digits and +/- signs, rather than
> only letters.
>
> But you're right the "sometimes the same" (which is actually a very
> sane choice) means that you cannot use the abbreviation alone to map.
> However, the name, and the time to which it is being applied, is
> enough (and perhaps to avoid running that time through localtime()
> or its equivalent again just to get the offset, probably that as a param
> as well. We know localtime() must have been run already, or the data
> currently used would not be available.)
>
> | Thus, the only reliable way to pick which piece of CLDR data is needed
> | is from the offsets.
>
> Not even that alone, as the same offset can have different names during
> different periods. That (unlike some of my potential scenarios) has
> actually been observed in the past, and CLDR needs to handle that we well.
>
> It is simply untrue, and incorrect, to assume that if (in locaiity X)
> times at offset N are called ABC and times at offset M are called DEF
> today, than that was true last year. The old and the new names need
> to be available and applied to the appropriate times. This is true just
> as it is true that CLDR data is needed for more than calendaring
> applications - the only thing that matters is not just when the next
> meeting is schedueled (with the day and month, and timezone names
> converted to the local correct forms.)
>
> | For 20 years, this has been done in a simple and straightforward way -
> | if (raw-offset != actual-offset) then CLDR uses summer text and array
> | element 1.
>
> So, for 20 years there has been a latent bug. If for 20 years there
> has been a latent bug that allows a security breach, are you going to
> simply say "it has been there too long, we can't fix it now" ?
>
> Really?
>
> It makes no difference how old it is, a bug is a bug, and needs to be fixed.
>
> | This provides the necessary glue to link the two projects:
>
> It is the wrong glue.
>
> | TZDB has always had the raw and actual offsets
>
> What on earth is the "raw offset"?
>
> I somehow suspect that you (and perhaps CLDR in general) is reading
> too much into the tzdb source files.
>
> 99.9999% of people (not being zic) should really be ignoring those
> files, and everything they contain (the remaining percentage are the
> people who maintain the data - all 10 or 20 or so of them in the world).
>
> Everything else should be based upon the zoneinfo output files from
> zic - and that has no notion of a "raw" offset at all, all that exists,
> and all that you can ever assume, is that for some period of time
> (or indefinite length, starting at arbitrary and often unpredictable
> instants) a particular timezone will be at some offset from UTC.
> It might also be associated with some name (in reality, many are not,
> as Paul keeps pointing out, many of the abbreviated names that tzdb
> contain were purely invented for tzdb, because the (US centric) UNIX
> API/ABI required them - some of those are the ones being turned into
> numeric offsets represented as text strings - it makes no difference
> in the zone concerned, as there the time is just "the time" it has
> no other name (we really should have no abbreviation at all, and CLDR
> should have no translation of it).
>
> | the same in winter and different in summer,
>
> Once upon a time, the world was always flat, everyone knew that,
> the pope even proclaimed it...
>
> | so this has always worked.
>
> The latent bug was not exposed. That is not "worked" it is
> rather "managed to survive".
>
> | The Ireland proposal breaks this, with (raw-offset != actual-offset)
> | meaning winter, instead of summer. It is fair for TZDB to complain
> | that CLDR is inflexible with its definitions, but the reality is that
> | this was and is the only way to connect two separately developed
> | projects (where API stability is vital).
>
> Nonsense. It was just someone's idea of something they thought
> would work, and which seemed to - but it was based upon unfounded
> (and incorrect) assumptions about the natire of civil time, and how
> it can be expected to work.
>
> | In order for TZDB and CLDR to co-exist, it is *required* that the raw
> | offset equals the actual offset in winter,
>
> No, it is *required* for CLDR to be fixed. What is happening now is
> obviously incorrect.
>
> | This isn't a change that can be delayed for a year.
>
> Oh good, so we can make it now?
>
> | This interpretation of inSummerTime() relies on positive SAVE values,
>
> So, fix it. It is broken.
>
> | is part of the public API of TZDB just as much as the source code file
> | format is.
>
> If that's all, then we have no problem, as the source file format
> should not be regarded as part of anything except the method by which
> we happen to represent the data before zic converts it to zoneinfo.
>
> The source format has changed, and will change again - that is guaranteed.
>
> The zoneinfo format (in binary form, or converted to text) is designed
> to be immune to all of the schenanigans that go on, and really is
> what everyone should be using. If anyone believes that they need
> the source files for anything other than feeding to zic (or some
> equivalent program for systems that cannot run it, if there are any)
> then that almost guarantees that they are making some unststainable
> assumptions, which will, one day, be proven false.
>
> We (of course) attempt to remain backward compatible, but as legislatures
> (and the people under their governance) do weirder and weirder things, we
> are likely to find that the current language is incapable of expressing
> what needs to be expressed, and it ill be extended.
>
> I know there are others that read it, but this should be treated in
> a similar way to the way that compilers treat programming language
> specifications - when the language is extended (as all that are not
> dead have happen) the compilers all need to be updated to deal.
> Similarly, when tax legislation is amended (about the only thing that
> changes even more frequently, and for less rational reasons than
> timezones) the accountants, and the software they use, needs to be
> updated to deal with that.
>
> Updates/changes are simply a fact of life, there is nothing that is
> guaranteed (not really even death or taxes) that we can promise will
> never change. Hopefully zoneinfo files will not need much - though it
> aready has changed when 64 bit time support was added, and might need
> more, if people dealing 2038 issues find some innovative way to allow
> 32 bit timestamps to keep working, in some fashion, beyond 2038 in order
> to retain compat with old databases that cannot be updated easily.
>
> Everyone needs to remain aware of this. Sticking our heads in the
> sand and proclaiming "it always worked in the past, it must be made
> to continue working in the future" is, frankly, absurd.
>
> kre
>
> ps: I am sure apologies will be needed, I have tried to find and
> correct all my typos, but right now, my e-mail environment is
> horribly challenged, and I have no way to rationally do spell or
> grammar checks I normally would (well sometimes) attempt. So,
> consider that for any unfound mistakes, apologies are tendered.
More information about the tz
mailing list