[tz] Proposal: validation text file with releases

Tue Jul 14 20:12:27 UTC 2015

I've expanded this a bit - we now have implementations for:

   - Joda Time
   - Noda Time
   - Java 7 (well, Java pre-8)
   - Java 8
   - ICU4J
   - zdump
   - Ruby's tzinfo gem

I'd really appreciate any input at this point. There are still a few issues
with the data collection - it's not the pristine file diff we'd like to end
up with - but it's enough to highlight some discrepancies, which I'll
probably write up as a blog post and cc here. I think the fact that it
*is* showing
up these differences is evidence that this could provide a lot of value
with the support of the rest of the community (and with a better
implementation of my zdump munging - ideally something in zic itself, I
suspect). Who do I need to persuade? (Paul, I guess...)

Jon

On 13 July 2015 at 21:43, Jon Skeet <skeet at pobox.com> wrote:

> Okay, I've created
> https://github.com/nodatime/tzvalidate
>
> It allows you (well, someone who's got everything set up...) to compare
> and contrast:
>
>    - Joda Time
>    - Noda Time
>    - Java 8
>    - zdump
>
> Only Joda Time and Noda Time allow (and in fact require) a data version to
> be specified. Obviously in order to compare data meaningfully, one has to
> be using the same data in all places. That's the next thing to look at...
> but they're all using the same output format, and the results are already
> interesting in terms of some unexpected discrepanicies. I haven't had a
> chance to look into them yet.
>
> Jon
>
>
> On 13 July 2015 at 16:06, Jon Skeet <skeet at pobox.com> wrote:
>
>> Given that I've already found discrepancies (see "Discrepancies in time
>> zone data interpretation") I'm going to go ahead and hack on this in purely
>> pragmatic (read: short term) ways. I'll create a github repo just for this
>> purpose and dump code in there - this is explicitly with the aim of
>> encouraging a more permanent solution by proving value.
>>
>> Will post another message here when there's something worth looking at -
>> I'll be initially looking at zdump output, Joda Time, standard Java, and
>> Noda Time. Contributions from others for other languages/platforms will be
>> very welcome.
>>
>> Jon
>>
>>
>> On 13 July 2015 at 14:46, Stephen Colebourne <scolebourne at joda.org>
>> wrote:
>>
>>> FWIW, I think such a format would be very useful. Effectively, it is a
>>> unit test for others to confirm that they interpret the rules the same
>>> way as intended.
>>>
>>> It is similar to what I produced when trying to demonstrate the amount
>>> of change being caused by apparently "minor" changes to the data:
>>> https://github.com/jodastephen/tzdiff/commits/master
>>>
>>> Any output of this type should indeed just consist of a simple text
>>> file with ISO-8601 format timestamps.
>>>
>>> Stephen
>>>
>>>
>>>
>>> On 11 July 2015 at 11:35, Jon Skeet <skeet at pobox.com> wrote:
>>> > Background: I'm the primary developer for Noda Time which consumes the
>>> tz
>>> > data. I'm currently refactoring the code to do this... and I've come
>>> across
>>> > some code (originally ported from Joda Time) which I now understand in
>>> terms
>>> > of what it's doing, but not exactly why.
>>> >
>>> > For a little while now, the Noda Time source repo has included a text
>>> dump
>>> > file, containing a text dump of every transition (up to 2100, at the
>>> moment)
>>> > for every time zone. It looks like this, picking just one example:
>>> >
>>> > Zone: Africa/Maseru
>>> > LMT: [StartOfTime, 1892-02-07T22:08:00Z) +01:52 (+00)
>>> > SAST: [1892-02-07T22:08:00Z, 1903-02-28T22:30:00Z) +01:30 (+00)
>>> > SAST: [1903-02-28T22:30:00Z, 1942-09-20T00:00:00Z) +02 (+00)
>>> > SAST: [1942-09-20T00:00:00Z, 1943-03-20T23:00:00Z) +03 (+01)
>>> > SAST: [1943-03-20T23:00:00Z, 1943-09-19T00:00:00Z) +02 (+00)
>>> > SAST: [1943-09-19T00:00:00Z, 1944-03-18T23:00:00Z) +03 (+01)
>>> > SAST: [1944-03-18T23:00:00Z, EndOfTime) +02 (+00)
>>> >
>>> > I use this file for confidence when refactoring my time zone handling
>>> code -
>>> > if the new code comes up with the same set of transitions as the old
>>> code,
>>> > it's probably okay. (This is just one line of defence, of course -
>>> there are
>>> > unit tests, though not as many as I'd like.)
>>> >
>>> > It strikes me that having a similar file (I'm not wedded to the
>>> format, but
>>> > it should have all the same information, one way or another) released
>>> > alongside the main data files would be really handy for all
>>> implementors -
>>> > it would be a good way of validating consistency across multiple
>>> platforms,
>>> > with the release data being canonical. For any platforms which didn't
>>> want
>>> > to actually consume the rules as rules, but just wanted a list of
>>> > transitions, it could even effectively replace their use of the data.
>>> >
>>> > One other benefit: diffing the dump between two releases would make it
>>> clear
>>> > what had changed in effect, rather than just in terms of rules.
>>> >
>>> > One sticking point is size. The current file for Noda Time is about
>>> 4MB,
>>> > although it zips down to about 300K. Some thoughts around this:
>>> >
>>> > We wouldn't need to distribute it in the same file as the data - just
>>> as we
>>> > have data and code file, there could be a "textdump" file or whatever
>>> we'd
>>> > want to call it. These could be retroactively generated for previous
>>> > releases, too.
>>> > As you can see, there's redundancy in the format above, in that it's a
>>> list
>>> > of "zone intervals" (as I call them in Noda Time) rather than a list of
>>> > transitions - the end of each interval is always the start of the next
>>> > interval.
>>> > For zones which settle into an infinite daylight saving pattern, I
>>> currently
>>> > generate from the start of time to 2100 (and then a single zone
>>> interval for
>>> > the end of time as Noda Time understands it; we'd need to work out
>>> what form
>>> > that would take, if any). If we decided that "year of release + 30
>>> years"
>>> > was enough, that would cut down the size considerably.
>>> >
>>> > Any thoughts? If the feeling is broadly positive, the next step would
>>> be to
>>> > nail down the text format, then find a willing victim/volunteer to
>>> write the
>>> > C code. (You really don't want me writing C...)
>>> >
>>> > Jon
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20150714/91efcb41/attachment.html>