[tz] Proposal: validation text file with releases

Jon Skeet skeet at pobox.com
Mon Jul 13 15:06:16 UTC 2015


Given that I've already found discrepancies (see "Discrepancies in time
zone data interpretation") I'm going to go ahead and hack on this in purely
pragmatic (read: short term) ways. I'll create a github repo just for this
purpose and dump code in there - this is explicitly with the aim of
encouraging a more permanent solution by proving value.

Will post another message here when there's something worth looking at -
I'll be initially looking at zdump output, Joda Time, standard Java, and
Noda Time. Contributions from others for other languages/platforms will be
very welcome.

Jon


On 13 July 2015 at 14:46, Stephen Colebourne <scolebourne at joda.org> wrote:

> FWIW, I think such a format would be very useful. Effectively, it is a
> unit test for others to confirm that they interpret the rules the same
> way as intended.
>
> It is similar to what I produced when trying to demonstrate the amount
> of change being caused by apparently "minor" changes to the data:
> https://github.com/jodastephen/tzdiff/commits/master
>
> Any output of this type should indeed just consist of a simple text
> file with ISO-8601 format timestamps.
>
> Stephen
>
>
>
> On 11 July 2015 at 11:35, Jon Skeet <skeet at pobox.com> wrote:
> > Background: I'm the primary developer for Noda Time which consumes the tz
> > data. I'm currently refactoring the code to do this... and I've come
> across
> > some code (originally ported from Joda Time) which I now understand in
> terms
> > of what it's doing, but not exactly why.
> >
> > For a little while now, the Noda Time source repo has included a text
> dump
> > file, containing a text dump of every transition (up to 2100, at the
> moment)
> > for every time zone. It looks like this, picking just one example:
> >
> > Zone: Africa/Maseru
> > LMT: [StartOfTime, 1892-02-07T22:08:00Z) +01:52 (+00)
> > SAST: [1892-02-07T22:08:00Z, 1903-02-28T22:30:00Z) +01:30 (+00)
> > SAST: [1903-02-28T22:30:00Z, 1942-09-20T00:00:00Z) +02 (+00)
> > SAST: [1942-09-20T00:00:00Z, 1943-03-20T23:00:00Z) +03 (+01)
> > SAST: [1943-03-20T23:00:00Z, 1943-09-19T00:00:00Z) +02 (+00)
> > SAST: [1943-09-19T00:00:00Z, 1944-03-18T23:00:00Z) +03 (+01)
> > SAST: [1944-03-18T23:00:00Z, EndOfTime) +02 (+00)
> >
> > I use this file for confidence when refactoring my time zone handling
> code -
> > if the new code comes up with the same set of transitions as the old
> code,
> > it's probably okay. (This is just one line of defence, of course - there
> are
> > unit tests, though not as many as I'd like.)
> >
> > It strikes me that having a similar file (I'm not wedded to the format,
> but
> > it should have all the same information, one way or another) released
> > alongside the main data files would be really handy for all implementors
> -
> > it would be a good way of validating consistency across multiple
> platforms,
> > with the release data being canonical. For any platforms which didn't
> want
> > to actually consume the rules as rules, but just wanted a list of
> > transitions, it could even effectively replace their use of the data.
> >
> > One other benefit: diffing the dump between two releases would make it
> clear
> > what had changed in effect, rather than just in terms of rules.
> >
> > One sticking point is size. The current file for Noda Time is about 4MB,
> > although it zips down to about 300K. Some thoughts around this:
> >
> > We wouldn't need to distribute it in the same file as the data - just as
> we
> > have data and code file, there could be a "textdump" file or whatever
> we'd
> > want to call it. These could be retroactively generated for previous
> > releases, too.
> > As you can see, there's redundancy in the format above, in that it's a
> list
> > of "zone intervals" (as I call them in Noda Time) rather than a list of
> > transitions - the end of each interval is always the start of the next
> > interval.
> > For zones which settle into an infinite daylight saving pattern, I
> currently
> > generate from the start of time to 2100 (and then a single zone interval
> for
> > the end of time as Noda Time understands it; we'd need to work out what
> form
> > that would take, if any). If we decided that "year of release + 30 years"
> > was enough, that would cut down the size considerably.
> >
> > Any thoughts? If the feeling is broadly positive, the next step would be
> to
> > nail down the text format, then find a willing victim/volunteer to write
> the
> > C code. (You really don't want me writing C...)
> >
> > Jon
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20150713/2f301183/attachment.html>


More information about the tz mailing list