[tz] Proposal: validation text file with releases

Sat Jul 11 10:59:33 UTC 2015

That's certainly a good starting point, but I have three issues with it:

   - The format is verbose and somewhat harder to parse (IMO) than an
   ISO-8601-based numeric format. It's easier to get things wrong when they
   involve cultures :)
   - It indicates the final wall offset and whether or not it's in DST, but
   not how much DST there is. This only matters in a few cases like
   Antarctica/Troll which have a non-one-hour saving, but it would still be
   worth indicating, IMO
   - The "second before the transition" line for each transition feels
   redundant to me... we could potentially put *three* values for each
   transition: UTC instant, local time one second before, local time "at"
   transition

Given how widely used zdump is, we can't really change the format by
default - but we could perhaps add a new flag to indicate a new format,
assuming I'm not alone in my objections above?
(Having said that, a dump with the data in a format I'm not ecstatic about
would be better for me than no dump at all. I'm certainly not going to
start chest-thumping about formats.)

Jon

On 11 July 2015 at 11:56, Arthur David Olson <arthurdavidolson at gmail.com>
wrote:

> A possibility here is to store the output of a "zdump -v" command; I've
> used "zdump -v" output for regression testing; I believe Paul Eggert has
> done so as well. The "-c" option of zdump could be used to limit the range
> of the output.
>
>     @dashdashado
>
> On Sat, Jul 11, 2015 at 6:35 AM, Jon Skeet <skeet at pobox.com> wrote:
>
>> Background: I'm the primary developer for Noda Time <http://nodatime.org> which
>> consumes the tz data. I'm currently refactoring the code to do this... and
>> I've come across some code (originally ported from Joda Time) which I now
>> understand in terms of what it's doing, but not exactly why.
>>
>> For a little while now, the Noda Time source repo has included a text
>> dump file
>> <https://github.com/nodatime/nodatime/blob/master/src/NodaTime.Test/TestData/tzdb-dump.txt>,
>> containing a text dump of every transition (up to 2100, at the moment) for
>> every time zone. It looks like this, picking just one example:
>>
>> Zone: Africa/Maseru
>> LMT: [StartOfTime, 1892-02-07T22:08:00Z) +01:52 (+00)
>> SAST: [1892-02-07T22:08:00Z, 1903-02-28T22:30:00Z) +01:30 (+00)
>> SAST: [1903-02-28T22:30:00Z, 1942-09-20T00:00:00Z) +02 (+00)
>> SAST: [1942-09-20T00:00:00Z, 1943-03-20T23:00:00Z) +03 (+01)
>> SAST: [1943-03-20T23:00:00Z, 1943-09-19T00:00:00Z) +02 (+00)
>> SAST: [1943-09-19T00:00:00Z, 1944-03-18T23:00:00Z) +03 (+01)
>> SAST: [1944-03-18T23:00:00Z, EndOfTime) +02 (+00)
>>
>> I use this file for confidence when refactoring my time zone handling
>> code - if the new code comes up with the same set of transitions as the old
>> code, it's probably okay. (This is just one line of defence, of course -
>> there are unit tests, though not as many as I'd like.)
>>
>> It strikes me that having a similar file (I'm not wedded to the format,
>> but it should have all the same information, one way or another) released
>> alongside the main data files would be really handy for *all* implementors
>> - it would be a good way of validating consistency across multiple
>> platforms, with the release data being canonical. For any platforms which
>> didn't want to actually consume the rules as rules, but just wanted a list
>> of transitions, it could even effectively replace their use of the data.
>>
>> One other benefit: diffing the dump between two releases would make it
>> clear what had changed in *effect*, rather than just in terms of rules.
>>
>> One sticking point is size. The current file for Noda Time is about 4MB,
>> although it zips down to about 300K. Some thoughts around this:
>>
>>    - We wouldn't need to distribute it in the same file as the data -
>>    just as we have data and code file, there could be a "textdump" file or
>>    whatever we'd want to call it. These could be retroactively generated for
>>    previous releases, too.
>>    - As you can see, there's redundancy in the format above, in that
>>    it's a list of "zone intervals" (as I call them in Noda Time) rather than a
>>    list of transitions - the end of each interval is always the start of the
>>    next interval.
>>    - For zones which settle into an infinite daylight saving pattern, I
>>    currently generate from the start of time to 2100 (and then a single zone
>>    interval for the end of time as Noda Time understands it; we'd need to work
>>    out what form that would take, if any). If we decided that "year of release
>>    + 30 years" was enough, that would cut down the size considerably.
>>
>> Any thoughts? If the feeling is broadly positive, the next step would be
>> to nail down the text format, then find a willing victim/volunteer to write
>> the C code. (You really don't want me writing C...)
>>
>> Jon
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20150711/a3ebf215/attachment.html>