<div dir="ltr">Given that I've already found discrepancies (see "Discrepancies in time zone data interpretation") I'm going to go ahead and hack on this in purely pragmatic (read: short term) ways. I'll create a github repo just for this purpose and dump code in there - this is explicitly with the aim of encouraging a more permanent solution by proving value.<div><br></div><div>Will post another message here when there's something worth looking at - I'll be initially looking at zdump output, Joda Time, standard Java, and Noda Time. Contributions from others for other languages/platforms will be very welcome.</div><div><br></div><div>Jon</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 13 July 2015 at 14:46, Stephen Colebourne <span dir="ltr"><<a href="mailto:scolebourne@joda.org" target="_blank">scolebourne@joda.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">FWIW, I think such a format would be very useful. Effectively, it is a<br>

unit test for others to confirm that they interpret the rules the same<br>

way as intended.<br>

<br>

It is similar to what I produced when trying to demonstrate the amount<br>

of change being caused by apparently "minor" changes to the data:<br>

<a href="https://github.com/jodastephen/tzdiff/commits/master" rel="noreferrer" target="_blank">https://github.com/jodastephen/tzdiff/commits/master</a><br>

<br>

Any output of this type should indeed just consist of a simple text<br>

file with ISO-8601 format timestamps.<br>

<span class="HOEnZb"><font color="#888888"><br>

Stephen<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

<br>

On 11 July 2015 at 11:35, Jon Skeet <<a href="mailto:skeet@pobox.com">skeet@pobox.com</a>> wrote:<br>

> Background: I'm the primary developer for Noda Time which consumes the tz<br>

> data. I'm currently refactoring the code to do this... and I've come across<br>

> some code (originally ported from Joda Time) which I now understand in terms<br>

> of what it's doing, but not exactly why.<br>

><br>

> For a little while now, the Noda Time source repo has included a text dump<br>

> file, containing a text dump of every transition (up to 2100, at the moment)<br>

> for every time zone. It looks like this, picking just one example:<br>

><br>

> Zone: Africa/Maseru<br>

> LMT: [StartOfTime, 1892-02-07T22:08:00Z) +01:52 (+00)<br>

> SAST: [1892-02-07T22:08:00Z, 1903-02-28T22:30:00Z) +01:30 (+00)<br>

> SAST: [1903-02-28T22:30:00Z, 1942-09-20T00:00:00Z) +02 (+00)<br>

> SAST: [1942-09-20T00:00:00Z, 1943-03-20T23:00:00Z) +03 (+01)<br>

> SAST: [1943-03-20T23:00:00Z, 1943-09-19T00:00:00Z) +02 (+00)<br>

> SAST: [1943-09-19T00:00:00Z, 1944-03-18T23:00:00Z) +03 (+01)<br>

> SAST: [1944-03-18T23:00:00Z, EndOfTime) +02 (+00)<br>

><br>

> I use this file for confidence when refactoring my time zone handling code -<br>

> if the new code comes up with the same set of transitions as the old code,<br>

> it's probably okay. (This is just one line of defence, of course - there are<br>

> unit tests, though not as many as I'd like.)<br>

><br>

> It strikes me that having a similar file (I'm not wedded to the format, but<br>

> it should have all the same information, one way or another) released<br>

> alongside the main data files would be really handy for all implementors -<br>

> it would be a good way of validating consistency across multiple platforms,<br>

> with the release data being canonical. For any platforms which didn't want<br>

> to actually consume the rules as rules, but just wanted a list of<br>

> transitions, it could even effectively replace their use of the data.<br>

><br>

> One other benefit: diffing the dump between two releases would make it clear<br>

> what had changed in effect, rather than just in terms of rules.<br>

><br>

> One sticking point is size. The current file for Noda Time is about 4MB,<br>

> although it zips down to about 300K. Some thoughts around this:<br>

><br>

> We wouldn't need to distribute it in the same file as the data - just as we<br>

> have data and code file, there could be a "textdump" file or whatever we'd<br>

> want to call it. These could be retroactively generated for previous<br>

> releases, too.<br>

> As you can see, there's redundancy in the format above, in that it's a list<br>

> of "zone intervals" (as I call them in Noda Time) rather than a list of<br>

> transitions - the end of each interval is always the start of the next<br>

> interval.<br>

> For zones which settle into an infinite daylight saving pattern, I currently<br>

> generate from the start of time to 2100 (and then a single zone interval for<br>

> the end of time as Noda Time understands it; we'd need to work out what form<br>

> that would take, if any). If we decided that "year of release + 30 years"<br>

> was enough, that would cut down the size considerably.<br>

><br>

> Any thoughts? If the feeling is broadly positive, the next step would be to<br>

> nail down the text format, then find a willing victim/volunteer to write the<br>

> C code. (You really don't want me writing C...)<br>

><br>

> Jon<br>

><br>

</div></div></blockquote></div><br></div>