<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 27 April 2016 at 23:34, Random832 <span dir="ltr">&lt;<a href="mailto:random832@fastmail.com" target="_blank">random832@fastmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On Wed, Apr 27, 2016, at 12:21, Jon Skeet wrote:<br>

&gt; I&#39;d say those cons are pretty significant - I find it very significantly<br>

&gt; harder to read than the format I&#39;ve propsed. I&#39;m also confused by your<br>

&gt; &quot;pro&quot; that it doesn&#39;t depend on any implementation details, but it really<br>

</span>&gt; *exposes* the implementation details in naming (&quot;isdst&quot; and &quot;gmtoff&quot; for<br>

<span class="">&gt; example, along with the mysterious huge numeric values).<br>

<br>

</span>isdst is standard C. gmtoff is a common extension.<br></blockquote><div><br></div><div>Right, so basically the format is specific to &quot;C-based implementations&quot;. I agree that it&#39;s a different sort of implementation detail than normal, but it&#39;s still far from platform-neutral. The aim of the tzvalidate data is to help people validate that any code parsing the source data from tz does so in the same way - and I don&#39;t think the C-centric format helps that. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

My point is that it doesn&#39;t actually parse the internal structures of<br>

the timezone files, it simply calls localtime over and over with<br>

different values, and so can be used even with a radically different<br>

implementation of the C functions, or against POSIX timezone strings,<br>

etc</blockquote><div><br></div><div>Okay, so that&#39;s an argument for changing the implementation - but it&#39;s not an argument for changing the format, IMO. As far as I can see, the only genuine benefit from choosing the zdump format as the output format is that there&#39;s already C code for it. Hopefully it would be entirely possible to write code which calls localtime in the same way, and output my proposed format.</div><div><br></div><div>On the other hand, I&#39;m not sure whether that&#39;s actually a benefit anyway: the whole idea isn&#39;t to check whether multiple platforms have the same time zone data, but to check whether they each handle the same input data in the same way... I think it&#39;s reasonable to determine how zic handles its input data by looking directly at its output. To be honest, I think there&#39;d be room for two tools in C here - one &quot;white box&quot; one dealing with the zic format directly, and one &quot;black box&quot; one more similar to zdump.</div><div><br></div><div>Another &quot;con&quot; against zdump - the man pages I&#39;ve found don&#39;t specify the format in very much detail. For example:</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">For each zonename on the command line, print the time at the lowest possible time value, the time one day after the lowest possible time value, the times both one second before and exactly at each detected time discontinuity, the time at one day less than the highest possible time value, and the time at the highest possible time value. Each line ends with isdst=1 if the given time is Daylight Saving Time or isdst=0 otherwise.</blockquote><div><br></div><div>So what&#39;s the format for the time? I can see what it does on my system, but I wouldn&#39;t be surprised if there were multiple implementations of zdump doing slightly different things - possibly with some of them using the user locale for formatting, for example. For tzvalidate to be useful, the format has to be nailed down, ideally to the exact byte.</div><div><br></div><div>The output of all my tools currently uses \r\n as the line break; for wider adoption it would probably be worth moving to \n. But if we had the output to an exact byte, then users wouldn&#39;t need to download the whole output file to check it for correctness, necessarily - they could check the SHA-1 hash of <i>their</i> output against the golden SHA-1 hash, and only find differences if necessary. Indeed, the SHA-1 hash from zic output could become part of the distributed tzdata, which I&#39;d personally <i>love</i>. Discussion on whether that&#39;s feasible would be welcome...</div><div><br></div><div>Jon</div><div><br></div></div></div></div>