[tz] Proposal: validation text file with releases

Jon Skeet skeet at pobox.com
Thu Apr 28 06:16:20 UTC 2016


On 27 April 2016 at 23:34, Random832 <random832 at fastmail.com> wrote:

> On Wed, Apr 27, 2016, at 12:21, Jon Skeet wrote:
> > I'd say those cons are pretty significant - I find it very significantly
> > harder to read than the format I've propsed. I'm also confused by your
> > "pro" that it doesn't depend on any implementation details, but it really
> > *exposes* the implementation details in naming ("isdst" and "gmtoff" for
> > example, along with the mysterious huge numeric values).
>
> isdst is standard C. gmtoff is a common extension.
>

Right, so basically the format is specific to "C-based implementations". I
agree that it's a different sort of implementation detail than normal, but
it's still far from platform-neutral. The aim of the tzvalidate data is to
help people validate that any code parsing the source data from tz does so
in the same way - and I don't think the C-centric format helps that.


> My point is that it doesn't actually parse the internal structures of
> the timezone files, it simply calls localtime over and over with
> different values, and so can be used even with a radically different
> implementation of the C functions, or against POSIX timezone strings,
> etc


Okay, so that's an argument for changing the implementation - but it's not
an argument for changing the format, IMO. As far as I can see, the only
genuine benefit from choosing the zdump format as the output format is that
there's already C code for it. Hopefully it would be entirely possible to
write code which calls localtime in the same way, and output my proposed
format.

On the other hand, I'm not sure whether that's actually a benefit anyway:
the whole idea isn't to check whether multiple platforms have the same time
zone data, but to check whether they each handle the same input data in the
same way... I think it's reasonable to determine how zic handles its input
data by looking directly at its output. To be honest, I think there'd be
room for two tools in C here - one "white box" one dealing with the zic
format directly, and one "black box" one more similar to zdump.

Another "con" against zdump - the man pages I've found don't specify the
format in very much detail. For example:

For each zonename on the command line, print the time at the lowest
> possible time value, the time one day after the lowest possible time value,
> the times both one second before and exactly at each detected time
> discontinuity, the time at one day less than the highest possible time
> value, and the time at the highest possible time value. Each line ends with
> isdst=1 if the given time is Daylight Saving Time or isdst=0 otherwise.


So what's the format for the time? I can see what it does on my system, but
I wouldn't be surprised if there were multiple implementations of zdump
doing slightly different things - possibly with some of them using the user
locale for formatting, for example. For tzvalidate to be useful, the format
has to be nailed down, ideally to the exact byte.

The output of all my tools currently uses \r\n as the line break; for wider
adoption it would probably be worth moving to \n. But if we had the output
to an exact byte, then users wouldn't need to download the whole output
file to check it for correctness, necessarily - they could check the SHA-1
hash of *their* output against the golden SHA-1 hash, and only find
differences if necessary. Indeed, the SHA-1 hash from zic output could
become part of the distributed tzdata, which I'd personally *love*.
Discussion on whether that's feasible would be welcome...

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20160428/2e48bc0b/attachment.html>


More information about the tz mailing list