[tz] Proposal: validation text file with releases
skeet at pobox.com
Wed Jul 15 19:10:25 UTC 2015
If the tzdata isn't really intended to be consumed - if we should *really* only
consume the zic output, and anything else is somewhat questionable, then
why distribute the tzdata at all? Why not just distribute the zic output
As for DST vs STD time not being relevant in software - while Windows
doesn't (mostly) use tzdata, it *does* allow you to specify a time zone and
exclude DST from that. Anyone wanting to mimic that behaviour but using the
tz data can *only* do it if they know the DST component of the overall
I would really like to proceed pragmatically: I have a personal real-world
need for validation, and given the discrepancies I've found so far, I think
it would be useful to other people as well. I would much rather have an
imperfect but widely-used solution that has scope to be improved later than
a centithread here but with no practical outcome.
Some observations I hope we can all agree on:
- Some platforms *do* consume tzdata directly, and expose the STD/DST
components of the overall offset, and are likely to continue doing so.
- zic and the reference API do not expose the STD/DST components of the
overall offset, and are unlikely to start doing so
- If two implementations agree on overall offsets, it's highly likely
they agree on STD/DST components, whether or not that can be verified
- It is desirable that all implementations agree on as much as possible,
and this is more likely to happen if it's easy to regularly validate
- It is also informative to be able to easily tell the differences
between different data versions (whether released or not)
Now, a few more debatable thoughts:
- I think it would be reasonably logical - and probably fairly simple
- to modify zdump to output whatever format we want, assuming we don't
intend to include the STD/DST components. I'm not sure how easy it is to
get a complete list of all time zones to make zdump dump them all.
- Modifying zic would give us more information to dump, but wouldn't
make as much sense logically.
- Writing a brand new tool with the logic of zic but just for dumping
would probably involve code duplication, which introduces the possibility
of the codebases diverging accidentally.
- If we were to use a format such as XML, JSON or YAML to represent
the data, it's easier to make it extensible and it's also more easily
processed by modern platforms (Java, .NET etc) without fiddly line-based
parsing. On the other hand, that sometimes requires extra dependencies -
and may well be annoying from C. (I'm aware that zdump and zic are pretty
simple to build - and need to be built on a very wide variety of
- A line-based format is easier to diff with common tools than a more
- Writing a tool to convert between formats would be near-trivial if
we bear it in mind from the beginning.
- If the format allows the fields included to be specified, it will
allow STD/DST-component-aware platforms to compare against each other for
differences, even if they can only compare total offsets against
- Despite the contents of my current github repo, I'm certainly not
proposing actually using Windows line breaks for the "real" format.
- It should be very easy to add a commit hook to github to generate a
new dump file per commit, making it really easy to diff any pair (e.g.
between two releases, or seeing effect a recent commit had,
whether of code
- While I believe it would be beneficial to ship a dump file
alongside code/data releases, with the previous bullet in place, we
wouldn't *need* that to start with. We could view the whole thing as
experimental until we're happy with the format etc.
- zic documentation:
- All of this has come from me struggling to implement a tzdata
parser which is in line with zic. The man page for zic documents the tz
data format, but not in enough detail for a compliant
implementation, in my
view. I volunteer to at least attempt some more detailed documentation if
others feel it's useful. (If no-one else does, I'll probably
keep it under
the Noda Time documentation anyway, but with a suitable "this is entirely
So, where do we go from here? Does anyone believe this would actually be a
bad thing to have? (That might come with the position of "only use zic
output".) How are we best to decide the format? If modifying zdump to add
an extra flag is deemed an appropriate course of action, do we have any
volunteers to do so? I'm happy to host a github hook to publish the dump
files at each commit when all the rest of the machinery is in place.
On 15 July 2015 at 06:09, Paul Eggert <eggert at cs.ucla.edu> wrote:
> Stephen Colebourne wrote:
>> To say that
>> software should not care and that it is unsupported is .... er ....
>> rather worrying.
> Although it is an issue, the DST-vs-STD offsets are implementation details
> that are neither exposed by the reference API nor exported to zic's output
> files. Any values they internally have were not intended to be visible when
> the tzdata entries were written.
> Of course other implementations are free to process tzdata sources in
> other ways -- to take an extreme example, implementations could export
> tzdata comments to their APIs. However, this sort of thing is not part of
> the reference tz API, and any regression suite based on the reference API
> shouldn't worry about it.
> I'm not sure this project fully
>> appreciates or understands the downstream impacts of changes on
>> systems other than zic.
> It's helpful to mention those impacts on this list, if only clarify issues
> like these in the documentation. Proposed patch attached. This patch
> doesn't change zic's behavior; it just documents the way zic has always
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tz