[tz] zdump new option -i for easier-to-review output
Paul Eggert
eggert at cs.ucla.edu
Sun Jun 5 23:51:33 UTC 2016
Jon Skeet wrote:
> The use case I'm
> primarily interested in is validation: diffing a "golden" file with one
> generated by another tool
Yes, I should have mentioned that. I commonly compare two zdump output files
using "diff", for example. zdump -i works well for this, too. However, it does
not suffice to merely look at diff output. Sometimes we add new zones, for
example, and diff output won't serve to proofread those.
> I wouldn't expect them to be dealing with this format every day
True; even I don't do that. Still, there is no need for zdump -i format to be
self-explanatory. For example, the format need not use strftime %c format merely
because naive users are more likely to understand %c format than ISO 8601
format. As long as the format is reasonably clear without constantly having to
refer to the documentation then we should be OK, and zdump -i format clears that
relatively-low bar.
> - I don't see why we need the quoted form for the time zone ID.
The API allows the TZ environment variable (the time zone ID) to be any finite
sequence of non-null bytes. TZ need not be UTF-8 encoded, and the bytes can
contain newlines, etc., and zdump output should be unambiguous regardless of how
weird TZ's value is.
> Presumably the benefit of the proposed format is that you can copy/paste it
> into a Unix shell to use that time zone.
No, and in general such a cut-and-paste would not work because the quotation
scheme is not designed to be shell-compatible. The main goal is to have an
unambiguous format that supports any TZ value allowed by the API. Also, to
provide some room for future extensions to zdump -i format.
> the quotes and TZ= part are an unnecessary distraction IMO.
Some decoration is needed in order to make it easy to distinguish a TZ= line
from an ordinary data line. This is because a TZ string can be almost anything:
it can look like a data line, for example.
Anyway, if this is the worst of zdump -i's problems, we should be OK.
> - Indicating daylight/standard with an arbitrary positive integer: if this is
> going to be a canonical format, we need to be more precise than that.
> Equivalent outputs should be equal. I'd also prefer it not to be an integer
> at all, given that it's indicating a Boolean value.
tm_isdst is defined by ISO C11 and by POSIX to be an int value, so if we want
zdump to work with all standard-conforming implementations without losing
information, it must be able to represent an arbitrary int somehow. The existing
zdump -v format can do it, and it would be odd if zdump -i format were to lose
that ability.
> - I'd *really* like colons in the UT offsets
That is mostly just a style thing. That being said, in my experience most UT
offsets that contain hours and minutes omit colons (this includes several
examples in the RFC-5322-format header in your email :-).
> - I think it's simpler to think about the transition times in UT, indicated
> with a Z in the output.
That's not my experience. Most of our sources do not base transitions on UT, and
I typically think about local time when mulling over transitions and DST rules.
> choosing the local time *after* the transition isn't how most people think
> about transitions in day to day conversation.
True. But it's easy to get used to when looking at zdump -i format. Plus, users
most likely prefer localtime to UT when thinking about transitions.
> Just the fact that there's ambiguity
The format is documented and if this documentation is understood correctly the
zdump -i output has just one interpretation, so there is no ambiguity. A problem
might arise if someone attempts to look at zdump -i output without reading the
documentation; although such a problem could occur with any format choice, some
formats are less confusing than others, and most likely that is what you're
referring to.
To some extent there is a tradeoff between formats that make typos easy to find,
and formats that are more what users typically expect. Within reason I'd rather
make typos easy to find, as typos are a real probelm!
> - Omitting the abbreviation when it happens to be the same as the UT offset
> makes the file harder to parse for very little benefit in my view.
First, it's trivial to parse zdump -i lines even when the abbreviation is
omitted. For example, here's an awk script that outputs only zdump -i lines that
correspond to DST transitions even when abbreviations are omitted:
/^[0-9]/ && NF > 3 && /[0-9]$/ {print}
Compare this to an awk script to do the same thing with tzvalidate format:
/^[0-9]/ && $(NF - 1) == "daylight" {print}
which is not significantly simpler.
Second, I realize the improvement is of little benefit to those who do not read
zdump output. But any unambiguous format would do for that case; we could pick
JSON format, or XML format, or whatever. Being somewhat old-fashioned I'd like a
text format that makes it easy for me to read zdump -i format using an ordinary
text editor. And for me, it's quite useful that redundant abbreviations are
omitted. Consider, for example, this output:
1981-04-01 01 +07 1
1981-09-30 23 +06
1982-04-01 01 +07 1
1982-09-30 23 +06
1983-04-01 01 +07 +08 1
1983-09-30 23 +06
1984-04-01 01 +07 1
1984-09-30 02 +06
where the (incorrect) 1983-04-01 transition sticks out like a sore thumb. In
contrast, if the abbreviation were always output and columns always lined up,
and the output looked like this:
1981-04-01 01 +07 +07 1
1981-09-30 23 +06 +06 0
1982-04-01 01 +07 +07 1
1982-09-30 23 +06 +06 0
1983-04-01 01 +07 +08 1
1983-09-30 23 +06 +06 0
1984-04-01 01 +07 +07 1
1984-09-30 02 +06 +06 0
the same typo is *much* harder to spot.
So it is not "very little benefit". It's a big deal to someone like me who wants
to catch typos and who has to deal with the consequences of typos.
> for times, I'd favour at least keeping the minutes
I was tempted by that too, on the grounds that it's what readers typically
expect. However, it makes typos harder to catch, which is a significant
disadvantage.
I hope I've explained the significant technical advantages of zdump -i format
for my use case (manually looking at zdump -i output, and looking at diffs of
it). I am not surprised that its style is offputting, which is why I'm thinking
that we may need a way for people to specify output style more flexibly than
zdump -i versus zdump -v versus zdump -V.
More information about the tz
mailing list