[tz] [Patch] Make it slightly easier to parse tzdata

Ed Schouten ed at 80386.nl
Fri Oct 31 21:35:40 UTC 2014

2014-10-31 21:18 GMT+01:00 Paul Eggert <eggert at cs.ucla.edu>:
> On 10/31/2014 07:56 AM, Ed Schouten wrote:
>> there is a very small number of directives that would require quite a
>> lot of additional code to parse properly. For example, "lastSun" makes
>> a lot of sense as a special keyword in "Rule" directives, but it
>> provides no functional gain in "Zone" directives.
>> The same holds for the use of the "Dec 29 24:00" time used in Samoa's
>> timezone. We should be able to use "Dec 30" instead, right?
> We could, but if the original announcement said the equivalent of "Dec 29
> 24:00" it's helpful if the corresponding zone line matches the announcement.
> Similarly, if the Zone transition is supposed to be at the same time and
> date as a Rule transition, it simplifies maintenance a bit to use the same
> string for both.
> As I understand it, in zic the same code is used to parse dates regardless
> of whether they appear in Rule or Zone lines.  I assume the same thing could
> be done in a Python parser....

My idea was to just make it easier for the next person. Adding support
for it to my script is of course not infeasible. In fact, taking into
account that it's only used in a couple of place, it will even be
shorter to have a fixed map for these irregularly shaped entries:

last_sun_map = {('1980', 'Mar'): 30, ...}

You could argue that the dates in Zone and Rule entries are simply not
the same thing. Dates in Zone entries are absolute. They indicate an
end date of a ruleset. "lastSun" would need to be applied to the year
used in the statement itself. "lastSun" in Rule entries are not
applied to a year specifically. They are merely copied into the
compiled timezone. I'd say that requiring the same parsing logic may
be too demanding.

Though I agree that it would be nice to have the definitions matching
up with original announcements, in the end they will need to be
processed by machines. If we are afraid that people get confused
between "Dec 29 24:00" and "Dec 30", there is still the possibility to
add a comment to clarify.

If a feature is only used so rarely in the datasets that it's easier
to use a lookup table to translate them to the proper value than it is
to actually parse it properly, we might be sacrificing reusability.

Best regards,
Ed Schouten <ed at 80386.nl>

More information about the tz mailing list