Html-ize the tz database?
csells at sellsbrothers.com
Wed Feb 28 22:23:31 UTC 2001
Syed, When I produce the parser that outputs XML, I'll let you know and you
can let me know if it helps you produce the data you're looking for. The
current data format, while cryptic, seems to be parseable by zic, so I'm
leveraging that code to build my own program to output XML.
> -----Original Message-----
> From: Syed Sajjath [mailto:Syed.Sajjath at wcom.com]
> Sent: Wednesday, February 28, 2001 12:57 PM
> To: 'Gwillim Law'; 'Chris Sells'; tz at elsie.nci.nih.gov
> Subject: RE: Html-ize the tz database?
> Most of the work is parsing the tzdata file in its current form. Once you
> do this, you can present the data in any form you like.
> I had sent this earlier to a limited group working on parsing tzdata, but
> received no feedback. Following mail describes the difficulty I faced
> parsing timezone data. Please go thru and let me know if the suggestions
> are viable.
> I have two scripts, one to extract the rules, the second one to extract a
> list of all timezones and timezones under each country.
> Sample of each file is attached at the end.
> The problems I had were for,
> 1) Identifying the country name.
> 2) Identifying the end of a Zone ( I assume either start of
> next zone or a
> blank line or a comment line denotes end of current zone).
> 3) Getting the timezone long name. It is easy to get the one's listed at
> the top (or bottom) of the file, but difficult if it is written
> as a note on
> the zone line itself. All the timezone long names are also not
> 4) In some cases like China the zone line says
> Zone Asia/Shanghai 8:05:52 - LMT 1928
> 8:00 Shang C%sT 1949
> 8:00 PRC C%sT
> but the rule for PRC is only until 1991. My script interpreted this as
> Shangai observing DST with corresponding rule missing.
> My suggestions to overcome these are
> 1) Adding a tag #<ctryname> before the country name (or #<ctry> ...
> #<EndCtry> at the beginning and end).
> 2) A #<Zone> at beginning and #<EndZone> tag at the end of each zone. It
> can be done for Rule and Link also to make it consistent.
> 3) List all timezone abbrevaitions and names in a separate file (since
> timezones like EET are used across files), and consistently use the same
> names (need to figure out a way to handle duplicate names,
> possibly by using
> country name in conjunction, wherever relevant).
> 4) If any zone has a name like C%sT at the last line of its defenition,
> implying it observes daylight savings, then the correspoding rule
> line must
> have two entries each until 'max' (one for start and one for end). If not
> split the lines as suggested below
> 8:00 PRC C%sT 1991
> 8:00 - CST
> All the changes suggested above does not affect tzcode, as we are adding
> comment lines only.
> Sample files generated by parser,
> -----Original Message-----
> From: Gwillim Law [mailto:gwil at mindspring.com]
> Sent: Wednesday, February 28, 2001 9:57 AM
> To: Chris Sells; tz at elsie.nci.nih.gov
> Subject: Re: Html-ize the tz database?
> > Gwillim, I'm curious how you produced these HTML files?
> Manually. I started by taking a copy of tzdata2000h and editing it with a
> text editor. Several reasons for this: I wanted to get a feel for the
> data; to experiment with different ways of organizing them; to capture
> whatever useful information I could find in the comments; and to spot any
> inconsistencies or holes in the data. There are plenty of applications
> where it makes sense to automate the process, but I think it
> should be done
> manually at least once.
> Yours, Gwillim Law
More information about the tz