Html-ize the tz database?

Wed Feb 28 20:56:47 UTC 2001

Gwillim/Chris/All,

Most of the work is parsing the tzdata file in its current form.  Once you
do this, you can present the data in any form you like.

I had sent this earlier to a limited group working on parsing tzdata, but
received no feedback.  Following mail describes the difficulty I faced
parsing timezone data.  Please go thru and let me know if the suggestions
are viable.

I have two scripts, one to extract the rules, the second one to extract a
list of all timezones and timezones under each country.

Sample of each file is attached at the end.

The problems I had were for,

1)	Identifying the country name.
2)	Identifying the end of a Zone ( I assume either start of next zone or a
blank line or a comment line denotes end of current zone).
3) Getting the timezone long name.  It is easy to get the one's listed at
the top (or bottom) of the file, but difficult if it is written as a note on
the zone line itself.   All the timezone long names are also not documented.
4)	In some cases like China the zone line says
Zone Asia/Shanghai 8:05:52 - LMT 1928
   8:00 Shang C%sT 1949
   8:00 PRC C%sT
but the rule for PRC is only until 1991.  My script interpreted this as
Shangai observing DST with corresponding rule missing.

My suggestions to overcome these are
----------------------------------------------------
1) Adding a tag #<ctryname> before the country name (or #<ctry> ...
#<EndCtry> at the beginning and end).
2) A #<Zone> at beginning and #<EndZone> tag at the end of each zone.  It
can be done for Rule and Link also to make it consistent.
3) List all timezone abbrevaitions and names in a separate file (since
timezones like EET are used across files), and consistently use the same
names (need to figure out a way to handle duplicate names, possibly by using
country name in conjunction, wherever relevant).
4) If any zone has a name like C%sT at the last line of its defenition,
implying it observes daylight savings, then the correspoding rule line must
have two entries each until 'max' (one for start and one for end).  If not
split the lines as suggested below
8:00 PRC C%sT 1991
8:00 - CST

All the changes suggested above does not affect tzcode, as we are adding
comment lines only.

Sample files generated by parser,

Thanks
-Syed

-----Original Message-----
From:	Gwillim Law [mailto:gwil at mindspring.com]
Sent:	Wednesday, February 28, 2001 9:57 AM
To:	Chris Sells; tz at elsie.nci.nih.gov
Subject:	Re: Html-ize the tz database?

> Gwillim, I'm curious how you produced these HTML files?

Manually.  I started by taking a copy of tzdata2000h and editing it with a
text editor.  Several reasons for this:  I wanted to get a feel for the
data; to experiment with different ways of organizing them; to capture
whatever useful information I could find in the comments; and to spot any
inconsistencies or holes in the data.  There are plenty of applications
where it makes sense to automate the process, but I think it should be done
manually at least once.

Yours,    Gwillim Law

-------------- next part --------------
A non-text attachment was scrubbed...
Name: CountryZone.xls
Type: application/vnd.ms-excel
Size: 16384 bytes
Desc: not available
Url : http://mm.icann.org/pipermail/tz/attachments/20010228/b890a58c/CountryZone-0001.xls 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ZoneList.xls
Type: application/vnd.ms-excel
Size: 18432 bytes
Desc: not available
Url : http://mm.icann.org/pipermail/tz/attachments/20010228/b890a58c/ZoneList-0001.xls 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rule.xls
Type: application/vnd.ms-excel
Size: 15872 bytes
Desc: not available
Url : http://mm.icann.org/pipermail/tz/attachments/20010228/b890a58c/Rule-0001.xls