[tz] TZ file comments UTF-8? Bastardized HTML? (was Re: Busingen revisited)

Ian Abbott abbotti at mev.co.uk
Fri Jan 11 10:05:36 UTC 2013

On 2013/01/10 05:58 PM, Paul Eggert wrote:
> On 01/10/13 05:04, Ian Abbott wrote:
>> I'd prefer the comments to be in UTF-8 without the HTML entities and
>> HTML tags, but the non-comment parts of the files to be restricted
>> to plain-old ASCII.  The current HTML mark-up tags seem to have been
>> added around December 1997 or earlier, although there have been URLs
>> in the files since 1996 or earlier.  The TZ files pre-date HTML by
>> several years and pre-date UTF-8 by several more years.
> The HTML markup has bothered me, too; I have found it more
> distracting than useful.  URLs themselves should be fine,
> but the <a href='...'> business gets in the way.

The HTML entities are also rather unreadable.  If the HTML markup is
removed, and the HTML entities can't be replaced with UTF-8 sequences,
perhaps the non-ASCII characters could be replaced with TeX markup
sequences e.g. 'B\"usingen' rather than 'Büingen'.

>> I'm not sure how widespread the adoption of UTF-8 text files is in
>> the big, wide world, but I don't suppose we should care as long as
>> the zic compilers don't break and the systems that zic is run on
>> support 8-bit text files.
> There still is the problem that people who are editing
> the files with their own text editors may be hampered.
> In my normal way of editing text across the network (ssh and
> LC_ALL=C and emacs -nw), non-ASCII characters are rendered as ugly
> hexadecimalish strings that are hard to read.  I can work
> around the problem but it is an annoyance.

Even worse for non 8-bit-clean editors such as the original 'vi', I
suppose.  Thanks, I hadn't considered that!

