[tz] TZ file comments UTF-8? Bastardized HTML? (was Re: Busingen revisited)

Paul Eggert eggert at cs.ucla.edu
Fri Jan 11 15:46:06 UTC 2013


On 01/11/2013 06:51 AM, Deborah Goldsmith wrote:

> Can anyone name a system in use today that is not capable of dealing with it?

It depends on what one means by "capable".  When I type
the command "emacs southamerica" here's some text
that I see on my remote-shell terminal window:

# A partir de entonces, San Luis establecer\u00E1 el huso horario propio de

That is, Emacs correctly infers that the file uses Latin-1,
but because I prefer the LC_ALL='C' locale it displays all
non-ASCII characters by using hexadecimal escape sequences.
It would do the same if the file used UTF-8.
I could work around the problem by using, say, the
the LC_ALL='en_US.utf8' locale, but that has some undesirable
side effects (it mishandles character ranges, and it's
noticeably slower for some other things I do), and I'd rather not.

The main systems I use these days are Ubuntu 12.10, Fedora 17, and
RHEL 6.3; these are all the latest stable versions, and they
all work this way.  I wouldn't be surprised if the latest OS X
release worked this way too.

For this particular case, the fix is simple: translate the text
into English (it's an English-language database, after all).
Names can be a bit trickier, but again, things are simpler
(at least for this maintainer) if the commentary is in ASCII.


More information about the tz mailing list