Non-ASCII encoding (was: Re: proposed time zone package changes: Brazil; Mauritius; URL fixes)
martin.jerabek at isis-papyrus.com
Tue Jul 1 07:30:13 UTC 2008
On 01.07.2008 01:26, Rodrigo Severo wrote:
> The text by Paul Schulze about Brazilian timezones is missing all
> accented characters. Here is the text with the proper characters:
I would like to use the opportunity to clarify the question of the
encoding of non-ASCII characters in the tzdata files. This is only a
minor point because they only occur in the comments but I think it
should at least be defined.
In tzdata2008c there seems to be only one non-ASCII character, the
accented e in the name José Miguel Garrido in the file southamerica. It
is obviously encoded in ISO 8859-1 (Latin1).
If more non-ASCII characters are going to be included in the tzdata
files, I would like to propose to define UTF-8 as the official encoding
of the tzdata files. UTF-8 is widely supported and is a true superset of
7-bit ASCII, so it does not change the encoding of the actual data. I
think it is only a question of time until the name of a contributor, a
location, or an official publication cannot be properly represented in
any single 8-bit encoding. For example, the letter "r" in my surname
should really be "ř", "Latin Small Letter R With Caron" (U+0159) which
is not part of ISO 8859-1.
More information about the tz