Non-ASCII encoding

Julian Cable julian.cable at bbc.co.uk
Tue Jul 1 08:53:49 UTC 2008


-On [20080701 09:30], Martin Jerabek (martin.jerabek at isis-papyrus.com) wrote:
>If more non-ASCII characters are going to be included in the tzdata 
>files, I would like to propose to define UTF-8 as the official encoding 
>of the tzdata files.

 I'm a bit ambivalent on this one.

In principle, I agree. In practice UTF-8 has at least one little quirk which has caused me problems:

Microsoft operating systems always start UTF-8 encoded files with a Byte Order Mark (BOM) (http://en.wikipedia.org/wiki/Byte_Order_Mark)

*nix-like operating systems never do (at least in my experience) and at least one perl-based xml parser
running on Linux chokes on the BOM.

So I have a practical preference for the 7-bit subset of UTF-8 with no BOM (of course I would never
dream of calling this ASCII ;)

If we go for UTF-8 can we be very firm about whether a BOM is required or prohibited and please make sure its
not permitted.

Julian Cable
BBC World Service

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					




More information about the tz mailing list