Non-ASCII encoding
Andy McDonald
andy_tz at stemhaus.com
Tue Jul 1 16:56:11 UTC 2008
Jonathan Leffler wrote:
>
> You've mis-characterized the problem. UTF-8 doesn't have the quirk --
> MS operating systems have the quirk. See:
> http://unicode.org/faq/utf_bom.html#BOM
>
> We can note one of the parting comments in the FAQ:
>
> A particular protocol (e.g. Microsoft conventions for .txt files) may
> require use of the BOM on certain Unicode data streams, such as files.
> When you need to conform to such a protocol, use a BOM.
>
> We can also note that none of the TZ data files are .txt files (because
> they do not have the extension .txt in the file name) - and therefore do
> not need the BOM. Or a tool can be provided that stuffs a UTF-8 BOM
> (bytes 0xEF 0xBB 0xBF in that sequence) at the start of the file,
> transferring it to the MS format.
I'm in complete agreement that UTF-8 without BOM is the 'correct' solution.
It's worth pointing out that MS Notepad correctly detects and renders
UTF-8/no BOM as UTF-8; there's just no way to stop it from writing a BOM
when a file is saved. Thus the only people likely to be affected by
UTF-8/no BOM are those who download tz files, open and save them in
Notepad, then pass these files to a BOM-unaware parser. It all seems
fairly unlikely, and really is down to the user's choice of 'faulty' tools.
Andy
--
FoxClocks
More information about the tz
mailing list