Non-ASCII encoding

Tue Jul 1 16:56:11 UTC 2008

Jonathan Leffler wrote:
> 
> You've mis-characterized the problem.  UTF-8 doesn't have the quirk -- 
> MS operating systems have the quirk.  See: 
> http://unicode.org/faq/utf_bom.html#BOM
> 
> We can note one of the parting comments in the FAQ:
> 
> A particular protocol (e.g. Microsoft conventions for .txt files) may 
> require use of the BOM on certain Unicode data streams, such as files. 
> When you need to conform to such a protocol, use a BOM.
> 
> We can also note that none of the TZ data files are .txt files (because 
> they do not have the extension .txt in the file name) - and therefore do 
> not need the BOM.  Or a tool can be provided that stuffs a UTF-8 BOM 
> (bytes 0xEF 0xBB 0xBF in that sequence) at the start of the file, 
> transferring it to the MS format.

I'm in complete agreement that UTF-8 without BOM is the 'correct' solution.

It's worth pointing out that MS Notepad correctly detects and renders 
UTF-8/no BOM as UTF-8; there's just no way to stop it from writing a BOM 
when a file is saved. Thus the only people likely to be affected by 
UTF-8/no BOM are those who download tz files, open and save them in 
Notepad, then pass these files to a BOM-unaware parser. It all seems 
fairly unlikely, and really is down to the user's choice of 'faulty' tools.

Andy

--
FoxClocks