[tz] "Standard byte order"

Fri Nov 4 20:03:58 UTC 2011

On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:

> Perhaps the documentation should say something like:
> 
> For the 16-bit value 0x3210, the bytes are 0x32 followed by 0x10.
> For the 32-bit value 0x76543210, the bytes are in the order
>   0x76 0x54 0x32 0x10    (is that the case?)

Yes, because the data are written in big-endian format, and that's how big-endian format works.  No need to worry about whether anything is in PDP-endian order, as it's not.

The documentation should use the term "big-endian", for the benefit of those who know what it means, and perhaps give details, for the benefit of those who don't.

> For the 64-bit value 0xFEDCBA9876543210, ...

	0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10

> For character data, the bytes are written leftmost character 
> first, and sequentially as one would read them left to right in 
> English,

"Leftmost" in what sense?

If all strings in a time zone data file are required to be ASCII, then the "logical order" and display order as per

	http://unicode.org/reports/tr9/

are the same.

If not all strings in a time zone data file are required to be ASCII, then we should perhaps specify that they're UTF-8.  In the case of non-ASCII strings, they could potentially contain, for example, Arabic/Hebrew/Farsi/etc. text, in which case the "leftmost" character on the screen would, as I understand it, *not* necessarily be the first character in the string.

I think the display order is Not Our Problem, especially if we specify that strings are UTF-8 (so that they're Unicode and the Unicode Bidirectional Algorithm applies), so "left" and "right" need not and should not be used.

We *should*, however, specify what encoding is used - ASCII, meaning "any byte with the 8th bit set is an error" and "the national-use positions of ISO 646 have the US characters", or "all strings are UTF-8", or something else.

> and terminated with a null byte.