[tz] "Standard byte order"
Guy Harris
guy at alum.mit.edu
Fri Nov 4 20:03:58 UTC 2011
On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:
> Perhaps the documentation should say something like:
>
> For the 16-bit value 0x3210, the bytes are 0x32 followed by 0x10.
> For the 32-bit value 0x76543210, the bytes are in the order
> 0x76 0x54 0x32 0x10 (is that the case?)
Yes, because the data are written in big-endian format, and that's how big-endian format works. No need to worry about whether anything is in PDP-endian order, as it's not.
The documentation should use the term "big-endian", for the benefit of those who know what it means, and perhaps give details, for the benefit of those who don't.
> For the 64-bit value 0xFEDCBA9876543210, ...
0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10
> For character data, the bytes are written leftmost character
> first, and sequentially as one would read them left to right in
> English,
"Leftmost" in what sense?
If all strings in a time zone data file are required to be ASCII, then the "logical order" and display order as per
http://unicode.org/reports/tr9/
are the same.
If not all strings in a time zone data file are required to be ASCII, then we should perhaps specify that they're UTF-8. In the case of non-ASCII strings, they could potentially contain, for example, Arabic/Hebrew/Farsi/etc. text, in which case the "leftmost" character on the screen would, as I understand it, *not* necessarily be the first character in the string.
I think the display order is Not Our Problem, especially if we specify that strings are UTF-8 (so that they're Unicode and the Unicode Bidirectional Algorithm applies), so "left" and "right" need not and should not be used.
We *should*, however, specify what encoding is used - ASCII, meaning "any byte with the 8th bit set is an error" and "the national-use positions of ISO 646 have the US characters", or "all strings are UTF-8", or something else.
> and terminated with a null byte.
More information about the tz
mailing list