[tz] "Standard byte order"
dave at davecantor.us
Fri Nov 4 20:32:24 UTC 2011
On 04-Nov-2011, Guy Harris wrote:
> On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:
> > Perhaps the documentation should say something like:
> > For the 16-bit value 0x3210, the bytes are 0x32 followed by
> > 0x10. For the 32-bit value 0x76543210, the bytes are in the
> > order
> > 0x76 0x54 0x32 0x10 (is that the case?)
> Yes, because the data are written in big-endian format, and that's
> how big-endian format works. No need to worry about whether
> anything is in PDP-endian order, as it's not.
Yes, I was just amplifying on what that format looks like
_sometimes_. I agree that it doesn't come into play here.
> The documentation should use the term "big-endian", for the
> benefit of those who know what it means, and perhaps give details,
> for the benefit of those who don't.
Yes, that's a good idea.
> > For the 64-bit value 0xFEDCBA9876543210, ...
> 0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10
> > For character data, the bytes are written leftmost character
> > first, and sequentially as one would read them left to right in
> > English,
> "Leftmost" in what sense?
Leftmost in the sense of reading them (or writing them) in
English; of course, I agree that if they were Hebrew characters,
you'd expect the first character to be the rightmost and proceed
leftward as they would be written. Do we have character strings
that are not in English? Don't we use the English names for
place names (zone names)?
> If all strings in a time zone data file are required to be ASCII,
> then the "logical order" and display order as per
> are the same.
> If not all strings in a time zone data file are required to be
> ASCII, then we should perhaps specify that they're UTF-8. In the
> case of non-ASCII strings, they could potentially contain, for
> example, Arabic/Hebrew/Farsi/etc. text, in which case the
> "leftmost" character on the screen would, as I understand it,
> *not* necessarily be the first character in the string.
> I think the display order is Not Our Problem, especially if we
> specify that strings are UTF-8 (so that they're Unicode and the
> Unicode Bidirectional Algorithm applies), so "left" and "right"
> need not and should not be used.
> We *should*, however, specify what encoding is used - ASCII,
> meaning "any byte with the 8th bit set is an error" and "the
> national-use positions of ISO 646 have the US characters", or "all
> strings are UTF-8", or something else.
I agree with all that.
> > and terminated with a null byte.
More information about the tz