Time Zone Localizations (encodings)
Oscar van Vlijmen
ovv at hetnet.nl
Sat Jun 12 15:24:18 UTC 2004
Paul Eggert wrote:
> * At the HTML level the document specifies charset=windows-1252 and
> the four HTML lines containing non-ASCII characters didn't come out
> right in my browser. Can you please use plain ASCII instead, e.g.,
> use "«" instead of the Windows-1252 byte that means
> left-pointing double angle quotation mark? Or it may be simpler to
> reformulate the examples to avoid non-ASCII characters.
Gave no problem in a couple of my browsers, for instance Macintosh Netscape
4, Internet Explorer 5.
The "offending" characters were encoded correctly, namely as literals
according to the character set windows-1252.
Another way to encode the "left-pointing double angle quotation mark" is
indeed by "«".
A serious problem with this type of "named entities" is that there are 2451
named entities defined in the ISO/IEC DTR 9573-13 2nd Ed. standard
(or a more recent version)
but only about one hundred of these are really supported by most browsers
since version type 4 (Netscape/Microsoft) on most platforms.
Much better supported are numbered entities like « (hexadecimal) or
« (decimal). Decimal entities are preferred due to a somewhat better
compatibility with older browsers and because you can copy them directly
from most DTD entity definition files - e.g. the *.ent files from w3.org.
If you have to use characters outside the defined character set - in this
case windows-1252 - then hexadecimal/decimal entities are obligatory. In
that case a utf-8 character set designation for the html document is
advised, but this means that all windows-1252 literals should be reencoded
to numbered entities according to Unicode positions.
In short: for multilingual html pages one should use a utf-8 character set
designation (per meta or http-header) and encode all special characters with
numbered entities according to Unicode positions.
Oscar van Vlijmen
Sorry, but I don't C-copy emails to discussion partners personally if an
email to the tz-list should be sufficient.
More information about the tz