Time Zone Localizations (encodings)

Oscar van Vlijmen ovv at hetnet.nl
Sat Jun 12 15:24:18 UTC 2004


Paul Eggert wrote:

> * At the HTML level the document specifies charset=windows-1252 and
> the four HTML lines containing non-ASCII characters didn't come out
> right in my browser.  Can you please use plain ASCII instead, e.g.,
> use "«" instead of the Windows-1252 byte that means
> left-pointing double angle quotation mark?  Or it may be simpler to
> reformulate the examples to avoid non-ASCII characters.

Gave no problem in a couple of my browsers, for instance Macintosh Netscape
4, Internet Explorer 5.
The "offending" characters were encoded correctly, namely as literals
according to the character set windows-1252.

Another way to encode the "left-pointing double angle quotation mark" is
indeed by "«".
A serious problem with this type of "named entities" is that there are 2451
named entities defined in the ISO/IEC DTR 9573-13 2nd Ed. standard
<http://www.w3.org/2003/entities/iso9573-2003doc/9573.html>
(or a more recent version)
but only about one hundred of these are really supported by most browsers
since version type 4 (Netscape/Microsoft) on most platforms.

Much better supported are numbered entities like &#xab; (hexadecimal) or
&#171; (decimal). Decimal entities are preferred due to a somewhat better
compatibility with older browsers and because you can copy them directly
from most DTD entity definition files - e.g. the *.ent files from w3.org.

If you have to use characters outside the defined character set - in this
case windows-1252 - then hexadecimal/decimal entities are obligatory. In
that case a utf-8 character set designation for the html document is
advised, but this means that all windows-1252 literals should be reencoded
to numbered entities according to Unicode positions.

In short: for multilingual html pages one should use a utf-8 character set
designation (per meta or http-header) and encode all special characters with
numbered entities according to Unicode positions.


Oscar van Vlijmen
2004-06-12

Sorry, but I don't C-copy emails to discussion partners personally if an
email to the tz-list should be sufficient.



More information about the tz mailing list