[arabic-vip] normalization text

Andrew Sullivan ajs at anvilwalrusden.com
Tue Oct 4 13:41:23 UTC 2011


Some of these cases are addressed automatically through the IDNA 2008
protocol specifications, which require the characters to be normalized
(in Normalized Form C) and treats them using Unicode's definition of
Canonical Equivalece.  However, it appears that in some cases
precomposed characters do not have Canonical Equivalence defined to a
series of code points that produces the same apparent (abstract)
character.  In cases where such Canonical Equivalence has not been
defined by Unicode (for various reasons), such cases would need to be
explicitly managed as variants.  See the status listed in the final
column of Appendix A.2 for some examples of this.


In addition, near the end of section 5, there's this:

    A general rule may be extracted that combining marks are not
    allowed for TLDs.  However, before finalizing such a policy,
    consequences for African and other languages, which use these
    marks (especially U+065A - U+065F), should also be considered.

I'd adjust it to this:

    A general rule may be extracted that combining marks should not be
    allowed for TLDs.  However, before finalizing such a policy,
    consequences for African and other languages, which use these
    marks (especially U+065A - U+065F), should also be considered.  In
    addition, the team did not have time to perform an exhaustive
    analysis of Unicode's Canonical Equivalence for any effect that
    depends on any of these characters.  If a character in NFC form
    ends up depending on one of the combining marks, then it might be
    that the code point needs to be permitted in some circumstances.

Does this help?

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com



More information about the arabic-vip mailing list