[arabic-vip] normalization text
Andrew Sullivan
ajs at anvilwalrusden.com
Tue Oct 4 13:41:23 UTC 2011
Some of these cases are addressed automatically through the IDNA 2008
protocol specifications, which require the characters to be normalized
(in Normalized Form C) and treats them using Unicode's definition of
Canonical Equivalece. However, it appears that in some cases
precomposed characters do not have Canonical Equivalence defined to a
series of code points that produces the same apparent (abstract)
character. In cases where such Canonical Equivalence has not been
defined by Unicode (for various reasons), such cases would need to be
explicitly managed as variants. See the status listed in the final
column of Appendix A.2 for some examples of this.
In addition, near the end of section 5, there's this:
A general rule may be extracted that combining marks are not
allowed for TLDs. However, before finalizing such a policy,
consequences for African and other languages, which use these
marks (especially U+065A - U+065F), should also be considered.
I'd adjust it to this:
A general rule may be extracted that combining marks should not be
allowed for TLDs. However, before finalizing such a policy,
consequences for African and other languages, which use these
marks (especially U+065A - U+065F), should also be considered. In
addition, the team did not have time to perform an exhaustive
analysis of Unicode's Canonical Equivalence for any effect that
depends on any of these characters. If a character in NFC form
ends up depending on one of the combining marks, then it might be
that the code point needs to be permitted in some circumstances.
Does this help?
A
--
Andrew Sullivan
ajs at anvilwalrusden.com
More information about the arabic-vip
mailing list