[arabic-vip] Some (ignorant) questions about particular code points (was: Singapore)

Andrew Sullivan ajs at anvilwalrusden.com
Tue Jun 14 18:06:56 UTC 2011


Dear colleagues,

Speaking as someone who understands IDNA and DNS quite well, but who
does not speak Arabic, I have a couple question about the items below.

On Tue, Jun 14, 2011 at 10:18:17PM +0500, Dr. Sarmad Hussain wrote:
> 
> Thanks for your comments.  Yes, the optional combining marks refer to
> diacritics or aerab.

> From: Manal Ismail [mailto:manal at tra.gov.eg] 

> -          Technical 
> 
> o   Required  combining marks - extra Unicode normalization
> 
> o   Optional combining marks - 
>  
> o   Joining characters - ZWNJ
> 
>  
> 
> Manal: Thanks for the useful summary .. Does 'Optional combining marks' in
> bullet 2 above refer to 'Diacritics' ? 

IDNA2008 introduced the CONTEXT rules for permissibility in the
protocol.  It also relies on Unicode, and in particular a character is
not allowed if it isn't stable under Unicode Normalization Form K
(NFKC).  I understand that some combining marks are likely to be
eliminated by the NFKC stability rule, but I don't really have an idea
of how many cases are left after the stability rule is invoked.  It
would be very helpful to me to have such an idea.

Similarly, ZWNJ is ruled out except in the case of a CONTEXTJ rule.
What I have not understood are the linguistic effects of the rule (see
Appendix A section 1 of RFC 5892:
http://www.rfc-editor.org/rfc/rfc5892.txt).  If someone could clue me
in, that would be a big help.

Obviously, I'm not expecting a quick answer; I'm rather hoping to
highlight an issue that I can see for implementation, especially at
the top level.

Thanks and best regards,

Andrew

-- 
Andrew Sullivan
ajs at anvilwalrusden.com



More information about the arabic-vip mailing list