[arabic-vip] Some (ignorant) questions about particular code points (was: Singapore)

Dr. Sarmad Hussain sarmad at cantab.net
Tue Jun 14 18:48:54 UTC 2011


Dear Andrew,

The optional marks are those which may be chosen to be written by the users,
but the base strings are considered same even without them.  Though not the
same but closest example I can think for English is the additional dot on
'i' in "naïve" vs. "naive".  The two are considered equivalent by English
speakers.  This is different from the normalization defined under the
Unicode.  

ZWNJ is normally allowed between characters which would change from a
joining to a non-joining shape.  However, there are some characters which do
have these two shapes, but these are very similar to each other, causing the
visual confusion.  

I am not sure if I have been able to explain these well here. The
documents/presentation I circulated earlier have some examples in these
contexts. We will discuss them in further detail in Singapore.  

Please feel free to share any follow up thoughts.

Regards,
Sarmad


>  -----Original Message-----
>  From: arabic-vip-bounces at icann.org [mailto:arabic-vip-
>  bounces at icann.org] On Behalf Of Andrew Sullivan
>  Sent: Tuesday, June 14, 2011 11:07 PM
>  To: arabic-vip at icann.org
>  Subject: [arabic-vip] Some (ignorant) questions about particular code
>  points (was: Singapore)
>  
>  Dear colleagues,
>  
>  Speaking as someone who understands IDNA and DNS quite well, but who
>  does not speak Arabic, I have a couple question about the items below.
>  
>  On Tue, Jun 14, 2011 at 10:18:17PM +0500, Dr. Sarmad Hussain wrote:
>  >
>  > Thanks for your comments.  Yes, the optional combining marks refer
>  to
>  > diacritics or aerab.
>  
>  > From: Manal Ismail [mailto:manal at tra.gov.eg]
>  
>  > -          Technical
>  >
>  > o   Required  combining marks - extra Unicode normalization
>  >
>  > o   Optional combining marks -
>  >
>  > o   Joining characters - ZWNJ
>  >
>  >
>  >
>  > Manal: Thanks for the useful summary .. Does 'Optional combining
>  marks' in
>  > bullet 2 above refer to 'Diacritics' ?
>  
>  IDNA2008 introduced the CONTEXT rules for permissibility in the
>  protocol.  It also relies on Unicode, and in particular a character is
>  not allowed if it isn't stable under Unicode Normalization Form K
>  (NFKC).  I understand that some combining marks are likely to be
>  eliminated by the NFKC stability rule, but I don't really have an idea
>  of how many cases are left after the stability rule is invoked.  It
>  would be very helpful to me to have such an idea.
>  
>  Similarly, ZWNJ is ruled out except in the case of a CONTEXTJ rule.
>  What I have not understood are the linguistic effects of the rule (see
>  Appendix A section 1 of RFC 5892:
>  http://www.rfc-editor.org/rfc/rfc5892.txt).  If someone could clue me
>  in, that would be a big help.
>  
>  Obviously, I'm not expecting a quick answer; I'm rather hoping to
>  highlight an issue that I can see for implementation, especially at
>  the top level.
>  
>  Thanks and best regards,
>  
>  Andrew
>  
>  --
>  Andrew Sullivan
>  ajs at anvilwalrusden.com
>  No virus found in this incoming message.
>  Checked by AVG - www.avg.com
>  Version: 9.0.901 / Virus Database: 271.1.1/3703 - Release Date:
>  06/14/11 11:34:00





More information about the arabic-vip mailing list