[arabic-vip] Some (ignorant) questions about particular code points (was: Singapore)

Dr. Sarmad Hussain sarmad at cantab.net
Wed Jun 15 03:56:01 UTC 2011


My English example is perhaps not accurate in depicting what happens in
Arabic script.  

More realistically, optional marks indicate the vowels or duplicated
consonants (and are normally not written in practice), i.e. "conveniently"
would normally be written as "cnvnntly" in the Arabic script.  The missing
vowels (indicated by marks) may or may not be written and the two versions
of writing are considered the same by the users.  Thus, my comment that this
is not the typical Unicode normalization issue.  

Regards,
Sarmad

>  -----Original Message-----
>  From: Dr. Sarmad Hussain [mailto:sarmad at cantab.net]
>  Sent: Tuesday, June 14, 2011 11:49 PM
>  To: 'Andrew Sullivan'; 'arabic-vip at icann.org'
>  Subject: RE: [arabic-vip] Some (ignorant) questions about particular
>  code points (was: Singapore)
>  
>  Dear Andrew,
>  
>  The optional marks are those which may be chosen to be written by the
>  users, but the base strings are considered same even without them.
>  Though not the same but closest example I can think for English is the
>  additional dot on 'i' in "naïve" vs. "naive".  The two are considered
>  equivalent by English speakers.  This is different from the
>  normalization defined under the Unicode.
>  
>  ZWNJ is normally allowed between characters which would change from a
>  joining to a non-joining shape.  However, there are some characters
>  which do have these two shapes, but these are very similar to each
>  other, causing the visual confusion.
>  
>  I am not sure if I have been able to explain these well here. The
>  documents/presentation I circulated earlier have some examples in
>  these contexts. We will discuss them in further detail in Singapore.
>  
>  Please feel free to share any follow up thoughts.
>  
>  Regards,
>  Sarmad
>  
>  
>  >  -----Original Message-----
>  >  From: arabic-vip-bounces at icann.org [mailto:arabic-vip-
>  >  bounces at icann.org] On Behalf Of Andrew Sullivan
>  >  Sent: Tuesday, June 14, 2011 11:07 PM
>  >  To: arabic-vip at icann.org
>  >  Subject: [arabic-vip] Some (ignorant) questions about particular
>  code
>  >  points (was: Singapore)
>  >
>  >  Dear colleagues,
>  >
>  >  Speaking as someone who understands IDNA and DNS quite well, but
>  who
>  >  does not speak Arabic, I have a couple question about the items
>  below.
>  >
>  >  On Tue, Jun 14, 2011 at 10:18:17PM +0500, Dr. Sarmad Hussain wrote:
>  >  >
>  >  > Thanks for your comments.  Yes, the optional combining marks
>  refer
>  >  to
>  >  > diacritics or aerab.
>  >
>  >  > From: Manal Ismail [mailto:manal at tra.gov.eg]
>  >
>  >  > -          Technical
>  >  >
>  >  > o   Required  combining marks - extra Unicode normalization
>  >  >
>  >  > o   Optional combining marks -
>  >  >
>  >  > o   Joining characters - ZWNJ
>  >  >
>  >  >
>  >  >
>  >  > Manal: Thanks for the useful summary .. Does 'Optional combining
>  >  marks' in
>  >  > bullet 2 above refer to 'Diacritics' ?
>  >
>  >  IDNA2008 introduced the CONTEXT rules for permissibility in the
>  >  protocol.  It also relies on Unicode, and in particular a character
>  is
>  >  not allowed if it isn't stable under Unicode Normalization Form K
>  >  (NFKC).  I understand that some combining marks are likely to be
>  >  eliminated by the NFKC stability rule, but I don't really have an
>  idea
>  >  of how many cases are left after the stability rule is invoked.  It
>  >  would be very helpful to me to have such an idea.
>  >
>  >  Similarly, ZWNJ is ruled out except in the case of a CONTEXTJ rule.
>  >  What I have not understood are the linguistic effects of the rule
>  (see
>  >  Appendix A section 1 of RFC 5892:
>  >  http://www.rfc-editor.org/rfc/rfc5892.txt).  If someone could clue
>  me
>  >  in, that would be a big help.
>  >
>  >  Obviously, I'm not expecting a quick answer; I'm rather hoping to
>  >  highlight an issue that I can see for implementation, especially at
>  >  the top level.
>  >
>  >  Thanks and best regards,
>  >
>  >  Andrew
>  >
>  >  --
>  >  Andrew Sullivan
>  >  ajs at anvilwalrusden.com
>  >  No virus found in this incoming message.
>  >  Checked by AVG - www.avg.com
>  >  Version: 9.0.901 / Virus Database: 271.1.1/3703 - Release Date:
>  >  06/14/11 11:34:00





More information about the arabic-vip mailing list