[arabic-vip] Some (ignorant) questions about particular code points (was: Singapore)

Andrew Sullivan ajs at anvilwalrusden.com
Wed Jun 15 16:17:24 UTC 2011


Dear Dr Hussain,

Thanks very much for your reply.  I think this highlights one of the
central issues facing us, so I want to make sure I have this right.

On Tue, Jun 14, 2011 at 11:48:54PM +0500, Dr. Sarmad Hussain wrote:

> The optional marks are those which may be chosen to be written by the users,
> but the base strings are considered same even without them.  Though not the
> same but closest example I can think for English is the additional dot on
> 'i' in "naïve" vs. "naive".  The two are considered equivalent by English
> speakers.  This is different from the normalization defined under the
> Unicode.  

If I understand you correctly, the issue here is _not_ the case of
precomposed versus composed characters.  In the naïve example, it's
not the issue of using LATIN SMALL LETTER I WITH DIAERESIS, U+00EF, vs
LATIN SMALL LETTER I U+0069 combined with DIAERESIS, U+00A8.  That's
the sort of issue that is handled (even if not to everyone's
satisfaction) with Unicode's NKFC. 

Instead, the issue is of characters or sets of characters that, taken
together by a competent speaker of the language, would be understood
to be the same.  This is not (for instance) a spelling variation such
as "color" vs "colour" -- although it is perhaps related -- but
instead a matter of marks that are entirely a matter of the user's
preference.  Some style sheets in English do indeed consider "naive"
and "naïve" to be simple matters of preference.

Is that right?  If it is, then something that would be very helpful
for me to understand is the scope of this problem (and again, I don't
expect this to be a quick answer in an email, but more likely one of
the key results of the Team's deliberations).  From a DNS point of
view, a U-label with an optional mark and a U-label without the
optional mark are just different labels.  So, (1) are there rules such
that one could generate all the permutations from a given label?  (2)
How many such permutations is it reasonable to expect?  (3) How
critical to a user's experience are the optional marks?  That is, if
they just never worked, would that be a disaster?

It is only in understanding these boundaries for policy that we have
any hope of success in defining what acceptable technical trade-offs
might be.

> I am not sure if I have been able to explain these well here. The
> documents/presentation I circulated earlier have some examples in these
> contexts. We will discuss them in further detail in Singapore.  

Thanks so much for those examples, and for your indulgence of my
questions.  I look forward to continuing the discussion during the
Singapore meetings.

Best regards,

Andrew

-- 
Andrew Sullivan
ajs at anvilwalrusden.com



More information about the arabic-vip mailing list