[arabic-vip] ZWNJ Possible Risks/Issues

Siavash Shahshahani shahshah at irnic.ir
Mon Sep 26 19:48:38 UTC 2011


Some initial reactions to Andrew's message which I found very thoughtful
and sensible.
1. Agreed that not all TLD labels need be words in the strict sense (and
this is one of the reasons I was pushing against the Guidebook item that
asks for the TLD language), and most of the present ones aren't, but as
Andrew points out, the tendency to use abbreviations is more prevalent in
English-speaking world (especially America?), and we may see a larger
proportion of full-word IDN candidates. Given the uncertainties and
disagreements, an interim solution for this first round of gTLD
applications along one of the lines he suggests may be in order: No ZWNJ
unless the applicant can demonstrate that it is risk-free. To implement
this solution, ICANN should place the string evaluation for ZWNJ
risk-freeness as the first stage of string evaluation with a commitment to
return most of the $185000 fee if the string is rejected on this ground.
2. There seem to be two distinct reservations about ZWNJ, one of which is
perfectly sensible, and the other not so, IMHO. Cases like the three
characters noted earlier and the recent ones discovered by Raed are real
threats and should be considered serious risks. On the other hand, I don't
understand why confusing ZWNJ with empty space should be considered a
threat. No one will be led to reaching a wrong address by using empty
space. The fact that some people may be bewildered by it is no reason to
reject it. I may be bewildered by some Sindhi characters and Arab speakers
may be bewildered by consonants < پ،ژ،چ،گ > used in Persian, but this is no
reason to ban these characters. 
3. Back to the risky ones, what is unsettling is the worry that people may
discover a new case, like the one Raed did, every now and then. It seems to
me that doing an exhaustive search on the full Arabic script table is not
as impractial as it may sound, and Alireza has indicated to me a method
that seems feasible.
Even if we have to postpone this full search to a later round, for an IDN
gTLD using a relatively small subset of the Arabic Script Table, this could
be quite feasible in their formulation of policy restriction for ZWNJ at
second-level. 
4. One final comment about Alireza's suggestion to leave the final
judgment to ICANN. This is OK only if clear and well-defined policy by the
user community in conjunction with ICANN technical team is agreed on
beforehand. I hate to see us embroiled in the kind of controversy
Bulgarians and Greeks have had to deal with.
Regards,
Siavash

On Mon, 26 Sep 2011 11:22:33 -0400, Andrew Sullivan
<ajs at anvilwalrusden.com> wrote:
> Dear colleagues,
> 
> I've given quite a lot of thought to ZWNJ since our meeting.  My
> previous remarks in this thread have been an attempt tp urge the team
> to a clear statement on this matter, particularly now that the new
> final Guidelines document is published.  I'll review the new proposed
> text after I send this note.  But first, I want to offer some expanded
> remarks that reflect my personal views, developed after our meeting.
> You may take this as advice from me in my capacity as subject matter
> expert.
> 
> This message is a little long, but I'm trying to be clear.  I'm not
> picking on Alireza; but his message exposed perfectly two of the lines
> of thinking I've been pursuing, so I'm going to use his message as a
> "hook" to hang this on.
> 
> On Mon, Sep 26, 2011 at 05:45:03PM +0330, Alireza Saleh wrote:
> 
>> However there are also many other examples that show the necessity
>> of using this character such as those sent earlier by others
> 
> The problem with these examples, as I've noted before, is that they
> start from the assumption, "Here is a word that is common in some
> language; therefore, it needs to be permitted as a TLD label."  That
> premise has not yet, as near as I can see, been supported by much of
> an argument.  Moreover, it is not a premise that I (or, I suspect, any
> other DNS protocol expert) will grant.
> 
> To begin with, a large number of the root labels are not words.  None
> of the country code TLDs are words -- or, at least, when they are
> words, they are not words that mean "that country".  Of the
> non-ccTLDs, the following are also not words:
> 
>     aero
>     arpa
>     biz
>     cat
>     com
>     coop
>     edu
>     gov
>     info
>     int
>     mil
>     mobi
>     org
>     pro
>     tel
>     xxx
> 
> Some of those are common abbreviations, like info and pro.  The rest
> might be read as abbreviations.  They are certainly _meaningful_: they
> are intended to convey some sense of the purpose of the domain.  But
> that is not the same as being words.  And one can argue pretty
> strongly that at least one of them -- coop -- is misspelled, since to
> communicate what that domain is intended to mean, one needs a hyphen
> (co-op).  "Coop" means the place where you keep chickens (a chicken
> coop).
> 
> Now, there is a problem that confronts us in some contexts: whereas in
> English we have a tradition of abbreviation, some languages don't have
> that tradition.  It is therefore awkward to make analogies between
> English and, say, Hindi.  But we are altering the policy for
> registration in the root zone, and to minimize the negative effects of
> such a change one needs to do the best one can with analogies.  For
> practical purposes, this means (I think) three things:
> 
>     1.  Short, potentially meaningful labels are to be preferred.
> 
>     2.  Labels do not need actually to be words.
> 
>     3.  Non-words can be close to meaningful without really being
>     meaningful on their own.
> 
> (1) comes by analogy from the bulk of existing labels (travel and
> museum are outliers); (2) is just entailed by the fact that many
> labels aren't; and (3) comes from the analysis of how (2) plays out in
> fact.
> 
>>From this, I think it follows that the test is not merely whether many
> words in some language use ZWNJ; nor even whether some of those words
> are short.  Instead, the test should be, at least at the beginning,
> whether a restriction of ZWNJ is so restrictive as to make it very
> difficult to register useful mnemonics for some language community.
> So far, I have not seen such an argument.
> 
> Note that a restriction on ZWNJ (and ZWJ, for that matter) need not be
> the outright ban currently in the Guidelines.  For instance, one could
> have a restriction that said, "Not allowed, unless you come up with a
> very strong argument for why nothing else will work.  This will be
> subject to review by experts in the language."  I am neither
> advocating nor opposing such a restriction; I'm merely observing that
> one could have different restrictions than are contemplated by the
> currently-published policy.
> 
>> and most are risk-free.
> 
> The other issue that is critical in the root is this matter of risk.
> The problem with a zero-width character is that by its very nature, it
> is not itself visible to the user.  The result is that a user
> attempting to deal with the string has to have a theory about how it
> is represented: unlike every other character, the user has to know
> that there is this invisible character there, and has to know how it
> interacts with the other characters around it.
> 
> In the root, the risk is not, "Will this work for some set of users?"
> nor, "Will this fail to work in some contexts?"  It's instead, "Will
> this sometimes cause users to be confused such that they end up going
> to the wrong place?"  All of the examples so far have been examples of
> how some users will understand the string and nobody else will be able
> to use it.  In order to be convinced that ZWNJ is risk-free, however,
> we'd need a convincing argument that a string that could somehow be
> confused with the ZWNJ case could not be registered.
> 
> Moreover, the team has already come up with cases there the CONTEXTJ
> rule in IDNA2008 is met, but the ZWNJ can't be seen anyway.  Given the
> report points to font issues and ways that a font can break the user's
> expectations (or can be incomprehensible to a user if the writing
> style is not what the user is used to), it seems to me there is plenty
> of reason to believe that these examples are not risk free.  
> 
> The proposals for ZWNJ-free variants might get us there, but I'd sure
> like much broader arguments (to the effect that the group is sure
> there are no other corner cases) before saying ZWNJ is a good idea.  
> 
> One proposal might be to plan a study of actual ZWNJ use in some gTLD
> aimed at (say) a pan-Arabic-script and see whether there are negative
> effects, as a precondition for beginning use in the root.  I don't
> know how realistic such a plan would be, nor whether we'd get usable
> results (I'm not actually sure how I'd design such a study); but it
> might be better than starting with the root zone, where removing a
> label in an effort to fix a mistake will be all but impossible.
> 
> Best regards,
> 
> A


More information about the arabic-vip mailing list