[arabic-vip] ZWNJ (earlier: [vip] Overarching principles used in Devanagari team report)

Behnam Esfahbod behnam at esfahbod.info
Fri Sep 23 13:43:09 UTC 2011


All,

1. ZWNJ is a "joining control" character, thus although it doesn't
have a "visual representation" by itself, it's still "visible" to the
naked eye of people familiar with Arabic script.

2. In Persian language, ZWNJ is not an optional feature of the script
(like FATHA, SHADDA, etc) but a mandatory one. Dr. Shahshahani
mentioned this a few month ago as well.

3. Although ZWNJ is most of the time used in "long" words, but still
very basic nouns use it. Here are a few examples:

3.1. Any Iranian city name that ends with آباد or اباد, and many of
those ending in شهر.

You can find a list of Iranian city names here:
http://fa.wikipedia.org/wiki/%D9%81%D9%87%D8%B1%D8%B3%D8%AA_%D8%B4%D9%87%D8%B1%D9%87%D8%A7%DB%8C_%D8%A7%DB%8C%D8%B1%D8%A7%D9%86

3.2. "wikipedia" in Persian, which is ویکی‌پدیا and not ویکیپدیا.

3.2. Brand names like: فارسی‌وب, راه‌ابریشم, بی‌بی and many many others.

So, the question would be are we in a place to prevent all of these
names from being a TLD in the future?

4. Kurdish language depends on ZWNJ even more than Persian!

Just open this link and count the number of ZWNJ in each sentence:
http://ckb.wikipedia.org/wiki/%D8%B2%D9%85%D8%A7%D9%86%DB%8C_%DA%A9%D9%88%D8%B1%D8%AF%DB%8C
In the first sentence, there are 10 words and 7 ZWNJs!

5. Although it has been proposed "to not allow ZWNJ appear after
ARABIC LETTER TAH (and its family), I should note that "خط‌کش"
("ruler" in Persian) and "رباط‌کریم" (a district of Tehran province)
both use ZWNJ after TAH.

Thus I'm completely against putting any generic rule to ban ZWNJ, more
than what IDNA2008 specifies. We should let people use it when it is
needed, but we should ask/recommend ICANN to be prepared for such
security risks, as many other difficulties of the Arabic script.

For example, consider these two strings: "بیا" and "یبا" (it's BEH,
YEH, ALEF and YEH, BEH, ALEF). These two strings can be considered
"security risk" to each other, but would it make any sense to ban BEH
completely because of these two strings? or maybe ban YEH? It doesn't!
But yet, ICANN should not delegate more than one of them! The case of
ZWNJ is very similar to this. It's necessary in many use cases, but
there can be a few security threats. We should provide ICANN methods
to understand and find this threats, instead of just ignoring these
difficulties.

Thanks,
-Behnam

-- 
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '      http://behnam.esfahbod.info
  *  ..   http://zwnj.org/
 *  `  *  http://persian-computing.ir
  * o *   3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B



More information about the arabic-vip mailing list