[arabic-vip] WHOIS related query

Fahd Batayneh Fahd.Batayneh at NITC.gov.jo
Thu Aug 18 07:06:38 UTC 2011


I think here is where local languages/scripts linguistic experts come in hand in which they can recommend what are the frequency of occurrences of ways a certain word is written. For example, we in Jordan registered a .الاردن IDN ccTLD (the second ALEF is with no Hamza, while linguistically it should) since we were aware of the fact that almost every Jordanian writes the ALEF with no Hamza above.

In summary, I would give the knot to option 1 [contain the variants by putting an upper limit], and at the same time provide some flexibility to increase the upper limit to some cases that may require a more diverse set of variant words.

 [cid:image001.png at 01CB3491.EE8CD3A0]
National Information Technology Center

Fahd A. Batayneh

Team Lead
National Domain Names Division
Data and Network Security Department

P.O.Box: 259  ▪  Amman 11941  ▪  Jordan
Tel: 962.6.5300225
Fax: 962.6.5300277
E-Mail: fahd.batayneh at nitc.gov.jo<mailto:fahd.batayneh at nitc.gov.jo>


-- Follow NITC on Twitter<http://twitter.com/jordannitc>
Register your Arabic Domain Name under .alordun. For more information, please visit our website http://www.idn.jo/ or http://نطاقات-عربية.الاردن/

Disclaimer
The message contained in this e-mail along with the attachments (if present) are meant for the use of the intended recipient only. If you are not the intended recipient, please notify the sender immediately. Any unauthorized disclosure, copying, distribution of or taking any action in reliance on the contents of the information contained herein is strictly prohibited
• Please consider the environment - Do you really need to print this e-mail?

-----Original Message-----
From: arabic-vip-bounces at icann.org [mailto:arabic-vip-bounces at icann.org] On Behalf Of Manal Ismail
Sent: Thursday, August 18, 2011 2:50 AM
To: Dr.Sarmad Hussain
Cc: arabic-vip at icann.org
Subject: Re: [arabic-vip] WHOIS related query

Thanks for all the clarifications ..
Frankly I was talking practically (which is hard to accurately calculate) not theoretically ..
But I fully agree with Sarmad that we should be catering for the worst case scenario or have a criteria that guarantees that we'll never reach that point ..

Having said that, I have to admit that I don't fully understand option 2 below .. if I understand right, containing the language table won't limit the theoretical number of possible variants across the whole script, right? so how would this solve the problem ?

Kind Regards

--Manal

________________________________

From: sarmad.hussain at kics.edu.pk on behalf of Dr.Sarmad Hussain
Sent: Wed 17/08/2011 07:58 PM
To: Manal Ismail
Cc: baher.esmat; Steve Sheng; arabic-vip at icann.org
Subject: Re: [arabic-vip] WHOIS related query


Dear Manal and All,

Theoretically, if we have n letters in a label, and the letters have m variants each, then the total possibilities are m^n.  So for a 10 letter label, with say three variants per letter (e.g. kaf), we have 3^10 variants i.e. about 59,000.  Now add optional mark on each letter (two possibilities: with mark without mark per letter; assuming these are considered equivalent); for a single sequence of n letters, there 2^n possibilities, i.e. 1,000 approx.  Thus total possibilities with variants and marks would be 59K*1K, which gives a order of 100's of millions (if my mathematics is correct).

So Raed's estimates are without aerab/diacritical marks, just on letters.

However, practically speaking, "real" words (if there is such a thing for a label definition) would be fewer (this is most of the cases).

Having said that, we must plan for boundary cases, not just "real" cases as the theoretical limits must also be catered for.

Two possible solutions:

1. contain the variants by putting an upper limit
2. contain the language table to avoid generation of too many variants (harder to do, without significantly limiting linguistic expression)

If we choose option 1, then we need terminology and mechanisms to articulate and enable this.

I am not sure what is the best option at this time.

regards,
Sarmad




On Wed, Aug 17, 2011 at 9:32 AM, Manal Ismail <manal at tra.gov.eg> wrote:


        Does this has to do with using Diacritics ?

        --Manal

        ________________________________

        From: arabic-vip-bounces at icann.org on behalf of baher.esmat
        Sent: Wed 17/08/2011 02:30 PM
        To: Steve Sheng; Sarmad Hussain
        Cc: arabic-vip at icann.org
        Subject: Re: [arabic-vip] WHOIS related query




        On 8/16/11 9:15 PM, "Steve Sheng" <steve.sheng at icann.org> wrote:

        > Another question is a stupid question from me, how many variants could an
        > Arabic label have? Is it in the order of 10s, 100s or 1000s we are talking
        > about? This have obvious implications for WHOIS output and registry WHOIS
        > services.

        If my memory serves me right, Raed Al-Fayez of (.sa), also a member of the
        Arabic team, mentioned in a presentation at the ICANN meeting in Singapore
        that there were cases of variants ­ as per (.sa) policy ­ where the number
        of variants per a single label could be as many as ~64,000.

        Baher










-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mm.icann.org/pipermail/arabic-vip/attachments/20110818/7debfbf1/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture (Device Independent Bitmap) 1.jpg
Type: image/jpeg
Size: 1692 bytes
Desc: Picture (Device Independent Bitmap) 1.jpg
Url : http://mm.icann.org/pipermail/arabic-vip/attachments/20110818/7debfbf1/PictureDeviceIndependentBitmap1-0001.jpg 


More information about the arabic-vip mailing list