[arabic-vip] WHOIS related query
Manal Ismail
manal at tra.gov.eg
Wed Aug 17 23:49:39 UTC 2011
Thanks for all the clarifications ..
Frankly I was talking practically (which is hard to accurately calculate) not theoretically ..
But I fully agree with Sarmad that we should be catering for the worst case scenario or have a criteria that guarantees that we'll never reach that point ..
Having said that, I have to admit that I don't fully understand option 2 below .. if I understand right, containing the language table won't limit the theoretical number of possible variants across the whole script, right? so how would this solve the problem ?
Kind Regards
--Manal
________________________________
From: sarmad.hussain at kics.edu.pk on behalf of Dr.Sarmad Hussain
Sent: Wed 17/08/2011 07:58 PM
To: Manal Ismail
Cc: baher.esmat; Steve Sheng; arabic-vip at icann.org
Subject: Re: [arabic-vip] WHOIS related query
Dear Manal and All,
Theoretically, if we have n letters in a label, and the letters have m variants each, then the total possibilities are m^n. So for a 10 letter label, with say three variants per letter (e.g. kaf), we have 3^10 variants i.e. about 59,000. Now add optional mark on each letter (two possibilities: with mark without mark per letter; assuming these are considered equivalent); for a single sequence of n letters, there 2^n possibilities, i.e. 1,000 approx. Thus total possibilities with variants and marks would be 59K*1K, which gives a order of 100's of millions (if my mathematics is correct).
So Raed's estimates are without aerab/diacritical marks, just on letters.
However, practically speaking, "real" words (if there is such a thing for a label definition) would be fewer (this is most of the cases).
Having said that, we must plan for boundary cases, not just "real" cases as the theoretical limits must also be catered for.
Two possible solutions:
1. contain the variants by putting an upper limit
2. contain the language table to avoid generation of too many variants (harder to do, without significantly limiting linguistic expression)
If we choose option 1, then we need terminology and mechanisms to articulate and enable this.
I am not sure what is the best option at this time.
regards,
Sarmad
On Wed, Aug 17, 2011 at 9:32 AM, Manal Ismail <manal at tra.gov.eg> wrote:
Does this has to do with using Diacritics ?
--Manal
________________________________
From: arabic-vip-bounces at icann.org on behalf of baher.esmat
Sent: Wed 17/08/2011 02:30 PM
To: Steve Sheng; Sarmad Hussain
Cc: arabic-vip at icann.org
Subject: Re: [arabic-vip] WHOIS related query
On 8/16/11 9:15 PM, "Steve Sheng" <steve.sheng at icann.org> wrote:
> Another question is a stupid question from me, how many variants could an
> Arabic label have? Is it in the order of 10s, 100s or 1000s we are talking
> about? This have obvious implications for WHOIS output and registry WHOIS
> services.
If my memory serves me right, Raed Al-Fayez of (.sa), also a member of the
Arabic team, mentioned in a presentation at the ICANN meeting in Singapore
that there were cases of variants as per (.sa) policy where the number
of variants per a single label could be as many as ~64,000.
Baher
More information about the arabic-vip
mailing list