[arabic-vip] WHOIS related query

Siavash Shahshahani shahshah at irnic.ir
Thu Aug 18 03:59:19 UTC 2011


Option 2 is what many ccTLDs priactically do, i.e., limit themselves to
part of the full table. This makes sense for a ccTLD as they are concerned
with a limited community. I don't think 'we' should adopt a universal
solution; it is the job of each registry to cope with this problem for
itself according to the nature of the TLD it operates. Further note that
this becomes a problem only if a bundling mechanism is used. A registry
that uses indexing doesn't have to worry too much about this multiplicity
of variants.
Siavash

On Thu, 18 Aug 2011 01:49:39 +0200, "Manal Ismail" <manal at tra.gov.eg>
wrote:
> Thanks for all the clarifications ..
> Frankly I was talking practically (which is hard to accurately
calculate)
> not theoretically ..
> But I fully agree with Sarmad that we should be catering for the worst
> case scenario or have a criteria that guarantees that we'll never reach
> that point ..
>  
> Having said that, I have to admit that I don't fully understand option 2
> below .. if I understand right, containing the language table won't
limit
> the theoretical number of possible variants across the whole script,
right?
> so how would this solve the problem ?
>  
> Kind Regards
>  
> --Manal
> 
> ________________________________
> 
> From: sarmad.hussain at kics.edu.pk on behalf of Dr.Sarmad Hussain
> Sent: Wed 17/08/2011 07:58 PM
> To: Manal Ismail
> Cc: baher.esmat; Steve Sheng; arabic-vip at icann.org
> Subject: Re: [arabic-vip] WHOIS related query
> 
> 
> Dear Manal and All, 
> 
> Theoretically, if we have n letters in a label, and the letters have m
> variants each, then the total possibilities are m^n.  So for a 10 letter
> label, with say three variants per letter (e.g. kaf), we have 3^10
variants
> i.e. about 59,000.  Now add optional mark on each letter (two
> possibilities: with mark without mark per letter; assuming these are
> considered equivalent); for a single sequence of n letters, there 2^n
> possibilities, i.e. 1,000 approx.  Thus total possibilities with
variants
> and marks would be 59K*1K, which gives a order of 100's of millions (if
my
> mathematics is correct).  
> 
> So Raed's estimates are without aerab/diacritical marks, just on
letters.
> 
> However, practically speaking, "real" words (if there is such a thing
for
> a label definition) would be fewer (this is most of the cases).
> 
> Having said that, we must plan for boundary cases, not just "real" cases
> as the theoretical limits must also be catered for.
> 
> Two possible solutions:
> 
> 1. contain the variants by putting an upper limit
> 2. contain the language table to avoid generation of too many variants
> (harder to do, without significantly limiting linguistic expression)
> 
> If we choose option 1, then we need terminology and mechanisms to
> articulate and enable this. 
> 
> I am not sure what is the best option at this time.
> 
> regards,
> Sarmad
> 
> 
> 
> 
> On Wed, Aug 17, 2011 at 9:32 AM, Manal Ismail <manal at tra.gov.eg> wrote:
> 
> 
> 	Does this has to do with using Diacritics ?
> 
> 	--Manal
> 
> 	________________________________
> 
> 	From: arabic-vip-bounces at icann.org on behalf of baher.esmat
> 	Sent: Wed 17/08/2011 02:30 PM
> 	To: Steve Sheng; Sarmad Hussain
> 	Cc: arabic-vip at icann.org
> 	Subject: Re: [arabic-vip] WHOIS related query
> 
> 
> 
> 
> 	On 8/16/11 9:15 PM, "Steve Sheng" <steve.sheng at icann.org> wrote:
> 
> 	> Another question is a stupid question from me, how many variants
could
> 	> an
> 	> Arabic label have? Is it in the order of 10s, 100s or 1000s we are
> 	> talking
> 	> about? This have obvious implications for WHOIS output and registry
> 	> WHOIS
> 	> services.
> 
> 	If my memory serves me right, Raed Al-Fayez of (.sa), also a member of
the
> 	Arabic team, mentioned in a presentation at the ICANN meeting in
Singapore
> 	that there were cases of variants ­ as per (.sa) policy ­ where the
> 	number
> 	of variants per a single label could be as many as ~64,000.
> 
> 	Baher


More information about the arabic-vip mailing list