[arabic-vip] WHOIS related query

Manal Ismail manal at tra.gov.eg
Wed Aug 17 23:49:39 UTC 2011


Thanks for all the clarifications ..
Frankly I was talking practically (which is hard to accurately calculate) not theoretically ..
But I fully agree with Sarmad that we should be catering for the worst case scenario or have a criteria that guarantees that we'll never reach that point ..
 
Having said that, I have to admit that I don't fully understand option 2 below .. if I understand right, containing the language table won't limit the theoretical number of possible variants across the whole script, right? so how would this solve the problem ?
 
Kind Regards
 
--Manal

________________________________

From: sarmad.hussain at kics.edu.pk on behalf of Dr.Sarmad Hussain
Sent: Wed 17/08/2011 07:58 PM
To: Manal Ismail
Cc: baher.esmat; Steve Sheng; arabic-vip at icann.org
Subject: Re: [arabic-vip] WHOIS related query


Dear Manal and All, 

Theoretically, if we have n letters in a label, and the letters have m variants each, then the total possibilities are m^n.  So for a 10 letter label, with say three variants per letter (e.g. kaf), we have 3^10 variants i.e. about 59,000.  Now add optional mark on each letter (two possibilities: with mark without mark per letter; assuming these are considered equivalent); for a single sequence of n letters, there 2^n possibilities, i.e. 1,000 approx.  Thus total possibilities with variants and marks would be 59K*1K, which gives a order of 100's of millions (if my mathematics is correct).  

So Raed's estimates are without aerab/diacritical marks, just on letters.

However, practically speaking, "real" words (if there is such a thing for a label definition) would be fewer (this is most of the cases).

Having said that, we must plan for boundary cases, not just "real" cases as the theoretical limits must also be catered for.

Two possible solutions:

1. contain the variants by putting an upper limit
2. contain the language table to avoid generation of too many variants (harder to do, without significantly limiting linguistic expression)

If we choose option 1, then we need terminology and mechanisms to articulate and enable this. 

I am not sure what is the best option at this time.

regards,
Sarmad




On Wed, Aug 17, 2011 at 9:32 AM, Manal Ismail <manal at tra.gov.eg> wrote:


	Does this has to do with using Diacritics ?
	
	--Manal
	
	________________________________
	
	From: arabic-vip-bounces at icann.org on behalf of baher.esmat
	Sent: Wed 17/08/2011 02:30 PM
	To: Steve Sheng; Sarmad Hussain
	Cc: arabic-vip at icann.org
	Subject: Re: [arabic-vip] WHOIS related query
	
	
	
	
	On 8/16/11 9:15 PM, "Steve Sheng" <steve.sheng at icann.org> wrote:
	
	> Another question is a stupid question from me, how many variants could an
	> Arabic label have? Is it in the order of 10s, 100s or 1000s we are talking
	> about? This have obvious implications for WHOIS output and registry WHOIS
	> services.
	
	If my memory serves me right, Raed Al-Fayez of (.sa), also a member of the
	Arabic team, mentioned in a presentation at the ICANN meeting in Singapore
	that there were cases of variants ­ as per (.sa) policy ­ where the number
	of variants per a single label could be as many as ~64,000.
	
	Baher
	
	
	
	
	
	





More information about the arabic-vip mailing list