[arabic-vip] WHOIS related query
Siavash Shahshahani
shahshah at irnic.ir
Thu Aug 18 03:59:19 UTC 2011
Option 2 is what many ccTLDs priactically do, i.e., limit themselves to
part of the full table. This makes sense for a ccTLD as they are concerned
with a limited community. I don't think 'we' should adopt a universal
solution; it is the job of each registry to cope with this problem for
itself according to the nature of the TLD it operates. Further note that
this becomes a problem only if a bundling mechanism is used. A registry
that uses indexing doesn't have to worry too much about this multiplicity
of variants.
Siavash
On Thu, 18 Aug 2011 01:49:39 +0200, "Manal Ismail" <manal at tra.gov.eg>
wrote:
> Thanks for all the clarifications ..
> Frankly I was talking practically (which is hard to accurately
calculate)
> not theoretically ..
> But I fully agree with Sarmad that we should be catering for the worst
> case scenario or have a criteria that guarantees that we'll never reach
> that point ..
>
> Having said that, I have to admit that I don't fully understand option 2
> below .. if I understand right, containing the language table won't
limit
> the theoretical number of possible variants across the whole script,
right?
> so how would this solve the problem ?
>
> Kind Regards
>
> --Manal
>
> ________________________________
>
> From: sarmad.hussain at kics.edu.pk on behalf of Dr.Sarmad Hussain
> Sent: Wed 17/08/2011 07:58 PM
> To: Manal Ismail
> Cc: baher.esmat; Steve Sheng; arabic-vip at icann.org
> Subject: Re: [arabic-vip] WHOIS related query
>
>
> Dear Manal and All,
>
> Theoretically, if we have n letters in a label, and the letters have m
> variants each, then the total possibilities are m^n. So for a 10 letter
> label, with say three variants per letter (e.g. kaf), we have 3^10
variants
> i.e. about 59,000. Now add optional mark on each letter (two
> possibilities: with mark without mark per letter; assuming these are
> considered equivalent); for a single sequence of n letters, there 2^n
> possibilities, i.e. 1,000 approx. Thus total possibilities with
variants
> and marks would be 59K*1K, which gives a order of 100's of millions (if
my
> mathematics is correct).
>
> So Raed's estimates are without aerab/diacritical marks, just on
letters.
>
> However, practically speaking, "real" words (if there is such a thing
for
> a label definition) would be fewer (this is most of the cases).
>
> Having said that, we must plan for boundary cases, not just "real" cases
> as the theoretical limits must also be catered for.
>
> Two possible solutions:
>
> 1. contain the variants by putting an upper limit
> 2. contain the language table to avoid generation of too many variants
> (harder to do, without significantly limiting linguistic expression)
>
> If we choose option 1, then we need terminology and mechanisms to
> articulate and enable this.
>
> I am not sure what is the best option at this time.
>
> regards,
> Sarmad
>
>
>
>
> On Wed, Aug 17, 2011 at 9:32 AM, Manal Ismail <manal at tra.gov.eg> wrote:
>
>
> Does this has to do with using Diacritics ?
>
> --Manal
>
> ________________________________
>
> From: arabic-vip-bounces at icann.org on behalf of baher.esmat
> Sent: Wed 17/08/2011 02:30 PM
> To: Steve Sheng; Sarmad Hussain
> Cc: arabic-vip at icann.org
> Subject: Re: [arabic-vip] WHOIS related query
>
>
>
>
> On 8/16/11 9:15 PM, "Steve Sheng" <steve.sheng at icann.org> wrote:
>
> > Another question is a stupid question from me, how many variants
could
> > an
> > Arabic label have? Is it in the order of 10s, 100s or 1000s we are
> > talking
> > about? This have obvious implications for WHOIS output and registry
> > WHOIS
> > services.
>
> If my memory serves me right, Raed Al-Fayez of (.sa), also a member of
the
> Arabic team, mentioned in a presentation at the ICANN meeting in
Singapore
> that there were cases of variants as per (.sa) policy where the
> number
> of variants per a single label could be as many as ~64,000.
>
> Baher
More information about the arabic-vip
mailing list