[arabic-vip] WHOIS related query

Dr.Sarmad Hussain sarmad at cantab.net
Wed Aug 17 17:58:13 UTC 2011


Dear Manal and All,

Theoretically, if we have n letters in a label, and the letters have m
variants each, then the total possibilities are m^n.  So for a 10 letter
label, with say three variants per letter (e.g. kaf), we have 3^10 variants
i.e. about 59,000.  Now add optional mark on each letter (two possibilities:
with mark without mark per letter; assuming these are considered
equivalent); for a single sequence of n letters, there 2^n possibilities,
i.e. 1,000 approx.  Thus total possibilities with variants and marks would
be 59K*1K, which gives a order of 100's of millions (if my mathematics is
correct).

So Raed's estimates are without aerab/diacritical marks, just on letters.

However, practically speaking, "real" words (if there is such a thing for a
label definition) would be fewer (this is most of the cases).

Having said that, we must plan for boundary cases, not just "real" cases as
the theoretical limits must also be catered for.

Two possible solutions:

1. contain the variants by putting an upper limit
2. contain the language table to avoid generation of too many variants
(harder to do, without significantly limiting linguistic expression)

If we choose option 1, then we need terminology and mechanisms to articulate
and enable this.

I am not sure what is the best option at this time.

regards,
Sarmad




On Wed, Aug 17, 2011 at 9:32 AM, Manal Ismail <manal at tra.gov.eg> wrote:

> Does this has to do with using Diacritics ?
>
> --Manal
>
> ________________________________
>
> From: arabic-vip-bounces at icann.org on behalf of baher.esmat
> Sent: Wed 17/08/2011 02:30 PM
> To: Steve Sheng; Sarmad Hussain
> Cc: arabic-vip at icann.org
> Subject: Re: [arabic-vip] WHOIS related query
>
>
>
>
> On 8/16/11 9:15 PM, "Steve Sheng" <steve.sheng at icann.org> wrote:
>
> > Another question is a stupid question from me, how many variants could an
> > Arabic label have? Is it in the order of 10s, 100s or 1000s we are
> talking
> > about? This have obvious implications for WHOIS output and registry WHOIS
> > services.
>
> If my memory serves me right, Raed Al-Fayez of (.sa), also a member of the
> Arabic team, mentioned in a presentation at the ICANN meeting in Singapore
> that there were cases of variants ­ as per (.sa) policy ­ where the number
> of variants per a single label could be as many as ~64,000.
>
> Baher
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mm.icann.org/pipermail/arabic-vip/attachments/20110817/8560205b/attachment.html 


More information about the arabic-vip mailing list