[arabic-vip] IDN Table/Language table/Script Table

Sat Aug 13 17:20:43 UTC 2011

Hi,

On Wed, 10 Aug 2011 08:57:22 -0400, Andrew Sullivan
<ajs at anvilwalrusden.com> wrote:
> Dear colleagues,
> 
> On Tue, Aug 09, 2011 at 05:28:44AM -0700, iftakhar shah wrote:
> 
>> "An IDN Table is a table listing all those characters that a particular
>> TLD registry supports. If one or more of these characters are
considered
>> a variant this is indicated next to that/those characters. It is also
>> indicated which character a particular character is a variant to. The
>> variant tables usually holds characters representing a specific
language,
>> or they can be characters from a specific script. Therefore the variant
>> table is sometimes referred to as 'language variant table', language
>> table', script table' or something similar"
>>  
>> Reference 
>> http://www.icann.org/en/topics/idn/idn-glossary.htm
> 
> 
> In my opnion, one problem with that definition is that it is ambivalent
> about whether a table is bound to a language, or to a script.

This may be considered the beauty of it! Just as 'com' and 'info' are
meaningful TLDs for a number of different languages that use the Latin
alphabet, there may be cross-language TLDs in Arabic script or other
scripts. There is no reason to discriminate against these scripts when the
same thing is allowed for ASCII.

>  In the context of the top level, this may be an extremely important
> distinction.

It 'may', but not necessarily. ASCII labels serve more than just English
language TLDs. It becomes important only if the TLD itself allows
variants.
This problem is moot for IDN ccTLDs because their language connection is
presumably known. For IDN gTLDs that are susceptible to variants, this is
admittedly a problem, but one for which there need not be a universal
pre-determined solution. In each such case the TLD applicant should be
expected to deal thoroughly with confusability aspects before the TLD is
allowed in the root, and this is part of the ICANN gTLD process. The
manner
in which this is dealt with may be different for different TLD proposals. 

> During DNS activities, there is no way to tell the linguistic context
> of a lookup.  That is, it is not possible to know what language, if
> any, might be relevant to the DNS lookup that is happening.  By
> reverse processing of the A-label (recall that only the A-label form
> is used in the DNS), it is possible to figure out what script(s) are
> contained in the original FQDN in U-label form.  (A DNS lookup
> contains the entire name to be looked up.  But IDNA is defined label
> by label, and there is no reason to suppose that one label has to be
> in the same script as another label; there's not even a requirement in
> IDNA that a given label needs to be in a single script, and such a
> rule could not be adopted wholesale.)  At a busy cacheing nameserver
> or at the root namservers, however, performing the U-label/A-label
> transformation on all the labels in every lookup would be unrealistic
> for performance reasons.
> 
> Therefore, for practical purposes, (1) at the root, any table will
> likely need to take into account all languaes using the script(s) in
> question, and (2) any plan for handling variants will have to depend
> entirely on registration-time rules.

To me this sounds more like a critique of IDNA. I don't see the relevance
to the
problem being discussed, such as you indicate below.

> 
> I therefore suggest that, at least for the root case, the ambivalence
> in the above definition would need to be addressed.
> 
> Best regards,
> 
> Andrew

Best regards,
Siavash