[UA-discuss] IANA IDN Tables

Asmus Freytag asmusf at ix.netcom.com
Mon Feb 26 19:48:13 UTC 2018


On 2/26/2018 7:51 AM, Andrew Sullivan wrote:
> On Mon, Feb 26, 2018 at 01:47:56PM +0300, Maxim Alzoba wrote:
>> As I understand all IDN tables, which passed 2012 application round
>> are to be allowed to use despite any changes in LGR (it was the part of the
>> discussion when LGRs were established).
> Correct.  But some people are updating in line with LGR tools already
> -- particularly when their communities are sensitive to the issues of
> general-purpose domains and need conflicting uses of the same script.
> This applies (for instance) to Han, certainly Latin and Arabic, and
> probably Cyrillic (and maybe even Latin, Cyrillic, and Greek, though I
> know of nobody who's been that careful yet).
>
> The previous "variants" approach derived from the JET work made the
> distinction between "blocked" and "allocatable" less plain than it is
> now, and the inter-writing-system effects of characters is also now
> plainer and so easier to represent.  The JET approach worked quite
> well for CJK when used in relative isolation, but has limitations when
> applied more generally, which is why the new approach was worked out.

One interesting development for Chinese is a clever bit of tweaking of 
the algorithm
that defines the set of "allocatable" variants.

The original JET approach was intended to lead to at most three possible 
labels:
one all-simplified, one all-traditional label plus one mixed label (as 
applied for).

Because some code points have more than one simplified or more than one 
traditional
variant, a simple-minded scheme would allow a combinatorial explosion of
allocatable labels in some cases.

The new algorithm is able to limit the number of allocatable labels in 
these cases
to four; fewer in the general case.

This would be a big win, as keeping the number of allocatable variants 
small
has benefits, especially as the number of allocatable FQDN is the 
permutation
of all allocatable labels on each level.

Embedding the reduction into the algorithm has the advantage of making
the set of allocatable labels predictable (by evaluating the label 
against the LGR).
The LGR would fully conform to RFC 7940.

The number of blocked variants is still defined by the permutation of 
all variants
that aren't allocatable. For some labels, the numbers can be formidable, but
fortunately, there is no need to enumerate them, even for collision testing.

However, even the largest set of blocked variant pales compared to the
immense size of the namespace (20,000 code points) to the power of
(maximal number of code points in a U-label).

I believe the Chinese Generation Panel is planning a presentation of the 
scheme
at ICANN61.

A./



More information about the UA-discuss mailing list