[UA-discuss] IANA IDN Tables
asmusf at ix.netcom.com
Mon Feb 26 19:48:13 UTC 2018
On 2/26/2018 7:51 AM, Andrew Sullivan wrote:
> On Mon, Feb 26, 2018 at 01:47:56PM +0300, Maxim Alzoba wrote:
>> As I understand all IDN tables, which passed 2012 application round
>> are to be allowed to use despite any changes in LGR (it was the part of the
>> discussion when LGRs were established).
> Correct. But some people are updating in line with LGR tools already
> -- particularly when their communities are sensitive to the issues of
> general-purpose domains and need conflicting uses of the same script.
> This applies (for instance) to Han, certainly Latin and Arabic, and
> probably Cyrillic (and maybe even Latin, Cyrillic, and Greek, though I
> know of nobody who's been that careful yet).
> The previous "variants" approach derived from the JET work made the
> distinction between "blocked" and "allocatable" less plain than it is
> now, and the inter-writing-system effects of characters is also now
> plainer and so easier to represent. The JET approach worked quite
> well for CJK when used in relative isolation, but has limitations when
> applied more generally, which is why the new approach was worked out.
One interesting development for Chinese is a clever bit of tweaking of
that defines the set of "allocatable" variants.
The original JET approach was intended to lead to at most three possible
one all-simplified, one all-traditional label plus one mixed label (as
Because some code points have more than one simplified or more than one
variant, a simple-minded scheme would allow a combinatorial explosion of
allocatable labels in some cases.
The new algorithm is able to limit the number of allocatable labels in
to four; fewer in the general case.
This would be a big win, as keeping the number of allocatable variants
has benefits, especially as the number of allocatable FQDN is the
of all allocatable labels on each level.
Embedding the reduction into the algorithm has the advantage of making
the set of allocatable labels predictable (by evaluating the label
against the LGR).
The LGR would fully conform to RFC 7940.
The number of blocked variants is still defined by the permutation of
that aren't allocatable. For some labels, the numbers can be formidable, but
fortunately, there is no need to enumerate them, even for collision testing.
However, even the largest set of blocked variant pales compared to the
immense size of the namespace (20,000 code points) to the power of
(maximal number of code points in a U-label).
I believe the Chinese Generation Panel is planning a presentation of the
More information about the UA-discuss