[UA-discuss] truly international IDNs, was Armenia

Andrew Sullivan ajs at anvilwalrusden.com
Wed Feb 13 01:53:22 UTC 2019


Hi,

On Tue, Feb 12, 2019 at 03:32:54PM -0500, John Levine wrote:

> The second version of IDNA, IDNA2008, recognized this problem and
> deliberately removed all the mappings.  The idea was that experts in
> different scripts and languages would create mappings that make sense
> for people who use those scripts and speak those languages.  The
> mappings would turn the user into into standardized U-labels that the
> IDN software can then use.

This isn't quite correct for the case of the dots in domain names.

There are two additional important wrinkles here.

First, IDNA is defined for _labels_, and not for _domain names_.  This
is perfectly clear in IDNA2008.  It is less clear in IDNA2003, because
while most of that specification _is_ about labels, there are some
places where the whole domain mname is implicated.  This is
particularly true of label separators (the dots).  That brings us,
however, to two different problems.

First, domain names are distributed in their operation, and that means
that there is no way to be sure that the "whole domain name" is in one
script.  We see this today, quite commonly, where there are IDLs that
live under traditional LDH-labels.  For most Latin-based languages,
this isn't really a problem, but where you have multiple scripts where
at least one is not Latin, it's hard to be sure exactly which rules
ought to apply.

But more importantly, there is an additional problem with domain
names: the label separators we are used to seeing _don't appear_ in
the DNS.  A domain name like crankycanuck.ca. does not appear, in the
DNS, as a series of octets separated by a special character (.), but
instead a series of octets bound by length indicators that also
function as label separators (conceptually, it's like
12crankycanuck2ca00; the final 0 is a null label to indicate the root.
This is, by the way, the reason it is possible to have a label with a
. in it in the DNS.  You rarely see these, but they sometimes show up
in the responisble person field of the SOA record).  Since the
separator never actually appears in the DNS and since you're supposed
to go label by label, this is a problem.

Now, it _might_ be that an application that is attempting to handle
IDNs that are likely to be entered in a given locale should do some
sort of mapping of the normal stops in that locale: that's roughly
what RFC 5895 suggests.

> If there were an Armenian mapping for IDNs, when the characters in a
> domain name are Armenian, it handles Armenenian punctuation, and when
> the characters are Latin, Latin punctuation.

That won't, of course, work, because it is possible to have mixed code
point repertoires either within or between labels.  _Probably_ it
would be safe just to map all stops to ".", but nobody knows and the
last time we tried that it didn't work out.

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com


More information about the UA-discuss mailing list