[UA-discuss] truly international IDNs, was Armenia

Michael Casadevall michael at casadevall.pro
Wed Feb 13 06:50:52 UTC 2019


Replies inline

On 2/12/19 9:28 PM, John Levine wrote:
> In article <20190213015321.im3xzkmrbn2nsnp5 at mx4.yitter.info> you write:
>> But more importantly, there is an additional problem with domain
>> names: the label separators we are used to seeing _don't appear_ in
>> the DNS.
> 
> True.
> 
>>> If there were an Armenian mapping for IDNs, when the characters in a
>>> domain name are Armenian, it handles Armenenian punctuation, and when
>>> the characters are Latin, Latin punctuation.
>>
>> That won't, of course, work, because it is possible to have mixed code
>> point repertoires either within or between labels.  _Probably_ it
>> would be safe just to map all stops to ".", but nobody knows and the
>> last time we tried that it didn't work out.
> 
> I agree we can't do it perfectly, but the question is whether we can
> do it better than we're doing it now.  We seem to agree that trying to
> do mapping without context has gone about as far as it can go, which
> isn't far enough.  Context free dots are particularly horrible since
> there are at least two kinds of dots (00b7 and 30fb) which can appear
> in U-labels in some contexts.
> 

There is technical issue that comes up is how the stop is processed. In
the DNS protocol itself, the dot represents a separator for domain
components (which have a max length of 63 ASCII characters) and are also
used as markers for DNS compression which is somewhat mandatory to make
replies fit within the 512 byte limit.

This creates an entire nightmare of problems if a period (specifically
0x2E) is used in an IDN in a context where it doesn't represent a
subdomain. I don't know enough about Armenian or any other foreign
languages to say specifically if this is a problem in actuality but I
can easily imagine areas where this causes pain.

To expand on John's comment, context based processing is a can of worms
that should be approached very carefully; as the dot has a very specific
meaning within the DNS protocol itself (with a single dot representing
the root zone), allowing another character to have this functionality in
the U-Label could easily lead to unpredictable results; this is
especially potent in cases where you have an IDN domain with an ASCII TLD

> My question is whether we can come up with context sensitive mappings
> that are not horribly complicated and match what users expect.
> 
> For the case of Armenian, it seems like if you have aaa:aaa where aaa
> is Armenian text and : is the Armenian stop, it makes sense to map the
> : to an ASCII dot.  If you have aaa:lll (Latin text), maybe it does,
> or maybe since the user is shifting to Latin anyway it's not hard to
> type a dot instead.  Or maybe if you know the input's coming from an
> Armenian input device, you always treat : as a dot.  I don't know
> which of those, or something else, is best, but the current setup is
> clearly wrong.
> 

Arguably, the characters to encode a domain spot should be of the level
represented. i.e., you need a period to denote .com as it's an ASCII
TLD, even if the next part. If done in this matter, then the Armenian
character separate with an Armenian TLD would at least be
straightforward and reduce the amount of places where things can go
wrong in U->A label generation. This would also make things like DNS
search path more or or less work as expected, with the downside that it
may be unintuitive for users.

I do feel a better solution is needed here but I'm not sure I have a
solid suggestion on how to handle it. Part of me is wondering if a EDNS
extension may be a path forward to help reduce IDN pain in the future to
allow resolution of u-labels directly.
Michael


-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 2468 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20190213/0cee596d/pEpkey.asc>


More information about the UA-discuss mailing list