[UA-discuss] truly international IDNs, was Armenia

John Levine john.levine at standcore.com
Wed Feb 13 02:28:09 UTC 2019


In article <20190213015321.im3xzkmrbn2nsnp5 at mx4.yitter.info> you write:
>But more importantly, there is an additional problem with domain
>names: the label separators we are used to seeing _don't appear_ in
>the DNS.

True.

>> If there were an Armenian mapping for IDNs, when the characters in a
>> domain name are Armenian, it handles Armenenian punctuation, and when
>> the characters are Latin, Latin punctuation.
>
>That won't, of course, work, because it is possible to have mixed code
>point repertoires either within or between labels.  _Probably_ it
>would be safe just to map all stops to ".", but nobody knows and the
>last time we tried that it didn't work out.

I agree we can't do it perfectly, but the question is whether we can
do it better than we're doing it now.  We seem to agree that trying to
do mapping without context has gone about as far as it can go, which
isn't far enough.  Context free dots are particularly horrible since
there are at least two kinds of dots (00b7 and 30fb) which can appear
in U-labels in some contexts.

My question is whether we can come up with context sensitive mappings
that are not horribly complicated and match what users expect.

For the case of Armenian, it seems like if you have aaa:aaa where aaa
is Armenian text and : is the Armenian stop, it makes sense to map the
: to an ASCII dot.  If you have aaa:lll (Latin text), maybe it does,
or maybe since the user is shifting to Latin anyway it's not hard to
type a dot instead.  Or maybe if you know the input's coming from an
Armenian input device, you always treat : as a dot.  I don't know
which of those, or something else, is best, but the current setup is
clearly wrong.

R's,
John



More information about the UA-discuss mailing list