<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Two things:<br>
<br>
1) By limiting the cross-script variants to code points that truly
appear identical, no matter the font, you would take the judgement
out of this. The two labels colliding are really not usable at the
same time in the same zone, because, on some level, they are not
just confusable, but "identical" (on all but the code point
level). <br>
<br>
A registry that feels that they must "protect" their investment in
that kind of clearly bogus situation doesn't have a very strong
case.<br>
<br>
The situation is different for merely "similar" labels. For these,
it may be difficult to tell them apart, but as a user, you can
always establish (short of code point editing or comparing xn---
labels) whether two labels are the same or not. The degree of
acceptable similarity between labels becomes a mater of judgement;
hence the reason to leave them out of consideration for this
approach.<br>
<br>
2) You argument below assumes - for what reason I don't know -
that this technology cannot be applied selectively to new
registrations (with all existing registrations grandfathered).
There's nothing about variants that needs to be checked at lookup
time; any processing would happen at application time.<br>
<br>
This is different from making a change to the protocol itself.<br>
<br>
A./<br>
<br>
PS: in case you think this is a cross-script issue only, have a
look at 0259 and 01DD (both Latin) or TAMIL LETTER KA and TAMIL
DIGIT ZERO.<br>
<br>
On 4/27/2017 5:57 PM, Jothan Frakes wrote:<br>
</div>
<blockquote
cite="mid:CAGrS0FL7TEwUzj2PTiUU91yMo3KV+wTA=8wzhaEwrnn40PhQWA@mail.gmail.com"
type="cite">
<div dir="auto">Asmus I agree with the wisdom and the approach of
variant mapping like this.
<div dir="auto"><br>
</div>
<div dir="auto"> if we were all starting from scratch on IDN
today, gosh, it would really be swell.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I am going to digress a tad on the homograph
discussion into an issue that came to mind from the points in
your postscripts.<br>
<div dir="auto"><br>
</div>
<div dir="auto">Backward compatibility and interop are a key
area where friction happens each time something changes and
renders characters invalid that were previously allowed. </div>
<div dir="auto"><br>
</div>
<div dir="auto">It has been a while since IDNA2008 (which
replaced 2003) so one might reasonably expect something new.
</div>
<div dir="auto"><br>
</div>
<div dir="auto">There are still matters of supporting existing
stuff commercially and being mindful and sensitive to the
registrant experience.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Consider registries that sold domains to
people who made websites and started communication on
idna2003-valid domains that idna2008 later invalidated.</div>
<div dir="auto"><br>
</div>
<div dir="auto">The registrant experience is not so positive
if the registry simply says "oopsie, you can't have that
anymore". A registry could quote some well crafted wiggle
words in a registration agreement in justifying invalidating
a registration in such a manner, but the net effect is a
horrible registrant experience.</div>
<div dir="auto"><br>
</div>
<div dir="auto">So do we just chalk up those sub-optimal
registrant experiences to evolution? I suggest that it
actually would penalize pioneers and early adopters to do
so.</div>
<div dir="auto"><br>
</div>
<div dir="auto">In the transition from 2003 to 2008 IDNA,
registries recognized this, and we're careful in how they
moved forward.</div>
<div dir="auto"><br>
</div>
<div dir="auto">The registry gets held back in cases where the
specs collide -or- they have to make some bridge solution
that supports both to the best of their ability. This can
prove challenging where other registries may use their own
solutions.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I took the long way around the barn to pay
compliments to the efforts, and to hopefully inspire that we
always keep in mind retroactive support and a positive
registrant experience with IDN.</div>
<div dir="auto"><br>
</div>
<div dir="auto">-jothan</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Apr 27, 2017 12:27, "Asmus Freytag"
<<a moz-do-not-send="true"
href="mailto:asmusf@ix.netcom.com">asmusf@ix.netcom.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_7648477781036115485moz-cite-prefix">On
4/27/2017 11:22 AM, Jothan Frakes wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div>
<div class="m_7648477781036115485gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">On Thu, Apr 27, 2017 at 7:37 AM,
Andre Schappo <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:A.Schappo@lboro.ac.uk"
target="_blank">A.Schappo@lboro.ac.uk</a>></span>
wrote:<br>
</div>
</div>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div style="word-wrap:break-word"> Some
thoughts, having now caught up with all the UA
emails on phishing.
<div><br>
</div>
<div>① Over the years there has been much
discussion about Cyrillic being used to
masquerade as ASCII domain names. I wonder
if the Russian speaking community have been
having similar discussions with respect to
ASCII being used to masquerade as Cyrillic
domain names. </div>
<div><br>
</div>
</div>
</blockquote>
<div><br>
<br>
Quick comments here (mostly for a wider reading
audience):<br>
<br>
1] Need to include Greek in the Cyrillic/Latin
(or "ASCII" as we call it here in this
discussion) as being Homograph rich across all
three from visually identical or near identicals</div>
</div>
</div>
</div>
</blockquote>
<br>
Also arguably Armenian - the font used in the Unicode
charts is not representative, and much Armenian fonts
styles look more like Times or Helvetica, meaning that
there are shapes like "հ ս ո օ"<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
2] I used to believe that there was a bright
line between all Cyrillic and all Latin/ASCII -
and I learned through the process of many wise
people like Yuri and Dusan spending time to
evolve my thinking that these may be intermixed
under perfectly normal use, and this also varies
by region. We should not assume all of one or
all of the other.</div>
<div><br>
</div>
</div>
<br>
</div>
</div>
</blockquote>
The limitation in thinking is that the "go-to" solution is
to try to ban some code points, or to ban them in certain
contexts. Which leads to the call for single-script labels
(which, as we know reduces, but does not remove the
homograph attach surface).<br>
<br>
A more robust method is to make homoglyphs mutually
exclusive in the registry. If a registered label has one
code point at a certain position, the same label with the
homoglyph substituted at the same position would be
blocked. ("Blocked variant")<br>
<br>
The technology to specify this used to exist in two
slightly different forms; once for Arabic and once for
CJK. These were defined in separate RFCs, with mutually
incompatible plain-text formats.<br>
<br>
With RFC 7940 there is, for the first time, a universal
XML schema to specify these kinds of relations. This
should make it easy to generate shared libraries and
toolsets that can read/process these definitions. As a
result, blocked variants are a technique that should
become a standard methodology for registries.<br>
<br>
If you have blocked variants defined, then you can mix not
just Cyrillic and Latin labels more safely, but also mix
Latin and Cyrillic inside a single label without opening
yourself up to homograph attacks.<br>
<br>
RFC 7940 is occasionally misunderstood as a prescription
how to design Label Generation Rules (aka IDN tables). It
is not, it is instead a description of a universal data
format (in XML) that can represent pretty much anything
needed for registration policies (on the code point
level): for example, you can define which code points to
allow, next to what other code points and what variants to
block.<br>
<br>
It could use a bit of advertising. Perhaps it could be
mentioned in comments to the IDN guidelines? (As a
co-author, I'm not eligible to make such comments myself).
Not least because it unifies the description of blocked
variants, it does have a clear place in the infrastructure
needed to support universal acceptance.<br>
<br>
A./<br>
<br>
PS: For the root zone, we are planning to stick to
single-script labels, but also to implement blocked
variants across scripts. Some of the data in my
cross-script variants collection comes from the relevant
drafts for that project, other data comes from data
derived from Unicode's UTR#39, and some is based on my own
knowledge of certain scripts.<br>
<br>
PPS: I'm attaching an update of my cross- script variants
listing. The data for that exists in an XML file according
to RFC7940; the HTML summary of that data is created by a
simple tool. I would appreciate comments on the contents
and description from anyone.<br>
<br>
PPPS: you may have noticed that I'm not writing anything
about allocatable variants. Their effect on the DNS is
very different - they may be needed/useful in some
context, but the motivation is not security. RFC 7940
allows you to define them where needed, including with the
same semantics as in the existing RFCs if desired.<br>
<p><br>
</p>
</div>
</blockquote>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>