<div dir="auto">Asmus I agree with the wisdom and the approach of variant mapping like this. <div dir="auto"><br></div><div dir="auto"> if we were all starting from scratch on IDN today, gosh, it would really be swell.</div><div dir="auto"><br></div><div dir="auto">I am going to digress a tad on the homograph discussion into an issue that came to mind from the points in your postscripts.<br><div dir="auto"><br></div><div dir="auto">Backward compatibility and interop are a key area where friction happens each time something changes and renders characters invalid that were previously allowed.  </div><div dir="auto"><br></div><div dir="auto">It has been a while since IDNA2008 (which replaced 2003) so one might reasonably expect something new.  </div><div dir="auto"><br></div><div dir="auto">There are still matters of supporting existing stuff commercially and being mindful and sensitive to the registrant experience.</div><div dir="auto"><br></div><div dir="auto">Consider registries that sold domains to people who made websites and started communication on idna2003-valid domains that idna2008 later invalidated.</div><div dir="auto"><br></div><div dir="auto">The registrant experience is not so positive if the registry simply says &quot;oopsie, you can&#39;t have that anymore&quot;.  A registry could quote some well crafted wiggle words in a registration agreement in justifying invalidating a registration in such a manner, but the net effect is a horrible registrant experience.</div><div dir="auto"><br></div><div dir="auto">So do we just chalk up those sub-optimal registrant experiences to evolution?  I suggest that it actually would penalize pioneers and early adopters to do so.</div><div dir="auto"><br></div><div dir="auto">In the transition from 2003 to 2008 IDNA, registries recognized this, and we&#39;re careful in how they moved forward.</div><div dir="auto"><br></div><div dir="auto">The registry gets held back in cases where the specs collide -or- they have to make some bridge solution that supports both to the best of their ability.  This can prove challenging where other registries may use their own solutions.</div><div dir="auto"><br></div><div dir="auto">I took the long way around the barn to pay compliments to the efforts, and to hopefully inspire that we always keep in mind retroactive support and a positive registrant experience with IDN.</div><div dir="auto"><br></div><div dir="auto">-jothan</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Apr 27, 2017 12:27, &quot;Asmus Freytag&quot; &lt;<a href="mailto:asmusf@ix.netcom.com">asmusf@ix.netcom.com</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <div class="m_7648477781036115485moz-cite-prefix">On 4/27/2017 11:22 AM, Jothan Frakes

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div>

            <div class="m_7648477781036115485gmail_signature" data-smartmail="gmail_signature">

              <div dir="ltr">On Thu, Apr 27, 2017 at 7:37 AM, Andre

                Schappo <span dir="ltr">&lt;<a href="mailto:A.Schappo@lboro.ac.uk" target="_blank">A.Schappo@lboro.ac.uk</a>&gt;</span>

                wrote:<br>

              </div>

            </div>

          </div>

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div style="word-wrap:break-word">

                Some thoughts, having now caught up with all the UA

                emails on phishing.

                <div><br>

                </div>

                <div>①  Over the years there has been much discussion

                  about Cyrillic being used to masquerade as ASCII

                  domain names. I wonder if the Russian speaking

                  community have been having similar discussions with

                  respect to ASCII being used to masquerade as Cyrillic

                  domain names. </div>

                <div><br>

                </div>

              </div>

            </blockquote>

            <div><br>

              <br>

              Quick comments here (mostly for a wider reading audience):<br>

              <br>

              1] Need to include Greek in the Cyrillic/Latin (or &quot;ASCII&quot;

              as we call it here in this discussion) as being Homograph

              rich across all three from visually identical or near

              identicals</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Also arguably Armenian - the font used in the Unicode charts is not

    representative, and much Armenian fonts styles look more like Times

    or Helvetica, meaning that there are shapes like &quot;հ ս ո օ&quot;<br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

              2] I used to believe that there was a bright line between

              all Cyrillic and all Latin/ASCII - and I learned through

              the process of many wise people like Yuri and Dusan

              spending time to evolve my thinking that these may be

              intermixed under perfectly normal use, and this also

              varies by region.  We should not assume all of one or all

              of the other.</div>

            <div><br>

            </div>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    The limitation in thinking is that the &quot;go-to&quot; solution is to try to

    ban some code points, or to ban them in certain contexts. Which

    leads to the call for single-script labels (which, as we know 

    reduces, but does not remove the homograph attach surface).<br>

    <br>

    A more robust method is to make homoglyphs mutually exclusive in the

    registry. If a registered label has one code point at a certain

    position, the same label with the homoglyph substituted at the same

    position would be blocked. (&quot;Blocked variant&quot;)<br>

    <br>

    The technology to specify this used to exist in two slightly

    different forms; once for Arabic and once for CJK. These were

    defined in separate RFCs, with mutually incompatible plain-text

    formats.<br>

    <br>

    With RFC 7940 there is, for the first time, a universal XML schema

    to specify these kinds of relations. This should make it easy to

    generate shared libraries and toolsets that can  read/process these

    definitions. As a result, blocked variants are a technique that

    should become a standard methodology for registries.<br>

    <br>

    If you have blocked variants defined, then you can mix not just

    Cyrillic and Latin labels more safely, but also mix Latin and

    Cyrillic inside a single label without opening yourself up to

    homograph attacks.<br>

    <br>

    RFC 7940 is occasionally misunderstood as a prescription how to

    design Label Generation Rules (aka IDN tables). It is not, it is

    instead a description of a universal data format (in XML) that can

    represent pretty much anything needed for registration policies (on

    the code point level): for example, you can define which code points

    to allow, next to what other code points and what variants to block.<br>

    <br>

    It could use a bit of advertising. Perhaps it could be mentioned in

    comments to the IDN guidelines? (As a co-author, I&#39;m not eligible to

    make such comments myself). Not least because it unifies the

    description of blocked variants, it does have a clear place in the

    infrastructure needed to support universal acceptance.<br>

    <br>

    A./<br>

    <br>

    PS: For the root zone, we are planning to stick to single-script

    labels, but also to implement blocked variants across scripts. Some

    of the data in my cross-script variants collection comes from the

    relevant drafts for that project, other data comes from data derived

    from Unicode&#39;s UTR#39, and some is based on my own knowledge of

    certain scripts.<br>

    <br>

    PPS: I&#39;m attaching an update of my cross- script variants listing.

    The data for that exists in an XML file according to RFC7940; the

    HTML summary of that data is created by a simple tool. I would

    appreciate comments on the contents and description from anyone.<br>

    <br>

    PPPS: you may have noticed that I&#39;m not writing anything about

    allocatable variants. Their effect on the DNS is very different -

    they may be needed/useful in some context, but the motivation is not

    security. RFC 7940 allows you to define them where needed, including

    with the same semantics as in the existing RFCs if desired.<br>

    <p><br>

    </p>

  </div>

</blockquote></div></div>