<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Two things:<br>

      <br>

      1) By limiting the cross-script variants to code points that truly

      appear identical, no matter the font, you would take the judgement

      out of this. The two labels colliding are really not usable at the

      same time in the same zone, because, on some level, they are not

      just confusable, but "identical" (on all but the code point

      level). <br>

      <br>

      A registry that feels that they must "protect" their investment in

      that kind of clearly bogus situation doesn't have a very strong

      case.<br>

      <br>

      The situation is different for merely "similar" labels. For these,

      it may be difficult to tell them apart, but as a user, you can

      always establish (short of code point editing or comparing xn---

      labels) whether two labels are the same or not. The degree of

      acceptable similarity between labels becomes a mater of judgement;

      hence the reason to leave them out of consideration for this

      approach.<br>

      <br>

      2) You argument below assumes  - for what reason I don't know -

      that this technology cannot be applied selectively to new

      registrations (with all existing registrations grandfathered).

      There's nothing about variants that needs to be checked at lookup

      time; any processing would happen at application time.<br>

      <br>

      This is different from making a change to the protocol itself.<br>

      <br>

      A./<br>

      <br>

      PS: in case you think this is a cross-script issue only, have a

      look at 0259 and 01DD (both Latin) or TAMIL LETTER KA and TAMIL

      DIGIT ZERO.<br>

      <br>

      On 4/27/2017 5:57 PM, Jothan Frakes wrote:<br>

    </div>

    <blockquote

cite="mid:CAGrS0FL7TEwUzj2PTiUU91yMo3KV+wTA=8wzhaEwrnn40PhQWA@mail.gmail.com"

      type="cite">

      <div dir="auto">Asmus I agree with the wisdom and the approach of

        variant mapping like this. 

        <div dir="auto"><br>

        </div>

        <div dir="auto"> if we were all starting from scratch on IDN

          today, gosh, it would really be swell.</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">I am going to digress a tad on the homograph

          discussion into an issue that came to mind from the points in

          your postscripts.<br>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Backward compatibility and interop are a key

            area where friction happens each time something changes and

            renders characters invalid that were previously allowed.  </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">It has been a while since IDNA2008 (which

            replaced 2003) so one might reasonably expect something new.

             </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">There are still matters of supporting existing

            stuff commercially and being mindful and sensitive to the

            registrant experience.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Consider registries that sold domains to

            people who made websites and started communication on

            idna2003-valid domains that idna2008 later invalidated.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">The registrant experience is not so positive

            if the registry simply says "oopsie, you can't have that

            anymore".  A registry could quote some well crafted wiggle

            words in a registration agreement in justifying invalidating

            a registration in such a manner, but the net effect is a

            horrible registrant experience.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">So do we just chalk up those sub-optimal

            registrant experiences to evolution?  I suggest that it

            actually would penalize pioneers and early adopters to do

            so.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">In the transition from 2003 to 2008 IDNA,

            registries recognized this, and we're careful in how they

            moved forward.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">The registry gets held back in cases where the

            specs collide -or- they have to make some bridge solution

            that supports both to the best of their ability.  This can

            prove challenging where other registries may use their own

            solutions.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">I took the long way around the barn to pay

            compliments to the efforts, and to hopefully inspire that we

            always keep in mind retroactive support and a positive

            registrant experience with IDN.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">-jothan</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto"><br>

          </div>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Apr 27, 2017 12:27, "Asmus Freytag"

          &lt;<a moz-do-not-send="true"

            href="mailto:asmusf@ix.netcom.com">asmusf@ix.netcom.com</a>&gt;

          wrote:<br type="attribution">

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <div class="m_7648477781036115485moz-cite-prefix">On

                4/27/2017 11:22 AM, Jothan Frakes wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_extra">

                    <div>

                      <div class="m_7648477781036115485gmail_signature"

                        data-smartmail="gmail_signature">

                        <div dir="ltr">On Thu, Apr 27, 2017 at 7:37 AM,

                          Andre Schappo <span dir="ltr">&lt;<a

                              moz-do-not-send="true"

                              href="mailto:A.Schappo@lboro.ac.uk"

                              target="_blank">A.Schappo@lboro.ac.uk</a>&gt;</span>

                          wrote:<br>

                        </div>

                      </div>

                    </div>

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div style="word-wrap:break-word"> Some

                          thoughts, having now caught up with all the UA

                          emails on phishing.

                          <div><br>

                          </div>

                          <div>①  Over the years there has been much

                            discussion about Cyrillic being used to

                            masquerade as ASCII domain names. I wonder

                            if the Russian speaking community have been

                            having similar discussions with respect to

                            ASCII being used to masquerade as Cyrillic

                            domain names. </div>

                          <div><br>

                          </div>

                        </div>

                      </blockquote>

                      <div><br>

                        <br>

                        Quick comments here (mostly for a wider reading

                        audience):<br>

                        <br>

                        1] Need to include Greek in the Cyrillic/Latin

                        (or "ASCII" as we call it here in this

                        discussion) as being Homograph rich across all

                        three from visually identical or near identicals</div>

                    </div>

                  </div>

                </div>

              </blockquote>

              <br>

              Also arguably Armenian - the font used in the Unicode

              charts is not representative, and much Armenian fonts

              styles look more like Times or Helvetica, meaning that

              there are shapes like "հ ս ո օ"<br>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <div><br>

                        2] I used to believe that there was a bright

                        line between all Cyrillic and all Latin/ASCII -

                        and I learned through the process of many wise

                        people like Yuri and Dusan spending time to

                        evolve my thinking that these may be intermixed

                        under perfectly normal use, and this also varies

                        by region.  We should not assume all of one or

                        all of the other.</div>

                      <div><br>

                      </div>

                    </div>

                    <br>

                  </div>

                </div>

              </blockquote>

              The limitation in thinking is that the "go-to" solution is

              to try to ban some code points, or to ban them in certain

              contexts. Which leads to the call for single-script labels

              (which, as we know  reduces, but does not remove the

              homograph attach surface).<br>

              <br>

              A more robust method is to make homoglyphs mutually

              exclusive in the registry. If a registered label has one

              code point at a certain position, the same label with the

              homoglyph substituted at the same position would be

              blocked. ("Blocked variant")<br>

              <br>

              The technology to specify this used to exist in two

              slightly different forms; once for Arabic and once for

              CJK. These were defined in separate RFCs, with mutually

              incompatible plain-text formats.<br>

              <br>

              With RFC 7940 there is, for the first time, a universal

              XML schema to specify these kinds of relations. This

              should make it easy to generate shared libraries and

              toolsets that can  read/process these definitions. As a

              result, blocked variants are a technique that should

              become a standard methodology for registries.<br>

              <br>

              If you have blocked variants defined, then you can mix not

              just Cyrillic and Latin labels more safely, but also mix

              Latin and Cyrillic inside a single label without opening

              yourself up to homograph attacks.<br>

              <br>

              RFC 7940 is occasionally misunderstood as a prescription

              how to design Label Generation Rules (aka IDN tables). It

              is not, it is instead a description of a universal data

              format (in XML) that can represent pretty much anything

              needed for registration policies (on the code point

              level): for example, you can define which code points to

              allow, next to what other code points and what variants to

              block.<br>

              <br>

              It could use a bit of advertising. Perhaps it could be

              mentioned in comments to the IDN guidelines? (As a

              co-author, I'm not eligible to make such comments myself).

              Not least because it unifies the description of blocked

              variants, it does have a clear place in the infrastructure

              needed to support universal acceptance.<br>

              <br>

              A./<br>

              <br>

              PS: For the root zone, we are planning to stick to

              single-script labels, but also to implement blocked

              variants across scripts. Some of the data in my

              cross-script variants collection comes from the relevant

              drafts for that project, other data comes from data

              derived from Unicode's UTR#39, and some is based on my own

              knowledge of certain scripts.<br>

              <br>

              PPS: I'm attaching an update of my cross- script variants

              listing. The data for that exists in an XML file according

              to RFC7940; the HTML summary of that data is created by a

              simple tool. I would appreciate comments on the contents

              and description from anyone.<br>

              <br>

              PPPS: you may have noticed that I'm not writing anything

              about allocatable variants. Their effect on the DNS is

              very different - they may be needed/useful in some

              context, but the motivation is not security. RFC 7940

              allows you to define them where needed, including with the

              same semantics as in the existing RFCs if desired.<br>

              <p><br>

              </p>

            </div>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>