<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 4/21/2017 10:11 AM, Dusan Stojicevic

      wrote:<br>

    </div>

    <br>

    <blockquote cite="mid:01ff01d2bac2$40936930$c1ba3b90$@dukes.in.rs"

      type="cite">

      <pre wrap="">

And these are just brand names with Cyrillic... more of them can be made with other scripts (Armenian, Georgian, Greek, Arabic...).</pre>

    </blockquote>

    <br>

    Just hold on a minute. <br>

    <br>

    We've just done a pretty thorough first pass over cross-script

    homoglyphs (the identical-looking code points, not the "looks the

    same if you squint at them at arms-length" variety).<br>

    <br>

    The conclusion is that Armenian has a small number of letters (q, h,

    n, u,  o, and possibly g) that might qualify. In some fonts, they

    are rendered practically identically, in others not so much:<br>

    <img src="cid:part1.0739E826.1C138E3D@ix.netcom.com" alt=""><br>

    <br>

    They are also less "useful" for whole script confusables, as they

    lack certain high frequency letters like "e", "a", "i", and "s"<br>

    <pre><i>Armenian</i><i>

</i>   x x x    x   x       x

<b>etaoinshrdlcumwfgypbvkjxqz</b><b>

</b>x xxx xx  x      xx    x 

<i>Cyrillic

</i></pre>

    <p>Now for Georgian, the same review concluded there is no high

      fidelity overlap (near identical pair of code points).</p>

    <p>In Greek you have a real issue only to the extent that you show

      the address in uppercase. Most of the lowercase letters are pretty

      distinct (except for omicron, and nu (ν) looks more than a little

      bit like "v"). We had a strong debate on whether to take uppercase

      into account when deciding which code points constitute

      cross-script variants.</p>

    <p>The conclusion we had was that the protocol is limited to

      lowercase for a reason.</p>

    <p>If you consider uppercase, you get different pairs based on the

      two cases.</p>

    <p>Capital N looks like "N", lowercase nu looks like "v". If you

      require variants to be transitive (very necessary for optimized

      evaluation), then you get "n" as a variant of "v" in Latin!</p>

    <p>It works like this: Lowercase n is a case variant of cap N, N is

      a (homoglyph-)variant of Cap Nu, Cap Nu is a (case-)variant of

      lowercase nu, lowercase nu is a (homoglyph-)variant of v. When you

      traverse this chain, which is what defines transitivity, you can

      get from "n" to "v" inside the same script.</p>

    <p>We figured that we had reached the limit of what you can address

      with variants in the registries at this point.</p>

    <p>Finally, as for Arabic, I would like to see an example of a Latin

      label spoofed using only Arabic letters.</p>

    <p>(It's possible to write "English" using Chinese characters that

      vaguely look like letters of the alphabet, but while you can read

      such texts, they look rather odd).<br>

    </p>

    <blockquote cite="mid:01ff01d2bac2$40936930$c1ba3b90$@dukes.in.rs"

      type="cite">

      <pre wrap="">

Also agree entirely with Vittorio, and just want to add another layer of the problem - epic.com example use https, and while GeoTrust and at least one other CA have stopped issuing automated certificated for IDNs sometime ago for other reasons, this trend will be expected for others to follow.</pre>

    </blockquote>

    <br>

    Displaying some details about the domain/certificate owner (see my

    previous message) would seem to be more useful than showing an IDN

    as impenetrable xn-- label. The former works for phishing attacks

    against any scripts, the latter is only useful for people who can be

    expected to work entirely without IDNs.<br>

    <br>

    A./<br>

  </body>

</html>