<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 4/21/2017 10:11 AM, Dusan Stojicevic
wrote:<br>
</div>
<br>
<blockquote cite="mid:01ff01d2bac2$40936930$c1ba3b90$@dukes.in.rs"
type="cite">
<pre wrap="">
And these are just brand names with Cyrillic... more of them can be made with other scripts (Armenian, Georgian, Greek, Arabic...).</pre>
</blockquote>
<br>
Just hold on a minute. <br>
<br>
We've just done a pretty thorough first pass over cross-script
homoglyphs (the identical-looking code points, not the "looks the
same if you squint at them at arms-length" variety).<br>
<br>
The conclusion is that Armenian has a small number of letters (q, h,
n, u, o, and possibly g) that might qualify. In some fonts, they
are rendered practically identically, in others not so much:<br>
<img src="cid:part1.0739E826.1C138E3D@ix.netcom.com" alt=""><br>
<br>
They are also less "useful" for whole script confusables, as they
lack certain high frequency letters like "e", "a", "i", and "s"<br>
<pre><i>Armenian</i><i>
</i> x x x x x x
<b>etaoinshrdlcumwfgypbvkjxqz</b><b>
</b>x xxx xx x xx x
<i>Cyrillic
</i></pre>
<p>Now for Georgian, the same review concluded there is no high
fidelity overlap (near identical pair of code points).</p>
<p>In Greek you have a real issue only to the extent that you show
the address in uppercase. Most of the lowercase letters are pretty
distinct (except for omicron, and nu (ν) looks more than a little
bit like "v"). We had a strong debate on whether to take uppercase
into account when deciding which code points constitute
cross-script variants.</p>
<p>The conclusion we had was that the protocol is limited to
lowercase for a reason.</p>
<p>If you consider uppercase, you get different pairs based on the
two cases.</p>
<p>Capital N looks like "N", lowercase nu looks like "v". If you
require variants to be transitive (very necessary for optimized
evaluation), then you get "n" as a variant of "v" in Latin!</p>
<p>It works like this: Lowercase n is a case variant of cap N, N is
a (homoglyph-)variant of Cap Nu, Cap Nu is a (case-)variant of
lowercase nu, lowercase nu is a (homoglyph-)variant of v. When you
traverse this chain, which is what defines transitivity, you can
get from "n" to "v" inside the same script.</p>
<p>We figured that we had reached the limit of what you can address
with variants in the registries at this point.</p>
<p>Finally, as for Arabic, I would like to see an example of a Latin
label spoofed using only Arabic letters.</p>
<p>(It's possible to write "English" using Chinese characters that
vaguely look like letters of the alphabet, but while you can read
such texts, they look rather odd).<br>
</p>
<blockquote cite="mid:01ff01d2bac2$40936930$c1ba3b90$@dukes.in.rs"
type="cite">
<pre wrap="">
Also agree entirely with Vittorio, and just want to add another layer of the problem - epic.com example use https, and while GeoTrust and at least one other CA have stopped issuing automated certificated for IDNs sometime ago for other reasons, this trend will be expected for others to follow.</pre>
</blockquote>
<br>
Displaying some details about the domain/certificate owner (see my
previous message) would seem to be more useful than showing an IDN
as impenetrable xn-- label. The former works for phishing attacks
against any scripts, the latter is only useful for people who can be
expected to work entirely without IDNs.<br>
<br>
A./<br>
</body>
</html>