[UA-discuss] IDN Implementation Guidelines [RE: Re : And now about phishing...]

Asmus Freytag (c) asmusf at ix.netcom.com
Fri Apr 21 19:00:45 UTC 2017

On 4/21/2017 10:11 AM, Dusan Stojicevic wrote:

> And these are just brand names with Cyrillic... more of them can be made with other scripts (Armenian, Georgian, Greek, Arabic...).

Just hold on a minute.

We've just done a pretty thorough first pass over cross-script 
homoglyphs (the identical-looking code points, not the "looks the same 
if you squint at them at arms-length" variety).

The conclusion is that Armenian has a small number of letters (q, h, n, 
u,  o, and possibly g) that might qualify. In some fonts, they are 
rendered practically identically, in others not so much:

They are also less "useful" for whole script confusables, as they lack 
certain high frequency letters like "e", "a", "i", and "s"

/Armenian///    x x x    x   x       x
*etaoinshrdlcumwfgypbvkjxqz***x xxx xx  x      xx    x
/Cyrillic /

Now for Georgian, the same review concluded there is no high fidelity 
overlap (near identical pair of code points).

In Greek you have a real issue only to the extent that you show the 
address in uppercase. Most of the lowercase letters are pretty distinct 
(except for omicron, and nu (ν) looks more than a little bit like "v"). 
We had a strong debate on whether to take uppercase into account when 
deciding which code points constitute cross-script variants.

The conclusion we had was that the protocol is limited to lowercase for 
a reason.

If you consider uppercase, you get different pairs based on the two cases.

Capital N looks like "N", lowercase nu looks like "v". If you require 
variants to be transitive (very necessary for optimized evaluation), 
then you get "n" as a variant of "v" in Latin!

It works like this: Lowercase n is a case variant of cap N, N is a 
(homoglyph-)variant of Cap Nu, Cap Nu is a (case-)variant of lowercase 
nu, lowercase nu is a (homoglyph-)variant of v. When you traverse this 
chain, which is what defines transitivity, you can get from "n" to "v" 
inside the same script.

We figured that we had reached the limit of what you can address with 
variants in the registries at this point.

Finally, as for Arabic, I would like to see an example of a Latin label 
spoofed using only Arabic letters.

(It's possible to write "English" using Chinese characters that vaguely 
look like letters of the alphabet, but while you can read such texts, 
they look rather odd).

> Also agree entirely with Vittorio, and just want to add another layer of the problem - epic.com example use https, and while GeoTrust and at least one other CA have stopped issuing automated certificated for IDNs sometime ago for other reasons, this trend will be expected for others to follow.

Displaying some details about the domain/certificate owner (see my 
previous message) would seem to be more useful than showing an IDN as 
impenetrable xn-- label. The former works for phishing attacks against 
any scripts, the latter is only useful for people who can be expected to 
work entirely without IDNs.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170421/525c8d16/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gajhngegekcdapih.png
Type: image/png
Size: 4455 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170421/525c8d16/gajhngegekcdapih.png>

More information about the UA-discuss mailing list