[UA-discuss] Latin+Cyrillic — .com .?? .??

Fri Apr 28 01:44:18 UTC 2017

Two things:

1) By limiting the cross-script variants to code points that truly 
appear identical, no matter the font, you would take the judgement out 
of this. The two labels colliding are really not usable at the same time 
in the same zone, because, on some level, they are not just confusable, 
but "identical" (on all but the code point level).

A registry that feels that they must "protect" their investment in that 
kind of clearly bogus situation doesn't have a very strong case.

The situation is different for merely "similar" labels. For these, it 
may be difficult to tell them apart, but as a user, you can always 
establish (short of code point editing or comparing xn--- labels) 
whether two labels are the same or not. The degree of acceptable 
similarity between labels becomes a mater of judgement; hence the reason 
to leave them out of consideration for this approach.

2) You argument below assumes  - for what reason I don't know - that 
this technology cannot be applied selectively to new registrations (with 
all existing registrations grandfathered). There's nothing about 
variants that needs to be checked at lookup time; any processing would 
happen at application time.

This is different from making a change to the protocol itself.

A./

PS: in case you think this is a cross-script issue only, have a look at 
0259 and 01DD (both Latin) or TAMIL LETTER KA and TAMIL DIGIT ZERO.

On 4/27/2017 5:57 PM, Jothan Frakes wrote:
> Asmus I agree with the wisdom and the approach of variant mapping like 
> this.
>
>  if we were all starting from scratch on IDN today, gosh, it would 
> really be swell.
>
> I am going to digress a tad on the homograph discussion into an issue 
> that came to mind from the points in your postscripts.
>
> Backward compatibility and interop are a key area where friction 
> happens each time something changes and renders characters invalid 
> that were previously allowed.
>
> It has been a while since IDNA2008 (which replaced 2003) so one might 
> reasonably expect something new.
>
> There are still matters of supporting existing stuff commercially and 
> being mindful and sensitive to the registrant experience.
>
> Consider registries that sold domains to people who made websites and 
> started communication on idna2003-valid domains that idna2008 later 
> invalidated.
>
> The registrant experience is not so positive if the registry simply 
> says "oopsie, you can't have that anymore".  A registry could quote 
> some well crafted wiggle words in a registration agreement in 
> justifying invalidating a registration in such a manner, but the net 
> effect is a horrible registrant experience.
>
> So do we just chalk up those sub-optimal registrant experiences to 
> evolution?  I suggest that it actually would penalize pioneers and 
> early adopters to do so.
>
> In the transition from 2003 to 2008 IDNA, registries recognized this, 
> and we're careful in how they moved forward.
>
> The registry gets held back in cases where the specs collide -or- they 
> have to make some bridge solution that supports both to the best of 
> their ability.  This can prove challenging where other registries may 
> use their own solutions.
>
> I took the long way around the barn to pay compliments to the efforts, 
> and to hopefully inspire that we always keep in mind retroactive 
> support and a positive registrant experience with IDN.
>
> -jothan
>
>
>
>
>
>
>
> On Apr 27, 2017 12:27, "Asmus Freytag" <asmusf at ix.netcom.com 
> <mailto:asmusf at ix.netcom.com>> wrote:
>
>     On 4/27/2017 11:22 AM, Jothan Frakes wrote:
>>     On Thu, Apr 27, 2017 at 7:37 AM, Andre Schappo
>>     <A.Schappo at lboro.ac.uk <mailto:A.Schappo at lboro.ac.uk>> wrote:
>>
>>         Some thoughts, having now caught up with all the UA emails on
>>         phishing.
>>
>>         ①  Over the years there has been much discussion about
>>         Cyrillic being used to masquerade as ASCII domain names. I
>>         wonder if the Russian speaking community have been having
>>         similar discussions with respect to ASCII being used to
>>         masquerade as Cyrillic domain names.
>>
>>
>>
>>     Quick comments here (mostly for a wider reading audience):
>>
>>     1] Need to include Greek in the Cyrillic/Latin (or "ASCII" as we
>>     call it here in this discussion) as being Homograph rich across
>>     all three from visually identical or near identicals
>
>     Also arguably Armenian - the font used in the Unicode charts is
>     not representative, and much Armenian fonts styles look more like
>     Times or Helvetica, meaning that there are shapes like "հ ս ո օ"
>>
>>     2] I used to believe that there was a bright line between all
>>     Cyrillic and all Latin/ASCII - and I learned through the process
>>     of many wise people like Yuri and Dusan spending time to evolve
>>     my thinking that these may be intermixed under perfectly normal
>>     use, and this also varies by region.  We should not assume all of
>>     one or all of the other.
>>
>>
>     The limitation in thinking is that the "go-to" solution is to try
>     to ban some code points, or to ban them in certain contexts. Which
>     leads to the call for single-script labels (which, as we know 
>     reduces, but does not remove the homograph attach surface).
>
>     A more robust method is to make homoglyphs mutually exclusive in
>     the registry. If a registered label has one code point at a
>     certain position, the same label with the homoglyph substituted at
>     the same position would be blocked. ("Blocked variant")
>
>     The technology to specify this used to exist in two slightly
>     different forms; once for Arabic and once for CJK. These were
>     defined in separate RFCs, with mutually incompatible plain-text
>     formats.
>
>     With RFC 7940 there is, for the first time, a universal XML schema
>     to specify these kinds of relations. This should make it easy to
>     generate shared libraries and toolsets that can  read/process
>     these definitions. As a result, blocked variants are a technique
>     that should become a standard methodology for registries.
>
>     If you have blocked variants defined, then you can mix not just
>     Cyrillic and Latin labels more safely, but also mix Latin and
>     Cyrillic inside a single label without opening yourself up to
>     homograph attacks.
>
>     RFC 7940 is occasionally misunderstood as a prescription how to
>     design Label Generation Rules (aka IDN tables). It is not, it is
>     instead a description of a universal data format (in XML) that can
>     represent pretty much anything needed for registration policies
>     (on the code point level): for example, you can define which code
>     points to allow, next to what other code points and what variants
>     to block.
>
>     It could use a bit of advertising. Perhaps it could be mentioned
>     in comments to the IDN guidelines? (As a co-author, I'm not
>     eligible to make such comments myself). Not least because it
>     unifies the description of blocked variants, it does have a clear
>     place in the infrastructure needed to support universal acceptance.
>
>     A./
>
>     PS: For the root zone, we are planning to stick to single-script
>     labels, but also to implement blocked variants across scripts.
>     Some of the data in my cross-script variants collection comes from
>     the relevant drafts for that project, other data comes from data
>     derived from Unicode's UTR#39, and some is based on my own
>     knowledge of certain scripts.
>
>     PPS: I'm attaching an update of my cross- script variants listing.
>     The data for that exists in an XML file according to RFC7940; the
>     HTML summary of that data is created by a simple tool. I would
>     appreciate comments on the contents and description from anyone.
>
>     PPPS: you may have noticed that I'm not writing anything about
>     allocatable variants. Their effect on the DNS is very different -
>     they may be needed/useful in some context, but the motivation is
>     not security. RFC 7940 allows you to define them where needed,
>     including with the same semantics as in the existing RFCs if desired.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170427/c7dccf50/attachment.html>