<html><head></head><body><div style="color:#000; background-color:#fff; font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px"><div id="yiv1531090464"><div id="yui_3_16_0_ym19_1_1535587971069_2877"><div style="color:#000;background-color:#fff;font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;" id="yui_3_16_0_ym19_1_1535587971069_2876"><div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3392"><div dir="ltr"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">I take a couple of things from this letter from the IP: <br clear="none"></span></div></div><div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3461"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- First, the IP considers security to be a major priority in our consideration of variants and/or confusibles. <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3709"><div dir="ltr" id="yui_3_16_0_ym19_1_1535587971069_3314"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- In pursuit of that end, the IP considers it important whether users will be able to distinguish between code points.  Especially when presented with only one code point, not two alternatives next to each other for comparison.  The IP's note talks specifically about diacritics below the line.  But those are hardly the only cases where this is an issue.<br clear="none"></span></div></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_4269"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_4928"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">Among languages written using the Latin script, pretty much all of them use the basic 26 letters (codepoints 0061 - 007A).  But we have identified over 100 additional glyphs in our repertoire.  All of those have one characteristic in common: The vast majority of Internet users are not familiar with a language which use them.  Because they are not familiar with those code points, they will have a challenge distinguishing between them.  <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5244"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5245"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">Consider, just by way of example, these 4 code points:</span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5461"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- ă (0103) <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5684"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- ǎ (01CE) <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5782"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"> - ā (0101)</span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5920"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- ã (00E3)<br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_6122"><div id="yui_3_16_0_ym19_1_1535587971069_3088"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">Someone who is not familiar with more than one of these will, inevitably, perceive the one that he is familiar with whenever presented with any of the 4.  People see what they expect to see, what is familiar.  That happens even when they might be physically capable of distinguishing between two points IF they were presented with them side-by-side.  Because, in the kind of phishing attack discussed in the e-mail, they aren't presented with two options.  They are presented by something that looks like what they are expecting to see, and don't see anything sufficiently amiss to doubt it.  <br></span></div><div id="yui_3_16_0_ym19_1_1535587971069_3089"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br></span></div><div id="yui_3_16_0_ym19_1_1535587971069_3090"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">A similar exercise can be done with <br clear="none"></span></div></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_10875"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- è (00E8) <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_11087"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- é (00E9)<br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_7368"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">- ė (0117)<br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_11470"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">Someone who is familiar with two of them can probably distinguish between those two.  But someone who (like a far larger number of Internet users) is only familiar with one will see what he knows.  And someone who (like the vast majority of Internet users who use the Latin script) is not familiar with any of them will only notice that there is something above the letter -- but not at all what that something is. <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_11303"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_7517"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">This is not to say that every letter plus diacritic is a variant of that same letter with any other diacritic.  Far from it.  But what we do have is something like this.  Take the letter A.  Our repertoire includes 24 variations, which gives almost 290 pairs.  Of those, one (0103 and 01CE) cannot be distinguished by eye at normal type sizes.  But another 51 are close enough that a normal user (i.e. someone who is not a trained linguist, who has not spent the last two years immersed in the various Latin script code points) will readily mistake one for another.  <br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_9458"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br clear="none"></span></div><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_9459"><div id="yui_3_16_0_ym19_1_1535587971069_3503"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">Now one might argue that all of those should be left as Confusibles.  (Ignoring that detail that there are an additional 170 or so pairs which really are confusibles.)  But what, exactly, is the benefit of refusing to acknowledge that they are variants? <br></span></div><div id="yui_3_16_0_ym19_1_1535587971069_3504"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br></span></div><div id="yui_3_16_0_ym19_1_1535587971069_3505"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438">I confess I cannot see any benefit to the Internet user community of drastically constricting what we consider a variant.  The closest thing to a benefit that I can see is that, by declining to spend the time actually checking the various pairs, we can finish sooner.  But that is a benefit to <i id="yui_3_16_0_ym19_1_1535587971069_3261">us</i> as individuals who are on the Latin GP; it isn't a benefit to the people who will be using the results of our efforts. <br clear="none"></span></div></div><div dir="ltr" id="yui_3_16_0_ym19_1_1535587971069_3224"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3438"><br clear="none"></span></div><div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3385"> </div><div class="yiv1531090464signature" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3359">Bill Jouris<br clear="none">Inside Products<br clear="none">bill.jouris@insidethestack.com<br clear="none">831-659-8360<br clear="none">925-855-9512 (direct)</div><div class="yiv1531090464qtdSeparateBR" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5361"><br clear="none"><br clear="none"></div></div></div></div><div class=".yiv1531090464yahoo_quoted">  <div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3852" style="font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3851" style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div class="qtdSeparateBR"><br><br></div><div class="yiv1531090464yqt3693521218" id="yiv1531090464yqt88175"><div dir="ltr" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_5362"> <font size="2" face="Arial"> </font><hr size="1"> <b><span style="font-weight:bold;">From:</span></b> Sarmad Hussain <sarmad.hussain@icann.org><br clear="none"> <b><span style="font-weight:bold;">To:</span></b> Latin GP <latingp@icann.org> <br clear="none"> <b><span style="font-weight:bold;">Sent:</span></b> Tuesday, August 28, 2018 11:58 PM<br clear="none"> <b><span style="font-weight:bold;">Subject:</span></b> [Latingp] From IP: Diacritics below a security risk?<br clear="none">  </div> <div class="yiv1531090464y_msg_container" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3850"><br clear="none"><div id="yiv1531090464">


<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3849">

Dear Latin GP members 

<div><br clear="none">

</div>

<div>Kindly find below some feedback from IP for your consideration.</div>

<div><br clear="none">

</div>

<div>Regards </div>

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3848">Sarmad <br clear="none">

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3847"><br clear="none">

<blockquote id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3856" type="cite">

<div class="yiv1531090464moz-forward-container" id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3855">

<div class="yiv1531090464MsoNormal" style="margin-bottom:12.0pt;"><br clear="none">

TO: LatinGP<br clear="none">

FROM: IP<br clear="none">

<br clear="none">

There are recent and widely published examples of phishing attacks using Latin IDNs in which the key features involved were diacritics below the letter. Here is an example:</div> 

<div><span style="font-family:sans-serif;"><img id="yiv1531090464_x0000_i1026" style="width:4.5625in;min-height:2.9791in;" src="cid:YSwES2segc1ZMoY0smB5" yahoo_partid="3" data-id="cb233ac9-4db4-8432-d85e-27b756c264b6" width="438" height="286" border="0"></span></div> 

<div><span style="font-family:sans-serif;">Of all diacritics, diacritics below can be difficult to distinguish or be prone to clipping -- there is less space below the baseline than between the typical lowercase glyph and the top of the line.</span></div> 

<div><span style="font-family:sans-serif;">The example given above shows a further interaction with URL underlining - and not all display engines actually do as nice a job interrupting the underline as in the screen shot above. For example, here is how

 one system will render this (using a designated UI font - Segoe UI):</span></div> 

<div><span style="font-family:sans-serif;"><img id="yiv1531090464_x0000_i1027" style="width:1.927in;min-height:.3333in;" src="cid:24QeV3jjjYNupwHq3n6o" data-id="869bcc6b-a669-2ea4-9ba5-3af375bcb28e" width="185" height="32" border="0"></span></div> 

<div><span style="font-family:sans-serif;">Note, this code point (U+1E33) is in the MSR as is (U+1E35 LATIN SMALL K WITH LINE BELOW).</span></div> 

<div><span style="font-family:sans-serif;"><img id="yiv1531090464_x0000_i1028" style="width:2.052in;min-height:.4895in;" src="cid:3UppKy2vodlY3FNEQMwR" data-id="38769c4a-7f1e-7e82-6bb0-a1d97dfc2976" width="197" height="47" border="0"></span></div> 

<div><span style="font-family:sans-serif;">The second example contains U+1E35 --  while the effect does not show equally at all type sizes, from 12pt and below the LINE BELOW is reliably hidden. Here are the two examples at 10pt</span></div> 

<div><span style="font-family:sans-serif;"><img id="yiv1531090464_x0000_i1029" style="width:1.052in;min-height:.5833in;" src="cid:9YMjyqfy4cjSEQ6H7USD" data-id="1ee9ba7a-f3b9-871d-1cce-7c6304dd12bf" width="101" height="56" border="0"></span></div> 

<div><span style="font-family:sans-serif;">The issue is not limited to "K". We see "B", "D", "L" and "N" with both DOT and LINE BELOW and "M" and "H" with DOT BELOW, all on the same page in the MSR.</span></div>

<div><span style="font-family:sans-serif;">It can be argued users have no working understanding of typography and would not reliably interpret small gaps or bulges in the underline as being related to an unfamiliar code point. This appears to make all

 diacritics below security-sensitive, however, the initial determination belongs to the relevant GPs.</span></div> 

<div><span style="font-family:sans-serif;">Note by the way that the Devanagari LGR treats

</span><span style="font-family:sans-serif;">sequences </span><span style="font-family:sans-serif;">containing NUKTA (a dot below) as variants in at least some cases and recent community comments for that script are calling for more variant

 sequences. However, while the feature is graphically analog (dot below), each script works differently and there is no single a-priori solution.<br clear="none">

</span></div>

<div><span style="font-family:sans-serif;"></span><span style="font-family:sans-serif;">The IP would like to encourage the LatinGP (and any other GP facing cases like this) to explicitly examine this example and other cases like it, where code

 points can become indistinguishable in common usage scenarios for IDNs, and formally conclude whether and how to take these into account when designing their LGR.</span></div> 

<div><span style="font-family:sans-serif;"></span></div> 

<span style="font-family:sans-serif;"></span> 

<div><span style="font-family:sans-serif;">At this point, the IP would expect the GP to:</span></div> 

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3863"><span style="font-family:sans-serif;">* explicitly discuss this and other scenarios like it</span></div> 

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3862"><span style="font-family:sans-serif;">* evaluate whether they constitute a security risk to the Root Zone</span></div> 

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3861"><span id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3860" style="font-family:sans-serif;">* come up with a reasoned decision as to whether and how to address them in the design of the Latin GP; and finally</span></div> 

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3859"><span style="font-family:sans-serif;">* document both the decision and its rationale.</span></div>

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3858"><span style="font-family:sans-serif;">In coming to a decision, the GP may resolve:</span></div>

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3857"><span style="font-family:sans-serif;">1) to make them variants</span></div>

<div id="yiv1531090464yui_3_16_0_ym19_1_1535583740924_3854"><span style="font-family:sans-serif;">2) to list them for attention as confusable<br clear="none">

</span></div>

<div><span style="font-family:sans-serif;">3) to take no action, because the GP feels that they do not represent a special security risk.</span></div>

<div><span style="font-family:sans-serif;">As part of the review of the Latin LGR, the IP will look at the background and rationale offered by the Latin GP in coming to its conclusion; note that if the IP feels that the facts considered and rationale

 documented do not support the conclusion reached by the GP it may raise objections at that time.</span></div>

</div>

</blockquote>

</div>

</div>

</div>

</div>_______________________________________________<br clear="none">Latingp mailing list<br clear="none"><a rel="nofollow" shape="rect" ymailto="mailto:Latingp@icann.org" target="_blank" href="mailto:Latingp@icann.org">Latingp@icann.org</a><br clear="none"><a rel="nofollow" shape="rect" target="_blank" href="https://mm.icann.org/mailman/listinfo/latingp">https://mm.icann.org/mailman/listinfo/latingp</a><br clear="none"><br clear="none"><br clear="none"></div></div> </div> </div>  </div></div></body></html>