<div dir="ltr">Dear colleagues,<div><br></div><div>I thought I'd try to take up this thread again after some silence. Hopefully Michael is back from a nice holiday and could chime in on the discussion too (I think he might have not received <span class="" id=":6bg.1" tabindex="-1" style="">Sarmad's</span> earlier email).</div><div><br></div><div>Obviously I can not say what was decided in Brussels, since I could not join the group, and that is why I had tried to put a question to our subgroup.</div><div><br></div><div>I think <span class="" id=":6bg.2" tabindex="-1" style="">Sarmad</span> has provided us with nearly all additional references we should consider as guidance on how to approach this highly complex task. My conclusion is that it is more complex than reducing things to "<span class="" id=":6bg.3" tabindex="-1" style="">homoglyphs</span>" but I do not think that (at least linguistically) we have a strong definition of <span class="" id=":6bg.4" tabindex="-1" style="">homoglyphs</span>, which would clearly set them apart from <span class="" id=":6bg.5" tabindex="-1" style="">confusables</span>, near-<span class="" id=":6bg.6" tabindex="-1" style="">confusables</span>, near-<span class="" id=":6bg.7" tabindex="-1" style="">homoglyphs</span> (and all the other terms we may have used to find an understanding of one another and the issues at hand).</div><div><br></div><div>Regarding the 'minority report' by Bill - Skimming over it I thought it would form an excellent basis for our chapter on variants and this work should not got to waste in my eyes. Equally, we can re-use <span class="" id=":6bg.8" tabindex="-1" style="">Sarmad's</span> summary and expand it to integrate it into the introduction of the variant section of the proposal. As I tried to argue in the last <span class="" id=":6bg.9" tabindex="-1" style="">tele</span>-conference, I believe it is important that we present not only the results or outcome of our work, but also the way we took to arrive at it, which means we have to discuss - at least briefly - the different considerations guiding our work.</div><div><br></div><div>As a pragmatic step I would suggest we continue for the moment with the very useful tables Dennis created, adding those few additional variant pairs I had suggested in comments. I don't think it would be too much overhead including them in our 2-pass review, and if both reviewers happen to come to the same conclusion that those potential variant pairs have a 3-5 rating - that is that they are in-fact not variants - we also do not need to have a theoretical discussion on the difference between <span class="" id=":6bg.10" tabindex="-1" style="">homoglyphs</span>, near-<span class="" id=":6bg.11" tabindex="-1" style="">homoglyphs</span>, <span class="" id=":6bg.12" tabindex="-1" style="">confusables</span>, etc. In this way, our decision would be driven by decisions based on a careful analysis of the data, rather than any a before-hand conceptions on what the categorical relationship exists between some of these code-points, that is an <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;font-size:small;float:none;display:inline">a <span class="" id=":6bg.14" tabindex="-1" style="">posteriori</span> rather than a <span class="" id=":6bg.15" tabindex="-1" style="">priori</span> analysis if you so will.<span> </span></span></div><div><br></div><div>I hope this is helpful but let's keep up the discussion. I think we were making good progress with the tables and the 1-5 rating scale (rather than a binary choice).</div><div><br>Best wishes,</div><div><br></div><div><span class="" id=":6bg.16" tabindex="-1" style="">Meikal</span></div><div class="gmail_extra"><br><div class="gmail_quote">On 19 May 2018 at 05:10, <span class="" id=":6bg.17" tabindex="-1" style="">Sarmad</span> <span class="" id=":6bg.18" tabindex="-1" style="">Hussain</span> <span dir="ltr"><<a href="mailto:sarmad.hussain@icann.org" target="_blank"><span class="" id=":6bg.19" tabindex="-1" style="">sarmad</span>.<span class="" id=":6bg.20" tabindex="-1" style="">hussain</span>@<span class="" id=":6bg.21" tabindex="-1" style="">icann</span>.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div class="m_-2380519825626997100WordSection1"><p class="MsoNormal">Dear All,<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">This is indeed a complex matter to address, and is therefore requiring this continued discussion.  It may also be useful here to refer back to the <a href="https://www.icann.org/en/system/files/files/lgr-procedure-20mar13-en.pdf" target="_blank">RZ-LGR Procedure</a>.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">The RZ-LGR Procedure, while defining “IDN variants” says that:<u></u><u></u></p><ul style="margin-top:0in" type="disc"><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">“An IDN variant, as understood here, is an alternate code point (or sequence of code points) that could be substituted for a code point (or sequence of code points) in a candidate label to create a variant label that is considered the “same” in some measure by a given community of Internet users.”<u></u><u></u></li></ul><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">However, the Procedure also acknowledges immediately following the definition that:<u></u><u></u></p><ul style="margin-top:0in" type="disc"><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in"> “There is not general agreement of what that sameness requires, and many of the things people seem to want from that sameness are not technically achievable.”<u></u><u></u></li></ul><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">While noting the benefits of defining IDN variants, the procedure also acknowledges the limitations.  <u></u><u></u></p><ul style="margin-top:0in" type="disc"><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">“The primary benefit of the LGR process is as a mechanism that delivers hands-off evaluation for these aspects. <u></u><u></u></li><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">“By doing so, the process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity, and which are not included in the LGR process.”  <u></u><u></u></li></ul><p class="MsoNormal">So, not all matters can be settled in the LGR.  A line has to be drawn between “same” and “similar”.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">The LGR Procedure does note what is desirable to be in the scope to LGR: <u></u><u></u></p><ul style="margin-top:0in" type="disc"><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">“the LGR process is designed to clear the table of all the straightforward, non-subjective cases, mainly by returning a “blocked” disposition.  <u></u><u></u></li><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">“Even for variants based on visual similarity, there exists a subset of evaluation rules that could be applied in an automated manner, obviating the need for further case-by case or even contextual review.”<u></u><u></u></li></ul><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">But notes that this should not go too far into the string similarity discussion: <u></u><u></u></p><ul style="margin-top:0in" type="disc"><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">“While the process described here could be expanded to address cases of visual similarity, that is not the primary intention”<u></u><u></u></li><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in"> “Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity.”<u></u><u></u></li></ul><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">One could infer from these statements in the RZ-LGR Procedure that:<u></u><u></u></p><ol style="margin-top:0in" start="1" type="1"><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">If two code points are considered “same” by the user community, these should be included as IDN variants (this is not limited to visual similarity, but could also include semantic equivalence, like in Chinese, orthographic conventions or spelling simplification, like in Arabic, homophonic relations, like in Ethiopic, etc., as determined the respective script community)<u></u><u></u></li><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">The “straightforward, non-subjective cases” of visual similarity could be included as IDN variants and blocked <u></u><u></u></li><li class="m_-2380519825626997100MsoListParagraph" style="margin-left:0in">Beyond these, the analysis goes into the realm of string similarity review, which is beyond the intention of the LGR<u></u><u></u></li></ol><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Generation Panels have been asked to draw the line based on these guidelines provided in the RZ-LGR Procedure.  For example, Cyrillic GP agreed to consider homoglyph relations with other related scripts for this purpose.  Neo-Brahmi GP has used a slightly different technique, where it considers cross-script variants those code points which members of both scripts in question find such code points “indistinguishable” even if these are not homoglyphs (see the <a href="https://www.icann.org/news/blog/the-south-asian-eleven-progress-on-supporting-idns-in-scripts-from-the-region" target="_blank">blog</a> for some more details).  <u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Of course, the Latin GP also needs to draw these lines for the analysis for identifying within-script and cross-script IDN variant cases.  <u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Regards,<br>Sarmad<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal"><u></u> <u></u></p><div><div style="border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in 0in 0in"><p class="MsoNormal"><b>From:</b> Latingp [mailto:<a href="mailto:latingp-bounces@icann.org" target="_blank">latingp-bounces@icann.<wbr>org</a>] <b>On Behalf Of </b>Bill Jouris<br><b>Sent:</b> Saturday, May 19, 2018 5:28 AM<span class=""><br><b>To:</b> Tan Tanaka, Dennis <<a href="mailto:dtantanaka@verisign.com" target="_blank">dtantanaka@verisign.com</a>>; Meikal Mumin <<a href="mailto:meikal@mumin.de" target="_blank">meikal@mumin.de</a>><br></span><b>Cc:</b> Tan Tanaka, Dennis via Latingp <<a href="mailto:latingp@icann.org" target="_blank">latingp@icann.org</a>><br><b>Subject:</b> Re: [Latingp] Variant cross-script analysis worksheets<u></u><u></u></p></div></div><div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3135"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"Helvetica Neue";color:black">It's been clear for some time, even before Brussels, that you think we should only look at homoglyphs.  (Also that you don't think that there are any in-script homoglyphs.  See the discussion about the schwa and the turned e.)  <br><br><u></u><u></u></span></p></div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3249"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"Helvetica Neue";color:black"><br><br><u></u><u></u></span></p></div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3250"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"Helvetica Neue";color:black">But there is a world of difference between agreeing, and merely deciding not to waste time arguing with a closed mind.  Which, for me, is what happened in the discussion in Brussels.  <u></u><u></u></span></p></div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3154"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"Helvetica Neue";color:black"> <u></u><u></u></span></p></div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3155"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"Helvetica Neue";color:black">Bill Jouris<br>Inside Products<br><a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.com</a><br>831-659-8360<br>925-855-9512 (direct)<u></u><u></u></span></p></div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3268"><p class="MsoNormal" style="margin-bottom:12.0pt;background:white"><span style="font-size:12.0pt;font-family:"Helvetica Neue";color:black"><u></u> <u></u></span></p></div><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3274"><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3273"><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3272"><div id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3271"><div class="MsoNormal" align="center" style="text-align:center;background:white"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black"><hr size="1" width="100%" align="center"></span></div><p class="MsoNormal" style="background:white"><b id="m_-2380519825626997100yui_3_16_0_1_1526689491810_3276"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">From:</span></b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black"> "Tan Tanaka, Dennis" <<a href="mailto:dtantanaka@verisign.com" target="_blank">dtantanaka@verisign.com</a>><br><b>To:</b> Bill Jouris <<a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.<wbr>com</a>>; Meikal Mumin <<a href="mailto:meikal@mumin.de" target="_blank">meikal@mumin.de</a>> <br><b>Cc:</b> Michael Bauland <<a href="mailto:Michael.Bauland@knipp.de" target="_blank">Michael.Bauland@knipp.de</a>>; "Tan Tanaka, Dennis via Latingp" <<a href="mailto:latingp@icann.org" target="_blank">latingp@icann.org</a>><br><b>Sent:</b> Friday, May 18, 2018 1:43 PM<br><b>Subject:</b> Re: [Latingp] Variant cross-script analysis worksheets</span><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"><u></u><u></u></span></p></div><div><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"><u></u> <u></u></span></p><div id="m_-2380519825626997100yiv8870410561"><div><div><div><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black">I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not?<u></u><u></u></span></p></div><div><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> <u></u><u></u></span></p></div><div id="m_-2380519825626997100yiv8870410561yqt58166"><div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in"><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><b><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black">From: </span></b><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black">Bill Jouris <<a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.<wbr>com</a>><br><b>Reply-To: </b>Bill Jouris <<a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.<wbr>com</a>><br><b>Date: </b>Friday, May 18, 2018 at 2:18 PM<br><b>To: </b>Dennis Tan Tanaka <<a href="mailto:dtantanaka@verisign.com" target="_blank">dtantanaka@verisign.com</a>>, Meikal Mumin <<a href="mailto:meikal@mumin.de" target="_blank">meikal@mumin.de</a>><br><b>Cc: </b>Michael Bauland <<a href="mailto:Michael.Bauland@knipp.de" target="_blank">Michael.Bauland@knipp.de</a>>, "Tan Tanaka, Dennis via Latingp" <<a href="mailto:latingp@icann.org" target="_blank">latingp@icann.org</a>><br><b>Subject: </b>[EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets<u></u><u></u></span></p></div></div><div><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> <u></u><u></u></span></p></div></div><div><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15024"><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black">It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with <i id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15069">Variants</i>.  Which include, <b id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15074">but are NOT limited to</b>, homoglyphs.  <u></u><u></u></span></p></div></div><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15025"><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> <u></u><u></u></span></p></div></div><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15026"><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black">Bill Jouris<br>Inside Products<br><a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.com</a><br>831-659-8360<br>925-855-9512 (direct)<u></u><u></u></span></p></div></div><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15043"><div style="margin-left:.5in;margin-bottom:12.0pt"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> <u></u><u></u></span></p></div></div><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15047"><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15046"><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15045"><div id="m_-2380519825626997100yiv8870410561yui_3_16_0_ym19_1_1526652361242_15044"><div style="margin-left:.5in"><div class="MsoNormal" align="center" style="text-align:center;background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"><hr size="1" width="96%" align="center"></span></div></div><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><b><span style="font-size:10.0pt;font-family:"HelveticaNeue",serif;color:black">From:</span></b><span style="font-size:10.0pt;font-family:"HelveticaNeue",serif;color:black"> "Tan Tanaka, Dennis" <<a href="mailto:dtantanaka@verisign.com" target="_blank">dtantanaka@verisign.com</a>><br><b>To:</b> Meikal Mumin <<a href="mailto:meikal@mumin.de" target="_blank">meikal@mumin.de</a>> <br><b>Cc:</b> "<a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.<wbr>com</a>" <<a href="mailto:bill.jouris@insidethestack.com" target="_blank">bill.jouris@insidethestack.<wbr>com</a>>; Michael Bauland <<a href="mailto:Michael.Bauland@knipp.de" target="_blank">Michael.Bauland@knipp.de</a>>; "Tan Tanaka, Dennis via Latingp" <<a href="mailto:latingp@icann.org" target="_blank">latingp@icann.org</a>><br><b>Sent:</b> Friday, May 18, 2018 10:20 AM<br><b>Subject:</b> Re: [Latingp] Variant cross-script analysis worksheets</span><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"><u></u><u></u></span></p></div></div><div><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> <u></u><u></u></span></p></div><div id="m_-2380519825626997100yiv8870410561"><div><div><div id="m_-2380519825626997100yiv8870410561yqtfd43584"><div><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> </span><span style="font-family:"Arial",sans-serif;color:black"><u></u><u></u></span></p></div></div></div><div id="m_-2380519825626997100yiv8870410561yqtfd29825"><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:13.5pt;font-family:"HelveticaNeue",serif;color:black">we must deal with such confusable characters or sequences of characters in the context of variants</span><span style="font-family:"Arial",sans-serif;color:black"><u></u><u></u></span></p></div></div><div><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> </span><span style="font-family:"Arial",sans-serif;color:black"><u></u><u></u></span></p></div></div><div><div style="margin-left:.5in"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black">No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.</span><span style="font-family:"Arial",sans-serif;color:black"><u></u><u></u></span></p></div></div></div></div></div><div style="margin-left:.5in;margin-bottom:12.0pt"><p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"> <u></u><u></u></span></p></div></div></div></div></div></div></div></div></div></div><p class="MsoNormal" style="margin-bottom:12.0pt;background:white"><span style="font-size:12.0pt;font-family:"HelveticaNeue",serif;color:black"><u></u> <u></u></span></p></div></div></div></div></div></div></div></div></div></blockquote></div><br></div></div>