[CPWG] Variants and Process

gopal at annauniv.edu gopal at annauniv.edu
Fri Oct 22 01:18:08 UTC 2021


Dear Mr. Bill Jouris,

Thank you for the response.

Yes. I do understand the need for choosing the phrases to fit into the
context of ICANN.

Warmest Regards




Gopal T V
0 9840121302
https://vidwan.inflibnet.ac.in/profile/57545
https://www.facebook.com/gopal.tadepalli
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dr. T V Gopal
Professor
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
       (Res) 24454753
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On 2021-10-22 01:19, Bill Jouris via CPWG wrote:
> Dear Dr. Gopal,
> 
> Thank you for that illustration.  It makes the point that there is no
> abrupt transition from identical to completely different.  Rather,
> there is a continuous variation in similarity.  The issue is where to
> draw the lines between Variant, Confusable, and Different.  That is,
> how much potential for confusion is acceptable.
> 
> That decision does involve a trade-off.  But the only trade-off that I
> can see is between the desire to allow registrants as much flexibility
> as possible to register exactly what they wish (in, for example, their
> own language) and the desire to minimize confusion for users of the
> DNS in general.
> 
> Oh yes, There is also potentially the desire of the contracted parties
> to have as many different choices to sell as possible -- although I
> would note that, in various ICANN meetings, the contracted parties and
> their representatives have been very adamant in insisting that this is
> not a consideration for them.  And hence would not seem like something
> that needs to be considered in making the trade-off decision.
> 
> Regards,
> 
> Bill Jouris
> 
>  On Thursday, October 21, 2021, 10:48:30 AM PDT, Bill Jouris via CPWG
> <cpwg at icann.org> wrote:
> 
> Dear Olivier,
> 
> That is the problem I see as well.  My sense is the both the Latin GP,
> and the Integration Panel (which is the next level higher) desire
> primarily to minimize the number of variants.  Two codepoints which
> are identical, such as the Latin schwa and the Latin turned E,
> obviously cannot be distinguished by anyone, and so are necessarily
> variants.  (Although one of my fellow Panel members argued against
> variant status even for that specific case.)  But how strict the
> constraints were on making two codepoints reflects that desire for
> minimization.  Also, in at least one case, the Integration Panel
> requested the Latin GP review (and modify) some variant findings
> because one set of codepoints which were variants of each other was
> "too large."  ("Too large" wasn't defined.  Nor was there indication
> of why one would care. Certainly it wouldn't impact the performance of
> the software doing the automatic filtering of proposed TLDs.)
> 
> Given that
>   a) the Panel members are experts,
>   b) we were doing side-by-side comparisons, and
>   c) we knew that we were looking at two different codepoints
> it seemed to me that if any of us couldn't tell the difference, then
> neither could the average user looking at a domain name in isolation.
> Setting a higher threshold seems to me like phishing, and especially
> pharming, enablement.
> 
> It also might appear that having a group of codepoints which are not
> variants, but which users cannot really distinguish, provides a
> marketing opportunity.  Not to sell to bad actors, who are typically
> one-off buyers and so not worth pursuing.  But to sell defensive
> registrations to legitimate registrants, who merely want to make sure
> that their customers find them.  Such defensive registrations would be
> likely to be renewed indefinitely, making them worthwhile even in a
> low margin business.**
> 
> Bill
> 
> ** 5 of the 7 members of the Latin Panel being employees of one or
> another of the contracted parties.  I believe most of them were
> sincerely making a good faith effort to do the right thing.  But their
> experience there may nevertheless have colored their perceptions.
> 
> Sent from Yahoo Mail on Android [1]
> 
>> On Thu, Oct 21, 2021 at 2:13 AM, Olivier MJ Crépin-Leblond
>> <ocl at gih.com> wrote:
>> 
>> Dear Bill,
>> 
>> thank you for explaining this in further detail. The problem I see
>> with the process here, is that *experts* have been used to notice a
>> difference. Because they are experts, they might be able to see
>> differences which the average Internet end user will not. And this
>> is the concern I have: is the panel of experts being conservative
>> enough in making their decisions? If there is any suspicion about
>> two characters being a variant, would a conservative approach them
>> as variants?
>> What is the end goal of identifying variants? If it is to avoid the
>> use of IDNs for phishing, then the only approach possible should be
>> a conservative approach.
>> Kindest regards,
>> 
>> Olivier
>> 
>> On 21/10/2021 05:17, Bill Jouris via CPWG wrote:
>> 
>>> 
>> 
>> After some of the discussion in the chat in this morning's meeting,
>> I feel like a little more extended discussion about variants might
>> be helpful.
>> 
>> The repertoire for the Latin script consists of "codepoints" -- some
>> are letters and some are letters plus diacritics.  "Variants" are
>> pairs of codepoints which are indistinguishable.  That is, in the
>> process that the Panel used, 5 of the 7 experts on the panel
>> couldn't see a difference.  The Latin GP did not look at diacritics
>> per se.  Just at codepoints which might involve diacritics.
>> 
>> Thus, a codepoint consisting of a letter with a caron diacritic ( ̌
>> ) and a codepoint with the same letter combined with a breve
>> diacritic (  ̆  ) may always result in a variant pair, but only
>> because the Panel's comparison worked out that way.  For example, a
>> G with caron (ǧ) and a G with breve (ğ) are variants.   On the
>> other hand, a caron and a macron ( ¯ ) never result in a variant
>> pair.
>> 
>> However some cases with diacritics are mixed.  For example, a
>> codepoint consisting of letter with a dot above ( ˙ ) and a
>> codepoint consisting of a letter with an acute accent results in a
>> variant pair for letters C (ċ vs ć), N (ṅ vs ń), and Z (ż vs
>> ź ). But, in the Panel's original finding, not for letters E (ė vs
>> é), and I (i vs í).
>> 
>> (Note that a majority of the Panel found the vowels to produce
>> variants as well.  Just not a supermajority, as required by the
>> process the Panel had adopted.  As a result, the Panel's official
>> position is that, in various cases not just this one, even though a
>> majority of the experts, looking side by side, could not see a
>> difference, the average "reasonably careful user" will somehow
>> magically notice the difference when looking at a domain name.)
>> 
>> Then we have cross-script variants, including those identified by
>> other Panels.  For example, the Greek Panel found that the Greek
>> letter Iota was a variant both of the Latin letter I and the Latin
>> letter I with acute.   As a result I and I with acute became
>> variants.
>> 
>> But there is no Greek letter which is a variant of the Latin letter
>> E.  So we are left with a situation where the dot above diacritic
>> and the acute produce variants for all letters EXCEPT for the letter
>> E.  (When I suggested that, for consistency, we should make the
>> letter E case a variant as well, the response was "It is more
>> important that we follow our process than that we have
>> consistency.")
>> 
>> TLDs consist of a series of codepoints.  Proposed TLDs which differ
>> _only_ by one or more variants from another TLD will be
>> automatically be rejected in the software.  For example, .çom
>> would be allowed, despite its similarity to .com, because C with
>> Cedilla is not a variant of C.  Also .сом (using Cyrillic
>> letters) would be allowed because, while C and the Cyrillic letter
>> Es are variants, and O and the Cyrillic letter O are variants, the
>> letter M and the Cyrillic letter Em are not variants (the Panel was
>> directed to ignore Upper Case when deciding what might confuse
>> users).  But .cóm could be rejected, because O and O with acute are
>> variants.
>> 
>> "Confusables" are pairs of codepoints which some for the experts
>> could not distinguish, just not enough to be designated as variants.
>> Confusables are intended as suggestions for the panel which will
>> manually review the proposed TLDs.
>> 
>> I hope this all will help everyone understand what we are looking at
>> here.
>> 
>> Regards,
>> Bill Jouris
>> 
>> _______________________________________________
>> CPWG mailing list
>> CPWG at icann.org
>> https://mm.icann.org/mailman/listinfo/cpwg
>> 
>> _______________________________________________
>> By submitting your personal data, you consent to the processing of
>> your personal data for purposes of subscribing to this mailing list
>> accordance with the ICANN Privacy Policy
>> (https://www.icann.org/privacy/policy) and the website Terms of
>> Service (https://www.icann.org/privacy/tos). You can visit the
>> Mailman link above to change your membership status or
>> configuration, including unsubscribing, setting digest-style
>> delivery or disabling delivery altogether (e.g., for a vacation),
>> and so on.
>> 
>> --
>> Olivier MJ Crépin-Leblond, PhD
>> http://www.gih.com/ocl.html
> 
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
> 
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.
> 
> Links:
> ------
> [1]
> https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
> 
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.


More information about the CPWG mailing list