[CPWG] Variants and Process

gopal at annauniv.edu gopal at annauniv.edu
Sat Oct 23 02:32:34 UTC 2021


Dear Bill Jouris,

Many thanks again for your presentation to the CPWG on 6 October 2021.

It has been a fantastic effort by your Seven Member team from six 
different
countries.

Ref Slide #12: UNICODE 00FE and 01A5

The quantification for decision making was based on a 5-point linear 
scale and
the Seven experts using "2-4" range only. Also, this for three popular 
typefaces.

I know this is just one sample and your question in the next slide "How 
Much is
Enough ?" is very vital.

Is there a tool / simulator that makes it all more generic for larger 
samples, different
languages and different quantificatio scales such as the Likert Scale ?

We can then anticipate the code generator within acceptable confidence 
interval.

Once again a big thank you from me for such a nice work and 
presentation.

Please advise.

Sincerely,




Gopal T V
0 9840121302
https://vidwan.inflibnet.ac.in/profile/57545
https://www.facebook.com/gopal.tadepalli
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dr. T V Gopal
Professor
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
       (Res) 24454753
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On 2021-10-23 03:28, Bill Jouris via CPWG wrote:
> Dear Roberto,
> 
> Not all that off-topic.  In general, you are correct that combinations
> of letters got ignored.  For example, a Latin letter R, followed by a
> Latin Letter N is, to my mind, hard to distinguish from a Latin letter
> M.  If you saw .corn, would you realize it was about maize, rather
> than being a normal .com?  But it didn't get considered in identifying
> variants.
> 
> The Sharp S is the exception.  The panel concluded that the Sharp S
> (ß) and a double S (ss) are variants.  Most variants are
> bidirectional -- that is, it doesn't matter which one was registered
> first, the other is blocked.  But this case is different.  If the name
> with a double S is registered first, then the Sharp S is indeed
> blocked.  However, if the name with Sharp S is registered first, then
> the variant is considered "allocatable."  That is the same name with a
> double S rather than Sharp S _can_ be registered, provided:
> 1) ALL of the instances of Sharp S in the name (if there is more than
> one) are changed to double S, and
> 2) the name is registered to the same registrant.
> 
> On the other hand, the possibility of substituting a vowel with
> diaresis for the same vowel followed by E did not come up.  That is
> the way I learned German (as an American) long ago.  But the native
> German speakers on the Panel did not consider it worth worrying about.
> 
> 
> Sorry if that doesn't totally clarify things.  But that's all I've got
> on the subject.
> 
> Bill Jouris
> 
>  On Thursday, October 21, 2021, 11:47:20 PM PDT, Roberto Gaetano
> <roberto_gaetano at hotmail.com> wrote:
> 
>  Dear Bill,
> 
> I wonder whether I am off-topic with this question, but here it is
> anyway.
> Has the Latin GP considered an additional potential confusion coming
> from cases like the german equivalency between “ae” and “ä”
> or “ss” and “ß”? Just to make an example, the Austrian
> Touring Club (ÖAMTC) has the site oeamtc.at [2], as it is customary
> in german-speaking countries to get around the problem in this way.
> 
> This is most probably out of scope, because the work is likely to be
> limited to single characters and not combination of characters, but
> from the user’s point of view it could be a source of confusion
> anyway.
> 
> Thanks,
> Roberto
> 
>> On 21.10.2021, at 19:47, Bill Jouris via CPWG <cpwg at icann.org>
>> wrote:
>> 
>> Dear Olivier,
>> 
>> That is the problem I see as well.  My sense is the both the Latin
>> GP, and the Integration Panel (which is the next level higher)
>> desire primarily to minimize the number of variants.  Two codepoints
>> which are identical, such as the Latin schwa and the Latin turned E,
>> obviously cannot be distinguished by anyone, and so are necessarily
>> variants.  (Although one of my fellow Panel members argued against
>> variant status even for that specific case.)  But how strict the
>> constraints were on making two codepoints reflects that desire for
>> minimization.  Also, in at least one case, the Integration Panel
>> requested the Latin GP review (and modify) some variant findings
>> because one set of codepoints which were variants of each other was
>> "too large."  ("Too large" wasn't defined.  Nor was there indication
>> of why one would care. Certainly it wouldn't impact the performance
>> of the software doing the automatic filtering of proposed TLDs.)
>> 
>> Given that
>> a) the Panel members are experts,
>> b) we were doing side-by-side comparisons, and
>> c) we knew that we were looking at two different codepoints
>> it seemed to me that if any of us couldn't tell the difference, then
>> neither could the average user looking at a domain name in
>> isolation.  Setting a higher threshold seems to me like phishing,
>> and especially pharming, enablement.
>> 
>> It also might appear that having a group of codepoints which are not
>> variants, but which users cannot really distinguish, provides a
>> marketing opportunity.  Not to sell to bad actors, who are typically
>> one-off buyers and so not worth pursuing.  But to sell defensive
>> registrations to legitimate registrants, who merely want to make
>> sure that their customers find them.  Such defensive registrations
>> would be likely to be renewed indefinitely, making them worthwhile
>> even in a low margin business.**
>> 
>> Bill
>> 
>> ** 5 of the 7 members of the Latin Panel being employees of one or
>> another of the contracted parties.  I believe most of them were
>> sincerely making a good faith effort to do the right thing.  But
>> their experience there may nevertheless have colored their
>> perceptions.
>> 
>> Sent from Yahoo Mail on Android [1]
>> 
>> On Thu, Oct 21, 2021 at 2:13 AM, Olivier MJ Crépin-Leblond
>> <ocl at gih.com> wrote:
>> 
>> Dear Bill,
>> 
>> thank you for explaining this in further detail. The problem I see
>> with the process here, is that *experts* have been used to notice a
>> difference. Because they are experts, they might be able to see
>> differences which the average Internet end user will not. And this
>> is the concern I have: is the panel of experts being conservative
>> enough in making their decisions? If there is any suspicion about
>> two characters being a variant, would a conservative approach them
>> as variants?
>> What is the end goal of identifying variants? If it is to avoid the
>> use of IDNs for phishing, then the only approach possible should be
>> a conservative approach.
>> Kindest regards,
>> 
>> Olivier
>> 
>> On 21/10/2021 05:17, Bill Jouris via CPWG wrote:
>> 
>> After some of the discussion in the chat in this morning's meeting,
>> I feel like a little more extended discussion about variants might
>> be helpful.
>> 
>> The repertoire for the Latin script consists of "codepoints" -- some
>> are letters and some are letters plus diacritics.  "Variants" are
>> pairs of codepoints which are indistinguishable.  That is, in the
>> process that the Panel used, 5 of the 7 experts on the panel
>> couldn't see a difference.  The Latin GP did not look at diacritics
>> per se.  Just at codepoints which might involve diacritics.
>> 
>> Thus, a codepoint consisting of a letter with a caron diacritic ( ̌
>> ) and a codepoint with the same letter combined with a breve
>> diacritic (  ̆  ) may always result in a variant pair, but only
>> because the Panel's comparison worked out that way.  For example, a
>> G with caron (ǧ) and a G with breve (ğ) are variants.   On the
>> other hand, a caron and a macron ( ¯ ) never result in a variant
>> pair.
>> 
>> However some cases with diacritics are mixed.  For example, a
>> codepoint consisting of letter with a dot above ( ˙ ) and a
>> codepoint consisting of a letter with an acute accent results in a
>> variant pair for letters C (ċ vs ć), N (ṅ vs ń), and Z (ż vs
>> ź ). But, in the Panel's original finding, not for letters E (ė vs
>> é), and I (i vs í).
>> 
>> (Note that a majority of the Panel found the vowels to produce
>> variants as well.  Just not a supermajority, as required by the
>> process the Panel had adopted.  As a result, the Panel's official
>> position is that, in various cases not just this one, even though a
>> majority of the experts, looking side by side, could not see a
>> difference, the average "reasonably careful user" will somehow
>> magically notice the difference when looking at a domain name.)
>> 
>> Then we have cross-script variants, including those identified by
>> other Panels.  For example, the Greek Panel found that the Greek
>> letter Iota was a variant both of the Latin letter I and the Latin
>> letter I with acute.   As a result I and I with acute became
>> variants.
>> 
>> But there is no Greek letter which is a variant of the Latin letter
>> E.  So we are left with a situation where the dot above diacritic
>> and the acute produce variants for all letters EXCEPT for the letter
>> E.  (When I suggested that, for consistency, we should make the
>> letter E case a variant as well, the response was "It is more
>> important that we follow our process than that we have
>> consistency.")
>> 
>> TLDs consist of a series of codepoints.  Proposed TLDs which differ
>> _only_ by one or more variants from another TLD will be
>> automatically be rejected in the software.  For example, .çom
>> would be allowed, despite its similarity to .com, because C with
>> Cedilla is not a variant of C.  Also .сом (using Cyrillic
>> letters) would be allowed because, while C and the Cyrillic letter
>> Es are variants, and O and the Cyrillic letter O are variants, the
>> letter M and the Cyrillic letter Em are not variants (the Panel was
>> directed to ignore Upper Case when deciding what might confuse
>> users).  But .cóm could be rejected, because O and O with acute are
>> variants.
>> 
>> "Confusables" are pairs of codepoints which some for the experts
>> could not distinguish, just not enough to be designated as variants.
>> Confusables are intended as suggestions for the panel which will
>> manually review the proposed TLDs.
>> 
>> I hope this all will help everyone understand what we are looking at
>> here.
>> 
>> Regards,
>> Bill Jouris
>> 
>> _______________________________________________
>> CPWG mailing list
>> CPWG at icann.org
>> https://mm.icann.org/mailman/listinfo/cpwg
>> 
>> _______________________________________________
>> By submitting your personal data, you consent to the processing of
>> your personal data for purposes of subscribing to this mailing list
>> accordance with the ICANN Privacy Policy
>> (https://www.icann.org/privacy/policy) and the website Terms of
>> Service (https://www.icann.org/privacy/tos). You can visit the
>> Mailman link above to change your membership status or
>> configuration, including unsubscribing, setting digest-style
>> delivery or disabling delivery altogether (e.g., for a vacation),
>> and so on.
>> 
>> --
>> Olivier MJ Crépin-Leblond, PhD
>> http://www.gih.com/ocl.html
> 
>  _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
> 
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.
> 
> 
> 
> Links:
> ------
> [1]
> https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature
> [2] http://oeamtc.at
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
> 
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.


More information about the CPWG mailing list