[CPWG] Variants and Process
gopal at annauniv.edu
gopal at annauniv.edu
Sat Oct 23 02:32:34 UTC 2021
Dear Bill Jouris,
Many thanks again for your presentation to the CPWG on 6 October 2021.
It has been a fantastic effort by your Seven Member team from six
different
countries.
Ref Slide #12: UNICODE 00FE and 01A5
The quantification for decision making was based on a 5-point linear
scale and
the Seven experts using "2-4" range only. Also, this for three popular
typefaces.
I know this is just one sample and your question in the next slide "How
Much is
Enough ?" is very vital.
Is there a tool / simulator that makes it all more generic for larger
samples, different
languages and different quantificatio scales such as the Likert Scale ?
We can then anticipate the code generator within acceptable confidence
interval.
Once again a big thank you from me for such a nice work and
presentation.
Please advise.
Sincerely,
Gopal T V
0 9840121302
https://vidwan.inflibnet.ac.in/profile/57545
https://www.facebook.com/gopal.tadepalli
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dr. T V Gopal
Professor
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
(Res) 24454753
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 2021-10-23 03:28, Bill Jouris via CPWG wrote:
> Dear Roberto,
>
> Not all that off-topic. In general, you are correct that combinations
> of letters got ignored. For example, a Latin letter R, followed by a
> Latin Letter N is, to my mind, hard to distinguish from a Latin letter
> M. If you saw .corn, would you realize it was about maize, rather
> than being a normal .com? But it didn't get considered in identifying
> variants.
>
> The Sharp S is the exception. The panel concluded that the Sharp S
> (ß) and a double S (ss) are variants. Most variants are
> bidirectional -- that is, it doesn't matter which one was registered
> first, the other is blocked. But this case is different. If the name
> with a double S is registered first, then the Sharp S is indeed
> blocked. However, if the name with Sharp S is registered first, then
> the variant is considered "allocatable." That is the same name with a
> double S rather than Sharp S _can_ be registered, provided:
> 1) ALL of the instances of Sharp S in the name (if there is more than
> one) are changed to double S, and
> 2) the name is registered to the same registrant.
>
> On the other hand, the possibility of substituting a vowel with
> diaresis for the same vowel followed by E did not come up. That is
> the way I learned German (as an American) long ago. But the native
> German speakers on the Panel did not consider it worth worrying about.
>
>
> Sorry if that doesn't totally clarify things. But that's all I've got
> on the subject.
>
> Bill Jouris
>
> On Thursday, October 21, 2021, 11:47:20 PM PDT, Roberto Gaetano
> <roberto_gaetano at hotmail.com> wrote:
>
> Dear Bill,
>
> I wonder whether I am off-topic with this question, but here it is
> anyway.
> Has the Latin GP considered an additional potential confusion coming
> from cases like the german equivalency between “ae” and “ä”
> or “ss” and “ß”? Just to make an example, the Austrian
> Touring Club (ÖAMTC) has the site oeamtc.at [2], as it is customary
> in german-speaking countries to get around the problem in this way.
>
> This is most probably out of scope, because the work is likely to be
> limited to single characters and not combination of characters, but
> from the user’s point of view it could be a source of confusion
> anyway.
>
> Thanks,
> Roberto
>
>> On 21.10.2021, at 19:47, Bill Jouris via CPWG <cpwg at icann.org>
>> wrote:
>>
>> Dear Olivier,
>>
>> That is the problem I see as well. My sense is the both the Latin
>> GP, and the Integration Panel (which is the next level higher)
>> desire primarily to minimize the number of variants. Two codepoints
>> which are identical, such as the Latin schwa and the Latin turned E,
>> obviously cannot be distinguished by anyone, and so are necessarily
>> variants. (Although one of my fellow Panel members argued against
>> variant status even for that specific case.) But how strict the
>> constraints were on making two codepoints reflects that desire for
>> minimization. Also, in at least one case, the Integration Panel
>> requested the Latin GP review (and modify) some variant findings
>> because one set of codepoints which were variants of each other was
>> "too large." ("Too large" wasn't defined. Nor was there indication
>> of why one would care. Certainly it wouldn't impact the performance
>> of the software doing the automatic filtering of proposed TLDs.)
>>
>> Given that
>> a) the Panel members are experts,
>> b) we were doing side-by-side comparisons, and
>> c) we knew that we were looking at two different codepoints
>> it seemed to me that if any of us couldn't tell the difference, then
>> neither could the average user looking at a domain name in
>> isolation. Setting a higher threshold seems to me like phishing,
>> and especially pharming, enablement.
>>
>> It also might appear that having a group of codepoints which are not
>> variants, but which users cannot really distinguish, provides a
>> marketing opportunity. Not to sell to bad actors, who are typically
>> one-off buyers and so not worth pursuing. But to sell defensive
>> registrations to legitimate registrants, who merely want to make
>> sure that their customers find them. Such defensive registrations
>> would be likely to be renewed indefinitely, making them worthwhile
>> even in a low margin business.**
>>
>> Bill
>>
>> ** 5 of the 7 members of the Latin Panel being employees of one or
>> another of the contracted parties. I believe most of them were
>> sincerely making a good faith effort to do the right thing. But
>> their experience there may nevertheless have colored their
>> perceptions.
>>
>> Sent from Yahoo Mail on Android [1]
>>
>> On Thu, Oct 21, 2021 at 2:13 AM, Olivier MJ Crépin-Leblond
>> <ocl at gih.com> wrote:
>>
>> Dear Bill,
>>
>> thank you for explaining this in further detail. The problem I see
>> with the process here, is that *experts* have been used to notice a
>> difference. Because they are experts, they might be able to see
>> differences which the average Internet end user will not. And this
>> is the concern I have: is the panel of experts being conservative
>> enough in making their decisions? If there is any suspicion about
>> two characters being a variant, would a conservative approach them
>> as variants?
>> What is the end goal of identifying variants? If it is to avoid the
>> use of IDNs for phishing, then the only approach possible should be
>> a conservative approach.
>> Kindest regards,
>>
>> Olivier
>>
>> On 21/10/2021 05:17, Bill Jouris via CPWG wrote:
>>
>> After some of the discussion in the chat in this morning's meeting,
>> I feel like a little more extended discussion about variants might
>> be helpful.
>>
>> The repertoire for the Latin script consists of "codepoints" -- some
>> are letters and some are letters plus diacritics. "Variants" are
>> pairs of codepoints which are indistinguishable. That is, in the
>> process that the Panel used, 5 of the 7 experts on the panel
>> couldn't see a difference. The Latin GP did not look at diacritics
>> per se. Just at codepoints which might involve diacritics.
>>
>> Thus, a codepoint consisting of a letter with a caron diacritic ( ̌
>> ) and a codepoint with the same letter combined with a breve
>> diacritic ( ̆ ) may always result in a variant pair, but only
>> because the Panel's comparison worked out that way. For example, a
>> G with caron (ǧ) and a G with breve (ğ) are variants. On the
>> other hand, a caron and a macron ( ¯ ) never result in a variant
>> pair.
>>
>> However some cases with diacritics are mixed. For example, a
>> codepoint consisting of letter with a dot above ( ˙ ) and a
>> codepoint consisting of a letter with an acute accent results in a
>> variant pair for letters C (ċ vs ć), N (ṅ vs ń), and Z (ż vs
>> ź ). But, in the Panel's original finding, not for letters E (ė vs
>> é), and I (i vs í).
>>
>> (Note that a majority of the Panel found the vowels to produce
>> variants as well. Just not a supermajority, as required by the
>> process the Panel had adopted. As a result, the Panel's official
>> position is that, in various cases not just this one, even though a
>> majority of the experts, looking side by side, could not see a
>> difference, the average "reasonably careful user" will somehow
>> magically notice the difference when looking at a domain name.)
>>
>> Then we have cross-script variants, including those identified by
>> other Panels. For example, the Greek Panel found that the Greek
>> letter Iota was a variant both of the Latin letter I and the Latin
>> letter I with acute. As a result I and I with acute became
>> variants.
>>
>> But there is no Greek letter which is a variant of the Latin letter
>> E. So we are left with a situation where the dot above diacritic
>> and the acute produce variants for all letters EXCEPT for the letter
>> E. (When I suggested that, for consistency, we should make the
>> letter E case a variant as well, the response was "It is more
>> important that we follow our process than that we have
>> consistency.")
>>
>> TLDs consist of a series of codepoints. Proposed TLDs which differ
>> _only_ by one or more variants from another TLD will be
>> automatically be rejected in the software. For example, .çom
>> would be allowed, despite its similarity to .com, because C with
>> Cedilla is not a variant of C. Also .сом (using Cyrillic
>> letters) would be allowed because, while C and the Cyrillic letter
>> Es are variants, and O and the Cyrillic letter O are variants, the
>> letter M and the Cyrillic letter Em are not variants (the Panel was
>> directed to ignore Upper Case when deciding what might confuse
>> users). But .cóm could be rejected, because O and O with acute are
>> variants.
>>
>> "Confusables" are pairs of codepoints which some for the experts
>> could not distinguish, just not enough to be designated as variants.
>> Confusables are intended as suggestions for the panel which will
>> manually review the proposed TLDs.
>>
>> I hope this all will help everyone understand what we are looking at
>> here.
>>
>> Regards,
>> Bill Jouris
>>
>> _______________________________________________
>> CPWG mailing list
>> CPWG at icann.org
>> https://mm.icann.org/mailman/listinfo/cpwg
>>
>> _______________________________________________
>> By submitting your personal data, you consent to the processing of
>> your personal data for purposes of subscribing to this mailing list
>> accordance with the ICANN Privacy Policy
>> (https://www.icann.org/privacy/policy) and the website Terms of
>> Service (https://www.icann.org/privacy/tos). You can visit the
>> Mailman link above to change your membership status or
>> configuration, including unsubscribing, setting digest-style
>> delivery or disabling delivery altogether (e.g., for a vacation),
>> and so on.
>>
>> --
>> Olivier MJ Crépin-Leblond, PhD
>> http://www.gih.com/ocl.html
>
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
>
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.
>
>
>
> Links:
> ------
> [1]
> https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature
> [2] http://oeamtc.at
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
>
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.
More information about the CPWG
mailing list