[CPWG] Variants and Process

gopal at annauniv.edu gopal at annauniv.edu
Thu Oct 21 10:04:44 UTC 2021


Dear Mr. Bill Jouris,

Thank you for the lucid explanation of the technicalities. They are all
very important.

However, please refer to the attached image to help visualize "Similar" 
and "Variant".

I am sure it was from ICANN several years back and I may locate it on 
the Community Wiki
when I find time. But it is useful.

There are furtehr requirements due to:

Security
Predictability (variants should behave and function as users expect in 
their language and script environments)
Equivalency (variants must be managed by the same entity and direct 
users to related content)
Consistency (variants should behave similarly within and across TLDs and 
supporting technology)

The outcomes ought to be replete with Engineering Compromises with the 
limitations clearly
stated to improve the transparent implementation.

Hope the attached image make this clearer.

Sincerely,




Gopal T V
0 9840121302
https://vidwan.inflibnet.ac.in/profile/57545
https://www.facebook.com/gopal.tadepalli
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dr. T V Gopal
Professor
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
       (Res) 24454753
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On 2021-10-21 09:47, Bill Jouris via CPWG wrote:
> After some of the discussion in the chat in this morning's meeting, I
> feel like a little more extended discussion about variants might be
> helpful.
> 
> The repertoire for the Latin script consists of "codepoints" -- some
> are letters and some are letters plus diacritics.  "Variants" are
> pairs of codepoints which are indistinguishable.  That is, in the
> process that the Panel used, 5 of the 7 experts on the panel couldn't
> see a difference.  The Latin GP did not look at diacritics per se.
> Just at codepoints which might involve diacritics.
> 
> Thus, a codepoint consisting of a letter with a caron diacritic ( ̌ )
> and a codepoint with the same letter combined with a breve diacritic (
>  ̆  ) may always result in a variant pair, but only because the
> Panel's comparison worked out that way.  For example, a G with caron
> (ǧ) and a G with breve (ğ) are variants.   On the other hand, a
> caron and a macron ( ¯ ) never result in a variant pair.
> 
> However some cases with diacritics are mixed.  For example, a
> codepoint consisting of letter with a dot above ( ˙ ) and a codepoint
> consisting of a letter with an acute accent results in a variant pair
> for letters C (ċ vs ć), N (ṅ vs ń), and Z (ż vs ź ). But, in
> the Panel's original finding, not for letters E (ė vs é), and I (i
> vs í).
> 
> (Note that a majority of the Panel found the vowels to produce
> variants as well.  Just not a supermajority, as required by the
> process the Panel had adopted.  As a result, the Panel's official
> position is that, in various cases not just this one, even though a
> majority of the experts, looking side by side, could not see a
> difference, the average "reasonably careful user" will somehow
> magically notice the difference when looking at a domain name.)
> 
> Then we have cross-script variants, including those identified by
> other Panels.  For example, the Greek Panel found that the Greek
> letter Iota was a variant both of the Latin letter I and the Latin
> letter I with acute.   As a result I and I with acute became variants.
> 
> But there is no Greek letter which is a variant of the Latin letter E.
>  So we are left with a situation where the dot above diacritic and the
> acute produce variants for all letters EXCEPT for the letter E.  (When
> I suggested that, for consistency, we should make the letter E case a
> variant as well, the response was "It is more important that we follow
> our process than that we have consistency.")
> 
> TLDs consist of a series of codepoints.  Proposed TLDs which differ
> _only_ by one or more variants from another TLD will be automatically
> be rejected in the software.  For example, .çom  would be allowed,
> despite its similarity to .com, because C with Cedilla is not a
> variant of C.  Also .сом (using Cyrillic letters) would be allowed
> because, while C and the Cyrillic letter Es are variants, and O and
> the Cyrillic letter O are variants, the letter M and the Cyrillic
> letter Em are not variants (the Panel was directed to ignore Upper
> Case when deciding what might confuse users).  But .cóm could be
> rejected, because O and O with acute are variants.
> 
> "Confusables" are pairs of codepoints which some for the experts could
> not distinguish, just not enough to be designated as variants.
> Confusables are intended as suggestions for the panel which will
> manually review the proposed TLDs.
> 
> I hope this all will help everyone understand what we are looking at
> here.
> 
> Regards,
> Bill Jouris
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
> 
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Similar_Variants_Good_Image_21Oct.jpg
Type: image/jpeg
Size: 81920 bytes
Desc: not available
URL: <https://mm.icann.org/pipermail/cpwg/attachments/20211021/3cc69d0c/Similar_Variants_Good_Image_21Oct-0001.jpg>


More information about the CPWG mailing list