[CPWG] Variants and Process

Cheryl Langdon-Orr langdonorr at gmail.com
Thu Oct 21 08:16:32 UTC 2021


Thank you Bill...

On Thu, Oct 21, 2021, 15:18 Bill Jouris via CPWG <cpwg at icann.org> wrote:

> After some of the discussion in the chat in this morning's meeting, I feel
> like a little more extended discussion about variants might be helpful.
>
> The repertoire for the Latin script consists of "codepoints" -- some are
> letters and some are letters plus diacritics.  "Variants" are pairs of
> codepoints which are indistinguishable.  That is, in the process that the
> Panel used, 5 of the 7 experts on the panel couldn't see a difference.  The
> Latin GP did not look at diacritics per se.  Just at codepoints which might
> involve diacritics.
>
> Thus, a codepoint consisting of a letter with a caron diacritic ( ̌ ) and
> a codepoint with the same letter combined with a breve diacritic (  ̆  )
> may always result in a variant pair, but only because the Panel's
> comparison worked out that way.  For example, a G with caron (ǧ) and a G
> with breve (ğ) are variants.   On the other hand, a caron and a macron ( ¯ )
> never result in a variant pair.
>
> However some cases with diacritics are mixed.  For example, a codepoint
> consisting of letter with a dot above ( ˙ ) and a codepoint consisting of
> a letter with an acute accent results in a variant pair for letters C (ċ
> vs ć), N (ṅ vs ń), and Z (ż vs ź ). But, in the Panel's original finding,
> not for letters E (ė vs é), and I (i vs í).
>
> (Note that a majority of the Panel found the vowels to produce variants as
> well.  Just not a supermajority, as required by the process the Panel had
> adopted.  As a result, the Panel's official position is that, in various
> cases not just this one, even though a majority of the experts, looking
> side by side, could not see a difference, the average "reasonably careful
> user" will somehow magically notice the difference when looking at a domain
> name.)
>
> Then we have cross-script variants, including those identified by other
> Panels.  For example, the Greek Panel found that the Greek letter Iota was
> a variant both of the Latin letter I and the Latin letter I with acute.
>  As a result I and I with acute became variants.
>
> But there is no Greek letter which is a variant of the Latin letter E.  So
> we are left with a situation where the dot above diacritic and the acute
> produce variants for all letters EXCEPT for the letter E.  (When I
> suggested that, for consistency, we should make the letter E case a variant
> as well, the response was "It is more important that we follow our process
> than that we have consistency.")
>
> TLDs consist of a series of codepoints.  Proposed TLDs which differ *only*
> by one or more variants from another TLD will be automatically be rejected
> in the software.  For example, .çom  would be allowed, despite its
> similarity to .com, because C with Cedilla is not a variant of C.  Also .сом
> (using Cyrillic letters) would be allowed because, while C and the Cyrillic
> letter Es are variants, and O and the Cyrillic letter O are variants, the
> letter M and the Cyrillic letter Em are not variants (the Panel was
> directed to ignore Upper Case when deciding what might confuse users).
> But .cóm could be rejected, because O and O with acute are variants.
>
> "Confusables" are pairs of codepoints which some for the experts could not
> distinguish, just not enough to be designated as variants.  Confusables
> are intended as suggestions for the panel which will manually review the
> proposed TLDs.
>
> I hope this all will help everyone understand what we are looking at here.
>
> Regards,
> Bill Jouris
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
>
> _______________________________________________
> By submitting your personal data, you consent to the processing of your
> personal data for purposes of subscribing to this mailing list accordance
> with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and
> the website Terms of Service (https://www.icann.org/privacy/tos). You can
> visit the Mailman link above to change your membership status or
> configuration, including unsubscribing, setting digest-style delivery or
> disabling delivery altogether (e.g., for a vacation), and so on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/cpwg/attachments/20211021/f22298db/attachment.html>


More information about the CPWG mailing list