<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 5/13/2019 10:08 AM, John Levine
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:alpine.OSX.2.21.9999.1905131201540.10862@ary.qy">On Mon,
13 May 2019, Ram Mohan wrote:
<br>
<blockquote type="cite">While it's a straightforward argument to
say no variants should be allowed
<br>
on the DNS, the reality in many linguistic locales is that
variants are a
<br>
part of everyday life. Not just in the Han script, but in Indic
and Arabic
<br>
scripts, among others. We can't wish them away, nor do we have
the luxury
<br>
of saying the DNS wasn't designed for it, so it shall never
support it.
<br>
</blockquote>
<br>
I think there's a large gap between "many writing systems can
write the same thing in different ways" and "those different ways
should be in the DNS."
<br>
<br>
It's easy to see why you'd block variants, but particularly given
the utter lack of tools to provision them, and no interest in
creating those tools, hard to see why you'd delegate them.
<br>
</blockquote>
<p>John, sorry, I'm with Ram on this one. Where I agree with you is
on the beneficial nature of blocked variants. They are a cheap and
underused tool to limit the attack surface for deceptive
registrations. <br>
</p>
<p>However, once you block a variant, you take away the option for
applicants to apply for the variant even if they have registered
the original label. Where variants are unrelated, that's not an
issue. But in many scripts you have situations where different
keyboards, for example, may have one of the variants, but not the
other. And where both variants are used for the same letter.</p>
<p>By blocking such variants, you exclude one community from
"reaching" any label registered for another one.</p>
<p>We don't really have that situation in the Latin script, not even
with European languages. The closest you can get is that Danish
uses "ø" for the letter for which Swedish uses "ö". However, the
same word, like the name of the Danish capital, is often spelled
differently elsewhere in the word, e.g. Köpenhamm instead of
København. (Therefore, Copenhagen can simply apply for all three
spellings, and additional ones as well - such as the German
spelling; even if the two code points were variants, the labels
are not).<br>
</p>
<p>That reduces the case for making these two letters variants of
each other. However, between Arabic and Persian, you'll have cases
where geographic names differ ONLY by such local variant use. You
could target only one community.<br>
</p>
<p>Not allowing someone to register both variants in this situation
causes just as many problems as ignoring the variant relation
altogether and letting an unrelated party register the variant.</p>
<p>There are many similar examples, and the best way to handle them
is to support allocatable variants. They still block the access to
registration of the variant label by unrelated parties, but allow
one applicant to register both. With the new LGR format, you can
express some further constraints so that only a limited number of
variant *labels* can be allocated.</p>
<p>For example, if you have a pair of code points that are variants,
and a pair of labels that contain 2 copies of each (in matching
positions) you would normally get a set of 4 variant labels. An
easy way to constrain that is to limit all variants to be from the
same subset (e.g. either all Persian, or all Arabic).</p>
<p>Allocatable variants still leave it to the discretion of the
applicant as to whether to apply for more than one variant. Some
people prefer automatic activation, where all applicant would
receive all variants. <br>
</p>
<p>There may be some cases where that would match the overall users'
expectations. For example, there are three sets of digits used
with the Arabic script, each being in "native" use in a different
region. Two of these sets even share many digit forms. Since any
numbers these digits represent are obviously the same, and in some
cases, users cannot tell which set of digits is used, forcing
activation may be justified.</p>
<p>No matter how you come down on that last example, there's no
escaping the need to deal with scripts that do have such variants.</p>
<p>The LGR format in RFC 7940 is making a start in letting you
express more of your registry policies in a machine-readable
format that was possible before (even with the earlier IDN table
format extensions for Chinese and Arabic).</p>
<p>Time to put some of the other tools into place.</p>
<p>A./<br>
</p>
</body>
</html>