[vip] Suggested meta-questions to think about

Nadya Morozova nad.morozova at gmail.com
Thu Jun 23 10:46:17 UTC 2011


Hello all,

Please accept my apologies if I’ll be re-iterating what has been discussed,
and furiously argued, over the previous sessions that I missed. Having read
this thread starting from Patrik’s post, here are some of thoughts, I hope
these can help.

There are a number of very broad issues being discussed here, and it may
make sense to try and ring-fence those that this work group can address
within reasonable time. I agree with James Seng that A1 and A2 should be of
priority.

I hate to come back to the definitions here, but it’s important to agree
what we’re trying to regulate here. For example, Patrik’s post, mentions
case A4, same word in two different languages. From a linguistic point of
view, it is not possible as each language is a standalone system, so
cross-language similarities should probably be kept out of the scope. It’d
be interesting to see a rare case where two TLD applications claim the same
string but in different languages.

Also, Daniel says that in Cyrillic, variants are word-based rather than
character-based and gives an example of E and Ё in one word. I’m not sure I
follow the example and logic and tend to disagree, although the exception of
“обед” given later on makes a point. I’ll be happy to have a separate
discussion with the Cyrillic group to clarify this, but as a linguist and
native Russian speaker, I do not see a problem with Ё using E forming
variant domain names. There is always a character layer, pure spelling with
no pronunciation issues, and that’s what we need to focus on, as that’s what
makes up an FQDN.

So, taking on board Siavash’s advice, I’ve made up a short list of working
definitions for the purpose of this discussion, just to make myself clear.
For me, an atomic unit here is a specific character within a specific
language, and the variations this character produces when forming a (domain)
name. Then “variant” can be a string of characters that is similar and
interchangeable with another string; all “variant” strings form a “bundle”,
an atomic domain unit that can be treated as one – cf. the SC & TC
treatments in ccTLD registries. If two strings are similar but one cannot be
mistaken for another, they are not variants. I don’t know what to call
similar strings as in a language they are just “different words”, and no-one
defines the degree of differentiation. I’ll use a random word like
“pancakes” to mean unique strings that are similar but not interchangeable.
Pancake cases may be useful where two words differ only in diacritics.

So, from my standpoint, there are several layers here:

1.       Ring-fencing character variants within different scripts, with
sub-groups for specific languages where needed (for example, where the same
character is used in different languages differently – cf., Arabic Alif and
Cyrillic Yer (Hard sign); explained below). Any pancakes need to be
identified and not mixed with variants.

2.       Determining policies to define all variants of a specific character
forming a bundle, its Unicode representations, font implications within a
language, and any cross-language specifics.

3.       Where possible, forming recommendations on technical
implementations of those policies within the DNS or at higher levels.

Ok, so I have my own terms and my own plan of action. Starting with point
one and looking at the practical experiences presented at the Wednesday
session, here are my initial thoughts. This is part one of a series of
rants, and I plan to continue with French and more thoughts on Cyrillic in a
separate email.

I don’t speak Arabic and mostly base my assumptions on the Internets –
presentations, wiki, etc. Please accept my apologies if I’m wrong, I’ll
gladly stand corrected.

>From what I see, most “variants” in Arabic scripts stem from the optional
tashkeel diacritics modifying consonant letters to show which vowels to read
them with. Tashkeel are optional and vary in different scripts, thus it is
impossible to distinguish between words formed written with and without
diacritics. That’s why ccTLD registries in the region treat them as variants
and block the possible options, once a variant is written. To me, this
sounds reasonable although policy work could help determine how these
variants are managed, and what can be done to simplify and improve
management of shadow-domains.

Perhaps, there’s a special case for the Arabic Hamza, a glottal stop
separating two syllables, which can be represented as a diacritic or use a
carrier. If Hamza is required and cannot be omitted, then should words
without it be treated as variants of the word with Hamza?
By the way, in Russian, there’s a similar glottal stop situation with the
old character Yer or Hard Sign, ъ, often replaced by an apostrophe in modern
Russian. No other language using Cyrillic alphabet has this character except
Bulgarian, where it denotes a specific sound. For Russian IDNs, should the
spelling with no Yer be a variant of the spelling with it, and vice versa?
There are a number of other characters in Russian that are somehow
“special”, including the mentioned Ё or characters that in some fonts may be
confusingly similar to other letters. In some cases, it is not reasonable to
treat these similarities as variants; instead, the confusion can be avoided
prohibiting registration of names that can be confusingly similar to a
canonical string that has already been registered.
Perhaps, Vladimir Shadrunov from the .tel Registry could share Telnic’s
experiences in defining language policies for Russian and other supported
IDN languages in .tel.
Kind regards,
Nadya Morozova


2011/6/20 Patrik Fältström <patrik at frobbit.se>

> Hi, I am sending this as an interested individual, and not as SSAC Chair...
>
> I have a few times this weekend already tried to explain my view on
> "variants", and after doing that in a chat, I felt it start to (for me) make
> sense, so I wanted to share with you.
>
> We have, I think, a problem divided in two different questions. And
> unfortunately many people think of the solution only the form of "answers to
> the second question". Let me try to explain.
>
> First, whether something is a variant or not (note: word is undefined), is
> actually a grayscale from "yes" to "no". There are various shades of gray
> there. For example:
>
> A.1. Two characters in Unicode really are to be treated as being
> equivalent. I presume one could say that the Hangul SC/TC issues fall in
> that category.
> A.2. Two different spellings of the same word in the same script and same
> language, like color/colour.
> A.3. Same word in the same language in two different scripts (bulgarian)
> A.4. Same word in two different languages
>
> And then there are many A.1.1, A.1.2, A.2.1 etc, and I did even hear today
> people say "two variants are two different accepted spellings of the same
> word that _sound_ the same". I do not even know where to put that.
>
> But one thing I because of that think should be done, and could be done, by
> people is to list all different "variants" they can come up with...
>
> The one draw the line, what is and what is not? Is the line drawn at
> A.1.1232 or A.2.56?
>
> Ok, given we have some agreement on what is a variant and not, we have to
> discuss what implications it has. I here also see a number of different
> questions to be answered. For example:
>
> B.1. Should an application with more TLDs than one be counted as one
> application if the TLDs in the application are variants of each other? And
> if so, should there be only one fee per application?
> B.2. Should two different variants be able to be managed by two different
> registries or not, and if not, what should happen with the variants? One
> primary and others like the bundling tactics in some TLDs (i.e. choice
> between "yes delegation" or "just block for other to register")?
>
> And then there might be a technical question in there...
> C.1. Given two domain names are variants of each other, is there something
> that can be done in the DNS from a technical point of view to express that,
> or can we only do delegations?
>
> The really tricky question is of course to really draw the line between
> variants and not variants. I think the line from a technical point of view,
> AND the implications on the second questions, should be for the new TLD
> approval process be as conservative as possible.
>
> Default answer: If someone want two domain names, just send in two
> applications.
>
> Exception: As you desperately need both and not only one of the domain
> names, you will get both treated as one application.
>
> Then ICANN ask IETF formally "can you please let us know if it is possible
> to have some kind of solution for _technically_ link two TLDs with each
> other, in a safe and stable way". Via a letter to IAB.
>
> Until and if IETF give such a solution, ICANN only have the following two
> alternatives for the ones that do get two variants approved:
>
> 1. Get both delegated
>
> 2. Get one delegated and the other blocked
>
> Then MAYBE there will be a third option:
>
> 3. Get both with some alias solution
>
> But these are things which are implications given a definition on what
> "variants" are, and that discussion is in the future -- although I am pretty
> sure some parties really would like to have certain solutions to the
> problem...
>
>   Patrik
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mm.icann.org/pipermail/vip/attachments/20110623/a45f4259/attachment.html 


More information about the vip mailing list