[vip] Suggested meta-questions to think about

Patrik Fältström patrik at frobbit.se
Fri Jun 24 06:25:03 UTC 2011


All,

FWIW, I personally do not at all include "translations" as variants, but these kind of discussion is exactly what I was hoping to trigger.

A. Talk about different categories (across scripts, same or similar, rating system), and give examples, lots of examples

B. Discuss what categories are variants and not

And maybe I should not have sent this mail as I am violating myself the process I suggested myself :-)

   Regards, Patrik

On 24 jun 2011, at 14.19, Dillon, Chris wrote:

> Dear all,
> 
> Some attempts at "underloading" the word "variant":
> 
> I would like to give an example of Nadya's "same word in two different languages":
> .travel could be translated as .reise ('trip' in German or Norwegian, also '(to) travel' in the latter). It could also be translated as .reisen 'to travel' in German. This and similar cases are why this approach could never work. The same English word can be translated several ways even into one language.
> 
> I would also like to pick up Urdu diacritics. I think the situation may be similar in Arabic and Farsi. In Urdu, apart from the Qur'an and texts for foreigners, often texts are written without indicating short vowels (a, i, u). Short a looks like an acute accent. Short i is an acute accent but is written below the letter. Short u is another accent above the letter. For me this is actually rather like spelling e.g. colour and color, or better cafe and café but I can see why at least one of the Arabic registries is registering the basic form and then the three forms with diacritics to avoid confusion.
> 
> For me Cyrillic cases where the same word e.g. komercant may be written with (pre-Revolution spelling) and without the hard sign are also spelling.
> 
> I am now thinking I should have filed this under Andrew's rather than Nadya's e-mail, but I'm worried about losing the e-mail so I shall send it.
> 
> Regards,
> 
> Chris.
> ==
> Faculty Information Support Officer
> for Arts & Humanities and Laws
> Arts & Humanities Faculty Office
> Andrew Huxley Building
> UCL, Gower St, London WC1E 6BT
> Tel 020 7679 1599 (int. 31599)
> http://www.ucl.ac.uk/isd/staff/fiso/ah
> ________________________________
> From: vip-bounces at icann.org [vip-bounces at icann.org] on behalf of Nadya Morozova [nad.morozova at gmail.com]
> Sent: 23 June 2011 11:46
> To: Patrik Fältström
> Cc: vip at icann.org
> Subject: Re: [vip] Suggested meta-questions to think about
> 
> Hello all,
> Please accept my apologies if I’ll be re-iterating what has been discussed, and furiously argued, over the previous sessions that I missed. Having read this thread starting from Patrik’s post, here are some of thoughts, I hope these can help.
> There are a number of very broad issues being discussed here, and it may make sense to try and ring-fence those that this work group can address within reasonable time. I agree with James Seng that A1 and A2 should be of priority.
> I hate to come back to the definitions here, but it’s important to agree what we’re trying to regulate here. For example, Patrik’s post, mentions case A4, same word in two different languages. From a linguistic point of view, it is not possible as each language is a standalone system, so cross-language similarities should probably be kept out of the scope. It’d be interesting to see a rare case where two TLD applications claim the same string but in different languages.
> Also, Daniel says that in Cyrillic, variants are word-based rather than character-based and gives an example of E and Ё in one word. I’m not sure I follow the example and logic and tend to disagree, although the exception of “обед” given later on makes a point. I’ll be happy to have a separate discussion with the Cyrillic group to clarify this, but as a linguist and native Russian speaker, I do not see a problem with Ё using E forming variant domain names. There is always a character layer, pure spelling with no pronunciation issues, and that’s what we need to focus on, as that’s what makes up an FQDN.
> So, taking on board Siavash’s advice, I’ve made up a short list of working definitions for the purpose of this discussion, just to make myself clear. For me, an atomic unit here is a specific character within a specific language, and the variations this character produces when forming a (domain) name. Then “variant” can be a string of characters that is similar and interchangeable with another string; all “variant” strings form a “bundle”, an atomic domain unit that can be treated as one – cf. the SC & TC treatments in ccTLD registries. If two strings are similar but one cannot be mistaken for another, they are not variants. I don’t know what to call similar strings as in a language they are just “different words”, and no-one defines the degree of differentiation. I’ll use a random word like “pancakes” to mean unique strings that are similar but not interchangeable. Pancake cases may be useful where two words differ only in diacritics.
> So, from my standpoint, there are several layers here:
> 
> 1.       Ring-fencing character variants within different scripts, with sub-groups for specific languages where needed (for example, where the same character is used in different languages differently – cf., Arabic Alif and Cyrillic Yer (Hard sign); explained below). Any pancakes need to be identified and not mixed with variants.
> 
> 2.       Determining policies to define all variants of a specific character forming a bundle, its Unicode representations, font implications within a language, and any cross-language specifics.
> 
> 3.       Where possible, forming recommendations on technical implementations of those policies within the DNS or at higher levels.
> Ok, so I have my own terms and my own plan of action. Starting with point one and looking at the practical experiences presented at the Wednesday session, here are my initial thoughts. This is part one of a series of rants, and I plan to continue with French and more thoughts on Cyrillic in a separate email.
> I don’t speak Arabic and mostly base my assumptions on the Internets – presentations, wiki, etc. Please accept my apologies if I’m wrong, I’ll gladly stand corrected.
> From what I see, most “variants” in Arabic scripts stem from the optional tashkeel diacritics modifying consonant letters to show which vowels to read them with. Tashkeel are optional and vary in different scripts, thus it is impossible to distinguish between words formed written with and without diacritics. That’s why ccTLD registries in the region treat them as variants and block the possible options, once a variant is written. To me, this sounds reasonable although policy work could help determine how these variants are managed, and what can be done to simplify and improve management of shadow-domains.
> Perhaps, there’s a special case for the Arabic Hamza, a glottal stop separating two syllables, which can be represented as a diacritic or use a carrier. If Hamza is required and cannot be omitted, then should words without it be treated as variants of the word with Hamza?
> By the way, in Russian, there’s a similar glottal stop situation with the old character Yer or Hard Sign, ъ, often replaced by an apostrophe in modern Russian. No other language using Cyrillic alphabet has this character except Bulgarian, where it denotes a specific sound. For Russian IDNs, should the spelling with no Yer be a variant of the spelling with it, and vice versa? There are a number of other characters in Russian that are somehow “special”, including the mentioned Ё or characters that in some fonts may be confusingly similar to other letters. In some cases, it is not reasonable to treat these similarities as variants; instead, the confusion can be avoided prohibiting registration of names that can be confusingly similar to a canonical string that has already been registered.
> Perhaps, Vladimir Shadrunov from the .tel Registry could share Telnic’s experiences in defining language policies for Russian and other supported IDN languages in .tel.
> Kind regards,
> Nadya Morozova
> 
> 
> 2011/6/20 Patrik Fältström <patrik at frobbit.se<mailto:patrik at frobbit.se>>
> Hi, I am sending this as an interested individual, and not as SSAC Chair...
> 
> I have a few times this weekend already tried to explain my view on "variants", and after doing that in a chat, I felt it start to (for me) make sense, so I wanted to share with you.
> 
> We have, I think, a problem divided in two different questions. And unfortunately many people think of the solution only the form of "answers to the second question". Let me try to explain.
> 
> First, whether something is a variant or not (note: word is undefined), is actually a grayscale from "yes" to "no". There are various shades of gray there. For example:
> 
> A.1. Two characters in Unicode really are to be treated as being equivalent. I presume one could say that the Hangul SC/TC issues fall in that category.
> A.2. Two different spellings of the same word in the same script and same language, like color/colour.
> A.3. Same word in the same language in two different scripts (bulgarian)
> A.4. Same word in two different languages
> 
> And then there are many A.1.1, A.1.2, A.2.1 etc, and I did even hear today people say "two variants are two different accepted spellings of the same word that _sound_ the same". I do not even know where to put that.
> 
> But one thing I because of that think should be done, and could be done, by people is to list all different "variants" they can come up with...
> 
> The one draw the line, what is and what is not? Is the line drawn at A.1.1232 or A.2.56?
> 
> Ok, given we have some agreement on what is a variant and not, we have to discuss what implications it has. I here also see a number of different questions to be answered. For example:
> 
> B.1. Should an application with more TLDs than one be counted as one application if the TLDs in the application are variants of each other? And if so, should there be only one fee per application?
> B.2. Should two different variants be able to be managed by two different registries or not, and if not, what should happen with the variants? One primary and others like the bundling tactics in some TLDs (i.e. choice between "yes delegation" or "just block for other to register")?
> 
> And then there might be a technical question in there...
> C.1. Given two domain names are variants of each other, is there something that can be done in the DNS from a technical point of view to express that, or can we only do delegations?
> 
> The really tricky question is of course to really draw the line between variants and not variants. I think the line from a technical point of view, AND the implications on the second questions, should be for the new TLD approval process be as conservative as possible.
> 
> Default answer: If someone want two domain names, just send in two applications.
> 
> Exception: As you desperately need both and not only one of the domain names, you will get both treated as one application.
> 
> Then ICANN ask IETF formally "can you please let us know if it is possible to have some kind of solution for _technically_ link two TLDs with each other, in a safe and stable way". Via a letter to IAB.
> 
> Until and if IETF give such a solution, ICANN only have the following two alternatives for the ones that do get two variants approved:
> 
> 1. Get both delegated
> 
> 2. Get one delegated and the other blocked
> 
> Then MAYBE there will be a third option:
> 
> 3. Get both with some alias solution
> 
> But these are things which are implications given a definition on what "variants" are, and that discussion is in the future -- although I am pretty sure some parties really would like to have certain solutions to the problem...
> 
>  Patrik
> 
> 
> 




More information about the vip mailing list