[vip] The "Invisible Separator Characters" Issue

Fri Jul 29 09:22:26 UTC 2011

On 29/07/2011 07:37, Patrik Fältström wrote:
> On 28 jul 2011, at 18.06, Nicholas Ostler wrote:
>
>> Those concerned about this issue for their languages (notably Nepali, Persian etc.) may wish to consider this approach as a concrete option.
> This is also what is recommended in RFC 5892 section 2.8 and appendix A.1 and A.2.
Thanks for that. I note that RFC 5892 is restricted to the Arabic and 
Devanagari script cases, whereas the Unicode reference I gave 
(http://unicode.org/review/pr-96.html ) is more general (including other 
Brahmi-derived scripts of India and South-east Asia, notably Sinhalese, 
Khmer, and Malayalam, and the Mongolian "Uyghur" script, U+1800-18AF.)
> What is problematic is how to handle the case when one can *not* tie the label to a language (because of implicit contexts) as the dns does not have language context.
>
> The DNS only have three parameters in a query: {owner, type, class}.
In fact, this may not be an impossible probem. The rules given by 
Unicode and RFC 5892 refer to properties of characters (e.g. 
Canonical_Combining_Class, Joining_Type:{L,D}, Joining_Type:{R,D} not to 
languages as such. So, in principle, the rules would apply universally, 
regardless of local registries' languages.

Evidently, the radical solution of simply banning their use (as per the 
Indian approach to ccTLD in Devanagari) could be followed by some 
registries (if they chose, or were so constrained by national 
authorities). But they would have to reconcile themselves to finding the 
characters in other registries' TLDs.

Nicholas

-- 
Nicholas Ostler

nicholas at ostler.net
+44 (0)1225-852865, (0)7720-889319