[vip] The "Invisible Separator Characters" Issue
Nicholas Ostler
nicholas at ostler.net
Fri Jul 29 09:22:26 UTC 2011
On 29/07/2011 07:37, Patrik Fältström wrote:
> On 28 jul 2011, at 18.06, Nicholas Ostler wrote:
>
>> Those concerned about this issue for their languages (notably Nepali, Persian etc.) may wish to consider this approach as a concrete option.
> This is also what is recommended in RFC 5892 section 2.8 and appendix A.1 and A.2.
Thanks for that. I note that RFC 5892 is restricted to the Arabic and
Devanagari script cases, whereas the Unicode reference I gave
(http://unicode.org/review/pr-96.html ) is more general (including other
Brahmi-derived scripts of India and South-east Asia, notably Sinhalese,
Khmer, and Malayalam, and the Mongolian "Uyghur" script, U+1800-18AF.)
> What is problematic is how to handle the case when one can *not* tie the label to a language (because of implicit contexts) as the dns does not have language context.
>
> The DNS only have three parameters in a query: {owner, type, class}.
In fact, this may not be an impossible probem. The rules given by
Unicode and RFC 5892 refer to properties of characters (e.g.
Canonical_Combining_Class, Joining_Type:{L,D}, Joining_Type:{R,D} not to
languages as such. So, in principle, the rules would apply universally,
regardless of local registries' languages.
Evidently, the radical solution of simply banning their use (as per the
Indian approach to ccTLD in Devanagari) could be followed by some
registries (if they chose, or were so constrained by national
authorities). But they would have to reconcile themselves to finding the
characters in other registries' TLDs.
Nicholas
--
Nicholas Ostler
nicholas at ostler.net
+44 (0)1225-852865, (0)7720-889319
More information about the vip
mailing list