[Latingp] Digraphs

Abdeslam Nasri abdeslam.nasri at gmail.com
Wed May 11 20:27:16 UTC 2016


Dear Chris and Colleagues,


Digraphs or more generally sequences of code points, can be specified as
variants of a single code point.

An excerpt from the LAGER specification :

" A sequence of multiple code points can be specified as a variant of a

   single code point.  For example, the sequence of LATIN SMALL LETTER O
   (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
   specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS
   (U+00F6) as follows:

       <char cp="00F6">
           <var cp="006F 0065"/>
       </char>

"

In the typical case of digraphs these are named precomposed versus
decomposed formats of a single letter. Normalization should exist in
Unicode in order to allow these variants, or otherwise block them.


Kind Regards,
Abdeslam NASRI



2016-05-09 15:43 GMT+02:00 Dillon, Chris <c.dillon at ucl.ac.uk>:

> Dear Meikal,
>
>
>
> Thank you for your thoughts on digraphs.
>
>
>
> In that case, we would have blocked variants like i, dotless i  and iota,
> where application for a label containing one, would block applications for
> labels containing any of the others.
>
>
>
> We would also have blocked variants, digraphs like ij, which could never
> be allocated at all. If we need to do this, it will be necessary to
> describe variants for ligature code points we have not yet analysed in the
> Latin ranges, as they aren’t in MSR2.
>
>
>
> (This distinction is what I was finding difficult during the face-to-face
> meeting in Marrakech.)
>
>
>
> Incidentally, I’m fairly sure two code points could be a variant of one. (
> I wonder what happens with the Arabic ligature of laam and alif that looks
> like Greek gamma; in Urdu the two do not combine so closely, if at all.)
>
>
>
> Regards,
>
>
>
> Chris.
>
> --
>
> Research Associate in Linguistic Computing, Centre for Digital Humanities,
> UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599)
> www.ucl.ac.uk/dis/people/chrisdillon
>
>
>
> *From:* Meikal Mumin [mailto:meikal.mumin at uni-koeln.de]
> *Sent:* 09 May 2016 09:38
> *To:* Dillon, Chris <c.dillon at ucl.ac.uk>
> *Cc:* latingp at icann.org
> *Subject:* Re: [Latingp] Digraphs
>
>
>
> Dear Chris and colleagues,
>
>
>
> apologies for the late reply. I believe we don't need to exclude digraphs.
> We could simply set them up as variants, e.g.  ij as equivalent of i + j. It
> could be useful to verify with IP, if it is possible to declare a sequence
> of two code-points as a variant of one - we had not encountered such a case
> with Arabic script.
>
>
>
> Best wishes,
>
>
>
> Meikal
>
>
>
> 2016-03-29 9:54 GMT+02:00 Dillon, Chris <c.dillon at ucl.ac.uk>:
>
> Dear colleagues,
>
>
>
> Mirjana’s recent research on Montenegrin has raised some interesting
> issues.
>
>
>
> One of them is diagraphs.
>
> Currently we have digraphs like æ and œ in our repertoire, but Dutch ij
> (U+0133) as in vijf ‘five’ is white in MSR-2 (not compatible with IDNA
> 2008). Certainly many digraphs, including ij are visually similar to their
> component letters. We could consider adding all digraphs to the list of
> criteria for exclusion, or adding them with exceptions (less good from a
> usability point of view). Incidentally, ß and & are probably excluded for
> other reasons, Longevity Principle and Punctuation, respectively.
>
>
>
> What do you think?
>
>
>
> Français: Qu’est-ce qu’on devrait faire avec les digraphs dans notre
> répertoire – les permettre ou pas?
>
>
>
> Regards,
>
>
>
> Chris.
>
>>
> _______________________________________________
> Latingp mailing list
> Latingp at icann.org
> https://mm.icann.org/mailman/listinfo/latingp
>
>


-- 
Cordialement,
Abdeslam NASRI
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20160511/0221ddc6/attachment.html>


More information about the Latingp mailing list