[Latingp] Digraphs

Meikal Mumin meikal.mumin at uni-koeln.de
Mon May 16 13:26:02 UTC 2016


Dear Chris,

could you clarify or exemplify what you mean by " I would suggest that we
take the approach "combining mark X is required in the following
sequence(s) of code points only", rather than "combining mark X is included
with any other code point"."?

Thanks,

Meikal

2016-05-16 10:39 GMT+02:00 Dillon, Chris <c.dillon at ucl.ac.uk>:

> Dear Meikal & Abdeslam,
>
> Thank you for your emails. This correspondence is a good summary of
> answers to difficult questions, along these lines:
>
>    - Variants may consist of more than one code point.
>    - So far we have been able to exclude combining marks, but it is
>    doubtful that that will continue to be possible once more work has been
>    done on the use of the Latin Script in Africa. I would suggest that we take
>    the approach "combining mark X is required in the following sequence(s) of
>    code points only", rather than "combining mark X is included with any other
>    code point".
>    - As regards ij and most other ligatures, they would be unallocatable
>    variants, or possibly out-of-repertoire code points.
>    - I like the suggestion of waiting for the IP's informal comments
>    before releasing our draft repertoire. The Second Level Team's work,
>    however, could require a substantial effort to digest and so we should
>    probably wait.
>
> Français: Ces emails forment une synthèse utile de réponses à quelques
> questions compliquées:
>
> ·         Les variants peuvent consister en plus d’une lettre Unicode.
>
> ·         Si on a besoin de signes pour combiner des lettres Unicode, on
> pourrait seulement les utiliser en des cas limités.
>
> ·         Ij, etc. sont peut-être un variant de i + j qui ne pourraient
> jamais exister dans un TLD, ou bien peut-être tout à fait hors de notre
> répertoire.
>
> ·         On va attendre seulement jusqu’à ce qu’on ne reçoive les
> comments informels du IP avant d’inviter des comments sur notre répertoire.
>
>
> Regards,
>
> Chris.
>
> On 14/05/2016 10:50, Meikal Mumin wrote:
>
> Dear colleagues,
>
>
>
> so that clarifies that question - thanks Abdeslam.
>
>
>
> Coming back to your questions Chris - I believe combining marks could be
> excluded, as was done in the case of Arabic LGR. Meanwhile case like ij
> could be declared variants with a sequence of i + j, provided we see a need
> for including the former.
>
>
>
> If ligatures are no part of MSR-2, then I assume the problem has solved
> itself.
>
>
>
> Best,
>
>
>
> Meikal
>
> Dear colleagues,
>
>
>
> I would suggest waiting for the feedback from IP, but not for anything
> regarding second levels.
>
>
>
> Best,
>
>
>
> Meikal
>
>
>
> 2016-05-11 22:27 GMT+02:00 Abdeslam Nasri <abdeslam.nasri at gmail.com>:
>
> Dear Chris and Colleagues,
>
>
>
>
>
> Digraphs or more generally sequences of code points, can be specified as
> variants of a single code point.
>
>
>
> An excerpt from the LAGER specification :
>
>
>
> " A sequence of multiple code points can be specified as a variant of a
>
>    single code point.  For example, the sequence of LATIN SMALL LETTER O
>
>    (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
>
>    specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS
>
>    (U+00F6) as follows:
>
>
>
>        <char cp="00F6">
>
>            <var cp="006F 0065"/>
>
>        </char>
>
> "
>
>
>
> In the typical case of digraphs these are named precomposed versus
> decomposed formats of a single letter. Normalization should exist in
> Unicode in order to allow these variants, or otherwise block them.
>
>
>
>
>
> Kind Regards,
>
> Abdeslam NASRI
>
>
>
>
>
>
>
> 2016-05-09 15:43 GMT+02:00 Dillon, Chris <c.dillon at ucl.ac.uk>:
>
> Dear Meikal,
>
>
>
> Thank you for your thoughts on digraphs.
>
>
>
> In that case, we would have blocked variants like i, dotless i  and iota,
> where application for a label containing one, would block applications for
> labels containing any of the others.
>
>
>
> We would also have blocked variants, digraphs like ij, which could never
> be allocated at all. If we need to do this, it will be necessary to
> describe variants for ligature code points we have not yet analysed in the
> Latin ranges, as they aren’t in MSR2.
>
>
>
> (This distinction is what I was finding difficult during the face-to-face
> meeting in Marrakech.)
>
>
>
> Incidentally, I’m fairly sure two code points could be a variant of one. (
> I wonder what happens with the Arabic ligature of laam and alif that looks
> like Greek gamma; in Urdu the two do not combine so closely, if at all.)
>
>
>
> Regards,
>
>
>
> Chris.
>
> --
>
> Research Associate in Linguistic Computing, Centre for Digital Humanities,
> UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599)
> www.ucl.ac.uk/dis/people/chrisdillon
>
>
>
> *From:* Meikal Mumin [mailto:meikal.mumin at uni-koeln.de]
> *Sent:* 09 May 2016 09:38
> *To:* Dillon, Chris <c.dillon at ucl.ac.uk>
> *Cc:* latingp at icann.org
> *Subject:* Re: [Latingp] Digraphs
>
>
>
> Dear Chris and colleagues,
>
>
>
> apologies for the late reply. I believe we don't need to exclude digraphs.
> We could simply set them up as variants, e.g.  ij as equivalent of i + j. It
> could be useful to verify with IP, if it is possible to declare a sequence
> of two code-points as a variant of one - we had not encountered such a case
> with Arabic script.
>
>
>
> Best wishes,
>
>
>
> Meikal
>
>
>
> 2016-03-29 9:54 GMT+02:00 Dillon, Chris <c.dillon at ucl.ac.uk>:
>
> Dear colleagues,
>
>
>
> Mirjana’s recent research on Montenegrin has raised some interesting
> issues.
>
>
>
> One of them is diagraphs.
>
> Currently we have digraphs like æ and œ in our repertoire, but Dutch ij
> (U+0133) as in vijf ‘five’ is white in MSR-2 (not compatible with IDNA
> 2008). Certainly many digraphs, including ij are visually similar to their
> component letters. We could consider adding all digraphs to the list of
> criteria for exclusion, or adding them with exceptions (less good from a
> usability point of view). Incidentally, ß and & are probably excluded for
> other reasons, Longevity Principle and Punctuation, respectively.
>
>
>
> What do you think?
>
>
>
> Français: Qu’est-ce qu’on devrait faire avec les digraphs dans notre
> répertoire – les permettre ou pas?
>
>
>
> Regards,
>
>
>
> Chris.
>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20160516/0285aca7/attachment.html>


More information about the Latingp mailing list