[Latingp] Digraphs
Chris Dillon
ccaacdi at ucl.ac.uk
Mon May 16 13:40:35 UTC 2016
Dear Meikal,
I think it's only a matter of time before combining marks are required,
but I think we should only allow them in restricted situations.
All other code points* may be used in any position with any other code
point(s). Combining marks would only be allowed in certain positions
with certain other code points. If, for example, ^x (x with a
circumflex), which does not exist as a pre-composed code point, were
required somewhere in Africa, the combining mark ^ would only be allowed
with x.
Is that better?
Regards,
Chris.
*as far as I know and except ß which may not be used label-initially
On 16/05/2016 14:26, Meikal Mumin wrote:
> Dear Chris,
>
> could you clarify or exemplify what you mean by " I would suggest that
> we take the approach "combining mark X is required in the following
> sequence(s) of code points only", rather than "combining mark X is
> included with any other code point"."?
>
> Thanks,
>
> Meikal
>
> 2016-05-16 10:39 GMT+02:00 Dillon, Chris <c.dillon at ucl.ac.uk
> <mailto:c.dillon at ucl.ac.uk>>:
>
> Dear Meikal & Abdeslam,
>
> Thank you for your emails. This correspondence is a good summary
> of answers to difficult questions, along these lines:
>
> * Variants may consist of more than one code point.
> * So far we have been able to exclude combining marks, but it is
> doubtful that that will continue to be possible once more work
> has been done on the use of the Latin Script in Africa. I
> would suggest that we take the approach "combining mark X is
> required in the following sequence(s) of code points only",
> rather than "combining mark X is included with any other code
> point".
> * As regards ij and most other ligatures, they would be
> unallocatable variants, or possibly out-of-repertoire code points.
> * I like the suggestion of waiting for the IP's informal
> comments before releasing our draft repertoire. The Second
> Level Team's work, however, could require a substantial effort
> to digest and so we should probably wait.
>
> Français: Ces emails forment une synthèse utile de réponses à
> quelques questions compliquées:
>
> ·Les variants peuvent consister en plus d’une lettre Unicode.
>
> ·Si on a besoin de signes pour combiner des lettres Unicode, on
> pourrait seulement les utiliser en des cas limités.
>
> ·Ij, etc. sont peut-être un variant de i + j qui ne pourraient
> jamais exister dans un TLD, ou bien peut-être tout à fait hors de
> notre répertoire.
>
> ·On va attendre seulement jusqu’à ce qu’on ne reçoive les comments
> informels du IP avant d’inviter des comments sur notre répertoire.
>
>
> Regards,
>
> Chris.
>
> On 14/05/2016 10:50, Meikal Mumin wrote:
>
> Dear colleagues,
>
> so that clarifies that question - thanks Abdeslam.
>
> Coming back to your questions Chris - I believe combining
> marks could be excluded, as was done in the case of Arabic
> LGR. Meanwhile case like ij could be declared variants with a
> sequence of i + j, provided we see a need for including the
> former.
>
> If ligatures are no part of MSR-2, then I assume the problem
> has solved itself.
>
> Best,
>
> Meikal
>
> Dear colleagues,
>
> I would suggest waiting for the feedback from IP, but not for
> anything regarding second levels.
>
> Best,
>
> Meikal
>
>
>
> 2016-05-11 22:27 GMT+02:00 Abdeslam Nasri
> <abdeslam.nasri at gmail.com <mailto:abdeslam.nasri at gmail.com>>:
>
> Dear Chris and Colleagues,
>
> Digraphs or more generally sequences of code points, can
> be specified as variants of a single code point.
>
> An excerpt from the LAGER specification :
>
> "A sequence of multiple code points can be specified as a
> variant of a
>
> single code point. For example, the sequence of LATIN SMALL LETTER O
>
> (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
>
> specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS
>
> (U+00F6) as follows:
>
>
>
> <char cp="00F6">
>
> <var cp="006F 0065"/>
>
> </char>
>
> "
>
> In the typical case of digraphs these are named
> precomposed versus decomposed formats of a single letter.
> Normalization should exist in Unicode in order to allow
> these variants, or otherwise block them.
>
> Kind Regards,
>
> Abdeslam NASRI
>
> 2016-05-09 15:43 GMT+02:00 Dillon, Chris
> <c.dillon at ucl.ac.uk <mailto:c.dillon at ucl.ac.uk>>:
>
> Dear Meikal,
>
> Thank you for your thoughts on digraphs.
>
> In that case, we would have blocked variants like i,
> dotless i and iota, where application for a label
> containing one, would block applications for labels
> containing any of the others.
>
> We would also have blocked variants, digraphs like
> ij,which could never be allocated at all. If we need to
> do this, it will be necessary to describe variants for
> ligature code points we have not yet analysed in the
> Latin ranges, as they aren’t in MSR2.
>
> (This distinction is what I was finding difficult
> during the face-to-face meeting in Marrakech.)
>
> Incidentally, I’m fairly sure two code points could be
> a variant of one. ( I wonder what happens with the
> Arabic ligature of laam and alif that looks like Greek
> gamma; in Urdu the two do not combine so closely, if
> at all.)
>
> Regards,
>
> Chris.
>
> --
>
> Research Associate in Linguistic Computing, Centre for
> Digital Humanities, UCL, Gower St, London WC1E 6BT Tel
> +44 20 7679 1599 <tel:%2B44%2020%207679%201599> (int
> 31599) www.ucl.ac.uk/dis/people/chrisdillon
> <http://www.ucl.ac.uk/dis/people/chrisdillon>
>
> *From:*Meikal Mumin [mailto:meikal.mumin at uni-koeln.de
> <mailto:meikal.mumin at uni-koeln.de>]
> *Sent:* 09 May 2016 09:38
> *To:* Dillon, Chris <c.dillon at ucl.ac.uk
> <mailto:c.dillon at ucl.ac.uk>>
> *Cc:* latingp at icann.org <mailto:latingp at icann.org>
> *Subject:* Re: [Latingp] Digraphs
>
> Dear Chris and colleagues,
>
> apologies for the late reply. I believe we don't need
> to exclude digraphs. We could simply set them up as
> variants, e.g. ij as equivalent of i + j. It could be
> useful to verify with IP, if it is possible to declare
> a sequence of two code-points as a variant of one - we
> had not encountered such a case with Arabic script.
>
> Best wishes,
>
> Meikal
>
> 2016-03-29 9:54 GMT+02:00 Dillon, Chris
> <c.dillon at ucl.ac.uk <mailto:c.dillon at ucl.ac.uk>>:
>
> Dear colleagues,
>
> Mirjana’s recent research on Montenegrin has
> raised some interesting issues.
>
> One of them is diagraphs.
>
> Currently we have digraphs like æ and œ in our
> repertoire, but Dutch ij (U+0133) as in vijf ‘five’
> is white in MSR-2 (not compatible with IDNA 2008).
> Certainly many digraphs, including ij are visually
> similar to their component letters. We could
> consider adding all digraphs to the list of
> criteria for exclusion, or adding them with
> exceptions (less good from a usability point of
> view). Incidentally, ß and & are probably excluded
> for other reasons, Longevity Principle and
> Punctuation, respectively.
>
> What do you think?
>
> Français: Qu’est-ce qu’on devrait faire avec les
> digraphs dans notre répertoire – les permettre ou pas?
>
> Regards,
>
> Chris.
>
> …
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20160516/44f94052/attachment-0001.html>
More information about the Latingp
mailing list