<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Dear Meikal,<br>
<br>
I think it's only a matter of time before combining marks are
required, but I think we should only allow them in restricted
situations.<br>
<br>
All other code points* may be used in any position with any other
code point(s). Combining marks would only be allowed in certain
positions with certain other code points. If, for example, ^x (x
with a circumflex), which does not exist as a pre-composed code
point, were required somewhere in Africa, the combining mark ^
would only be allowed with x.<br>
<br>
Is that better?<br>
<br>
Regards,<br>
<br>
Chris.<br>
*as far as I know and except ß which may not be used
label-initially<br>
<br>
On 16/05/2016 14:26, Meikal Mumin wrote:<br>
</div>
<blockquote
cite="mid:CAKF4YGoTCvDsrmRi4dvSKsagzhxDVM-Ap3HHsPOcjUwX3XK_og@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">Dear Chris,<br>
<br>
could you clarify or exemplify what you mean by " I would
suggest that we take the approach "combining mark X is required
in the following sequence(s) of code points only", rather than
"combining mark X is included with any other code point"."?<br>
<br>
Thanks,<br>
<br>
Meikal<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-05-16 10:39 GMT+02:00 Dillon,
Chris <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:c.dillon@ucl.ac.uk" target="_blank">c.dillon@ucl.ac.uk</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
<div>
<p class="MsoNormal">Dear Meikal & Abdeslam,<br>
<br>
Thank you for your emails. This correspondence is a
good summary of answers to difficult questions, along
these lines:</p>
<ul type="disc">
<li class="MsoNormal">
Variants may consist of more than one code point.</li>
<li class="MsoNormal">
So far we have been able to exclude combining marks,
but it is doubtful that that will continue to be
possible once more work has been done on the use of
the Latin Script in Africa. I would suggest that we
take the approach "combining mark X is required in
the following sequence(s) of code points only",
rather than "combining mark X is included with any
other code point".</li>
<li class="MsoNormal">
As regards ij and most other ligatures, they would
be unallocatable variants, or possibly
out-of-repertoire code points.</li>
<li class="MsoNormal">
I like the suggestion of waiting for the IP's
informal comments before releasing our draft
repertoire. The Second Level Team's work, however,
could require a substantial effort to digest and so
we should probably wait.</li>
</ul>
<p class="MsoNormal">Français: Ces emails forment une
synthèse utile de réponses à quelques questions
compliquées:</p>
<p><span style="font-family:Symbol"><span>·<span
style="font:7.0pt "Times New Roman"">
</span></span></span>Les variants peuvent
consister en plus d’une lettre Unicode.</p>
<p><span style="font-family:Symbol"><span>·<span
style="font:7.0pt "Times New Roman"">
</span></span></span>Si on a besoin de signes pour
combiner des lettres Unicode, on pourrait seulement
les utiliser en des cas limités.</p>
<p><span style="font-family:Symbol"><span>·<span
style="font:7.0pt "Times New Roman"">
</span></span></span>Ij, etc. sont peut-être un
variant de i + j qui ne pourraient jamais exister dans
un TLD, ou bien peut-être tout à fait hors de notre
répertoire.</p>
<p><span style="font-family:Symbol"><span>·<span
style="font:7.0pt "Times New Roman"">
</span></span></span>On va attendre seulement
jusqu’à ce qu’on ne reçoive les comments informels du
IP avant d’inviter des comments sur notre répertoire.</p>
<p class="MsoNormal"><br>
Regards,<br>
<br>
Chris.<span class=""><br>
<br>
On 14/05/2016 10:50, Meikal Mumin wrote:</span></p>
<span class="">
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">Dear colleagues, </p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">so that clarifies that
question - thanks Abdeslam.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Coming back to your
questions Chris - I believe combining marks
could be excluded, as was done in the case of
Arabic LGR. Meanwhile case like ij could be
declared variants with a sequence of i + j,
provided we see a need for including the
former.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">If ligatures are no part of
MSR-2, then I assume the problem has solved
itself.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Best,</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Meikal</p>
</div>
</div>
</blockquote>
</span>
<p class="MsoNormal">Dear colleagues, </p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">I would suggest waiting for the
feedback from IP, but not for anything regarding
second levels.</p>
</div>
<div>
<div class="h5">
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Best,</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Meikal</p>
</div>
<p class="MsoNormal"><br>
<br>
</p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">2016-05-11 22:27 GMT+02:00
Abdeslam Nasri <<a moz-do-not-send="true"
href="mailto:abdeslam.nasri@gmail.com"
target="_blank">abdeslam.nasri@gmail.com</a>>:</p>
<div>
<blockquote
style="border:none;border-left:solid #cccccc
1.0pt;padding:0cm 0cm 0cm
6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<p class="MsoNormal">Dear Chris and
Colleagues, </p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Digraphs or more
generally sequences of code points,
can be specified as variants of a
single code point.</p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">An excerpt from
the LAGER specification :</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">"<span
style="font-size:10.0pt"> A
sequence of multiple code points
can be specified as a variant of a</span></p>
</div>
<pre> single code point. For example, the sequence of LATIN SMALL LETTER O</pre>
<pre> (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be</pre>
<pre> specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS</pre>
<pre> (U+00F6) as follows:</pre>
<pre> </pre>
<pre> <char cp="00F6"></pre>
<pre> <var cp="006F 0065"/></pre>
<pre> </char></pre>
<div>
<p class="MsoNormal">"</p>
</div>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">In the typical case
of digraphs these are named
precomposed versus decomposed formats
of a single letter. Normalization
should exist in Unicode in order to
allow these variants, or otherwise
block them.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Kind Regards,</p>
</div>
<div>
<p class="MsoNormal">Abdeslam NASRI</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
<div>
<div>
<div>
<p class="MsoNormal">2016-05-09
15:43 GMT+02:00 Dillon, Chris
<<a moz-do-not-send="true"
href="mailto:c.dillon@ucl.ac.uk"
target="_blank">c.dillon@ucl.ac.uk</a>>:</p>
</div>
</div>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0cm 0cm 0cm
6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">Dear
Meikal,</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">Thank
you for your thoughts on
digraphs.</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">In
that case, we would have
blocked variants like i,
dotless i and iota,
where application for a
label containing one,
would block applications
for labels containing
any of the others.</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">We
would also have blocked
variants, digraphs like
</span>ij,<span
style="font-family:"Century
Gothic
,sans-serif",serif">
which could never be
allocated at all. If we
need to do this, it will
be necessary to describe
variants for ligature
code points we have not
yet analysed in the
Latin ranges, as they
aren’t in MSR2.</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">(This
distinction is what I
was finding difficult
during the face-to-face
meeting in Marrakech.)</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">Incidentally,
I’m fairly sure two code
points could be a
variant of one. ( I
wonder what happens with
the Arabic ligature of
laam and alif that looks
like Greek gamma; in
Urdu the two do not
combine so closely, if
at all.)</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">Regards,</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif">Chris.</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"
lang="EN-US">--</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"
lang="EN-US">Research
Associate in Linguistic
Computing, Centre for
Digital Humanities, UCL,
Gower St, London WC1E
6BT Tel <a
moz-do-not-send="true"
href="tel:%2B44%2020%207679%201599" target="_blank">+44 20 7679 1599</a>
(int 31599)
<a
moz-do-not-send="true"
href="http://www.ucl.ac.uk/dis/people/chrisdillon" target="_blank"><span
style="color:#0563c1"><a class="moz-txt-link-abbreviated" href="http://www.ucl.ac.uk/dis/people/chrisdillon">www.ucl.ac.uk/dis/people/chrisdillon</a></span></a>
</span></p>
<p class="MsoNormal"><span
style="font-family:"Century
Gothic
,sans-serif",serif"> </span></p>
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US"> Meikal
Mumin [mailto:<a
moz-do-not-send="true"
href="mailto:meikal.mumin@uni-koeln.de" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:meikal.mumin@uni-koeln.de">meikal.mumin@uni-koeln.de</a></a>]
<br>
<b>Sent:</b> 09 May 2016
09:38<br>
<b>To:</b> Dillon, Chris
<<a
moz-do-not-send="true"
href="mailto:c.dillon@ucl.ac.uk" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:c.dillon@ucl.ac.uk">c.dillon@ucl.ac.uk</a></a>><br>
<b>Cc:</b> <a
moz-do-not-send="true"
href="mailto:latingp@icann.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:latingp@icann.org">latingp@icann.org</a></a><br>
<b>Subject:</b> Re:
[Latingp] Digraphs</span></p>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">Dear
Chris and colleagues,</p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">apologies
for the late reply. I
believe we don't need
to exclude digraphs.
We could simply set
them up as variants,
e.g. ij as equivalent
of i + j. It could be
useful to verify with
IP, if it is possible
to declare a sequence
of two code-points as
a variant of one - we
had not encountered
such a case with
Arabic script.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Best
wishes,</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Meikal</p>
</div>
</div>
<div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">2016-03-29
9:54 GMT+02:00 Dillon,
Chris <<a
moz-do-not-send="true"
href="mailto:c.dillon@ucl.ac.uk" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:c.dillon@ucl.ac.uk">c.dillon@ucl.ac.uk</a></a>>:</p>
<blockquote
style="border:none;border-left:solid
#cccccc
1.0pt;padding:0cm 0cm
0cm
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<div>
<div>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">Dear
colleagues,</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif"> </span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">Mirjana’s
recent
research on
Montenegrin
has raised
some
interesting
issues.</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif"> </span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">One of
them is
diagraphs.</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">Currently
we have
digraphs like
æ and œ in our
repertoire,
but Dutch ij
(U+0133) as in
vijf ‘five’ is
white in MSR-2
(not
compatible
with IDNA
2008).
Certainly many
digraphs,
including ij
are visually
similar to
their
component
letters. We
could consider
adding all
digraphs to
the list of
criteria for
exclusion, or
adding them
with
exceptions
(less good
from a
usability
point of
view).
Incidentally,
ß and &
are probably
excluded for
other reasons,
Longevity
Principle and
Punctuation,
respectively.</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif"> </span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">What do
you think?</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif"> </span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">Français:
Qu’est-ce
qu’on devrait
faire avec les
digraphs dans
notre
répertoire –
les permettre
ou pas?</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif"> </span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">Regards,</span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif"> </span></p>
<p
class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">Chris.</span></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span
style="font-family:"Century Gothic ,sans-serif",serif">…</span></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>