[Latingp] How should combining diacritic marks be handled?

Sarmad Hussain sarmad.hussain at icann.org
Wed Jan 18 07:31:07 UTC 2017


Hi Michael, All,

 

>> I assume that we will need to define the context those combining marks
are allowed. ... But I guess Sarmad will know for certain.

 

This discussion is published by IP in Section 4.5. Non-Spacing Combining
Marks of the Overview and Rationale
<https://www.icann.org/en/system/files/files/msr-2-overview-14apr15-en.pdf>
document released as part of MSR-2
<https://www.icann.org/resources/pages/msr-2015-06-21-en> .  I encourage you
all to review it.  It is noted in this section that:

 

The actual set of combining marks allowable in the LGR will be smaller than
the set included in the MSR, because it will be limited to those marks that
are actually required for at least one combining sequence not expressible in
NFC. In addition, where the number of such attested sequences is known and
limited, GPs are encouraged to enumerate the sequences where feasible,
rather than adding the "bare" combining mark to the repertoire. This would
serve to prevent such marks from combining with every other allowed code
point in the GP's repertoire.

 

This suggests that where pre-composed form is not encoded directly, it is
preferred to include the combining mark only with the desired code point(s)
as a sequence <https://tools.ietf.org/html/rfc7940#section-5.1>  to prevent
over-generation.  

 

Regards,
Sarmad

 

-----Original Message-----
From: latingp-bounces at icann.org <mailto:latingp-bounces at icann.org>
[mailto:latingp-bounces at icann.org] On Behalf Of Michael Bauland
Sent: Tuesday, January 17, 2017 1:55 PM
To: latingp at icann.org <mailto:latingp at icann.org> 
Subject: Re: [Latingp] How should combining diacritic marks be handled?

 

Hi Mats, hi all,

 

 

On 16.01.2017 17:21, Mats Dufberg wrote:

> MSR2 contains a number of combining diacritic marks, e.g. U+0323 

> COMBINING DOT BELOW. It might be that we find that some of the 

> languages that should be supported requires that code point in 

> combination with, say, "n", i.e. "U+006E U+0323". Let us assume that 

> there is no pre-composed equivalent code point.

> 

>  

> 

> We can then justify the inclusion of U+0323. Will then the Integration 

> Panel accept that code point in any context, or just in the specific 

> context?

 

I assume that we will need to define the context those combining marks are
allowed. At least we did this for middle dot of the "ela geminada"

in the Catalan language tables (see, e.g.,
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iana.org_domains_id
n-2Dtables_tables_sap-5Fca-5F1.0.txt&d=DwICAg&c=FmY1u3PJp6wrcrwll3mSVzgfkbPS
S6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=dq_XzWWgeHkF
r9WfH9IXK7WNh6-cBi-Ri4_TEqaDJkc&s=reu-ntqXFnNrZeDRDBLwSxLIFdJo9q4_2kUN7aPCiP
8&e>
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iana.org_domains_idn
-2Dtables_tables_sap-5Fca-5F1.0.txt&d=DwICAg&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS
6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=dq_XzWWgeHkFr
9WfH9IXK7WNh6-cBi-Ri4_TEqaDJkc&s=reu-ntqXFnNrZeDRDBLwSxLIFdJo9q4_2kUN7aPCiP8
&e= ). But I guess Sarmad will know for certain.

 

 

> If the IP requires that we justify combining diacritic marks for every 

> context it will be allowed for, then we have to go language by 

> language to find all combinations to support.

> 

>  

> 

> If the IP accepts to include a combining diacritic mark for any 

> context as long as it is justified for one language, then we can go 

> code point by code point as long as we can find justification for all 

> Latin code points in MSR2 and we assume no more code points are needed.

> 

>  

> 

>  

> 

> If the purpose of our work is to create a Latin IDN table that 

> supports all listed languages (EGIDS value 4 or 5 as decided) then I 

> cannot see how we can achieve that without inspecting all those languages.

 

Going by language instead of going by character also has the advantage that
we will be able to distribute the languages to members of the group. Then
everybody can work with a certain sub-set of all languages.

If we distributed the characters, everybody would have to get acquainted
with every single language.

 

Cheers,

 

Michael

 

--

____________________________________________________________________

     |       |

     | knipp |            Knipp  Medien und Kommunikation GmbH

      -------                    Technologiepark

                                 Martin-Schmeisser-Weg 9

                                 44227 Dortmund

                                 Germany

 

     Dipl.-Informatiker          Fon:    +49 231 9703-284

                                 Fax:    +49 231 9703-200

     Dr. Michael Bauland         SIP:     <mailto:Michael.Bauland at knipp.de>
Michael.Bauland at knipp.de

     Software Development        E-mail:  <mailto:Michael.Bauland at knipp.de>
Michael.Bauland at knipp.de

 

                                 Register Court:

                                 Amtsgericht Dortmund, HRB 13728

 

                                Chief Executive Officers:

                                 Dietmar Knipp, Elmar Knipp
_______________________________________________

Latingp mailing list

 <mailto:Latingp at icann.org> Latingp at icann.org

 <https://mm.icann.org/mailman/listinfo/latingp>
https://mm.icann.org/mailman/listinfo/latingp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20170118/38ea982b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5046 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/latingp/attachments/20170118/38ea982b/smime.p7s>


More information about the Latingp mailing list