[Latingp] How should combining diacritic marks be handled?

Mats Dufberg mats.dufberg at iis.se
Tue Jan 17 09:27:46 UTC 2017


The Middle Dot U+00B7 is special since the IDNA standard requires it to be limited to the context between 'l' and 'l' ("U+006C U+00B7 U+006C"). (https://tools.ietf.org/html/rfc5892#page-16)


Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899

-----Original Message-----
From: <latingp-bounces at icann.org> on behalf of Michael Bauland <Michael.Bauland at knipp.de>
Date: Tuesday 17 January 2017 at 09:55
To: "latingp at icann.org" <latingp at icann.org>
Subject: Re: [Latingp] How should combining diacritic marks be handled?

Hi Mats, hi all,

On 16.01.2017 17:21, Mats Dufberg wrote:
> MSR2 contains a number of combining diacritic marks, e.g. U+0323
> COMBINING DOT BELOW. It might be that we find that some of the languages
> that should be supported requires that code point in combination with,
> say, "n", i.e. "U+006E U+0323". Let us assume that there is no
> pre-composed equivalent code point.
> We can then justify the inclusion of U+0323. Will then the Integration
> Panel accept that code point in any context, or just in the specific
> context?

I assume that we will need to define the context those combining marks
are allowed. At least we did this for middle dot of the "ela geminada"
in the Catalan language tables (see, e.g.,
http://www.iana.org/domains/idn-tables/tables/sap_ca_1.0.txt). But I
guess Sarmad will know for certain.

> If the IP requires that we justify combining diacritic marks for every
> context it will be allowed for, then we have to go language by language
> to find all combinations to support.
> If the IP accepts to include a combining diacritic mark for any context
> as long as it is justified for one language, then we can go code point
> by code point as long as we can find justification for all Latin code
> points in MSR2 and we assume no more code points are needed.
> If the purpose of our work is to create a Latin IDN table that supports
> all listed languages (EGIDS value 4 or 5 as decided) then I cannot see
> how we can achieve that without inspecting all those languages.

Going by language instead of going by character also has the advantage
that we will be able to distribute the languages to members of the
group. Then everybody can work with a certain sub-set of all languages.
If we distributed the characters, everybody would have to get acquainted
with every single language.



     |       |
     | knipp |            Knipp  Medien und Kommunikation GmbH
      -------                    Technologiepark
                                 Martin-Schmeisser-Weg 9
                                 44227 Dortmund

     Dipl.-Informatiker          Fon:    +49 231 9703-284
                                 Fax:    +49 231 9703-200
     Dr. Michael Bauland         SIP:    Michael.Bauland at knipp.de
     Software Development        E-mail: Michael.Bauland at knipp.de

                                 Register Court:
                                 Amtsgericht Dortmund, HRB 13728

                                 Chief Executive Officers:
                                 Dietmar Knipp, Elmar Knipp
Latingp mailing list
Latingp at icann.org

More information about the Latingp mailing list