[Latingp] Repertoire and Latin Extended A

Mats Dufberg mats.dufberg at iis.se
Mon Jul 23 14:44:40 UTC 2018


The .SE IDN table is not a language table, it is a country table. It supports the Swedish language and the official minority languages of Sweden (including Yiddish). Many ccTLDs have country tables, not language tables.

I am not sure about your question if we (Latin GP) have use the ccTLD repertoires. No, we have used the specification of the languages at e.g. Omniglot and Wikipedia.

Maybe you ask why we have looked at the LGRs. My position is that we looked at those to see if we could find any references to “missing” code points, and then check to see if any of those code points should be included.

The case of U+00FF was already known. It seems to be supported to include it by French.

The case of U+0157 is that we see that it is included in the Latvian LGR, but one language source does not even list it (Omniglot), and the other as historic or limited use (Wikipedia). We should ask if we see enough support to include it. I do not think it is clear that we see that.


Mats

---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899
https://www.iis.se/en/


From: "Tan Tanaka, Dennis" <dtantanaka at verisign.com>
Date: Monday, 23 July 2018 at 15:59
To: Mats Dufberg <mats.dufberg at iis.se>, "meikal.mumin at uni-koeln.de" <meikal.mumin at uni-koeln.de>, Bill Jouris <bill.jouris at insidethestack.com>, "Mirjana.Tasic at rnids.rs" <Mirjana.Tasic at rnids.rs>
Cc: ICANN Latin GP <latingp at icann.org>
Subject: Re: [Latingp] Repertoire and Latin Extended A

Hi Mats …. have we used the corresponding ccTLD’s repertoires too? For example, 014F and 00FF are included in DENIC’s table. And for Swedish, dotSE’s latin table is larger than ICANN’s reference LGR.

-Dennis

From: Latingp <latingp-bounces at icann.org> on behalf of Mats Dufberg <mats.dufberg at iis.se>
Date: Monday, July 23, 2018 at 9:52 AM
To: Meikal Mumin <meikal.mumin at uni-koeln.de>, Bill Jouris <bill.jouris at insidethestack.com>, Mirjana Tasić <Mirjana.Tasic at rnids.rs>
Cc: Latin GP <latingp at icann.org>
Subject: [EXTERNAL] Re: [Latingp] Repertoire and Latin Extended A

Mirjana,

Since I promised to review the second-level LGRs to see if there are any characters in any of those languages that we have not include I have also done that and this is my findings (Bill and Meikal have at least looked at some of them):

U+00FF is included for French. Omniglot does not list it, but Wikipedia does.

U+0157 is included for Latvian. Omniglog does not list it. Wikipedia does, but today only used in diaspora (as Meikal pointed out).

The other missing code-points from Latin Extended-A were not found.

Code points I looked at:

00FF LATIN SMALL LETTER Y WITH DIAERESIS (French)
0109 LATIN SMALL LETTER C WITH CIRCUMFLEX
0125 LATIN SMALL LETTER H WITH CIRCUMFLEX
0135 LATIN SMALL LETTER J WITH CIRCUMFLEX
014F LATIN SMALL LETTER O WITH BREVE
0157 LATIN SMALL LETTER R WITH CEDILLA (Latvian)
0163 LATIN SMALL LETTER T WITH CEDILLA
0177 LATIN SMALL LETTER Y WITH CIRCUMFLEX

The LGRs can be found at https://www.icann.org/resources/pages/second-level-lgr-2015-06-21-en

*

A comment to Bill and Meikal:

The LGR for German and Spanish lists U+014F (ŏ) LATIN SMALL LETTER O WITH BREVE as *excluded* not as included. You have to go down to “repertoire by code point” to see what is included. They also have a concept of “extended code point” which is in between included and excluded, e.g. U+00E0 for Spanish.

Ÿ (U+00FF) is neither included in English nor German.


Mats

---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899
https://www.iis.se/en/


From: Latingp <latingp-bounces at icann.org> on behalf of Meikal Mumin <meikal.mumin at uni-koeln.de>
Date: Monday, 23 July 2018 at 00:11
To: Bill Jouris <bill.jouris at insidethestack.com>
Cc: ICANN Latin GP <latingp at icann.org>
Subject: Re: [Latingp] Repertoire and Latin Extended A

Dear colleagues,

On 22 July 2018 at 21:49, Bill Jouris <bill.jouris at insidethestack.com<mailto:bill.jouris at insidethestack.com>> wrote:
Hi Mirjana,







I've reviewed the repertoire we have (after adding Esperanto) and compared it to the Unicode table's Basic Latin, Latin-1 Supplement, and Latin Extended-A codepoints.







The following entries from Latin-1 Supplement are included in MSR-2, but not included in our repertoire:
00FF    ÿ     Latin Small Letter Y with Diaeresis




This occurs rarely in personal names in German https://de.wikipedia.org/wiki/%C5%B8#Franz%C3%B6sisch
and in French in place names amongst others (https://fr.wikipedia.org/wiki/%C5%B8#Fran%C3%A7ais).




The following entries from Latin Extended-A are included in MSR-3 but not included in our repertoire:



014F     ŏ   Latin Small Letter O with Breve
0157     ŗ    Latin Small Letter R with Cedilla

FYI
ÿ is listed in the ICANN LGRs for German and for English (much to my amazement, as I have never encountered it previously), but does not appear in Omniglot, nor in the Wikipedia alphabet referenced in the LGR, for either language.

A quick search did not yield any evidence for English, but German - see above.

ŏ is listed in the ICANN LGRs for German and for Spanish, but does not appear in Omniglot, nor in the Wikipedia alphabet referenced in the LGR, for either language.

A quick search did not yield any supporting evidence.

ŗ is listed in the ICANN LGR for Latvian, but does not appear in Omniglot, nor in the Wikipedia alphabet referenced in the LGR.

This https://de.wikipedia.org/wiki/%C5%96 says it was used historically in Lativian. This https://en.wikipedia.org/wiki/Latvian_orthography clarifies that it is part of an older orthography still in use in diaspora communities.


LGR for language deu-Latn — German<https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-german-30aug16-en.html>


LGR for language deu-Latn — German



This is way  larger than the set of characters used in German, even taking into consideration loans and borrowings from other languages. I would be interested to know who developed this on what basis. Some sources are 404.


LGR for language eng-Latn — English
<https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-english-30aug16-en.html>



LGR for language eng-Latn — English



LGR for language spa-Latn — Spanish<https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-spanish-30aug16-en.html>


LGR for language spa-Latn — Spanish




Bill Jouris
Inside Products
bill.jouris at insidethestack.com<mailto:bill.jouris at insidethestack.com>
831-659-8360
925-855-9512 (direct)

_______________________________________________
Latingp mailing list
Latingp at icann.org<mailto:Latingp at icann.org>
https://mm.icann.org/mailman/listinfo/latingp


Best,

Meikal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20180723/558f936d/attachment-0001.html>


More information about the Latingp mailing list