[Latingp] How Authoritative Is Omniglot Really?

Michael Bauland Michael.Bauland at knipp.de
Thu Jul 4 10:09:54 UTC 2019


Hi Bill,

On 03.07.2019 21:59, Bill Jouris wrote:
> Dear colleagues, 
> 
> When we were developing our repertoire, our go-to reference for what
> glyphs are used in any given language was Omniglot.  
> 
> At the ICANN meeting in Marrakech last week, I was talking to a group of
> people about diacritics and such.  And I mentioned in passing that (as
> shown in Omniglot https://www.omniglot.com/writing/spanish.htm ) the
> only diacritic used in Spanish is the tilde over an N.  A couple of
> native speakers of Spanish immediately corrected me, saying that the
> acute and diaeresis are also used.  (A quick search with Google confirms
> this.) 
> 
> The good news is, all of those glyphs are already in our repertoire.  So
> no immediate problem there. 
> 
> The bad news, it seems to me, is this: in how many /other/ languages
> does Omniglot fail to capture all of the diacritics or diacritic/letter
> combinations actually used?  And how many of those result in glyphs
> which are not in our repertoire currently?  (Which might resolve the
> mystery of why Unicode has so many pre-composed combinations which we
> didn't find.) 
> 
> I realize that answering that question necessarily involves going back
> through the repertoire research process again.  Presumably using other
> sources.  But I wonder if we can, in good conscience, fail to do so. 

I agree with you that it is not unlikely that there may be further
errors in other Omniglot languages. I wouldn't be surprised if more
could be found.

The question is, what is the alternative? I can only speak for languages
that I know (English, German, Finnish). For these I can - with a high
degree of confidence - decide whether all glyphs have been included, but
not for the rest. So, even IF (and that's not a given) we find a better
source for our list of languages, who is to say that those lists of
glyphs are complete and correct. Those lists could also contain too
many/wrong glyphs.

Unless we find a native speaker for each of our languages who can list
us all glyphs (and even then, he/she can be mistaken, so we would
probably need at least three independent native speakers for each
language to get a reasonable degree of confidence), we will always have
the problem that whatever source we use, it may be incorrect.

My suggestion therefore is to go with the list we created and wait for
the public (or IP) comments. If someone complains and tells us we missed
a certain glyph, we of course have to and will add it.

I fear we have to get to a conclusion in the near future. It's like
writing a book: whenever you re-read it, you will most likely find
another problem or something to improve. It's almost impossible to get
it perfect. At some point you will have to decide whether you want to
publish the book (even if not 100% perfect) or continue improving it
until the end of days/ICANN. ;-)

Considering the fact that we're all volunteering our time here, I'd
rather come to a conclusion sooner than later. This does not mean that
if we find an actual error we shouldn't fix it. I want to submit
something that as far as we know is correct. However, we shouldn't spend
too much time searching for more potential errors at this point.

But that's of course only my personal opinion.

We can talk about this later today and get other opinions.

Michael

-- 
____________________________________________________________________
     |       |
     | knipp |            Knipp  Medien und Kommunikation GmbH
      -------                    Technologiepark
                                 Martin-Schmeisser-Weg 9
                                 44227 Dortmund
                                 Germany

     Dipl.-Informatiker          Fon:    +49 231 9703-0
                                 Fax:    +49 231 9703-200
     Dr. Michael Bauland         SIP:    Michael.Bauland at knipp.de
     Software Development        E-mail: Michael.Bauland at knipp.de

                                 Register Court:
                                 Amtsgericht Dortmund, HRB 13728

                                 Chief Executive Officers:
                                 Dietmar Knipp, Elmar Knipp


More information about the Latingp mailing list