[Comments-sinhala-lgr-02oct18] A quick review of the Sinhala proposal

Thu Nov 8 16:40:26 UTC 2018

- §3.2: Unclear what exact scope of writing systems have been decided. For example, if Sanskrit is included in the scope then the full set of four vocalic liquids (vocalic r/rr/l/ll) probably need to be included in §5.2. However the text only vaguely says “considered”.

- §3.2, “In addition, Myanmar script is also related.”: This statement seems to come from the comparison with Myanmar in §6.3.4, Sinhala and Myanmar. However those pairs are similar only by accident, and can’t suggest the two script are related in the same sense of how Sinhala is related to the scripts covered by the NBGP.

- §3.3.1, “In addition, conjunct characters and touching letters are features of Sinhala text, but do not require representation in the root-zone for labels.”: Either provide the reasoning in place or simply don’t talk about the usage in labels at this stage.

- §3.3.1, “… ඥ (jna) the symbol is considered as representing ජ්+ඤ (j+na), identical to the consonant in contemporary Sinhala ඥ which has a code point U+0DA5.”: The existence of U+0DA5 SINHALA LETTER TAALUJA SANYOOGA NAAKSIKYAYA is simply because the j.nya conjunct was analyzed as a structure eligible to be encoded atomically. This character is meant to represent the conjunct j.nya and the conjunct also must be represented/encoded with this character. The phrasing “… identical to …” here is misleading.

- §3.3.1, “When modifiers are added to any of the above categories, including … they will be formed as follows …“: Unclear what is suggested here. Are the following examples meant to illustrate how conjunct consonants and touching consonants behave graphically just like plain/individual consonant letters?

- §3.3.1, “Special symbols ්ර (rakaranshaya) for ර (ra) and ්ය (yanshaya) for ය (ya) …”: Inline images are needed for representing rakaranshaya and yanshaya if they can’t form properly with text.

- §3.3.1, “… used in Sinhala writing when they occur after a consonant (from which the inherent vowel has been removed).”: Since the authors appear to analyze this process as the symbols (rakaranshaya and yanshaya) have their own inherent vowels, the inherent vowel of the base consonant is removed by the very action of attaching a symbol, therefore one can’t exactly say the inherent vowel “has been removed” when a symbol is used.

- §3.3.1, “However, after ර් (r) not yanshaya but ය (ya) is used; so … or … are not accepted but …or …are.”: The examples are not designed properly. The first and second words correctly expresses the idea of that yanshaya is not used after a killed ර, with both styles (killed ර vs rephaya) presented. However the third word is a phonetic respelling, with a different phonetic sequence that contains an additional ය. And the fourth word is okay by itself but it should probably be accompanied by a spelling that does not use rephaya (thus corresponding to the first word).

- §3.3.2, Table 2: Pronunciations of the vocalic l and vocalic ll are suspicious. I understand the vocalic l and vocalic ll and not really used in Sinhala, but it seems the two letters’ names instead of their actual pronunciations are listed in the table. Note the vocalic r and vocalic rr also appear to have letter names different from their pronunciations, but their pronunciations are correctly listed in this table.

- §3.3.3, “This is thus used to join consonants and form conjunct characters.”: U+0DCA doesn’t form conjuncts by itself. The requirement of ZWJ when forming a conjunct can’t be ignored in such statements.

- §3.3.3: Actually, as the requirement of ZWJ is not mentioned in the preceding sections (where it’s good to discuss the script’s behavior independently from its encoding), it should be emphasized here that, ZWJ is required for forming not only typical conjuncts, but also the “special symbols” (rakaranshaya, yanshaya, and rephaya) and touching consonants. And it’s not emphasized enough anywhere in the proposal (even in §5.5) that excluding ZWJ is a major problem for Sinhala labels because all the aforementioned consonantal structures rely on it.

- §3.3.4, “… represents all the nasals”: Probably, “represents a general nasal sound” or “represents a context-dependent nasal sound”?

- §3.3.6, “One constraint for Sannjakas is that they cannot be followed by halanta.”: Is this a phonetic (so pre-nasalized stops cannot directly precede another consonant even ya, ra, or va) or a graphic statement (so pre-nasalized stops are not written with an attached vowel killer)?

- §5.2, Code Point Repertoire:

    * For writing the [f] sound, this lately invented structure represented by U+0DC6 ෆ SINHALA LETTER FAYANNA is often considered less used compared to the more popular form “ප combined with f“. The usage of these f-sound graphemes should be discussed.

    * About U+0DF2 ෲ SINHALA VOWEL SIGN DIGA GAETTA-PILLA, see the comment below for 5.4.

- §5.4, Code point not included:

    * Unclear why U+0DF2 ෲ SINHALA VOWEL SIGN DIGA GAETTA-PILLA is included when its independent form U+0D8E ඎ SINHALA LETTER IRUUYANNA is excluded is excluded.

    * It’s inappropriate to simply say “Usage unknown” for U+0D8E, U+0D8F, U+0D90, U+0DDF, and U+0DF3, as they’re apparently used in the standard Sanskrit alphabet as least. So they have known usage in Sanskrit and are probably not used for the Sinhala language.

    * The exclusion of U+0D9E ඞ SINHALA LETTER KANTAJA NAASIKYAYA and U+0DA6 ඦ SINHALA LETTER SANYAKA JAYANNA is concerning. It seems a stronger case is needed for excluding letters that are considered a part of the standard Sinhala alphabet and already have attestations (the word that uses U+0DA6). However I understand the standard alphabet itself is not as fixed as other Indic languages’.

- §5.5, “One of the most important deficiencies of not being able to have Top Level Domain with Rakar form is that one cannot have “ශ්‍රී” (Shri) in a top level domain name …”: The systematical necessity of ZWJ in the Sinhala encoding is not emphasized enough. Calling out ශ්‍රී here almost feels like a “fun fact”, while in fact the exclusion of ZWJ affects a great number of common words and those words just cannot be encoded correctly without ZWJ. The exact effect of excluding ZWJ (although not a decision made by the Sinhala panel) should be thoroughly analyzed See also the comment above for §3.3.3.

- §5.6, Akshar Formation Rules for Sinhala: See the comments below for §7.

- §6.1, In-Script Variants:

    * This list is nice (it can be ordered better though, according to either the shapes or code points). Proposals by the NBGP probably should undergo a similar set of criteria for identifying in-script variants. I do feel the criteria are strict (as these pairs are probably not that confusable) though.

    * “j. ඕ (U+0D95) and ඹ් (U+0DB9 U+0DCA)” is already disallowed by the akshar formation rule of that prenasalized stops cannot be followed by a vowel killer.

- §7, “This section provides the WLE rules that are required by all the languages mentioned in section 3.2 when written in Sinhala Script.”: The authors need to clearly define a scope of languages. “All the languages mentioned” is vague.

- §7, “… for each of the "Indic Syllabic Category" as mentioned …”: The term “Indic Syllabic Category” can cause confusion with the Unicode character property of the same name. Should note this is not the Unicode property mentioned here.

- §7, Whole Label Evaluation (WLE) Rules: §5.6 basically suggests such a pattern: `V[B|X] | C[M][B|X] | CH | J[M][B]`. It’s questionable whether it’s necessary to split J from C when the argument for disabling H and X after J is weak. It’s unclear whether it’s necessary to introduce such a restriction based on attestation instead of actual problems. Also the attestation of visarga following a prenasalized stop already exists, according to §5.6.5, then why is it disallowed? Atypical spellings (such as the ones of colloquial words and loan words) should not be considered the second-class use cases when underlying technical rules (instead of language policies) are being drafted.

Best,
梁海 Liang Hai
https://lianghai.github.io

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/comments-sinhala-lgr-02oct18/attachments/20181109/e7884e70/attachment.html>