[Comments-malayalam-tamil-25sep18] A quick review of the Tamil proposal

Sun Oct 7 03:59:52 UTC 2018

- §3, “tholkəppɪyəm” and many other similar cases throughout the document: Use a common transliteration scheme (say, ISO 15919) consistently.

- §3.1, “The image below shows how vaṭṭeḻuttu got transformed as Tamil letters.”: The introduction below Figure 1 says the image is about Tamil Brahmi diverging to Vaṭṭeḻuttu and Tamil.

- §3.3.1, “It should also be noted that
as per Tamil traditional grammar … in Tamil Traditional grammar.”: This is not a contrast between the Tamil traditional grammar and Unicode’s terminologies, but simply not clearly distinguishing a “consonantal sound/phoneme” and a “consonantal letter/grapheme” (Indic consonantal graphemes don’t necessarily represent pure consonantal phonemes) in common discussions. Eg, the sentence “The Unicode Consonant set of Tamil comprises the following characters” can be rephrased to “Tamil consonantal letters are encoded as the following Unicode characters”, if the authors want to make the distinction clear.

- §3.3.1, Table 3: Oh c’mon, can we at least clean up the format so phonetic transcriptions are legible?

- §3.3.2, Example 1 and 2: Use a clear format if character names are used, eg, <TAMIL LETTER KA, TAMIL SIGN VIRAMA, TAMIL LETTER SSA>.

- §3.3.4, “The Visarga is also used in Tamil and represents a sound very close to /ḵ/”: “ḵ” is the ISO 15919 transliteration for ஃ, not a phonetic transcription. This setence effectively means “ஃ represents a sound very close to /ஃ/”.

- §3.3.4, “To facilitate this modern usage apart from barring Visarga – Visarga combination …”: Is this saying the restriction of <…, visarga, visarga, …> is to be lifted in WLE? If yes, then there’s a conflict with the actual WLE rules in §7; if no, then since it’s perfectly valid to have <…, syllble final visarga, pre-consonant-modifier visarga, …> in spelling (eg, a native visarga-ending word followed by a visarga-begininng loanword), the restriction should be lifted and it’s inappropriate to include the rule 3 in §7.

- §4.1.2.4, ’AU LENGTH MARK “ௗ” (U+0BD7) is a character in Tamil which has been added to Unicode and is very rarely used in Modern Tamil.’: This character is encoded in Unicode for technical reasons, and is part of ஔ and ௌ’s canonical decompositions (basically, because U+0BCC TAMIL VOWEL SIGN AU has its canonical decomposition as <U+0BC6 TAMIL VOWEL SIGN E, U+0BD7 TAMIL AU LENGTH MARK>, U+0BD7 is technically as much used as U+0BCC is), therefore the rationale of “very rarely used in Modern Tamil” is inappropriate although the recommendation of excluding it on the exclusion-principle-level is okay (the variation-level is probably better?). Instead probably state the character is not used in valid Tamil text that is in Unicode Normalization Form C (NFC).

- §5.2, “for Tamil Language that NBGP has considered as given in 3.2”: Isn’t it stated in §3.2 that “they have not been considered in the present analysis”?

- §5.2, Table 5: Should note the “Indic syllabic category” column is not about the Unicode character property of the same name.

- §5.2.1, Table 6a: Are those “=” in the second column intended?

- §5.5, “… in the form of variables”: These are not variables but notation.

- §5.5.4, “3. A sequence of consonants …”: As it’s already stated in 5.5 that this section is about “Akshar formation”, it’s unclear why the authors are bringing phonetc syllables into the discussion. The so called “CHCHC” is just 3 separate akshars, <CH, CH, C>, although phonetically they belong to the same syllable. As Tamil doesn’t ever have pre-base stuctures written before (to the left of) a pulli-ed consonant letter, it doesn’t need to follow Devanagari’s practice and can safely simplify its akshar formation logic to allow only a single base consonant — the special cases śrī and kṣV should be discussed as special cases (because they’re really special, unlike anything else in Tamil).

  - The akshar formation pattern is thus simplified to `(C[M]|V)[X] | C[H]` (note V is just a special case of CM when C is zero), or expanded to `C[M][X] | V[X] | C[H]` for the sake of being more comprehensible. But note this analysis hasn’t taken the akshar-preceding visarga/aytham into consideration.

- §6.1.3: My first impression is this should be “blocked” because <sa, virama, ra, vowel sign i> and <sha, virama, ra, vowel sign i> are indistinguishable, but I’m aware that I’m not familiar enough with the concept of “allocatable” vs “blocked”. If the intention is to explicitly allow the applicant to make both encodings aliases to each other (while still blocking other applicants from applying for the variant) so users can access the same domain no matter which encoding they use, then it’s good.

- §6.3: Should note such cases are already eliminated by the NFC requirement of IDNA2008?

- §6.4, Table 21: The bottom-right cell has a wrong rendering of the Malayalam text.

- §6.5.2 Allocatable variants: See the comment above for 6.1.3. (I’m not confident about my understanding of “allocatable” and “blocked”…)

- §7, “… by all the languages mentioned in section 3.2 …”: See the comment above for 5.2, “… for Tamil Language that NBGP has considered as given in 3.2”.

- §7, “Below are the specific WLE rules”: See comment above for §3.3.4, “To facilitate this modern usage apart from barring Visarga – Visarga combination …”. Also, it’s unclear how the authors achieved this set of rules from §5.5’s analysis. Considering the consonant-modifier visarga, the akshar pattern should be `[X]C[M][X] | V[X] | [X]C[H]` (or even allowing X to precede V, if there’s attestation).

- §11 Appendix A, Table 22: Based on the same level of similarity, the following pairs (and probably more) should also be considered: U+0B89 TAMIL LETTER U and U+0D09 MALAYALAM LETTER U, U+0BB5 TAMIL LETTER VA and U+0D35 MALAYALAM LETTER VA, U+0BB7 TAMIL LETTER SSA and U+0D37 MALAYALAM LETTER SSA.

Best,
梁海 Liang Hai
https://lianghai.github.io

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/comments-malayalam-tamil-25sep18/attachments/20181007/19d8e4c5/attachment.html>