[Comments-lgr-proposal-malayalam-script-07may20] Comments on the Malayalam LGR proposal

梁海 lianghai at gmail.com
Wed Jun 17 09:28:16 UTC 2020


This latest proposal’s solution (three sequential variants blocking each other) on the “nta” variants is probably okay.

Detailed comments:

1. Considering the Tamil LGR’s position for the two srī variants (<0BB8 SA, 0BCD VIRAMA, 0BB0 RA, 0BC0 VOWEL SIGN II> and <0BB6 SHA, 0BCD VIRAMA, 0BB0 RA, 0BC0 VOWEL SIGN II>) is allocatable, as well as how similar the Tamil srī and Malayalam nta (in terms of the 1a vs. 1b pair in Table 9) cases are, either the Tamil srī disposition or the Malayalam nta disposition is problematic.

In both the Tamil srī case and the Malayalam nta 1a vs. 1b case (note the 1c variant is an orthographical variant of 1a/1b, different from the two cases discussed here), the encoding variants are meant to represent the exactly same written form, and are rendered identical in ideal environments. A careful discussion of the choices made for both the Tamil LGR (for comparison) and the Malayalam LGR is necessary to support the variant disposition in the proposal.

2. As the proposal attempts to correct the inconsistency in how nta is treated in the published LGR-3, there should be clear explanation of the existing issue in LGR-3 and the proposed changes. In particular, an itemized changelog will be helpful. Without clear documentation, this extremely confusing issue will be even more difficult for anyone in the future to understand.

3.

> Page 22, Table 9

As 1a shows glyphs of both the failed shaping and successful shaping, 1b should also show both the failed and successful shaping results, otherwise the discussion context is not clear to readers. The currently shown glyphs are even misleading because it’s actually 1b instead of 1a that commonly fails in shaping. 

Also, it makes more sense to have both 1a and 1b’s intended shaping precede the failed shaping.

As <0D31 RRA, 0D4D VIRAMA, 0D31 RRA> is not shaped by normal fonts as a pair of side-by-side RRA (also suggested by the proposal itself in a later paragraph discussed by me below), the suggested variation here in the 3b variant’s “Glyph” cell (the second glyph) seems to be orthographical, instead of some variation in terms of digital text encoding and shaping. This is inconsistent with the 1a row. As the orthographical variant is already suggested in the 3a row, the second glyph in the 3b row’s “Glyph” cell should be removed.

“It is rare to see a font that does not stack റ്റ, but instead of depending on that weak assumption, …” (page 24)—Unless the proposal can clarify what common fonts do not stack <RRA, VIRAM, RRA>, it’s likely not a “weak assumption” to assume normal fonts do stack the sequence. If this is considered a weak assumption, the whole expectation of how Indic shaping works in normal fonts is a weak assumption and cannot be trusted. The argument here should be orthographical, in order to make <RRA, RRA> and <RRA, VIRAM, RRA> effectively a pair of variants.

4.

> Page 22, “1 b) is how some Microsoft fonts have encoded nta 0D7B + 0D4D + 0D31”

As repeatedly pointed out before, Windows’s text shaping engine does not support this sequence. It is misleading to claim “Microsoft fonts have encoded nta …”, considering all of Microsoft’s Malayalam fonts are produced to be used in environment where the sequence cannot be shaped. Windows’s own legacy encoding <0D28 NA, 0D4D VIRAMA, 200D ZWJ, 0D31 RRA> (which is actually the only encoding intended in both the Kartika and Nirmala UI fonts) is not discussed either (although it’s clear a ZWJ-dependent encoding cannot be handled by LGR).

See Section 5 and Table 1 in L2/19-345R2 (https://www.unicode.org/L2/L2019/19345r2-malayalam-nta.pdf) for a description of the Windows platform’s behavior. Note “supported by font but not platform” and the column for the Windows legacy encoding.

Certain third-party applications like Chrome and Firefox (and recently, the Chromium-based new Edge browser from Microsoft) use HarfBuzz instead of DirectWrite for text shaping. Therefore these specific environments on Windows is essentially more like the “Android/HarfBuzz” platform in L2/19-345R2’s Table 1, instead of “Windows/DirectWrite”.

Best,
梁海 Liang Hai
https://lianghai.github.io



More information about the Comments-lgr-proposal-malayalam-script-07may20 mailing list