[Neobrahmigp] Malayalam LGR Proposal 20190218

veena solomon veena.ycet at gmail.com
Tue Mar 5 02:57:02 UTC 2019


Thank you. I will take a look and fix the issues at the earliest.

On Mon 4 Mar, 2019, 10:29 PM Sarmad Hussain <sarmad.hussain at icann.org wrote:

> Dear Veena and NBGP members,
>
> Thank you for sharing the revised version of the Malayalam proposal.
> Please find below feedback on the proposal.  IP would like you to consider
> some further details.
>
> Kindly update and share the proposal for further review by the IP members.
>
> Regards,
> Sarmad
> ------------------------------
>
> To: NeoBrahmi Generation Panel
> From: Integration Panel
>
> The Integration Panel has reviewed the updated draft for the Malayalam LGR
> dated 2019-02-20.
>
> We are noting good progress, but there remain some major pieces that still
> need work to remove inconsistencies and errors, as well as quite a number
> of detailed suggestions for additional improvements of the documentation of
> the proposal.
>
> The biggest issue is that the description and implementation of the
> various rules in the XML rules and DOCx do not yet match. While the listing
> of sequences in the XML now matches that in the DOCx, Section 6.1 in the
> document contains the wrong explanation as to how to handle 0D33 0D33
> pairs, etc. and other inconsistencies.
>
> For Section 7.1.1 we noted that 0D7B was added to the context for 0D4D in
> the DOCx but is *not implemented in the XML*. Is that an oversight?
>
> A further significant issue arises in the context of the new sequence
> Halant+RA. This must be addressed before the LGR can be finalized.
>
> A few of the suggested changes amount to minor, but essential corrections
> (e.g. fixed some code points in the XML and other details in the rules).
> The remaining items represent editorial issues.
>
> - Integration Panel
>
>
>
> *Detailed Recommendations:*
>
>
>
> *DOC:*
>
> (1) Section 7.1.1, the listing of members for "R" needs commas
>
> (2) Section 7.1.1, the listing or members for "R", the "glyph" for 0D4D
> needs a space for better layout - at least on some versions of Word this
> reorders around the opening parenthesis.
>
> (3) There are several discrepancies between the rules stated in the XML
> and the rules stated in Section 7.1.1
>
> We list here the corresponding rule name from the XML and whether a rule
> matches the document or not.
>
>
> (3.1) Rule 1:     H must be preceded by C or the M ു (0D41) or the L ൻ
> (0D7B)
>
>     For this rule, there are several discrepancies between DOCx and XML as
> well as between Section 7.1.1 and Section 6.1 (1).
>
>
>     In the XML this rule is implemented is "follows-only-C- or-0D41". *There
> is no mention of OD7B*
>
>     Also, for the new sequence 0D4D 0D30, this rule is changed to
> "follows-only-C"
>     (H as part of this sequence could not be preceded by 0D41 or 0D7B as
> written)
>
>     Accordingly rule 1 *needs to be restated*, to be brought in alignment
> with
>     the XML - *and  *- the XML *needs to be amended* to match the rule
>     with respect to 0D7B if that is still the intent of the GP.
>
>      [In section 6.1, there is discussion about needing to allow 0D7B 0D4D
> 0D31,
>      but this is not defined as a variant nor allowed in the current
> iteration of
>      the LGR. If necessary change the discussion from "not disallowed" to
> "disallowed"]
>
>      If it is the intent to allow an H to be preceded also by U+07DB
>
>
> (3.2) Rule 2:     M must be preceded by C
>
>     Matches the XML's "follows-only-C" applied to cp's tagged as Matra.
>
>     ( the use of "*only*" should be dropped in this and *all other rule
> names*
>      in the XML as it is implied by the use as a required context )
>
>
>
> (3.3) Rule 3:     B must be preceded by C, V or M
>
>     Matches the XML's "follows-only-C-V-or-M" applied to Anusvaram
>
>
>
> (3.4) Rule 4:     X must be preceded by C, V or M
>
>     Matches the XML's "follows-only-C-V-or-M" applied to Visargam
>
>
>
> (3.5) Rule 5:     L cannot be preceded by B, X or H
>
>     Matches the XML's "follows-B-X-or-H" used as "not-when" applied to
> Chillu
>
>
>
> (3.6) Rule 6:     Label does not begin with L
>
>     Matches the XML's "begins-with-L" used as a trigger for an action with
> disposition "invalid"
>
>
>
> (3.7) Rule 7:     The ള (0D33) cannot immediately follow ള (0D33)
>
>     While this matches the XML's "followed-by-0D33" used as "not-when"
> context, because
>     this rule is avoided by 0D33 pairs that are part of defined sequences.
>     Therefore the rule should be restated:
>
>         Rule 7:     The character ള (0D33) cannot immediately follow ള
> (0D33), except as part of a defined sequence
>
>
>
> (3.8) Rule 8: The റ (0D31) cannot immediately follow റ (0D31)
>
>     While this matches the XML's "followed-by-0D31" used as "not-when"
> context,  because
>     this rule is avoided by 0D31 pairs that are part of defined sequences
> the rule should be restated:
>
>         Rule 8:     The character റ (0D31) cannot immediately follow റ
> (0D31), except as part of a defined sequence
>
>
>
> (4) The discussion of "Set 2" in Section 6.1 no longer matches the
>       solution proposed in the XML. This passage needs to be extensively
> rewritten as follows:
>
>
>
> (4.1) The text documents the earlier solution which disallowed some
> sequences.
>
>       This text should be removed, as it does no describe the actual
> solution, which
>       involves variant sequences and context rules.
>
>       "Therefore, NBGP has decided not to define Set 2 as variants, but to
> handle this case by using a WLE rule. The rule... "
>
>       -->
>
>        "Therefore, NBGP has decided to define a rule (rule 7 in Section
> 7).."
>
>        and replace the following paragraph with new text:
>
> "The sequences U+0D33 U+0D33    ( ളള ) / U+0D33 U+0D4D U+0D33  ( ള്ള )
> and U+0D33 U+0D33 U+0D4D U+0D33  ( ളള്ള ) / U+0D33 U+0D4D U+0D33 U+0D33
> ( ള്ളള ) have been defined as variant pairs. However, these sequences and
> variants are further constrained by context rules on both sequences and
> variants. To make the "null" variant well-behaved, none of the sequences,
> nor U+0D33 ( ള ), may be followed by a further U+0D33 . That limits all
> occurrences of U+0D33 to singletons or explicitly enumerated sequences. At
> the same time, the variant mappings are not defined if a sequence follows
> U+0D33 U+0D4D or follows U+0D4D U+0D33, in other words, if it is part of a
> longer sequence of 0D33 ( ള ) joined by Halant."
>
> (4.2) An explanation of the context rules involving "R" needs to be
> provided
>
> Immediately add a paragraph:
>
> "If a reordrant matra follows a sequence it would graphically intervene,
> thus making the sequences no longer variants. Therefore, the variants are
> also not defined if a sequence is followed by a reordrant matra. These two
> context rules are combined into the single context on the variant mapping:
>
>      V1: A variant preceded by 0D33+Halant or followed by 0D33 or R or
> Halant+0D33 is not defined"
>
>
>
>  (4.3) The description of the analogous case of U+0D31 needs to be fixed:
>
> Change:
>
>
>     "but instead of depending on that weak assumption, a WLE rule has been
> added."
>
> To:
>
>     "but instead of depending on that weak assumption, sequences and
> variants have been defined in an entirely analogous manner to U+0D33 with a
> variant context:
>
>
>      V2: A variant preceded by 0D31+Halant or followed by 0D31 or R or
> Halant+0D31 is not defined"
>
>
> (5) The added "community input" in the appendix is not easy to follow: it
> is unclear
>       what conclusions the GP drew from the feedback and what changes were
> made
>       or not made in response.
>
>        Perhaps the appendix could start with an opening paragraph:
>
>        "This appendix contains copies of all input related to the case of
> ള (0D33) + ള (0D33). For the adopted solution see  (Section 6.1)."
>
>
> (6) Reference to MSR needs to be to *final public version* of the MSR*:*
>
> [MSR]     Integration Panel, "Maximal Starting Repertoire — MSR-4 Overview
> and Rationale", 7 February 2019 https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf
> [icann.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_en_system_files_files_msr-2D4-2Doverview-2D25jan19-2Den.pdf&d=DwMDaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=UCxpDqUlGaog-X21OXwOjq9jbdyfyKjr7WhcB0neIEI&s=k7veK-3ASWg36cWFLYo9dq4Pirwpx9RKxrDwOuGVJOw&e=> (Accessed
> on 18th February, 2019)
>
> (7) The definitions in Section 7.1.1 define a category "R". This category
> has
>       several issues:
>
>     (a) It is not referred to in any of the rules
>           (because it appears in variant contexts only, see (4.1))
>
>      (b) It contains not just code points, but one sequence
>            (see XML.9 below)
>
>      (c) It is not discussed in the document, except in an appendix.
>
>      Perhaps a note should be added to 7.1.1 that "R" is used in variant
>     contexts and point the reader to Section 6.1 for details.
>
>
>
> (8) the technical term in Unicode for R is "reordrant" matra, and the IP
> recommends to follow that terminology where possible. (Instead of
> "reordering" matra).
>
>
>  This comment (8) applies to the XML as well.
>
>
>  (9) See discussion below for the XML on the sequence U+0D4D U+0D30:
> depending on how that feedback is resolved, the new sequence may or may not
> be unnecessary and could be removed. Otherwise, it would be helpful to have
> a bit more explanation that describes how Halant+RA function in a limited
> way as reordrant matra and what the implication of that is for IDNs.
>
> *XML*
>
> (1) XML passes tool
>
>
>
> (2) Lines 370 and 402 each have a bogus code point:
>       D433 - presumably 0D33 and 0D31 respectively are intended instead
>
>
>
> (3) rule "follows-0D33 ..." has an extra "l" in "follows".
>
>
>
> (4) reference to MSR-4 needs to be to *final public version*:
>
> [MSR-4]     Integration Panel, "Maximal Starting Repertoire — MSR-4
> Overview and Rationale", 7 February 2019 https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf
> [icann.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_en_system_files_files_msr-2D4-2Doverview-2D25jan19-2Den.pdf&d=DwMDaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=UCxpDqUlGaog-X21OXwOjq9jbdyfyKjr7WhcB0neIEI&s=k7veK-3ASWg36cWFLYo9dq4Pirwpx9RKxrDwOuGVJOw&e=>
>   (Accessed on 18th February, 2019)
>
> (5) Line 359 and 391: the comment has an extra space before "precede" and
> "follwowed" should be spelled "followed"
>
>
>
> (6) On line 102 the word "prededed" should be "preceded".
>
>
>
> (7) On line 101 the code point 0D33 should be 0D31
>
>
>
> (8) On line 67, the last word should be "conjunct".
>
>
>
> (9) The "variable" R from the rules in section 7.1.1 is implemented as a
> "class"; this means it is not
>      possible to account for the sequence <U+0D4D u+0D30> as part of the
> definition of "R".
>
>
>
>     A possible fix would be to also define a *rule* R as follows:
>
>
>
>     <rule name="R" comment="Reordrant Malayalam matras, including sequence
> U+0D4D U+0D30">
>
>         <choice>
>
>             <class by-ref="R" />
>             <char cp="0D4D 0D30" />
>
>          </choice>
>
>       </rule>
>
>
>
>      and to change any <class by-ref="R"> on lines 354 and 386 to <rule
> by-ref="R" />
>
>
>
> (10) " or reordrant vowel" should become  " or reordrant matra" in two
> comments on rules.
>
>
>
> (11) After text has been added to the proposal document the [TBD] on line
> 68 should become:
>
> More details in Sections 6.1 "In-script Variants" and 7.1.1 "Variables or
> definitions" of the [Proposal]
>
> (12) The <description> is could use expanded documentation on the context
> rules for variants.
>
>         Suggested text for the end of the "Variants" section:
>
>
>
> <p>Context Rules for Variants: some of the variants defined in this LGR
> are "effective null variants", that is,
>     some code points in the source map to "nothing" in the target with all
> other code points unchanged.
>     (Because mappings are symmetric, it does not matter whether it is the
> forward or reverse mapping that
>     maps to "null"). Such variants require a context rule to keep the
> variant set well-behaved. Symmetry requires
>     the same context rule for both forward and reverse mappings.</p>
>
>     <p>In other cases, the sequences or code points making up source and
> target are constrained by context
>     rules on the code points. In such a case, any variants require context
> rules that match the intersection
>     between the contexts for both source and target; otherwise a sequence
> might be considered valid in some
>     variant label when it would not be valid in an equivalent context in
> an original label.</p>
>
> (13) Suggested text for the end of the WLE section (add above "The rules
> are:")
>
>     <p>Note: the Reordrant Matras include one sequence. That requires an
> auxiliary rule R in addition to class R.</p>
>
>
>
> (14) The description of several of the character classes could be edited
> as follows to align better with the names for the character classes being
> described as well as general copy editing:
>
>
>
>     <p>Consonant: Malayalam is written in an abugida script derived
> ultimately from Brāhmī in which
>     every consonant carries an inherent a. More details in Section 3.8,
> "The Structure of
>     Malayalam Script" of the [Proposal].</p>
>
>     <p>Matra: Vowels other than the inherent vowel are written as vowel
> diacritics. They are referred to as Matras,
>     when they follow consonants. More details in Section 3.8, "The
> Structure of Malayalam Script" of the [Proposal].</p>
>
>     <p>Halant: A consonant can be combined with another consonant or
> conjunct
>     using the halant encoded as U+0D4D MALAYALAM SIGN VIRAMA. This strips
> off the implicit vowel.
>      More details in Section 3.8, "The Structure of Malayalam Script" of
> the [Proposal].</p>
>
>     <p>Anusvaram: In Malayalam, anusvara represented as ം (0D02), simply
> represents a consonant /m/ after a vowel,
>     though this /m/ may be assimilated to another nasal consonant. More
> details in Section 3.8 "The Structure of Malayalam
>     Script" of the [Proposal].</p>
>
>     <p>Visargam: /വിസർഗം,/ (visargam), or visarga, represents a consonant
> /h/ after a vowel,
>     and is transliterated as ḥ. Like the anusvara, it is a special symbol,
> and is never followed by an
>     inherent vowel or another vowel. More details in Section 3.8, "The
> Structure of Malayalam
>     Script" of the [Proposal].</p>
>
>     <p>Chillu: Chillu letters, aka "Chillaksharam", represent pure
> consonants without any vowel sound.
>     More details in Section 3.8, "The Structure of Malayalam Script" of
> the [Proposal].</p>
>
> (15) the word "-only-" can be deleted from all rule names in the XML as it
> is redundant
>
> (16) The list of rules in the <description> could be numbered (and split
> into separate lists) with additional information as follows:
>
>     <p>The rules are: </p>
>      <ul>
>          <li>1. H: must be preceded by C or 0D41</li>
>          <li>2. M: must be preceded by C</li>
>         <li>3. B: must be preceded by C, V or M</li>
>         <li>4. X: must be preceded by C, V or M</li>
>         <li>5. L: cannot be preceded by B, X or H</li>
>         <li>6. A label does not begin with L</li>
>      </ul>
>      <p>The following context rules  apply to code points U+0D33 and
> U+0D31 as well as to sequences ending in these code points:</p>
>      <ul>
>         <li>7. The character ള (0D33) cannot immediately follow ള (0D33),
> except as part of a defined sequence</li>
>         <li>8. The character റ (0D31) cannot immediately follow റ (0D31),
> except as part of a defined sequence</li>
>      </ul>
>     <p>The following context rules apply to variants:</p>
>      <ul>
>         <li>V1: A variant preceded by 0D33+Halant or followed by 0D33 or R
> or Halant+0D33 is not defined</li>
>         <li>V2: A variant preceded by 0D31+Halant or followed by 0D31 or R
> of Halant+0D31 is not defined</li>
>      </ul>
>
>     <p>More details in Section 6.1 "In-script Variants" and Section 7,
> "Whole Label Evaluation Rules (WLE)" of the [Proposal]</p>
>
>
> (18) Some reviewers found it difficult to relate rules 7 and 8 to the
> context rules defined. Add a further note at then end of the rules section:
>
> <p>Note: the implementation of Rules 7 & 8 relies on the fact that a
> context rule is not evaluated between code points in the same sequence. For
> example, if a label contains two adjacent U+0D33 U+0D33 surrounded by other
> code points , the two code points can only be interpreted as the sequence
> U+0D33 U+0D33 ളള because a singleton U+0D33  ള  is not allowed to be
> followed by another U+0D33 ള.</p>
>
> (19) update the comments on the following rules as follows:
>
> <rule name= "followed-by-0D33" comment="Section 7, WLE 7. The character ള
> (0D33) cannot immediately follow ള (0D33), except as part of a defined
> sequence">
>
>
>  <rule name= "followed-by-0D31" comment="Section 7, WLE 8. The character റ
> (0D31) cannot immediately follow റ (0D31), except as part of a defined
> sequence">
>
> <rule name= "follows-0D33-0D4D-or-followed-by-0D33-or-0D4D-0D33-or-R"
> comment="Section 6.1, V1: variant not defined if preceded by 0D33+Halant or
> followed by Halant+0D33 or 0D33 or R">
>
>
> <rule name= "follows-0D31-0D4D-or-followed-by-0D31-or-0D4D-0D31-or-R"
> comment="Section 6.1, V2: variant not defined if preceded by 0D31+Halant or
> followed by Halant+0D31 or 0D31 or R">
>
>             (Move the rule up, so it follows rule 7; reorder the remaining
> unnumbered <rule> elements so those
>              referring to 0D33 occur consistently before those referring
> to 0D31).
>
>
>
>
> (20) Change all instances of "prevent variant if" in XML to "variant not
> defined if" to match language elsewhere.
>
>
>  (21) Definition of sequence U+0D4D U+0D30
>
> Given the following excerpt from the repertoire table (from the XML, but
> shown here as formatted in the HTML format):
>
> U+0D4D
>
>>
> Malayalam
>
> MALAYALAM SIGN VIRAMA
>
> [106]
>
> Halant
>
> follows-only-C-or-0D41
>
>>
>
>
>
>
> U+0D4D U+0D30
>
> ്ര
>
> [Malayalam]
>
> MALAYALAM SIGN VIRAMA + MALAYALAM LETTER RA
>
>
>
>
>
> follows-only-C
>
>>
>
>
> we notice that the sequence U+0D4D U+0D30  has a **more restrictive**
> context rule than the singleton (0D4D) that starts the sequence. As a
> result the difference in context rule becomes *ineffective*. We believe
> that this reflects a common misunderstanding about how "partitions" work in
> a label in the evaluation of context rules.
>
> In a label .... 0D41 0D4D 0D30 .... the partition .... {0D41} {0D4D}
> {0D30} .... would lead to a *valid *label (given that there is no context
> rule for 0D30). Therefore, the alternate partition .... {0D41} {0D4D 0D30}
> ...., even though it generates an invalid label, is *ignored*. It does
> not somehow "veto" or "override" the other legal partition.
>
> If it is somehow important to prevent Halant+Ra from following vowel sign
> U, then the context rule for Halant could be changed to
>
>     follows-only-C-or-0D41-and-precedes-anything-but-0D30
>
> That would force any combination of 0D4D 0D30 to use the defined sequence
> and its context rule. Writing such a rule requires expressing the
> equivalent of [^\u0D30] in Regex notation. The formulation would be a bit
> involved, but not too much.
>
> We further note that the sequence occurs one more time in the LGR for the
> following definition:
>
>         R    →     Re-Ordering Matra
>                     R =  ( െ, േ,  ൈ, ൊ, ോ,  ് + ര)
>             U+0D46 (െ) U+0D47 (േ)U+0D48 (ൈ) U+0D4A (ൊ) U+0D4B (ോ) and [U+0D4D
> (്) U+0D30 (ര)]
>
> It is not possible for a context rule to directly affect such a
> definition. Therefore, for the purpose of the definition, any occurrence of
> U+0D4D U+0D30 would be a "reordrant matra".
>
> The definition of R is used *only* in cases where this sequence follows
> either 0D31 or 0D33. Even though it is the case that following 0D41 (or
> probably any non-consonant) this sequence does not act like a reordrant
> matra, the definition is never invoked in these cases, therefore, it would
> not be necessary to give this sequence a restricted context for the purpose
> of defining R.
>
> Finally, it is not a requirement that a sequence cited in the body of a
> rule must be listed as a sequence in the repertoire. The latter is only
> necessary if the sequence is to participate in context rule evaluation
> (that is, if the sequence can take the place of an "anchor" in a rule).
> That's not the case for "R" in the Malayalam LGR.
>
> The IP does not have enough data to make a final recommendation, because
> that would depend on the intent of the GP:
>
> (A) We performed some searches (after prefixing everything with a
> consonant 0D15), and found no matches for കു്ര , but matches for കു് (that
> is, without the RA). That seems to indicate that the GP is correct in that
> such a sequence of Halant+RA does not occur following an 0D41.
>
> (B) However, that does not necessarily mean that such a sequence must be
> prohibited for LGRs - that would be a separate conclusion.
>
> (C) On at least one test system used, the sequence displays fine; that is
> perhaps not surprising because RA itself is not a combining mark (കു്ര). Taking
> the U out in the middle of the sequence gets ക്ര, which is easily found
> in online documents -- and which displays the Halant+RA before the
> consonant.
>
> The IP can only give this conditional feedback:
>
>    - If the restriction is *not required* (that is, the only cost is
>    overproducing some labels that are "nonsense" but still recognizable) then
>    IP would recommend to delete that sequence from the repertoire, on the
>    grounds that the rule is a "spelling rule".
>    - If there is valid claim that this allowing this sequences following
>    0D41 is a concern from user confusion/security aspect, then the GP would
>    need to  provide a corrected version context rule for the bare code point
>    that actually works as intended.
>    - Neither of the options impacts the definition of R from the
>    perspective of the LGR; linguistically, the sequence cannot follow 0D41 and
>    still be reordrant (but that's not important here).
>
> *TXT*
>
> testing TBD: further testing awaits corrections of errors and omissions in
> the normative part of the XML.
>
>
> ------------------------------
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190305/3fa4cee5/attachment-0001.html>


More information about the Neobrahmigp mailing list