[Neobrahmigp] IP Review of the Proposal for Devanagari LGR 20181212a

Akshat Joshi akshatj at cdac.in
Tue Jan 22 05:33:00 UTC 2019


Dear Dr. Sarmad,

I acknowledge the receipt of the IP comments. Will get back soon on this.

Regards,

Akshat

On 22-01-2019 07:46, Sarmad Hussain wrote:
>
> Dear Akshat, NBGP members,
>
> Please find attached the review of the Devanagari LGR proposal by the 
> IP members.
>
> An annotated file is attached with IP comments with additional 
> comments below.
>
> Please let us know if you have any queries.  We look forward to your 
> final review.
>
> Regards,
> Sarmad
>
> To: Neo-Brahmi Generation Panel
>
> From: Integration Panel
>
> We have reviewed your latest LGR version dated 20181212a and noticed 
> several discrepancies between the XML file and the DOCx file. The 
> suggested changes to the DOCx file are explained here, but recommended 
> suggested wording is found in attached document 
> "LGR-Proposal_Devanagari_20181212a_IP_Review-2.docx".
>
> (1) The recent updated added 3 additional in-script variant sets
>
>   * ॲंU+0972 U+0902 - अँU+0905 U+0901
>   * एँU+090F U+0901 - ऍंU+090D U+0902
>   * ऑंU+0911 U+0902 - आँU+0906 U+0901
>
>  These variants form a third class of variants where candra vowels 
> plus anusvara mimic candrabindu. This is missing from the XML 
> <description> which reads:
>
>     They fall into two broad categories:
>
>       * Vowel/Vowel sign followed by Nukta
>       * Unique Vowels and Vowel Signs required for Kashmiri
>
> This should be changed in the XML file to something like:
>
>     They fall into three broad categories:
>
>       * Vowel/Vowel sign followed by Nukta
>       * Unique Vowels and Vowel Signs required for Kashmiri
>       * Variants based on Candrabindu and Candra Vowel Signs followed
>         by Anusvara
>
> (2) Defining variant targets that are sequences requires the formal 
> listing of sequences as repertoire elements. These sequences do not 
> "inherit" the same context rules as their constituent code points, so 
> care must be taken to supply the necessary matching context rules 
> explicilty, lest a sequence be used to unintentionally override a 
> restriction.
>
> In the document, context rules are listed only generically in section 
> 7, we think that for purposes of clarity, they should be mentioned in 
> the discussion of variants (suggested text has been supplied).
>
> (3) Variant mappings may need additional context rules, and these have 
> been introduced in Sections 6.1.1 and 6.4.1 in the document. However, 
> it is important to note that RFC 7940 defines a variant as a tuple 
> consisting of both the mapping and the context. Therefore, symmetric 
> mappings must have formally matching context rules, even if logically 
> such contexts never occur.
>
> The XML file already had matching mappings, but we feel the text in 
> the DOCx file should contain an explanation of this process. Suggested 
> text has been supplied.
>
> (4) In one case, the context rules on one of the sequences and two 
> variant mappings in the XML was incorrect. Instead of 
> when(follows-only-V-or-C-or-N-or-M) it should be 
> when(follows-only-C-or-CN).
>
> This needs to be fixed in the XML file (the discussion in the DOCx 
> file is correct, except for the missing context on the reverse 
> mapping, see issue (3).)
>
> (5) In some cases the XML contains context mappings on variants that, 
> while symmetric, are in fact redundant: not-when(preceded-by-H). Not 
> only do both code points/sequences that form source and target of the 
> variant relation have the same context required on the code point 
> level, but there are no other variant relations that could lead to the 
> introduction of a Halant in the label during variant processing.
>
> We suggest removing these redundant variant contexts, because it 
> simplifies the LGR and makes it easier to spot the cases where a 
> context is required and makes a difference. (See below for more detail).
>
> (6) There are some more minor editorial issues (commas, typos, usage) 
> that are noted in the attached document. Individually not of high 
> priority, but since the document has to be touched, they are worth 
> attending to.
>
> (7) As always, with any suggestions, we request that the GP consider 
> the issue and either accept these recommendations of make appropriate 
> other changes.
>
> (8) The label files were processed against the 20181212a XML file and 
> achieved matching results with our tools.
>
> (9) Please find attached below an excerpt from a more detailed 
> analysis of some of the variant sets as performed by the IP. The 
> conclusions should all be summarized above already, but the fuller 
> context may clarify the reasoning behind our recommendations.
>
> We are looking forward to receiving an updated and finalized proposal 
> soon, so we can complete the integration process.
>
>
> --- Integration Panel.
>
> ------------------------------------------------------------------------
>
>
>    Analysis of Candrabindu / Candra Vowel + Anusvara Variants
>
> _Problem statement_:
>
> In the 20181212a XML, the /code point /context rules for one of the 
> sequences is _more permissive_ than for some constituent singletons 
> (0945) and that seems suspect:
>
> U+0945
>
> 	
>
>>
> 	
>
> Devanagari
>
> 	
>
> DEVANAGARI VOWEL SIGN CANDRA E
>
> 	
>
> matra
>
> 	
>
> *follows-only-C-or-CN*
>
> 	
>
> set 34
>
> U+0945 U+0902
>
> 	
>
> ॅं
>
> 	
>
> [Devanagari]
>
> 	
>
> DEVANAGARI VOWEL SIGN CANDRA E + DEVANAGARI SIGN ANUSVARA
>
> 	
>
> **
>
> 	
>
> *follows-only-V-or-C-or-N-or-M*
>
> 	
>
> set 1
>
> As written the sequence would also be allowed following any V or M or 
> N, meaning that, for example, ....0945 0945... could occur as part of 
> a label. That seems somehow implausible. In order to understand this 
> situation a bit better, the IP wrote down all the variants (from 
> section 6.4) and the context rules for both code points and variants 
> as found in the XML.
>
> This lead to the following _analysis and recommendation_:
>
>     The following mappings were found to have been defined in the XML
>     as blocked variants:
>
>     Variant Set 1
>
>     	
>
>     0901 <--> 0945 0902 : when(follows-only-V-or-C-or-N-or-M)
>
>     Variant Set 2
>
>     	
>
>     093E 0901 <--> 0949 0902 : when(follows-only-C-or-CN)
>
>     Variant Set 3
>
>     	
>
>     0905 0901 <--> 0972 0902 : not-when(preceded-by-H)
>
>     Variant Set 4
>
>     	
>
>     090D 0902 <--> 090F 0901 : not-when(preceded-by-H)
>
>     Variant Set 5
>
>     	
>
>     0906 0901 <--> 0911 0902 : not-when(preceded-by-H)
>
>      All of these /Code point/ Variant Sets are trivially transitive
>     (symmetric pair). Note the added context rule for each. The
>     /Variant /contexts for all Variant Sets are LHS (left-hand-side).
>
>     Here are the LHS /code point /contexts that apply to the leading
>     code points or the _whole sequence_ (if different) in either
>     variant in these sets:
>
>     0901
>
>     	
>
>     Leading
>
>     	
>
>     when(follows-only-V-or-C-or-N-or-M)
>
>     0905
>
>     	
>
>     Leading
>
>     	
>
>     not-when(preceded-by-H)
>
>     0906
>
>     	
>
>     Leading
>
>     	
>
>     not-when(preceded-by-H)
>
>     090D
>
>     	
>
>     Leading
>
>     	
>
>     not-when(preceded-by-H)
>
>     090F
>
>     	
>
>     Leading
>
>     	
>
>     not-when(preceded-by-H)
>
>     0911
>
>     	
>
>     Leading
>
>     	
>
>     not-when(preceded-by-H)
>
>     093E
>
>     	
>
>     Leading
>
>     	
>
>     when(follows-only-C-or-CN)
>
>     0945
>
>     	
>
>     Leading
>
>     	
>
>     when(follows-only-C-or-CN)
>
>     0945 0902
>
>     	
>
>     Sequence
>
>     	
>
>     when(follows-only-V-or-C-or-N-or-M)
>
>     0949
>
>     	
>
>     Leading
>
>     	
>
>     when(follows-only-C-or-CN)
>
>     0972
>
>     	
>
>     Leading
>
>     	
>
>     not-when(preceded-by-H)
>
>      We see that 0945 / 0945 0902 is the only exception where the
>     _sequence _has a different context applied to the singleton
>     leading code point. This appears to be *in error*. It occurs in
>     Variant set 1, where the two variants 0901 / 0945 0902 have
>     different /code point /contexts from each other, with the /code
>     point /context for 0901 being a superset of the context for 0945
>     (leading).
>
>     We conclude that the intent had been to restrict the variant to
>     the lowest common context and therefore the /variant /context
>     condition for variant set 1 as well as the /code point/ context
>     for sequence 0945 0902 should instead be set to
>     *when(follows-only-C-or-CN)*.
>
>     No sequence in any variant set contains any of the variants as
>     subsets, therefore none are “effective null variants”.
>
>     All variant sequences end in 0901 or 0902 each of which may be
>     followed by the same collection of code points under the WLE and
>     context rules of the LGR. Therefore, the LHS contexts (after
>     correction) limit the occurrence of variants to the same contexts
>     as are permitted for the sequences.
>
>     Having a context rule on the variant that matches the code point
>     contexts for both members of the variant set makes that
>     restriction explicit; formally it would be redundant , as long as
>     no permuted variant  can create an adjacent (before or after)
>     context that would violate the /code point /context for the
>     sequence after variant substitution.
>
>     For example, no variant set in the LGR results in a trailing
>     Halant (094D) so “not-when(preceded-by-H)” isn’t a /variant/
>     context that can be triggered if the /code point/ contexts on
>     either variant sequence are already “not-when(preceded-by-H)”.
>
>     All the variant sets involving consonants are cross-script
>     variants, so the /variant context /“when(follows-only-C-or-CN)”
>     cannot become no longer satisfied by a variant substitution
>     immediately prior — as long as both variants in the pair have
>     /code point /contexts of “when(follows-only-C-or-CN)”.
>
>     The one exception is Variant Set 1, where the two sequences have
>     *different /code point /contexts. *In that case, the variant
>     context is needed and it must be set to equal the lowest common
>     /code point /context for the two variants of that set.
>
>     We conclude that *the variant contexts for sets 2-5 can be removed
>     as redundant* (and the code point context for one sequence as well
>     as variant context for variant set 1 should be *corrected *per above).
>
> ------------------------------------------------------------------------
>
-- 
Regards,
Akshat Joshi
C-DAC GIST


------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190122/30e81d68/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.jpg
Type: image/jpeg
Size: 7789 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190122/30e81d68/signature-0001.jpg>


More information about the Neobrahmigp mailing list