[Neobrahmigp] IP Review of the Proposal for Devanagari LGR 20181212a

Sarmad Hussain sarmad.hussain at icann.org
Tue Jan 22 02:16:23 UTC 2019


Dear Akshat, NBGP members,

 

Please find attached the review of the Devanagari LGR proposal by the IP members.

 

An annotated file is attached with IP comments with additional comments below.

 

Please let us know if you have any queries.  We look forward to your final review.

 

 

Regards,
Sarmad

 

To: Neo-Brahmi Generation Panel

From: Integration Panel

 

We have reviewed your latest LGR version dated 20181212a and noticed several discrepancies between the XML file and the DOCx file. The suggested changes to the DOCx file are explained here, but recommended suggested wording is found in attached document "LGR-Proposal_Devanagari_20181212a_IP_Review-2.docx".

 

(1) The recent updated added 3 additional in-script variant sets

 

*	ॲं  U+0972 U+0902 - अँ U+0905 U+0901 
*	एँ U+090F U+0901 - ऍं U+090D U+0902 
*	ऑं U+0911 U+0902 - आँ U+0906 U+0901 

 

 These variants form a third class of variants where candra vowels plus anusvara mimic candrabindu. This is missing from the XML <description> which reads:

They fall into two broad categories:

*	Vowel/Vowel sign followed by Nukta
*	Unique Vowels and Vowel Signs required for Kashmiri

This should be changed in the XML file to something like:

 

They fall into three broad categories:

*	Vowel/Vowel sign followed by Nukta
*	Unique Vowels and Vowel Signs required for Kashmiri
*	Variants based on Candrabindu and Candra Vowel Signs followed by Anusvara

 

(2) Defining variant targets that are sequences requires the formal listing of sequences as repertoire elements. These sequences do not "inherit" the same context rules as their constituent code points, so care must be taken to supply the necessary matching context rules explicilty, lest a sequence be used to unintentionally override a restriction.

 

In the document, context rules are listed only generically in section 7, we think that for purposes of clarity, they should be mentioned in the discussion of variants (suggested text has been supplied).

 

(3) Variant mappings may need additional context rules, and these have been introduced in Sections 6.1.1 and 6.4.1 in the document. However, it is important to note that RFC 7940 defines a variant as a tuple consisting of both the mapping and the context. Therefore, symmetric mappings must have formally matching context rules, even if logically such contexts never occur.

 

The XML file already had matching mappings, but we feel the text in the DOCx file should contain an explanation of this process. Suggested text has been supplied.

 

(4) In one case, the context rules on one of the sequences and two variant mappings in the XML was incorrect. Instead of when(follows-only-V-or-C-or-N-or-M) it should be when(follows-only-C-or-CN).

 

This needs to be fixed in the XML file (the discussion in the DOCx file is correct, except for the missing context on the reverse mapping, see issue (3).)

 

(5) In some cases the XML contains context mappings on variants that, while symmetric, are in fact redundant: not-when(preceded-by-H). Not only do both code points/sequences that form source and target of the variant relation have the same context required on the code point level, but there are no other variant relations that could lead to the introduction of a Halant in the label during variant processing.

 

We suggest removing these redundant variant contexts, because it simplifies the LGR and makes it easier to spot the cases where a context is required and makes a difference. (See below for more detail).

 

(6) There are some more minor editorial issues (commas, typos, usage) that are noted in the attached document. Individually not of high priority, but since the document has to be touched, they are worth attending to.

 

(7) As always, with any suggestions, we request that the GP consider the issue and either accept these recommendations of make appropriate other changes.

 

(8) The label files were processed against the 20181212a XML file and achieved matching results with our tools.

 

(9) Please find attached below an excerpt from a more detailed analysis of some of the variant sets as performed by the IP. The conclusions should all be summarized above already, but the fuller context may clarify the reasoning behind our recommendations.

 

We are looking forward to receiving an updated and finalized proposal soon, so we can complete the integration process.


--- Integration Panel.

  _____  


 Analysis of Candrabindu / Candra Vowel + Anusvara Variants


Problem statement:

 

In the 20181212a XML, the code point context rules for one of the sequences is more permissive than for some constituent singletons (0945) and that seems suspect:


U+0945

ॅ

Devanagari

DEVANAGARI VOWEL SIGN CANDRA E

matra

follows-only-C-or-CN

set 34 


U+0945 U+0902

ॅं

[Devanagari]

DEVANAGARI VOWEL SIGN CANDRA E + DEVANAGARI SIGN ANUSVARA

 

follows-only-V-or-C-or-N-or-M

set 1 

 

As written the sequence would also be allowed following any V or M or N, meaning that, for example, ....0945 0945... could occur as part of a label. That seems somehow implausible. In order to understand this situation a bit better, the IP wrote down all the variants (from section 6.4) and the context rules for both code points and variants as found in the XML. 

 

This lead to the following analysis and recommendation:

The following mappings were found to have been defined in the XML as blocked variants:


Variant Set 1

0901 <--> 0945 0902 : when(follows-only-V-or-C-or-N-or-M)


Variant Set 2

093E 0901 <--> 0949 0902 : when(follows-only-C-or-CN)


Variant Set 3

0905 0901 <--> 0972 0902 : not-when(preceded-by-H)


Variant Set 4

090D 0902 <--> 090F 0901 : not-when(preceded-by-H)


Variant Set 5

0906 0901 <--> 0911 0902 : not-when(preceded-by-H)

 All of these Code point Variant Sets are trivially transitive (symmetric pair). Note the added context rule for each. The Variant contexts for all Variant Sets are LHS (left-hand-side).

Here are the LHS code point contexts that apply to the leading code points or the whole sequence (if different) in either variant in these sets:


0901

Leading

when(follows-only-V-or-C-or-N-or-M)


0905

Leading

not-when(preceded-by-H)


0906

Leading

not-when(preceded-by-H)


090D

Leading

not-when(preceded-by-H)


090F

Leading

not-when(preceded-by-H)


0911

Leading

not-when(preceded-by-H)


093E

Leading

when(follows-only-C-or-CN)


0945

Leading

when(follows-only-C-or-CN)


0945 0902

Sequence

when(follows-only-V-or-C-or-N-or-M)


0949

Leading

when(follows-only-C-or-CN)


0972

Leading

not-when(preceded-by-H)

 We see that 0945 / 0945 0902 is the only exception where the sequence has a different context applied to the singleton leading code point. This appears to be in error. It occurs in Variant set 1, where the two variants 0901 / 0945 0902 have different code point contexts from each other, with the code point context for 0901 being a superset of the context for 0945 (leading).

We conclude that the intent had been to restrict the variant to the lowest common context and therefore the variant context condition for variant set 1 as well as the code point context for sequence 0945 0902 should instead be set to when(follows-only-C-or-CN).

No sequence in any variant set contains any of the variants as subsets, therefore none are “effective null variants”.

All variant sequences end in 0901 or 0902 each of which may be followed by the same collection of code points under the WLE and context rules of the LGR. Therefore, the LHS contexts (after correction) limit the occurrence of variants to the same contexts as are permitted for the sequences.

Having a context rule on the variant that matches the code point contexts for both members of the variant set makes that restriction explicit; formally it would be redundant , as long as no permuted variant  can create an adjacent (before or after) context that would violate the code point context for the sequence after variant substitution.

For example, no variant set in the LGR results in a trailing Halant (094D) so “not-when(preceded-by-H)” isn’t a variant context that can be triggered if the code point contexts on either variant sequence are already “not-when(preceded-by-H)”. 

All the variant sets involving consonants are cross-script variants, so the variant context “when(follows-only-C-or-CN)” cannot become no longer satisfied by a variant substitution immediately prior — as long as both variants in the pair have code point contexts of “when(follows-only-C-or-CN)”.

The one exception is Variant Set 1, where the two sequences have different code point contexts. In that case, the variant context is needed and it must be set to equal the lowest common code point context for the two variants of that set.

We conclude that the variant contexts for sets 2-5 can be removed as redundant (and the code point context for one sequence as well as variant context for variant set 1 should be corrected per above).

 

  _____  

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190122/902ed0bc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LGR-Proposal_Devanagari_20181212a_IP_Review-2.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 767939 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190122/902ed0bc/LGR-Proposal_Devanagari_20181212a_IP_Review-2-0001.docx>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5026 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190122/902ed0bc/smime-0001.p7s>


More information about the Neobrahmigp mailing list