[Neobrahmigp] Tamil LGR v2.10 - 20190220

Sarmad Hussain sarmad.hussain at icann.org
Mon Mar 4 03:24:03 UTC 2019


Dear Shanmugam and NBGP members,

IP has suggested some minor edits to consider before final submission of the proposal, listed in their message below.  

As next steps, once these edits are incorporated, the proposal will be published at the proposals’ webpage <https://www.icann.org/resources/pages/lgr-proposals-2015-12-01-en>  as the final version after public comment, and then IP will undertake its evaluation.

Regards,
Sarmad

  _____  

TO: NeoBrahmi Generation Panel
From: Integration Panel

We reviewed the Tamil proposal dated 2019-02-20 and found a number of places where the text could be edited for further clarity and more consistency across the Root Zone LGR. Some of these suggestions have higher priority than others, but none are absolutely required. We leave that decision to the GP, but note that we plan to edit the version of the XML to be published with LGR-3 in line with the suggestions here.

There are two items that are MUST FIX.

- Integration Panel

DETAILED FEEDBACK

DOCX


In reviewing the new text it was noted that the LGR uses commas (,) to separate code points in sequences. This is unfortunate, because that use implies that the code points are a list of individual code points.

(1) please remove commas between code points that are part of the same sequence (many places in the document)

(2) please use smaller point size for some cases so that sequences do not break into two lines (e.g. table 18)

(3) in table 24 change the header cell "Code points which cannot co-occur within a label" to "Sequences which cannot co-occur within a label" ; The same change is required in the table name.

(4) at the start of Section 7.1, there are some characters ("hri") that should be part of the link to section 6.1.3. Something went wrong.

(5) MUST FIX:  Section 6.2: 

In both the paragraph and Table 19, the code point value for TAMIL LETTER AI should be 0B90 and not 0B9C.

XML

(1) The XML is missing some comments for the new rule and action.

A comment should be added to the <rule> element as follows:

     <rule name="no-mix-sri-shri" comment="Section 7: WLE 4: Two 
        representations of 'Shri' cannot be mixed in the same label" >

A comment should be added to the <action> element as follows:

    <action disp="invalid" match="no-mix-sri-shri" comment="do not 
        mix two representations of 'Shri' in the same label" />

(Note: the phrasing of the comment text is based on the model of the Arabic LGR, ensuring consistency)

 

(2) In reviewing the XML section on "Character Classes" we noted some additional small edits:

<p>Virama: All consonants contain an implicit vowel (a). A special
sign is needed to denote that this implicit vowel is stripped off. This is known
as the Pulli and encoded as U+0BCD ( ் ) TAMIL SIGN VIRAMA. The virama thus 
joins two adjacent consonants. In Tamil, thereare only two cases where this 
forms conjuncts. More details in Section 3.3.2, "Virama/Pulli" of the [Proposal].</p

and

<p>Visarga: The Visarga (or Aytham) is  used in Tamil to represent a sound very close to /ḵ/.
More details in Section 3.3.4, "Visarga/Aytham" of the [Proposal].</p>

This removes double names so the items in front of the ":" match the name of the character class as promised at the head of the section. It also fixes two typos (extraneous "2" and misspelled "adjacent")

 

Similar edits have been made in other LGRs in the list of character classes, so this would increase consistency.

 

(3) MUST FIX: 

The rule "preceded-by-X" uses look-ahead to match a Visarga, so it should be named "precedes-X" instead. 

 

The naming issue arises because the way the rule is defined in Section 7 does not translate directly to a context rule.

 

Section 7, WLE 3 , X cannot be preceded by X

 

the context should be "precedes-X" and the rule is applied to X as a not-when="precedes-X". The convention is that context rules are named for the context, so that a reader can correctly understand the following line in the XML:

 

<char cp="0B83" not-when="precedes-X" ...> 

 

as meaning X (08B3) cannot precede X.

------

Alternatively, it would be possible to change the rule to use  <look-behind>  and naming it "follows-X". This leads to the equivalent:

 

<char cp="0B83" not-when="follows-X" ...>

 

However, in that case, the rule also has to be changed to move the location of the <anchor/> element to after the <look-behind> element.

 

In the first alternative the name of the context is closer to the wording of the rule in Section 7, in the second alternative, all contexts in the LGR would consistently use <look-behind>. The choice between these alternatives is a matter of preference left to the GP.

 

  _____  





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190304/d55e255b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5026 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20190304/d55e255b/smime-0001.p7s>


More information about the Neobrahmigp mailing list