[Neobrahmigp] Singleton/Few cross-script variant code points

Dr. Ajay DATA ajay at data.in
Fri Jul 6 13:20:19 UTC 2018


This is as disucssed.. we shall see all the feedback and  discuss this today on call and finalise the view.

Thank you.

On 6 July 2018 15:33:38 GMT+05:30, Akshat Joshi <akshatj at cdac.in> wrote:
>Dear All,
>
>Here is a brief discussion about this issue.
>
>By and large, we have been including all the cross-script variants in 
>the cross-script variant analysis which (or any combination of which) 
>could stand as a alone valid character/character sequence.
>
>Recently IP has suggested that we may want to reconsider this where a 
>small number of code-points are involved as that is an indicative of 
>very small overlap between the scripts.
>
>There are two kinds of such cases:
>
>    1. Cross-script variant set made up of dependent characters *ONLY*:
>
>     2. Cross-script variant sets which do included non-dependent 
>characters/sequences:
>
>Let us take a look at each of them individually:
>
>*1. **Cross-script variants made up of dependent characters only:*
>
>Thisis the case as given in the Example 2 given by Pitinan:
>
>/Telugu ం (0C02) and Malayalam ം (0D02) are NOT variant code points. As
>
>they are combining marks and cannot form variant labels. The same 
>applies or Telugu ః (0C03)and Malayalam ഃ (0D03)/
>
>If dependent characters (e.g. Vowel Signs, Anusvara, Visarga, 
>Chandrabindu etc) are the *ONLY* cases of cross-script variants among 
>the script involved, it is safe to assume the *NON**E* of the labels 
>created entirely of the cross-script variants would be valid ones.
>Hence 
>we did not include them in the cross-script variants of the script
>pair. 
>However, if there is even one non-dependent (e.g. Consonant, Vowel etc)
>
>character as a part of the cross-script variants, then all such cases 
>should mandatorily be included in the cross-script variant table.
>
>*2. **Cross-script variants which do included non-dependent 
>characters/sequences:*
>
>This is the case as given in the Example 1 given by Pitinan:
>
>/Oriya ଠ (0B20) and Malayalam ഠ (0D20) are variant code points. /
>
>As both the code-points involved in this pair are non-dependent, even 
>the smallest instance (single code-point) i.e. ଠ (Oriya ) and ഠ 
>(Malayalam) are valid labels which look exactly alike. If we
>concatenate 
>instances of same variant characters with one another, we, in theory, 
>get infinite number labels as given below:
>
>ଠଠ - ഠഠ
>
>ଠଠଠ - ഠഠഠ
>
>ଠଠଠଠ - ഠഠഠഠ
>
>ଠଠଠଠଠ - ഠഠഠഠഠ
>
>......
>
>all of these look exactly alike, belong to totally different scripts
>and 
>can gain independent existence if not included in the cross-script 
>variant set. This indicates that though seemingly the number of 
>characters is few, it can create a large number of labels. Important 
>thing to note here is the presence of at least one non-dependent 
>character in the cross-script variant set.
>
>Hence, it is proposed that:
>
>If, in any two given scripts, all the potential cross-script variants 
>consist of dependent (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu
>
>etc) characters *ONLY*, then that entire set can be ignored and no 
>cross-script variants be proposed between those two scripts.
>
>If, in any two given scripts, there is *AT LEAST ONE* non-dependent 
>(e.g. Consonant, Vowel etc) cross-script variant character/sequence 
>present, all the potential cross-script variants be considered and 
>proposed between the two scripts.
>
>Regards,
>
>Akshat
>
>
>On 06-07-2018 12:20, Pitinan Kooarmornpatana wrote:
>>
>> Dear NBGP members,
>>
>> Kindly let me draw you attention to the issue of cross-script variant
>
>> code points where there is only a single code point or there are only
>
>> a few code points.
>>
>> *//*
>>
>>  1. *Background*
>>
>> Currently NBGP proposals include all cross-script variant code points
>
>> which they can form well-formed cross-script variant labels without 
>> considering how many cross-script variant code points there are 
>> between two scripts.
>>
>> /Example1:/Oriya ଠ(0B20) and Malayalam ഠ(0D20) *are* variant code
>points.
>>
>> They are consonants and they can form such ഠഠഠ(0B20 0B20 0B20) and 
>> ଠଠଠ(0D20 0D20 0D20)cross-script variant labels
>>
>> Oriya
>>
>> 	
>>
>> Malayalam
>>
>> ଠ(0B20)
>>
>> 	
>>
>> ഠ(0D20)
>>
>> /Example2/: Telugu ం(0C02) and Malayalam ം(0D02) *are* *NOT* variant 
>> code points. As they are combining marks and cannot form variant 
>> labels. The same applies or Telugu ః(0C03)and Malayalam ഃ(0D03)
>>
>> Telugu
>>
>> 	
>>
>> Malayalam
>>
>> ం(0C02)
>>
>> 	
>>
>> ം(0D02)
>>
>> ః(0C03)
>>
>> 	
>>
>> ഃ(0D03)
>>
>>  2. *IP Feedback*
>>
>> With only a single consonant (or plus two combining marks) the
>overlap 
>> between scripts appears rather limited (case of /Example 1/ above) . 
>> The IP would recommend dropping the variants. This feedback applies 
>> for Telugu, Kannada, Sinhala, Oriya, Malayalam. However the GP 
>> decision will affect all NBGP proposals.
>>
>> The IP suggest dropping following variant sets:
>>
>> Telugu
>>
>> 	
>>
>> Kannada
>>
>> 	
>>
>> Sinhala
>>
>> ం(0C02)
>>
>> 	
>>
>> ಂ(0C82)
>>
>> 	
>>
>> ං(0D82)
>>
>> ః(0C03)
>>
>> 	
>>
>> ಃ(0C83)
>>
>> 	
>>
>> ඃ(0D83)
>>
>> ర(0C30)
>>
>> 	
>>
>> ರ(0CB0)
>>
>> 	
>>
>> ර(0DBB)
>>
>> Oriya
>>
>> 	
>>
>> Malayalam
>>
>> ଠ(0B20)
>>
>> 	
>>
>> ഠ(0D20)
>>
>>  3. *OPTIONS*
>>
>> *OPTION 1: *Do nothing.
>>
>> *OPTION 2: *Drop the suggested variant sets.
>>
>> Both options are valid. The final decision depends on NBGP. Whichever
>
>> option selected, the proposals will be published for public comment 
>> period for 40 days. The community and experts will also have a chance
>
>> to make a comment there. After the public comment period has ended. 
>> NBGP will consider all feedback and finalize proposals accordingly.
>>
>> We’d like to request the NBGP to consider this issue prior to the 
>> NBGP-Sinhala call this evening and let’s aim to finalize the option 
>> during the call.
>>
>> Regards,
>>
>> Pitinan
>>
>>
>>
>> _______________________________________________ Neobrahmigp mailing 
>> list Neobrahmigp at icann.org 
>> https://mm.icann.org/mailman/listinfo/neobrahmigp
>
>-- Regards, Akshat Joshi C-DAC GIST
>
>
>-------------------------------------------------------------------------------------------------------------------------------
>[ C-DAC is on Social-Media too. Kindly follow us at:
>Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
>This e-mail is for the sole use of the intended recipient(s) and may
>contain confidential and privileged information. If you are not the
>intended recipient, please contact the sender by reply e-mail and
>destroy
>all copies and the original message. Any unauthorized review, use,
>disclosure, dissemination, forwarding, printing or copying of this
>email
>is strictly prohibited and appropriate legal action will be taken.
>-------------------------------------------------------------------------------------------------------------------------------
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Neobrahmigp mailing list
>Neobrahmigp at icann.org
>https://mm.icann.org/mailman/listinfo/neobrahmigp

-- 
Sent from my Android device with XGenPlus.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180706/ff58b0ea/attachment-0001.html>


More information about the Neobrahmigp mailing list