[Neobrahmigp] Singleton/Few cross-script variant code points

Fri Jul 6 13:43:29 UTC 2018

Yes, I too agree. I also felt the same yesterday when this was brought into
attention.

On Fri, Jul 6, 2018, 6:51 PM Dr. Ajay DATA <ajay at data.in> wrote:

> This is as disucssed.. we shall see all the feedback and discuss this
> today on call and finalise the view.
>
> Thank you.
>
> On 6 July 2018 15:33:38 GMT+05:30, Akshat Joshi <akshatj at cdac.in> wrote:
>>
>> Dear All,
>>
>> Here is a brief discussion about this issue.
>>
>> By and large, we have been including all the cross-script variants in the
>> cross-script variant analysis which (or any combination of which) could
>> stand as a alone valid character/character sequence.
>>
>> Recently IP has suggested that we may want to reconsider this where a
>> small number of code-points are involved as that is an indicative of very
>> small overlap between the scripts.
>>
>> There are two kinds of such cases:
>>
>>     1. Cross-script variant set made up of dependent characters *ONLY*:
>>
>>     2. Cross-script variant sets which do included non-dependent
>> characters/sequences:
>>
>> Let us take a look at each of them individually:
>>
>> *1. **Cross-script variants made up of dependent characters only:*
>>
>> This is the case as given in the Example 2 given by Pitinan:
>>
>> *Telugu ం (0C02) and Malayalam ം (0D02) are NOT variant code points. As
>> they are combining marks and cannot form variant labels. The same applies
>> or Telugu ః (0C03)and Malayalam ഃ (0D03)*
>>
>> If dependent characters (e.g. Vowel Signs, Anusvara, Visarga,
>> Chandrabindu etc) are the *ONLY* cases of cross-script variants among
>> the script involved, it is safe to assume the *NON**E* of the labels
>> created entirely of the cross-script variants would be valid ones. Hence we
>> did not include them in the cross-script variants of the script pair.
>> However, if there is even one non-dependent (e.g. Consonant, Vowel etc)
>> character as a part of the cross-script variants, then all such cases
>> should mandatorily be included in the cross-script variant table.
>>
>> *2. **Cross-script variants which do included non-dependent
>> characters/sequences:*
>>
>> This is the case as given in the Example 1 given by Pitinan:
>>
>> *Oriya ଠ (0B20) and Malayalam ഠ (0D20) are variant code points. *
>>
>> As both the code-points involved in this pair are non-dependent, even the
>> smallest instance (single code-point) i.e. ଠ (Oriya ) and ഠ (Malayalam)
>> are valid labels which look exactly alike. If we concatenate instances of
>> same variant characters with one another, we, in theory, get infinite
>> number labels as given below:
>>
>> ଠଠ - ഠഠ
>>
>> ଠଠଠ - ഠഠഠ
>>
>> ଠଠଠଠ - ഠഠഠഠ
>>
>> ଠଠଠଠଠ - ഠഠഠഠഠ
>>
>> .....
>>
>> all of these look exactly alike, belong to totally different scripts and
>> can gain independent existence if not included in the cross-script variant
>> set. This indicates that though seemingly the number of characters is few,
>> it can create a large number of labels. Important thing to note here is the
>> presence of at least one non-dependent character in the cross-script
>> variant set.
>>
>> Hence, it is proposed that:
>>
>> If, in any two given scripts, all the potential cross-script variants
>> consist of dependent (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu
>> etc) characters *ONLY*, then that entire set can be ignored and no
>> cross-script variants be proposed between those two scripts.
>>
>> If, in any two given scripts, there is *AT LEAST ONE* non-dependent (e.g.
>> Consonant, Vowel etc) cross-script variant character/sequence present,
>> all the potential cross-script variants be considered and proposed between
>> the two scripts.
>>
>> Regards,
>>
>> Akshat
>>
>> On 06-07-2018 12:20, Pitinan Kooarmornpatana wrote:
>>
>> Dear NBGP members,
>>
>>
>>
>> Kindly let me draw you attention to the issue of cross-script variant
>> code points where there is only a single code point or there are only a few
>> code points.
>>
>>
>>
>>    1.
>>
>> *Background *
>>
>> Currently NBGP proposals include all cross-script variant code points
>> which they can form well-formed cross-script variant labels without
>> considering how many cross-script variant code points there are between two
>> scripts.
>>
>>
>>
>> *Example1:* Oriya ଠ (0B20) and Malayalam ഠ (0D20) *are* variant code
>> points.
>>
>> They are consonants and they can form such ഠഠഠ (0B20 0B20 0B20) and ଠଠଠ (0D20
>> 0D20 0D20) cross-script variant labels
>>
>> Oriya
>>
>> Malayalam
>>
>> ଠ (0B20)
>>
>> ഠ (0D20)
>>
>>
>>
>> *Example2*: Telugu ం (0C02) and Malayalam ം (0D02) *are* *NOT* variant
>> code points. As they are combining marks and cannot form variant labels.
>> The same applies or Telugu ః (0C03)and Malayalam ഃ (0D03)
>>
>> Telugu
>>
>> Malayalam
>>
>> ం (0C02)
>>
>> ം (0D02)
>>
>> ః (0C03)
>>
>> ഃ (0D03)
>>
>>
>>
>>    1.
>>
>> *IP Feedback *
>>
>> With only a single consonant (or plus two combining marks) the overlap
>> between scripts appears rather limited (case of *Example 1* above) . The
>> IP would recommend dropping the variants. This feedback applies for Telugu,
>> Kannada, Sinhala, Oriya, Malayalam. However the GP decision will affect all
>> NBGP proposals.
>>
>>
>>
>> The IP suggest dropping following variant sets:
>>
>> Telugu
>>
>> Kannada
>>
>> Sinhala
>>
>> ం (0C02)
>>
>> ಂ (0C82)
>>
>> ං (0D82)
>>
>> ః (0C03)
>>
>> ಃ (0C83)
>>
>> ඃ (0D83)
>>
>> ర (0C30)
>>
>> ರ (0CB0)
>>
>> ර (0DBB)
>>
>>
>>
>> Oriya
>>
>> Malayalam
>>
>> ଠ (0B20)
>>
>> ഠ (0D20)
>>
>>
>>
>>    1.
>>
>> *OPTIONS *
>>
>> *OPTION 1: *Do nothing.
>>
>> *OPTION 2: *Drop the suggested variant sets.
>>
>>
>>
>> Both options are valid. The final decision depends on NBGP. Whichever
>> option selected, the proposals will be published for public comment period
>> for 40 days. The community and experts will also have a chance to make a
>> comment there. After the public comment period has ended. NBGP will
>> consider all feedback and finalize proposals accordingly.
>>
>>
>>
>>
>>
>> We’d like to request the NBGP to consider this issue prior to the
>> NBGP-Sinhala call this evening and let’s aim to finalize the option during
>> the call.
>>
>>
>>
>> Regards,
>>
>> Pitinan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Neobrahmigp mailing listNeobrahmigp at icann.orghttps://mm.icann.org/mailman/listinfo/neobrahmigp
>>
>>
>> --
>> Regards,
>> Akshat Joshi
>> C-DAC GIST
>>
>>
>> -------------------------------------------------------------------------------------------------------------------------------
>>
>> [ C-DAC is on Social-Media too. Kindly follow us at:
>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> -------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> ------------------------------
>>
>> Neobrahmigp mailing list
>> Neobrahmigp at icann.org
>> https://mm.icann.org/mailman/listinfo/neobrahmigp
>>
>>
> --
> Sent from my Android device with XGenPlus.
> _______________________________________________
> Neobrahmigp mailing list
> Neobrahmigp at icann.org
> https://mm.icann.org/mailman/listinfo/neobrahmigp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180706/7f427df9/attachment-0001.html>