[Neobrahmigp] Singleton/Few cross-script variant code points
Dr. Ajay DATA
ajay at data.in
Fri Jul 6 13:20:19 UTC 2018
This is as disucssed.. we shall see all the feedback and discuss this today on call and finalise the view.
Thank you.
On 6 July 2018 15:33:38 GMT+05:30, Akshat Joshi <akshatj at cdac.in> wrote:
>Dear All,
>
>Here is a brief discussion about this issue.
>
>By and large, we have been including all the cross-script variants in
>the cross-script variant analysis which (or any combination of which)
>could stand as a alone valid character/character sequence.
>
>Recently IP has suggested that we may want to reconsider this where a
>small number of code-points are involved as that is an indicative of
>very small overlap between the scripts.
>
>There are two kinds of such cases:
>
> 1. Cross-script variant set made up of dependent characters *ONLY*:
>
> 2. Cross-script variant sets which do included non-dependent
>characters/sequences:
>
>Let us take a look at each of them individually:
>
>*1. **Cross-script variants made up of dependent characters only:*
>
>Thisis the case as given in the Example 2 given by Pitinan:
>
>/Telugu ం (0C02) and Malayalam ം (0D02) are NOT variant code points. As
>
>they are combining marks and cannot form variant labels. The same
>applies or Telugu ః (0C03)and Malayalam ഃ (0D03)/
>
>If dependent characters (e.g. Vowel Signs, Anusvara, Visarga,
>Chandrabindu etc) are the *ONLY* cases of cross-script variants among
>the script involved, it is safe to assume the *NON**E* of the labels
>created entirely of the cross-script variants would be valid ones.
>Hence
>we did not include them in the cross-script variants of the script
>pair.
>However, if there is even one non-dependent (e.g. Consonant, Vowel etc)
>
>character as a part of the cross-script variants, then all such cases
>should mandatorily be included in the cross-script variant table.
>
>*2. **Cross-script variants which do included non-dependent
>characters/sequences:*
>
>This is the case as given in the Example 1 given by Pitinan:
>
>/Oriya ଠ (0B20) and Malayalam ഠ (0D20) are variant code points. /
>
>As both the code-points involved in this pair are non-dependent, even
>the smallest instance (single code-point) i.e. ଠ (Oriya ) and ഠ
>(Malayalam) are valid labels which look exactly alike. If we
>concatenate
>instances of same variant characters with one another, we, in theory,
>get infinite number labels as given below:
>
>ଠଠ - ഠഠ
>
>ଠଠଠ - ഠഠഠ
>
>ଠଠଠଠ - ഠഠഠഠ
>
>ଠଠଠଠଠ - ഠഠഠഠഠ
>
>......
>
>all of these look exactly alike, belong to totally different scripts
>and
>can gain independent existence if not included in the cross-script
>variant set. This indicates that though seemingly the number of
>characters is few, it can create a large number of labels. Important
>thing to note here is the presence of at least one non-dependent
>character in the cross-script variant set.
>
>Hence, it is proposed that:
>
>If, in any two given scripts, all the potential cross-script variants
>consist of dependent (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu
>
>etc) characters *ONLY*, then that entire set can be ignored and no
>cross-script variants be proposed between those two scripts.
>
>If, in any two given scripts, there is *AT LEAST ONE* non-dependent
>(e.g. Consonant, Vowel etc) cross-script variant character/sequence
>present, all the potential cross-script variants be considered and
>proposed between the two scripts.
>
>Regards,
>
>Akshat
>
>
>On 06-07-2018 12:20, Pitinan Kooarmornpatana wrote:
>>
>> Dear NBGP members,
>>
>> Kindly let me draw you attention to the issue of cross-script variant
>
>> code points where there is only a single code point or there are only
>
>> a few code points.
>>
>> *//*
>>
>> 1. *Background*
>>
>> Currently NBGP proposals include all cross-script variant code points
>
>> which they can form well-formed cross-script variant labels without
>> considering how many cross-script variant code points there are
>> between two scripts.
>>
>> /Example1:/Oriya ଠ(0B20) and Malayalam ഠ(0D20) *are* variant code
>points.
>>
>> They are consonants and they can form such ഠഠഠ(0B20 0B20 0B20) and
>> ଠଠଠ(0D20 0D20 0D20)cross-script variant labels
>>
>> Oriya
>>
>>
>>
>> Malayalam
>>
>> ଠ(0B20)
>>
>>
>>
>> ഠ(0D20)
>>
>> /Example2/: Telugu ం(0C02) and Malayalam ം(0D02) *are* *NOT* variant
>> code points. As they are combining marks and cannot form variant
>> labels. The same applies or Telugu ః(0C03)and Malayalam ഃ(0D03)
>>
>> Telugu
>>
>>
>>
>> Malayalam
>>
>> ం(0C02)
>>
>>
>>
>> ം(0D02)
>>
>> ః(0C03)
>>
>>
>>
>> ഃ(0D03)
>>
>> 2. *IP Feedback*
>>
>> With only a single consonant (or plus two combining marks) the
>overlap
>> between scripts appears rather limited (case of /Example 1/ above) .
>> The IP would recommend dropping the variants. This feedback applies
>> for Telugu, Kannada, Sinhala, Oriya, Malayalam. However the GP
>> decision will affect all NBGP proposals.
>>
>> The IP suggest dropping following variant sets:
>>
>> Telugu
>>
>>
>>
>> Kannada
>>
>>
>>
>> Sinhala
>>
>> ం(0C02)
>>
>>
>>
>> ಂ(0C82)
>>
>>
>>
>> ං(0D82)
>>
>> ః(0C03)
>>
>>
>>
>> ಃ(0C83)
>>
>>
>>
>> ඃ(0D83)
>>
>> ర(0C30)
>>
>>
>>
>> ರ(0CB0)
>>
>>
>>
>> ර(0DBB)
>>
>> Oriya
>>
>>
>>
>> Malayalam
>>
>> ଠ(0B20)
>>
>>
>>
>> ഠ(0D20)
>>
>> 3. *OPTIONS*
>>
>> *OPTION 1: *Do nothing.
>>
>> *OPTION 2: *Drop the suggested variant sets.
>>
>> Both options are valid. The final decision depends on NBGP. Whichever
>
>> option selected, the proposals will be published for public comment
>> period for 40 days. The community and experts will also have a chance
>
>> to make a comment there. After the public comment period has ended.
>> NBGP will consider all feedback and finalize proposals accordingly.
>>
>> We’d like to request the NBGP to consider this issue prior to the
>> NBGP-Sinhala call this evening and let’s aim to finalize the option
>> during the call.
>>
>> Regards,
>>
>> Pitinan
>>
>>
>>
>> _______________________________________________ Neobrahmigp mailing
>> list Neobrahmigp at icann.org
>> https://mm.icann.org/mailman/listinfo/neobrahmigp
>
>-- Regards, Akshat Joshi C-DAC GIST
>
>
>-------------------------------------------------------------------------------------------------------------------------------
>[ C-DAC is on Social-Media too. Kindly follow us at:
>Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
>This e-mail is for the sole use of the intended recipient(s) and may
>contain confidential and privileged information. If you are not the
>intended recipient, please contact the sender by reply e-mail and
>destroy
>all copies and the original message. Any unauthorized review, use,
>disclosure, dissemination, forwarding, printing or copying of this
>email
>is strictly prohibited and appropriate legal action will be taken.
>-------------------------------------------------------------------------------------------------------------------------------
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Neobrahmigp mailing list
>Neobrahmigp at icann.org
>https://mm.icann.org/mailman/listinfo/neobrahmigp
--
Sent from my Android device with XGenPlus.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180706/ff58b0ea/attachment-0001.html>
More information about the Neobrahmigp
mailing list