[Neobrahmigp] Singleton/Few cross-script variant code points

Dr. Shanmugam Rajabadher shanfaace at gmail.com
Fri Jul 6 13:54:15 UTC 2018


Dear all,
I am happy that this has been considered and accepted by many.

Thanks and regards,
Dr.Shanmugam.

On Fri 6 Jul, 2018, 3:34 PM Akshat Joshi, <akshatj at cdac.in> wrote:

> Dear All,
>
> Here is a brief discussion about this issue.
>
> By and large, we have been including all the cross-script variants in the
> cross-script variant analysis which (or any combination of which) could
> stand as a alone valid character/character sequence.
>
> Recently IP has suggested that we may want to reconsider this where a
> small number of code-points are involved as that is an indicative of very
> small overlap between the scripts.
>
> There are two kinds of such cases:
>
>     1. Cross-script variant set made up of dependent characters *ONLY*:
>
>     2. Cross-script variant sets which do included non-dependent
> characters/sequences:
>
> Let us take a look at each of them individually:
>
> *1. **Cross-script variants made up of dependent characters only:*
>
> This is the case as given in the Example 2 given by Pitinan:
>
> *Telugu ం (0C02) and Malayalam ം (0D02) are NOT variant code points. As
> they are combining marks and cannot form variant labels. The same applies
> or Telugu ః (0C03)and Malayalam ഃ (0D03)*
>
> If dependent characters (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu
> etc) are the *ONLY* cases of cross-script variants among the script
> involved, it is safe to assume the *NON**E* of the labels created
> entirely of the cross-script variants would be valid ones. Hence we did not
> include them in the cross-script variants of the script pair. However, if
> there is even one non-dependent (e.g. Consonant, Vowel etc) character as a
> part of the cross-script variants, then all such cases should mandatorily
> be included in the cross-script variant table.
>
> *2. **Cross-script variants which do included non-dependent
> characters/sequences:*
>
> This is the case as given in the Example 1 given by Pitinan:
>
> *Oriya ଠ (0B20) and Malayalam ഠ (0D20) are variant code points. *
>
> As both the code-points involved in this pair are non-dependent, even the
> smallest instance (single code-point) i.e. ଠ (Oriya ) and ഠ (Malayalam)
> are valid labels which look exactly alike. If we concatenate instances of
> same variant characters with one another, we, in theory, get infinite
> number labels as given below:
>
> ଠଠ - ഠഠ
>
> ଠଠଠ - ഠഠഠ
>
> ଠଠଠଠ - ഠഠഠഠ
>
> ଠଠଠଠଠ - ഠഠഠഠഠ
>
> .....
>
> all of these look exactly alike, belong to totally different scripts and
> can gain independent existence if not included in the cross-script variant
> set. This indicates that though seemingly the number of characters is few,
> it can create a large number of labels. Important thing to note here is the
> presence of at least one non-dependent character in the cross-script
> variant set.
>
> Hence, it is proposed that:
>
> If, in any two given scripts, all the potential cross-script variants
> consist of dependent (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu
> etc) characters *ONLY*, then that entire set can be ignored and no
> cross-script variants be proposed between those two scripts.
>
> If, in any two given scripts, there is *AT LEAST ONE* non-dependent (e.g.
> Consonant, Vowel etc) cross-script variant character/sequence present,
> all the potential cross-script variants be considered and proposed between
> the two scripts.
>
> Regards,
>
> Akshat
>
> On 06-07-2018 12:20, Pitinan Kooarmornpatana wrote:
>
> Dear NBGP members,
>
>
>
> Kindly let me draw you attention to the issue of cross-script variant code
> points where there is only a single code point or there are only a few code
> points.
>
>
>
>    1. *Background*
>
> Currently NBGP proposals include all cross-script variant code points
> which they can form well-formed cross-script variant labels without
> considering how many cross-script variant code points there are between two
> scripts.
>
>
>
> *Example1:* Oriya ଠ (0B20) and Malayalam ഠ (0D20) *are* variant code
> points.
>
> They are consonants and they can form such ഠഠഠ (0B20 0B20 0B20) and ଠଠଠ (0D20
> 0D20 0D20) cross-script variant labels
>
> Oriya
>
> Malayalam
>
> ଠ (0B20)
>
> ഠ (0D20)
>
>
>
> *Example2*: Telugu ం (0C02) and Malayalam ം (0D02) *are* *NOT* variant
> code points. As they are combining marks and cannot form variant labels.
> The same applies or Telugu ః (0C03)and Malayalam ഃ (0D03)
>
> Telugu
>
> Malayalam
>
> ం (0C02)
>
> ം (0D02)
>
> ః (0C03)
>
> ഃ (0D03)
>
>
>
>    1. *IP Feedback*
>
> With only a single consonant (or plus two combining marks) the overlap
> between scripts appears rather limited (case of *Example 1* above) . The
> IP would recommend dropping the variants. This feedback applies for Telugu,
> Kannada, Sinhala, Oriya, Malayalam. However the GP decision will affect all
> NBGP proposals.
>
>
>
> The IP suggest dropping following variant sets:
>
> Telugu
>
> Kannada
>
> Sinhala
>
> ం (0C02)
>
> ಂ (0C82)
>
> ං (0D82)
>
> ః (0C03)
>
> ಃ (0C83)
>
> ඃ (0D83)
>
> ర (0C30)
>
> ರ (0CB0)
>
> ර (0DBB)
>
>
>
> Oriya
>
> Malayalam
>
> ଠ (0B20)
>
> ഠ (0D20)
>
>
>
>    1. *OPTIONS*
>
> *OPTION 1: *Do nothing.
>
> *OPTION 2: *Drop the suggested variant sets.
>
>
>
> Both options are valid. The final decision depends on NBGP. Whichever
> option selected, the proposals will be published for public comment period
> for 40 days. The community and experts will also have a chance to make a
> comment there. After the public comment period has ended. NBGP will
> consider all feedback and finalize proposals accordingly.
>
>
>
>
>
> We’d like to request the NBGP to consider this issue prior to the
> NBGP-Sinhala call this evening and let’s aim to finalize the option during
> the call.
>
>
>
> Regards,
>
> Pitinan
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Neobrahmigp mailing listNeobrahmigp at icann.orghttps://mm.icann.org/mailman/listinfo/neobrahmigp
>
>
> --
> Regards,
> Akshat Joshi
> C-DAC GIST
>
>
> -------------------------------------------------------------------------------------------------------------------------------
>
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> -------------------------------------------------------------------------------------------------------------------------------
>
> _______________________________________________
> Neobrahmigp mailing list
> Neobrahmigp at icann.org
> https://mm.icann.org/mailman/listinfo/neobrahmigp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180706/f4547016/attachment-0001.html>


More information about the Neobrahmigp mailing list