[Neobrahmigp] Singleton/Few cross-script variant code points

pavanaja at vishvakannada.com pavanaja at vishvakannada.com
Fri Jul 6 12:38:56 UTC 2018


I too agree

 

Regards,

Pavanaja

 

 

From: Neobrahmigp <neobrahmigp-bounces at icann.org> On Behalf Of Harish Chowdhary
Sent: Friday, July 6, 2018 3:58 PM
To: unsciil51 at gmail.com
Cc: sinhalagp at icann.org; neobrahmigp at icann.org
Subject: Re: [Neobrahmigp] Singleton/Few cross-script variant code points

 

+1

Thanks,
Harish Chowdhary,
Technology Analyst,
National Internet Exchange of India
ISOC FELLOW | inSIG FELLOW
IIREF FELLOW | UASG AMBASSADOR
www.nixi.in <http://www.nixi.in>  | www.indiaig.in <http://www.indiaig.in>  | registry.in <http://registry.in> 


From: Udaya Narayana Singh <unsciil51 at gmail.com <mailto:unsciil51 at gmail.com> >
Sent: Fri, 6 Jul 2018 15:53:32 GMT+0530
To: "neobrahmigp at icann.org <mailto:neobrahmigp at icann.org> " <neobrahmigp at icann.org <mailto:neobrahmigp at icann.org> >, Akshat Joshi <akshatj at cdac.in <mailto:akshatj at cdac.in> >
Cc: "sinhalagp at icann.org <mailto:sinhalagp at icann.org> " <sinhalagp at icann.org <mailto:sinhalagp at icann.org> >
Subject: Re: [Neobrahmigp] Singleton/Few cross-script variant code points
 

I fully agree with the very pertinent observations given by Akshat here and with the solutions he has provided. I think this brings in a great deal of clarity in the issue. Regards,

 

Prof Udaya Narayana Singh

Chair-Professor, ACLiS

Amity University Haryana

Pachgaon-Manesar, Dt Gurgaon 

PIN 122413

Cell 9434050218; 9830132234

 

 

 

On Friday, 6 July, 2018, 3:34:41 PM IST, Akshat Joshi <akshatj at cdac.in <mailto:akshatj at cdac.in> > wrote:

 

 

Dear All,

Here is a brief discussion about this issue.

By and large, we have been including all the cross-script variants in the cross-script variant analysis which (or any combination of which) could stand as a alone valid character/character sequence.

Recently IP has suggested that we may want to reconsider this where a small number of code-points are involved as that is an indicative of very small overlap between the scripts. 

There are two kinds of such cases:

    1. Cross-script variant set made up of dependent characters ONLY:

    2. Cross-script variant sets which do included non-dependent characters/sequences:

Let us take a look at each of them individually:

1. Cross-script variants made up of dependent characters only:

This is the case as given in the Example 2 given by Pitinan:

Telugu ం (0C02) and Malayalam ം (0D02) are NOT variant code points. As they are combining marks and cannot form variant labels. The same applies or Telugu ః (0C03)and Malayalam ഃ (0D03)

If dependent characters (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu etc) are the ONLY cases of cross-script variants among the script involved, it is safe to assume the NONE of the labels created entirely of the cross-script variants would be valid ones. Hence we did not include them in the cross-script variants of the script pair. However, if there is even one non-dependent (e.g. Consonant, Vowel etc) character as a part of the cross-script variants, then all such cases should mandatorily be included in the cross-script variant table.

2. Cross-script variants which do included non-dependent characters/sequences:

This is the case as given in the Example 1 given by Pitinan:

Oriya ଠ (0B20) and Malayalam ഠ (0D20) are variant code points. 

As both the code-points involved in this pair are non-dependent, even the smallest instance (single code-point) i.e. ଠ (Oriya ) and ഠ (Malayalam) are valid labels which look exactly alike. If we concatenate instances of same variant characters with one another, we, in theory, get infinite number labels as given below:

ଠଠ - ഠഠ

ଠଠଠ - ഠഠഠ

ଠଠଠଠ - ഠഠഠഠ

ଠଠଠଠଠ - ഠഠഠഠഠ

.....

all of these look exactly alike, belong to totally different scripts and can gain independent existence if not included in the cross-script variant set. This indicates that though seemingly the number of characters is few, it can create a large number of labels. Important thing to note here is the presence of at least one non-dependent character in the cross-script variant set. 

Hence, it is proposed that:

If, in any two given scripts, all the potential cross-script variants consist of dependent (e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu etc) characters ONLY, then that entire set can be ignored and no cross-script variants be proposed between those two scripts. 

If, in any two given scripts, there is AT LEAST ONE non-dependent (e.g. Consonant, Vowel etc) cross-script variant character/sequence present, all the potential cross-script variants be considered and proposed between the two scripts.

Regards,

Akshat

  

On 06-07-2018 12:20, Pitinan Kooarmornpatana wrote:

 

 

Dear NBGP members, 

 

Kindly let me draw you attention to the issue of cross-script variant code points where there is only a single code point or there are only a few code points.

 

1.     Background

Currently NBGP proposals include all cross-script variant code points which they can form well-formed cross-script variant labels without considering how many cross-script variant code points there are between two scripts. 

 

Example1: Oriya ଠ (0B20) and Malayalam ഠ (0D20) are variant code points. 

They are consonants and they can form such ഠഠഠ (0B20 0B20 0B20) and ଠଠଠ (0D20 0D20 0D20) cross-script variant labels


Oriya

Malayalam


ଠ (0B20)

ഠ (0D20)

 

Example2: Telugu ం (0C02) and Malayalam ം (0D02) are NOT variant code points. As they are combining marks and cannot form variant labels. The same applies or Telugu ః (0C03)and Malayalam ഃ (0D03)


Telugu 

Malayalam 


ం (0C02)

ം (0D02)


ః (0C03)

ഃ (0D03)

 

2.     IP Feedback

With only a single consonant (or plus two combining marks) the overlap between scripts appears rather limited (case of Example 1 above) . The IP would recommend dropping the variants. This feedback applies for Telugu, Kannada, Sinhala, Oriya, Malayalam. However the GP decision will affect all NBGP proposals. 

 

The IP suggest dropping following variant sets: 


Telugu

Kannada

Sinhala


ం (0C02)

ಂ (0C82)

ං (0D82)


ః (0C03)

ಃ (0C83)

ඃ (0D83)


ర (0C30)

ರ (0CB0)

ර (0DBB)

 


Oriya

Malayalam


ଠ (0B20)

ഠ (0D20)

 

3.     OPTIONS

OPTION 1: Do nothing. 

OPTION 2: Drop the suggested variant sets.  

 

Both options are valid. The final decision depends on NBGP. Whichever option selected, the proposals will be published for public comment period for 40 days. The community and experts will also have a chance to make a comment there. After the public comment period has ended. NBGP will consider all feedback and finalize proposals accordingly. 

 

 

We’d like to request the NBGP to consider this issue prior to the NBGP-Sinhala call this evening and let’s aim to finalize the option during the call. 

 

Regards,

Pitinan

 

 

 

 

 

     

_______________________________________________
Neobrahmigp mailing list
Neobrahmigp at icann.org <mailto:Neobrahmigp at icann.org> 
https://mm.icann.org/mailman/listinfo/neobrahmigp

  

-- 
Regards,
Akshat Joshi
C-DAC GIST


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

_______________________________________________
Neobrahmigp mailing list
Neobrahmigp at icann.org <mailto:Neobrahmigp at icann.org> 
https://mm.icann.org/mailman/listinfo/neobrahmigp

_______________________________________________
Neobrahmigp mailing list
Neobrahmigp at icann.org <mailto:Neobrahmigp at icann.org> 
https://mm.icann.org/mailman/listinfo/neobrahmigp <file://prolinks.rediffmailpro.com/cgi-bin/prored.cgi%3fred=https%253A%252F%252Fmm%252Eicann%252Eorg%252Fmailman%252Flistinfo%252Fneobrahmigp&rediffng=0> 

-------------------------------------------------------------------------------------------------------------------------------
[NIXI is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/nixiindia & Twitter: @inregistry ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180706/4f7290af/attachment-0001.html>


More information about the Neobrahmigp mailing list