[Neobrahmigp] observation on valid invalid Bnegali LGR testing

atiurk atiurk at cdac.in
Fri May 18 09:07:32 UTC 2018


 Dear all,
Please find attached the observation procured from the test results of proposed
WLE rules for Bengali LGR.


Observations on the testing of Proposed Bengali LGR Rules on corpus.



1. Halant+C+halanta+C ( here halanta+ra one after the other)

যোগেন্দ্র্র: valid

As there is no blocking of number of consonant after a halanta this type of
labels have been tagged as VALID.



2.

যোগ্য-করা: invalid

যো-হুকুমের: invalid

যদু-বংশ: invalid

All hypenated labels are tagged as INVALID as hyphens not included in the MSR,
i.e. are ineligible for the root zone (digits, hyphen)



3. Nukta character (joint vs. exploded)

যোগমায়া: invalid

যথাসময়ে: invalid

যথানিয়মে: invalid

যাবতীয়: invalid

য় (YA) (U+09DF) as a consolidated character is not included in the MSR.
Moreover, to have a label tagged as VALID one has to type য + nukta (U+09BC) to
form য় (YA). Otherwise the labels with directly typed য় (U+09DF) will be tagged
as INVALID in all tests.

Accommodated in WLE rules (see section 7.1, Rule no. 1(h) in proposed Bengali
LGR).

4.

খানা৩:

Tagged as INVALID as numbers are not included in the MSR

5.

যাওয়া/হাত: invalid

All punctuated labels are tagged as INVALID  as hyphens not included in the MSR,
i.e. are ineligible for the root zone (digits, hyphen). This particular label
has another possible reason for INVALID i.e. the য় (YA) issue mentioned in no. 3
above.




REGARDS
Dr. Atiur Rahman Khan
Principal Technical Officer (PTO)
NLP Research Lab, GIST Group
5th Floor, CDAC Innovation Park / 5 वीं मंजिल, सी-डैक इनोवैशन पार्क
Panchavati, Pashan, Pune / पंचवटी, पाषाण, पुणे
Pin: 411 008 / पिन: 411 008
-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180518/df3295a2/attachment.html>


More information about the Neobrahmigp mailing list