[Neobrahmigp] observation on valid invalid Bnegali LGR testing
atiurk
atiurk at cdac.in
Fri May 18 09:07:32 UTC 2018
Dear all,
Please find attached the observation procured from the test results of proposed
WLE rules for Bengali LGR.
Observations on the testing of Proposed Bengali LGR Rules on corpus.
1. Halant+C+halanta+C ( here halanta+ra one after the other)
যোগেন্দ্র্র: valid
As there is no blocking of number of consonant after a halanta this type of
labels have been tagged as VALID.
2.
যোগ্য-করা: invalid
যো-হুকুমের: invalid
যদু-বংশ: invalid
All hypenated labels are tagged as INVALID as hyphens not included in the MSR,
i.e. are ineligible for the root zone (digits, hyphen)
3. Nukta character (joint vs. exploded)
যোগমায়া: invalid
যথাসময়ে: invalid
যথানিয়মে: invalid
যাবতীয়: invalid
য় (YA) (U+09DF) as a consolidated character is not included in the MSR.
Moreover, to have a label tagged as VALID one has to type য + nukta (U+09BC) to
form য় (YA). Otherwise the labels with directly typed য় (U+09DF) will be tagged
as INVALID in all tests.
Accommodated in WLE rules (see section 7.1, Rule no. 1(h) in proposed Bengali
LGR).
4.
খানা৩:
Tagged as INVALID as numbers are not included in the MSR
5.
যাওয়া/হাত: invalid
All punctuated labels are tagged as INVALID as hyphens not included in the MSR,
i.e. are ineligible for the root zone (digits, hyphen). This particular label
has another possible reason for INVALID i.e. the য় (YA) issue mentioned in no. 3
above.
REGARDS
Dr. Atiur Rahman Khan
Principal Technical Officer (PTO)
NLP Research Lab, GIST Group
5th Floor, CDAC Innovation Park / 5 वीं मंजिल, सी-डैक इनोवैशन पार्क
Panchavati, Pashan, Pune / पंचवटी, पाषाण, पुणे
Pin: 411 008 / पिन: 411 008
-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180518/df3295a2/attachment.html>
More information about the Neobrahmigp
mailing list