[Neobrahmigp] observation on valid invalid Bnegali LGR testing

atiurk atiurk at cdac.in
Fri May 18 10:51:20 UTC 2018


Kindly ignore the previous email and consider this Document as the observation
on the tests.
regards


On May 18, 2018 at 2:37 PM atiurk <atiurk at cdac.in> wrote:

>   Dear all,
>  Please find attached the observation procured from the test results of
> proposed WLE rules for Bengali LGR.
> 
> 
>  Observations on the testing of Proposed Bengali LGR Rules on corpus.
> 
> 
> 
>     1. Halant+C+halanta+C ( here halanta+ra one after the other)
> 
>  যোগেন্দ্র্র: valid
> 
>  As there is no blocking of number of consonant after a halanta this type of
> labels have been tagged as VALID.
> 
> 
> 
>  2.
> 
>  যোগ্য-করা: invalid
> 
>  যো-হুকুমের: invalid
> 
>  যদু-বংশ: invalid
> 
>  All hypenated labels are tagged as INVALID as hyphens not included in the
> MSR, i.e. are ineligible for the root zone (digits, hyphen)
> 
> 
> 
>  3. Nukta character (joint vs. exploded)
> 
>  যোগমায়া: invalid
> 
>  যথাসময়ে: invalid
> 
>  যথানিয়মে: invalid
> 
>  যাবতীয়: invalid
> 
>  য় (YA) (U+09DF) as a consolidated character is not included in the MSR.
> Moreover, to have a label tagged as VALID one has to type য + nukta (U+09BC)
> to form য় (YA). Otherwise the labels with directly typed য় (U+09DF) will be
> tagged as INVALID in all tests.
> 
>  Accommodated in WLE rules (see section 7.1, Rule no. 1(h) in proposed Bengali
> LGR).
> 
>  4.
> 
>  খানা৩:
> 
>  Tagged as INVALID as numbers are not included in the MSR
> 
>  5.
> 
>  যাওয়া/হাত: invalid
> 
>  All punctuated labels are tagged as INVALID  as hyphens not included in the
> MSR, i.e. are ineligible for the root zone (digits, hyphen). This particular
> label has another possible reason for INVALID i.e. the য় (YA) issue mentioned
> in no. 3 above.
> 
> 
> 
> 
>  REGARDS
>  Dr. Atiur Rahman Khan
>  Principal Technical Officer (PTO)
>  NLP Research Lab, GIST Group
>  5th Floor, CDAC Innovation Park / 5 वीं मंजिल, सी-डैक इनोवैशन पार्क
>  Panchavati, Pashan, Pune / पंचवटी, पाषाण, पुणे
>  Pin: 411 008 / पिन: 411 008
> 
> 
> -------------------------------------------------------------------------------------------------------------------------------
>  [ C-DAC is on Social-Media too. Kindly follow us at:
>  Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
> 
>  This e-mail is for the sole use of the intended recipient(s) and may
>  contain confidential and privileged information. If you are not the
>  intended recipient, please contact the sender by reply e-mail and destroy
>  all copies and the original message. Any unauthorized review, use,
>  disclosure, dissemination, forwarding, printing or copying of this email
>  is strictly prohibited and appropriate legal action will be taken.
> 
> -------------------------------------------------------------------------------------------------------------------------------
> 
  _______________________________________________
Neobrahmigp mailing list
Neobrahmigp at icann.org
https://mm.icann.org/mailman/listinfo/neobrahmigp
Dr. Atiur Rahman Khan
Principal Technical Officer (PTO)
NLP Research Lab, GIST Group
5th Floor, CDAC Innovation Park / 5 वीं मंजिल, सी-डैक इनोवैशन पार्क
Panchavati, Pashan, Pune / पंचवटी, पाषाण, पुणे
Pin: 411 008 / पिन: 411 008
-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180518/09482282/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Observations on the testing of Proposed Bengali LGR Rules on corpus.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 14452 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/neobrahmigp/attachments/20180518/09482282/ObservationsonthetestingofProposedBengaliLGRRulesoncorpus-0001.docx>


More information about the Neobrahmigp mailing list