This file contains Label Generation Rules (LGR) for the Oriya script as would be appropriate for the Root zone. For more details on this proposal see "Proposal for Generation Panel for Neo-Brahmi Scripts Label Generation Ruleset for the Root Zone [Proposal]". The format of this file follows [RFC 7940].
According to Section 5, "Repertoire" in [Proposal], the Oriya LGR contains 65 unique code points."
The repertoire is based on [MSR], which is a subset of Unicode 6.3 [Unicode 6.3].
Each code-point has associated Glyph, Character Name, Unicode General Category (gc), Language with EGIDS, Indic Syllabic Category and Reference.
According to Section 6 "Variants", in "[Proposal]", this LGR defines cross-script variants which are "Confusing due to deviation from normally perceived character formations by larger linguistic community". These cases are not of mere visual similarity. These can cause confusion even to a careful observer and hence being proposed as variants.
Variant Disposition: As variants are of confusingly similar, albeit of a peculiar nature, it is proposed that they be considered of "blocking" nature. There is no preference among these variants. Whichever label containing either of these variants is chosen earlier, the other one equivalent variant label should be blocked.
The basic characters in Oriya are classified into seven main categories. They are Consonants, Vowels, Halant, Nukta, Visarga, Candrabindu and Anusvara.
Consonant: The type of writing system of Oriya is syllabic alphabet in which all consonants have an inherent vowel. Diacritics, which can appear above, below, before or after the consonant they belong to, are used to change the inherent vowel. More details in Section "3.4 Notable features" of the [Proposal].
Matra sign (Dependent Vowel): It is used to represent a vowel sound that is not inherent to the consonant. Dependent vowels are referred to as "matras" in Sanskrit. They are always depicted in combination with a single consonant, or with a consonant cluster. More details in Section "3.12 Matra sign: (Dependent Vowel)" of the [Proposal].
The Implicit Vowel Killer Halant: It is the character used after a consonant to "strip" it of its inherent vowel. Halant form of consonants is the form produced by adding the halant "୍" (U+0B4D), also known as Virama, to the nominal shape. A Halant follows all but the last consonant in every Oriya syllable. More details in Section "3.7 The Implicit Vowel Killer Halant" of the [Proposal].
Nukta: The nukta sign is used in oriya language too just like any other Indian scripts. There are few number of consonants to represent sounds found only in words borrowed from Perso-Arabic. It can be commonly used with “ଡ” U+0B21, “ଢ” U+0B22, “କ” U+0B15, “ଖ” U+0B16, “ଗ” U+0B17, “ଚ” U+0B1A, “ଜ” U+0B1C, and “ଫ” U+0B2B to show that words having these consonants with a nukta are to be pronounced in the Perso-Arabic style. More details in Section "3.8 Nukta" of the [Proposal].
Visarga and Avagraha: The Visarga (“ଃ” (U+0B03) is frequently used in Sanskrit and represents a sound very close to /h/. More details in Section "3.9 Visarga & Avagraha" of the [Proposal].
Nasalization: Candrabindu: Candrabindu denotes nasalization of the preceding vowel as in ଅଁଳା /ãala/name of seasonal fruit (U+0B05 U+0B01 U+0B33 U+0B3E). Oriya users commonly use it for writing the words and sounds of Sanskrit language. More details in Section "3.10 Nasalization: Candrabindu" of the [Proposal].
Anusvara: Anusvara replaces a conjunct group of a Nasal Consonant+Halant+Consonant belonging to that particular varga. The Anusvara represents a homorganic nasal. Before a non-varga consonant the Anusvara represents a nasal sound. More details in Section "3.11 Anusvara" of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR]. They are marked with ⍟.
These rules have been formulated so that they can be adopted for LGR specification.
Following symbols are used in the WLE rules:
C → Consonant
M → Matra
V → Vowel
B → Anusvara
H → Halant
N → Nukta
C1 → {ଡ0B21, ଢ0B22, କ0B15, ଖ0B16, ଗ0B17, ଚ0B1A, ଜ0B1C, ଫ0B2B}
X → Visarga
D → Candrabindu
The rules are:
More details in Section "7 Whole Label Evaluation Rules (WLE)" of the [Proposal]
The Neo-Brahmi Generation Panel (NBGP) has been formed by members having experience in linguistics and computational linguistics. Under the Neo-Brahmi Generation Panel, there are nine scripts belonging to separate Unicode blocks. Each of these scripts will be assigned a separate LGR; however Neo-Brahmi GP ensures that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts.
NBGP considered all the languages with EGIDS scale 1 to 4 and found that Oriya script is being used in other spoken languages.
Following references are cited in this document: