This file contains Label Generation Rules (LGR) for the Devanagari script as would be appropriate for the Root zone. For more details on this proposal, see “Proposal for a Devanagari Script Root Zone Label Generation Rule-Set (LGR)” [Proposal]. The format of this file follows [RFC 7940].
The NeoBrahmi Generation Panel (NBGP) proposes 83 unique code-points to be made part of the Devanagari LGR [Proposal] in addition to thirteen sequences. The two sequences U+0931 U+094D U+092F and U+0931 U+094D U+0939 limit the character U+0931 (DEVANAGARI LETTER RRA) in its own specific context beyond which it does not stand by itself. Accordingly, while U+0931 is not listed by itself, it brings the total of distinct code points to 84.
A number of other sequences have been defined in connection with the definition of variants, bringing the total repertoire entries to 96 (see “Variants” below).
The repertoire includes code points used by languages written in Devanagari that fall within [EGIDS] scale 1 to 4. Boro, Braj, Dhundari, Mundari, Kharia have also been additionally covered. Though listed in EGIDS scale 4, Saraiki is not covered, because the Devanagari script is “no longer in use” by the Saraiki community. For more details, see Section 5 “Repertoire” in [Proposal]). A non-exhaustive list of languages using each code point can be found in the comments.
The repertoire is based on [MSR-3], which is a subset of Unicode 6.3 [Unicode 6.3].
According to Section 6 “Variants”, in [Proposal], this LGR defines variants which are “Confusing due to deviation from normally perceived character formations by the larger linguistic community” These cases are not of mere visual similarity as they involve some deviations from the widely accepted norms of Devanagari Akshar formations. These can cause confusion even to a careful observer and are hence being proposed as variants. They fall into two broad categories:
Variant Disposition: All variants are of type “blocked”, making labels that differ only by these variants mutually exclusive: whichever label containing either of these variants is chosen earlier would be delegated, while the other one label should be blocked.
In addition to these, cross-script variant analysis of Devanagari has been carried out by the NBGP. Possible cross-script variant cases were found with the Gurmukhi and Bengali script and have been mentioned in Appendix 1 of the [Proposal].
Devanagari is an alphasyllabary and the heart of the writing system is the akshar. It is this unit, which is instinctively recognized by users of the script. The writing system of Devanagari could be summed up as composed of Consonants, Implicit Vowel Killer: Halant, Vowels, Anusvara, Candrabindu, Nukta and a Visarga.
Consonants: Devanagari consonants all contain an implicit schwa /ə/. To make a full syllable, consonants may be followed by certain code points from one or more of the other groups (see “WLE rules” below). See Section “3.3.1 The Consonants” of the [Proposal].
Halant: All consonants contain an implicit vowel (schwa). A special sign is needed to denote that this implicit vowel is stripped off. This is known as the Halant (U+094D). The Halant thus joins two consonants and creates conjuncts, which can be generally from 2 to 4 consonant combinations. In rare cases it can join up to 5 consonants. However, this LGR will not enforce any length limit. See section 3.3.2 “The Implicit Vowel Killer: Halant” in [Proposal].
Vowels: There are separate code points for vowels that are pronounced independently at the beginning of a syllable or after a vowel sound. To indicate a Vowel sound following a consonant other than the implicit shwa sound, a vowel sign (Matra) is attached to the consonant. There is an equivalent Matra for each vowel excepting the U+0905. See Section “3.3.3 Vowels” of the [Proposal]
Anusvara : The Anusvara shows a nasal at the end of a syllable. See Section “3.3.4 The Anusvara” of the [Proposal].
Candrabindu : A Candrabindu denotes nasalization of the preceding vowel. Present-day Hindi users tend to replace the Candrabindu by the Anusvara. See Section “3.3.5 Nasalization: Candrabindu” of the [Proposal].
Nukta : The nukta sign is placed below a certain number of consonants to represent sounds found only in words borrowed from Perso-Arabic, English and other non-Aryan sources. It is also placed under U+0921 and U+0922 to indicate flapped sounds. Apart from this, Santali language uses Nukta adjoined to certain vowels and vowel signs. See Section “3.3.6 Nukta” of the [Proposal].
Visarga: The Visarga (U+0903), representing an aspiration at the end of a syllable, is frequently used in Sanskrit. See Section “3.3.7 Visarga and Avagraha” of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-3]. They are marked with ⍟.
These rules ensure that the Devanagari label conforms to akshar formation norms for the Devanagari script. These norms are exclusively presented as context rules.
The following symbols are used in the names and comments for WLE rules:
The rules are:
See Section “7 Whole Label Evaluation Rules (WLE)” of the [Proposal].
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts has been assigned a separate LGR; however, Neo-Brahmi GP ensured that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Devanagari LGR, which caters to multiple languages written using Devanagari belonging to EGIDS scale 1 to 4.
For additional details and contributors, see Sections 4 and 8 of the [Proposal].
References [0] to [11] refer to the Unicode Standard versions in which corresponding code points were initially encoded. Reference [100] and up correspond to sources given in [Proposal] for justifying the inclusion of for the corresponding code points. Single code point or ranges may have multiple source reference values.
In addition, the following references are cited in this document:
For more details for references [100] and up and [0] and up refer to the Table of References below.
]]>