This file contains Label Generation Rules (LGR) for the Gurmukhi script as would be appropriate for the Root zone. For more details on this proposal see "Proposal for Generation Panel for Neo-Brahmi Scripts Label Generation Ruleset for the Root Zone [Proposal]". The format of this file follows [RFC 7940].
According to Section 5, "Repertoire" in [Proposal], the Gurmukhi script LGR contains 56 unique code-points.
The repertoire is based on [MSR], which is a subset of Unicode 6.3 [Unicode 6.3].
Each code-point has associated Glyph, Character Name, Unicode General Category (gc), Indic Syllabic Category and Reference.
According to Section 6 "Variants", in "[Proposal]", this LGR defines variants which are "Confusing due to deviation from normally perceived character formations by larger linguistic community". These cases are not of mere visual similarity. These can cause confusion even to a careful observer and hence being proposed as variants. In addition to these, NBGP also did cross-script variant analysis among all the scripts under NBGP ambit. Gurmukhi and Devanagari scripts are closely related to each other and there are many characters in both scripts, which can be confused with the characters in other script.
Variant Disposition: As variants are of confusingly similar, albeit of a peculiar nature, it is proposed that they be considered of "blocking" nature. There is no preference among these variants. Whichever label containing either of these variants is chosen earlier, the other one equivalent variant label should be blocked.
Gurmukhi is an alphasyllabary and the heart of the writing system is the Akshar. It is this unit, which is instinctively recognized by users of the script. The writing system of Gurmukhi could be summed up as composed of Consonants, Implicit Vowel Killer: Halant, Vowels, Bindi, Tippi, Addak, Nukta and Visarga.
Consonants: Gurmukhi consonants have an implicit schwa /ə/ included in them. But these consonants are also used to represent consonant sounds where /ə/ vowel is not incorporated with them without any modification. More details in Section "3.3.1 The Consonants" of the [Proposal].
Halant: All consonants contain an implicit vowel (schwa). A special sign is needed to denote that this implicit vowel is stripped off. This is known as the Halant "्" (U+094D). More details in Section "3.3.2 The Implicit Vowel Killer: Halant" of the [Proposal].
Vowels: Punjabi has ten vowels /ਅ(ə), ਆ(a), ਇ(I), ਈ(i), ਉ(U), ਊ(u), ਏ(e), ਐ(ɛ), ਓ(o) and ਔ(ͻ)/. Out of these, three /ਅ(ə), ਇ(I), ਉ(U)/ are short vowels and seven (ਆ(a), ਈ(i), ਊ(u), ਏ(e), ਐ(ɛ), ਓ(o) and ਔ(ͻ)/ are long vowels. More details in Section "3.3.3 Vowels" of the [Proposal].
Bindi: The bindi (ਂ) represents a homo-organic nasal. Bindi is used with all long vowels/ਆ, ਈ, ਊ, ਏ, ਐ, ਓ, ਔ/ and the short vowel ਉ and with the matras of long vowels/ ਾ, ੀ, ੇ, ੈ, ੋ, ੌ / except the matra ( ੂ). More details in Section "3.3.4.1 The Bindi" of the [Proposal].
Tippi : Tippi (ੰ) is used to nasalize short vowels /ə/ and /I/ at all places and /U,and u/ after a consonant. So Tippi comes with the mātrās of /ə/ and /I/ i.e. mukta (without any vowel sign) and ਿ with vowel carriers and with consonants as ਸੰ and ਸਿੰ. Mātrās of /U,and u/ i.e. (ੁ, ੂ ) after a consonant take Tippi. In addition to this, Tippi is also used in gemination for nasal consonants ਙ, ਞ, ਨ and ਮ. More details in "3.3.4.2 The Tippi" of the [Proposal]
Addak : Addak is used to mark the gemination of the following consonant. In Punjabi, addak usually comes with mukta, aunkar (ੁ) and sihari (ਿ), the vowel signs of /ə, u and i/ short vowels and geminates the consonant which follows it. More details in "3.3.4.3 The Addak" of the [Proposal]
Nukta : Termed as pairin bindi in Punjabi, nukta is used with the following consonants: ਸ, ਖ, ਗ, ਜ, ਫ and ਲ to represent the phonemes of words of Sanskrit and Perso-Arabic sources. More details in Section "3.3.4.4 Nukta" of the [Proposal]
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR]. They are marked with ⍟.
These rules have been drafted to ensure that the prospective Gurmukhi label conforms to akshar formation norms as desired in Gurmukhi script. These norms are exclusively presented as context rules.
Following symbols are used in the WLE rules:
C → Consonant
M → Matra
V → Vowel
B → Bindi
D → Tippi
A → Addak
H → Virama
N → Nukta
M1 → { ਿ(U+0A3F), ੁ(U+0A41) } (Short matras)
M2 → M - M1 (Long matras)
V1 → { ਅ (U+0A05), ਇ (U+0A07), ਉ (U+0A09)} (Short Vowels)
V2 → V - V1 (Long Vowel)
C1 → {ਖ (U+0A16), ਗ (U+0A17), ਜ (U+0A1C), ਫ (U+0A2B), ਲ (U+0A32), ਸ (U+0A38)}
C2 → { ਰ (U+0A30), ਵ (U+0A35), ਹ (U+0A39)}
C3 → C – {ਙ(U+0A19), ਞ(U+0A1E), ਣ(U+0A23), ਹ(U+0A39), ੜ(U+0A5C)}
The rules are:
More details in Section "7 Whole Label Evaluation Rules (WLE)" of the [Proposal]
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts will be assigned a separate LGR; however Neo-Brahmi GP will ensure that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Gurmukhi LGR, which caters to Punjabi language written using the Gurmukhi script. Punjabi (EGIDS 2) is the only language which is currently using the Gurmukhi script.
Following references are cited in this document: