This file contains Label Generation Rules (LGR) for the Tamil script as would be appropriate for the Root zone. For more details on this proposal see "Proposal for a Tamil Script Root Zone Label Generation Rule-Set (LGR)" [Proposal]. The format of this file follows [RFC 7940].
According to Section 5, "Repertoire" in [Proposal] the Tamil LGR contains 48 unique code points and 4 sequences.
The repertoire is based on [MSR-2], which is a subset of Unicode 6.3 [Unicode 6.3].
According to Section 6 "Variants", in "[Proposal]", this LGR defines four sequences as variants two of them are variants to single code points and two of them are variants allocatable because they look exactly alike and can cause confusion even to a careful observer.
Variant Disposition: As variants are confusingly similar they have been assinged "blocked" type. There is no preference among these variants. Whichever label containing either of these variants is chosen earlier, blocks the other equivalent variant label. The variant Shri described in section 6.1.3 is a case of variant where exactly same visual form is rendered with two distinct sequences. Also, in the minds of the user, regardless of which sequence they choose to input, both are intended to be the same akshar i.e. Shri. Hence, it is imperative that both the sequences be treated as same in terms of Variant analysis and any label formed with either form should be made available to the same entity. This variant pair is thus being proposed as an Allocatable variant.
In addition to these, this LGR defines 6 cross-script variants with the Malayalam script.
Tamil is an alphasyllabary and the heart of the writing system is the Akshar. It is this unit, which is instinctively recognized by users of the script. The writing system of Tamil could be summed up as composed of Consonants, the Implicit Vowel Killer: Halant, Vowels and Visarga/Aytham.
Consonants: More details in Section "3.3.1 The Consonants" of the [Proposal].
Virama2/Pulli: All consonants contain an implicit vowel (a) within them. A special sign is needed to denote that this implicit vowel is stripped off. This is known as the virama "்" (U+0BCD). The virama thus joins two adajcent consonants. In Tamil, there are only two cases where this forms conjuncts. More details in Section "3.3.2 Virama2/Pulli" of the [Proposal].
Vowels: Separate symbols exist for all Vowels that are pronounced independently either at the beginning or after another vowel sound. To indicate a Vowel sound other than the implicit one following a consonant, a Vowel sign (Matra) is attached to the consonant. Since the consonant has a built in ‘a’, there are equivalent Matras for all vowels excepting the அ. More details in Section "3.3.3 Vowels" of the [Proposal].
Visarga/Ayutham: The Visarga is also used in Tamil and represents a sound very close to /ḵ/. The Visarga is always followed by a stop consonant. More details in Section "3.3.4 Visarga/Aytham" of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-2]. They are marked with ⍟.
These rules have been drafted to ensure that the prospective Tamil label conforms to akshar formation norms as desired in Tamil script. These norms are exclusively presented as context rules.
Following symbols used in the WLE rules:
C → Consonant
M → Matra
V → Vowel
X → Visarga/Aytham
H → Virama/Pulli
The rules are:
More details in Section 7 "Whole Label Evaluation Rules (WLE)" of the [Proposal]
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts will be assigned a separate LGR; however Neo-Brahmi GP did ensure that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Tamil LGR, which caters to the Tamil language written using Tamil script.
Following references are cited in this document: