[arabic-vip] Review of Arabic Script Definitions - 22Sep11

Behnam Esfahbod behnam at esfahbod.info
Sun Sep 25 00:45:10 UTC 2011


Dear Manal, all,

Following please find some points on the definitions document.

1. Joining and Non-Joining Letters

I think the definitions for Joining and Non-Joining letters
are not accurate enough. In fact, we should use Unicode's
definitions for these properties. In Unicode book, Chapter
8, (table 8-3, page 248 of latest edition) the following
categories are defined, based on Joining_Type character
property.

1.1. Non-Joining Characters: Those characters that do not
connect to letters before or after them; i.e. U+0621 LETTER
HAMZA, U+0674 HIGH HAMZA, and U+200C ZWNJ.

1.2. Right-Joining Characters: Those characters that connect
to the letter before them; i.e. all letters based on Alef,
Reh, Dal, and Waw, and a few other letters.

1.3. Dual-Joining Characters: Those characters that connect
to the letters before and after them; i.e. all other Arabic
letters.

1.4. Join-Causing Characters: Those characters that connect
to the letters before and after them, but do not change
shape themselves; i.e. only U+200D ZWJ and U+0640 TATWEEL.

With respect to those categories, we can have the following
definitions:

1.5. Non-Joining Letters: The group of characters in 1.1
which are letters (by Unicode's definition); i.e. U+0621
LETTER HAMZA and U+0674 HIGH HAMZA.

1.6. Right-Joining Letters: The group of characters in 1.2
which are letters; i.e. all letters based on Alef, Reh, Dal,
and Waw, and a few other letters.

1.7. Dual-Joining Letters: The group of characters in 1.3
which are letters; i.e. all other Arabic letters.

2. Ligature

Unicode's definition for "Ligature" says "a combination of
two or more characters". Why we are saying "one or more
Arabic Letters"?

If the idea is to simplify the definition for the Arabic
script, I don't see why we should use "one or more letters"
instead of "two or more letters".

If we are talking about any Arabic ligature in our report
that is made of only one Arabic letter and some other
combining marks, could you please point that out?

3. Forms of a Letter

3.1. In this section, the word "ligature" is misused in the
definitions for the four shaping forms. What you meant here
is "the group of letters that are joined together", which is
not the definition of "ligature". What we have been using in
technical context for this concept is "joining run". I
strongly recommend we agree on term for this concept before
we deliver our report. Anyone has any other term in mind we
can use here?

3.2. In the definitions of "Initial form" and "Medial form",
instead of "joining letter" it should be "right-joining
letter".

3.3. In the definition of "Final form", "It is the form of
a *joining* letter" is correct. You have missed the
"joining" part.

3.4. Because of 3.3, we should first define "Joining Letter"
in the section "Joining and Non-Joining Letters" as the
union of "Right-Joining Letters" and "Dual-Joining Letters".

4. Writing Style

4.1. I think "both use the Arabic script" should be replace
by something like "both are different styles of writing one
script, the Arabic script".

5. Label Valid Character

5.1. Would you please help me understand this. We have "A
Label Valid Character is represented by a sequence of one or
more Label Valid Code Points." What do you mean by
more-than-one code-points for a character?

Thanks all for all the efforts on writing this document.
-Behnam


More information about the arabic-vip mailing list