[ChineseGP] proposal to eliminate the divergence between us

Wang Wei wangwei at cnnic.cn
Mon Dec 29 07:53:32 UTC 2014


Dear Yoneya San and Hotta San

 

Please kindly accept my belated but best wishes for the Christmas and new
year.

Recently, we carried out the following works and I outlined them here for
your comments:

 

For any Hanzi in CGP repertoire, it belong to a variant mapping set (minimum
set size is 1 which means there is no variant for the code point) under the
current rules borrowed from CDNC; and for any Kanji code point in JGP
repertoire, it may also belong to some variant mapping set (we acknowledge
that there is no variant in JPRS practice so far, but we assume that there
will be a kind of variant mapping definition in JGP repertoire).

 

All the variant mapping sets can be divided into FIVE scenarios:

 

1)      the variant mapping set in JPRS ¡Ê variant mapping set in CDNC




In CGP 

Û 611B (0);°®(86),Û(886);Û(0),°®(0);

°® 7231 (0);°®(86),Û(886);Û(0),°®(0);

 

 In JGP 

Û611B(2,3);611B(2,3);

 

2)      the variant mapping set in CDNC ¡Ê the variant mapping set in JPRS




In CGP: 

¿¯520A (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0); 

„X520B (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);

 

In JGP: 

¿¯ 520A(2,3);520A(2,3);

„X 520B(2,3);520B(2,3);

–Ý 681E(2,3);681E(2,3); 

*: this example is ONLY an assumption

 

3)      the variant mapping set in CDNC = the variant mapping set in JPRS




In CGP 

Ò»4E00 (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

‰Ò58F1 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

Ò¼58F9 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

o5F0C (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

 

In JGP:

Ò» 4E00(2,3);4E00(2,3);

‰Ò 58F1(2,3);58F1(2,3);

Ò¼ 58F9(2,3);58F9(2,3); 

o 5F0C(2,3);5F0C(2,3);

*: this example is ONLY an assumption

 

4)      the variant mapping set in CDNC ¡É the variant mapping set in JPRS =
0




The code point UNIQUELY exists in JGP table

Þy8FBB(2,3);8FBB(2,3); 

 

5)      the variant mapping set in CDNC ¡É the variant mapping set in JPRS
¡Ù 0

and

the variant mapping set in CDNC ¡Ù the variant mapping set in JPRS




No specified example so far

\

 

In the past, we discussed the variants problem for many times, but mainly
based on the two types: allocatable and blocked. However, we think another
type ("out-of- repertoire") in the XML draft, may help the conflicted issue
between JGP and CGP, which was recommended by Asmus' mail. 

The basic principle is "any variant label with a code point
out-of-repertoire is invalid". We think this ¡°out-of-repertoire¡± type and
consequent ¡°invalid¡± action will tremendously decrease the complexity of
variant mapping coordination between us.

 

For scenario 1:

In CGP 

Û 611B (0);°®(86),Û(886);Û(0),°®(0);

°® 7231 (0);°®(86),Û(886);Û(0),°®(0);

In JGP 

Û 611B(2,3);611B(2,3);

 

JGP take°® 7231 into variant mapping set, but mark it as
¡°out-of-repertoire¡± and take ¡°invalid¡± action for WLG process, which
means, °® 7231 will never be generated into the labels.

 

JGP LGR:

<language>und-Jpan</language>

<char cp="611B" tag="sc:Hani">

    <var cp="611B" type="alloc" comment="identity" /> 

    <var cp="7231" type="out-of-repertoire-var" /> <!--Hans, JGP should
exist.-->

</char>

WLE rules:

<action disposition="invalid" any-variant="out-of-repertoire-var" 

comment="any variant label with a code point out of repertoire is invalid"/>

<action disp="allocatable" all-variant="alloc"  />

 

CGP LGR:

<language>und-Hani</language>

<char cp="611B" tag="sc:Hani">

    <var cp="611B" type="trad" comment="identity" /> <!-- Jpan -->

    <var cp="7231" type="simp" />

</char>

<char cp="7231" tag="sc:Hani">

    <var cp="611B" type="trad" /> <!-- Jpan -->

    <var cp="7231" type="simp" comment="identity" />

</char>

WLE rules:

         <action disp="blocked" any-variant="block" />

         <action disp="allocatable" only-variants="simp both" />

         <action disp="allocatable" only-variants="trad both" />

         <action disp="blocked" any-variant="simp trad" />

         <action disp="allocatable" comment="catch-all" />

 

 

For scenario 2:

In CGP: 

¿¯520A (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0); 

„X520B (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);

 

In JGP: 

¿¯ 520A(2,3);520A(2,3);

„X 520B(2,3);520B(2,3);

–Ý 681E(2,3);681E(2,3); 

 

Now it is CGP¡¯s turn to take–Ý 681E into variant mapping set, but mark it
as ¡°out-of-repertoire¡± and take ¡°invalid¡± action for WLG process, which
means, –Ý 681E will never be generated into the labels.

 

CGP LGR

<language>und-Hani</language>

<char cp="520A" tag="sc:Hani">

    <var cp="520A" type="both" comment="identity" />

    <var cp="520B" type="block" />

    <var cp="681E" type="out-of-repertoire-var" /> <!-- Jpan -->

</char>

<char cp="520B" tag="sc:Hani">

    <var cp="520A" type="both" />

    <var cp="520B" type="block" comment="identity" />

    <var cp="681E" type="out-of-repertoire-var" /> <!-- Jpan -->

</char>

<char cp="681E" tag="sc:Hani"> <!-- Jpan -->

    <var cp="520A" type="block" />

    <var cp="520B" type="block" />

    <var cp="681E" type="out-of-repertoire-var" comment="identity"/> 

</char>

WLE rules:

         <action disp="invalid" any-variant="out-of-repertoire-var" 

comment="any variant label with a code point out of repertoire is invalid"/>

         <action disp="blocked" any-variant="block" />

         <action disp="allocatable" only-variants="simp both" />

         <action disp="allocatable" only-variants="trad both" />

         <action disp="blocked" any-variant="simp trad" />

         <action disp="allocatable" comment="catch-all" />

         

JGP LGR:

<language>und-Jpan</language>

    <char cp="520A" tag="sc:Hani">

    <var cp="520A" type="alloc" comment="identity" />

    <var cp="520B" type="block" />

    <var cp="681E" type="block" />

</char>

<char cp="520B" tag="sc:Hani">

    <var cp="520A" type="block" />

    <var cp="520B" type="alloc" comment="identity" />

    <var cp="681E" type="block" /> 

</char>

<char cp="681E" tag="sc:Hani">

    <var cp="520A" type="block" />

    <var cp="520B" type="block" />

    <var cp="681E" type="alloc" comment="identity"/> 

</char>

WLE rules:

 <action disp="blocked" any-variant="block" />

 <action disp="allocatable" all-variant="alloc"  />

 

 

 

For Scenario 3:

 

In CGP 

Ò»4E00 (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

‰Ò58F1 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

Ò¼58F9 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

o5F0C (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 

 

In JGP:

Ò» 4E00(2,3);4E00(2,3); 

‰Ò 58F1(2,3);58F1(2,3);

Ò¼ 58F9(2,3);58F9(2,3); 

o 5F0C(2,3);5F0C(2,3); 

 

JGP needs to create its own mapping set including all above 4 code points
and corresponding rules, otherwise, it will fall into scenario 1..

 

 

For Scenario 4:

Like UNIQUE code point ONLY exists in JGP table

 Þy8FBB(2,3);8FBB(2,3); 

 

 

CGP probably will not include this code point into its repertoire.

No extra work or rule are needed.

 

 

For Scenario 5:



 

Actually, we have not find the code points which fit into this scenario.

But the solution will refer to scenario 1 or 2, like:

 

For JGP, ¡°C¡± will be included but marked as ¡°out-of-repertoire¡±

For CGP, ¡°A¡± will be included but marked as ¡°out-of-repertoire¡±

 

 

In conclusion, ¡°out-of ¨Crepertoire type¡± and ¡°invalid action¡± provide
us a conservative and simple way to reach a consensus for the variant
mapping and rules.

According to our analysis on CGP table and JPRS table

There are 4983 code points fit for Scenario 1

There are 840 code points fit for Scenario 3

There are 170 code points fit for Scenario 4

 

Since JGP has not decided yet if variant relationship exist in JGP
repertoire, we don¡¯t have analytical number about scenario 3 and scenario
5. But what we believe is that the above solution can also be applied for
scenario 3 and 5 no matter what kind of variant mapping JGP will produce.

 

 

All above is our proposal for settle the divergence at minimum cost for both
of us.

What do you think about it? Looking forward for your reply.

 

 

Best Regards,

Wei Wang

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 12179 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image001-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 3126 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image002-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 12003 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image003-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 3198 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image004-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 12513 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image005-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 2482 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image006-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 16199 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image007-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image008.jpg
Type: image/jpeg
Size: 2700 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image008-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image009.png
Type: image/png
Size: 14758 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image009-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image010.jpg
Type: image/jpeg
Size: 4015 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image010-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image011.jpg
Type: image/jpeg
Size: 2754 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/image011-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oledata.mso
Type: application/octet-stream
Size: 98661 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/oledata-0001.mso>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: For Yoneya and Hotta.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 200649 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20141229/13264847/ForYoneyaandHotta-0001.docx>


More information about the ChineseGP mailing list