[ChineseGP] 答复: proposal to eliminate the divergence between us

Wang Wei wangwei at cnnic.cn
Mon Jan 19 13:02:36 UTC 2015


Dear Hotta

	I do love the terminology of "passive defined variants" which
describes the 'borrowed' or 'adopted' variants from other GP's repertoires
precisely.

	Let me explain what we will do when we (CGP) encounter that "
passive defined variants "
	we will include that code point into the repertoire, and "block" or
"invalid" the whole labels contained the "passive defined variants"
	"block" means the label could be activated and allocated in the
future due to the registration policy.
	"invalid" means the label will never never be activated, since the
"passive defined variants" does not exist in original repertoire, or namely,
"out-of-repertoire"

	For example
	In CGP: 
		¿¯520A (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0); 
		„X520B (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);

	In JGP: 
		¿¯ 520A(2,3);520A(2,3);
		„X 520B(2,3);520B(2,3);
		–Ý 681E(2,3);681E(2,3);


	The possible solution for CGP, is to include–Ý 681E(2,3) into
repertoire, and mark it as  "passive defined variants" or "out-of-repertoire
¡°
	And the consequent whole label action is "block" or "invalid"
depends on linguists' advice and registration policy.
	Does that make sense to JGP?

	I think we have already reach a consensus on the variant mapping
principles. 
	Could you please give us some more examples of different variant
cases? we can work together on them and try to make the solution more clear.


Regards
Wang Wei


-----ÓʼþÔ­¼þ-----
·¢¼þÈË: chinesegp-bounces at icann.org [mailto:chinesegp-bounces at icann.org] ´ú
±í HiroHOTTA
·¢ËÍʱ¼ä: 2015Äê1ÔÂ18ÈÕ 23:16
ÊÕ¼þÈË: Dillon, Chris
³­ËÍ: Yoshiro YONEYA; ChineseGP at icann.org
Ö÷Ìâ: Re: [ChineseGP] proposal to eliminate the divergence between us

Hi Chris, 

JGP is investigating how each language GP should generate its LGR.  
We finally have come to understand that, for example, JGP should have at
least "passively defined" variants that are defined in CGP and/or KGP.

Borrowing Wang Wei's example,
Let's suppose
  CGP defines 
    Ò»4E00 (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 
    ‰Ò58F1 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 
    Ò¼58F9 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0); 
    o5F0C (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
  JGP defines 
    'Ò»' and '‰Ò' as independent (non-variant) characters, and 
    'Ò¼' and 'o' are not in JGP repertoire.

In such a case, JGP must define 'Ò»', '‰Ò', 'Ò¼', and 'o' as variants, and
'Ò»'and '‰Ò' are both allocatable when 'Ò»'or '‰Ò' 
is applied for registration. In JGP, we call this as "passively defined
variants" - it seems that 'inherited' or 'adopted' is used by Chris for
this.

"Passively defined variants" is necessary if this is not automatically
calculated in an output of IP. I hope each language GP and IP have a
discussion on the possibility and specification of this (automatic)
calculation.

Hiro


On Fri, 9 Jan 2015 12:16:26 +0000
"Dillon, Chris" <c.dillon at ucl.ac.uk> wrote:
> Dear Qichao,
> Effectively the CN rule becomes a general rule once the mappings are made
compatible. Variants in any language LGR may block labels even if they come
from an LGR where they would not be blocked.
> I like that particular example especially, as it indicates a first come
first served situation where a label including a low frequency character
(and which is not a variant in all the language LGRs) blocks a label with a
high frequency character.
> Again this is presuming the ¡°each language has a descriptive LGR of its
own situation (with or without variant mappings) and then another LGR (with
possibly adopted* variant mappings) which is compatible with other LGRs for
the script¡± model.
> *I have been using the word ¡°inherited¡± for this, but ¡°adopted¡± (i.e.
from other languages using the script) may be better.
> Regards,
> Chris.
> --
> Research Associate in Linguistic Computing, Centre for Digital 
> Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 
> 31599) 
> www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/c
> hrisdillon>
> 
> From: ?³¬ [mailto:qichao at cnnic.cn]
> Sent: 09 January 2015 03:21
> To: Dillon, Chris; Sarmad Hussain
> Cc: Yoshiro YONEYA; ChineseGP at icann.org; hotta at jprs.co.jp
> Subject: Re: Re: [ChineseGP] proposal to eliminate the divergence 
> between us
> 
> Dear Chris,
> 
>       I agree with your understanding: if there is a variant mapping in
any language, there will be the mapping in all language. So the mappings are
formally unified, but  C J and K still could apply different rule for
allocated/blocked case.
> 
> Your case is a nice example, but this is a minor error that the 
> situation of
> 
> 'where .XX‰Ò was allocated, disallowing the separate allocation of . XXÒ»'
> 
>   is a rule only for language tag CN,  while  '.XX‰Ò' and '.XXÒ»  'could
be both allocated for language tag JP or KR, and these  are in a mapping
similar to tag CN but different disposition rule.
> 
> Maybe my understanding is wrong, and I hope IP can give an common example
for C, J and K.
> 
> 
> Best Regards
> 
> ________________________________
>                     ?³¬ via foxmail
> 
> ?¼þÈË£º Dillon, Chris<mailto:c.dillon at ucl.ac.uk>
> ?ËÍ??£º 2015Äê1ÔÂ8ÈÕ(ÐÇÆÚËÄ) ÏÂÎç9:04
> ÊÕ¼þÈË£º Sarmad Hussain<mailto:sarmad.hussain at icann.org>
> ³­ËÍ£º yoshiro.yoneya at jprs.co.jp<mailto:yoshiro.yoneya at jprs.co.jp>; 
> ChineseGP at icann.org<mailto:ChineseGP at icann.org>; 
> hotta at jprs.co.jp<mailto:hotta at jprs.co.jp>
> Ö÷?£º Re: [ChineseGP] proposal to eliminate the divergence between us 
> Dear Sarmad, That is certainly an example of what I mean. The possible 
> issue with CJK is that there will be many such examples. We will only know
how many when we have tables for SC, TC, J and K.
> The CJK tables with compatible mappings* (i.e. not the tables which
describe the situation in SC, TC, J and K) will probably end up with:
> ¡°If it¡¯s a variant in any language, it will be a variant in the tables
with compatible mappings*.¡±
> *If my understanding is right, there will be no single amalgamated table.
> From a Japanese (Traditional Chinese and Korean) perspective, it Is weird
to have, for example, óŒ ¡®hair¡¯ and °l ¡®send¡¯ as variants (as they are
both ? in Simplified Chinese) and on slide 8 of Professor Kim¡¯s
presentation, there was a possible situation where .XX‰Ò was allocated,
disallowing the separate allocation of . XXÒ», although Ò» is a much more
basic character. However, I am yet to discover cases (and there would need
to be quite a few of them) which could be party-stoppers.
> The Procedure is very well made and I believe it will be possible to
follow it.
> Regards,
> Chris.
> --
> Research Associate in Linguistic Computing, Centre for Digital 
> Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 
> 31599) 
> www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/c
> hrisdillon>
> 
> From: Sarmad Hussain [mailto:sarmad.hussain at icann.org]
> Sent: 08 January 2015 11:01
> To: Dillon, Chris
> Cc: ChineseGP at icann.org<mailto:ChineseGP at icann.org>; Wang Wei; 
> yoshiro.yoneya at jprs.co.jp<mailto:yoshiro.yoneya at jprs.co.jp>; 
> hotta at jprs.co.jp<mailto:hotta at jprs.co.jp>
> Subject: RE: [ChineseGP] proposal to eliminate the divergence between 
> us Dear Chris, If I understand this correctly, here is an example from 
> Arabic script which could be relevant:
> U+06A9 (?) and U+06AA (?) are distinct letters (not variants) in Sindhi
language (see http://www.omniglot.com/writing/sindhi.htm).
> However, these code points are considered variants of U+0643 (?) by other
language communities (e.g. Arabic language).
> Therefore, they are being considered as variants by the ArabicGP (see
Table 3 in
https://community.icann.org/download/attachments/47253587/Arabic%20Variant%2
0Analysis%20for%20LGR%200.8.pdf?version=2&modificationDate=1419700233000&api
=v2).
> Regards,
> Sarmad
> From: Dillon, Chris [mailto:c.dillon at ucl.ac.uk]
> Sent: Monday, January 05, 2015 3:09 PM
> To: Wang Wei; 
> yoshiro.yoneya at jprs.co.jp<mailto:yoshiro.yoneya at jprs.co.jp>; 
> hotta at jprs.co.jp<mailto:hotta at jprs.co.jp>
> Cc: ChineseGP at icann.org<mailto:ChineseGP at icann.org>; Sarmad Hussain
> Subject: RE: [ChineseGP] proposal to eliminate the divergence between 
> us Dear colleagues,
> ÐÂÄê¿ì˜·
> Ã÷¤±¤Þ¤·¤Æ¤ª¤á¤Ç¤È¤¦¤´¤¶¤¤¤Þ¤¹
> ? ? ¸£ ?? ????
> Or/. Happy New Year!
> I am wondering whether there may be a way of making the proposal below
work, without the JGP¡¯s having to define variant sets and mappings (well,
only a small number in scenario 2).
> Scripts used by many languages, for example Cyrillic and Arabic (I¡¯ll
leave out Latin as it is used by so many languages it may cause confusion)
may be in a situation where some implementations of the script define
variants (cf. SC and TC) and some don¡¯t (cf. Japanese). One possible
approach could be that languages which don¡¯t define variants inherit the
variant sets and mappings from the languages using the script that do define
variants. I¡¯m copying Sarmad in on this email, as this is a phenomenon
which may have occurred in the work of one of the other GPs.
> I reckon this approach would work for cases 1, 3 and 4 below. 
> (Actually 5 too as long as there are no examples of it¡­) That only leaves
us with cases in scenario 2 such as –Ý (a variant which only exists in the
Japanese table) for which a mapping to ¿¯ and „X would need to be created.
For all other cases, the SC/TC mappings would be inherited.
> Regards,
> Chris.
> --
> Research Associate in Linguistic Computing, Centre for Digital 
> Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 
> 31599) 
> www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/c
> hrisdillon>
> 
> From: chinesegp-bounces at icann.org<mailto:chinesegp-bounces at icann.org> 
> [mailto:chinesegp-bounces at icann.org] On Behalf Of Wang Wei
> Sent: 29 December 2014 07:54
> To: yoshiro.yoneya at jprs.co.jp<mailto:yoshiro.yoneya at jprs.co.jp>; 
> hotta at jprs.co.jp<mailto:hotta at jprs.co.jp>
> Cc: ChineseGP at icann.org<mailto:ChineseGP at icann.org>
> Subject: [ChineseGP] proposal to eliminate the divergence between us
> 
> Dear Yoneya San and Hotta San
> 
> Please kindly accept my belated but best wishes for the Christmas and new
year.
> Recently, we carried out the following works and I outlined them here for
your comments:
> 
> For any Hanzi in CGP repertoire, it belong to a variant mapping set
(minimum set size is 1 which means there is no variant for the code point)
under the current rules borrowed from CDNC; and for any Kanji code point in
JGP repertoire, it may also belong to some variant mapping set (we
acknowledge that there is no variant in JPRS practice so far, but we assume
that there will be a kind of variant mapping definition in JGP repertoire).
> 
> All the variant mapping sets can be divided into FIVE scenarios:
> 
> 
> 
> 1.       the variant mapping set in JPRS ¡Ê variant mapping set in CDNC
> [cid:image001.jpg at 01D02C04.FD28B860]
> In CGP
> Û 611B (0);?(86),Û(886);Û(0),?(0);
> ? 7231 (0);?(86),Û(886);Û(0),?(0);
> 
>  In JGP
> Û611B(2,3);611B(2,3);
> 
> 
> 
> 
> 2.       the variant mapping set in CDNC ¡Ê the variant mapping set in
JPRS
> [cid:image002.jpg at 01D02C04.FD28B860]
> In CGP:
> ¿¯520A (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);
> „X520B (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);
> 
> In JGP:
> ¿¯ 520A(2,3);520A(2,3);
> „X 520B(2,3);520B(2,3);
> –Ý 681E(2,3);681E(2,3);
> *: this example is ONLY an assumption
> 
> 
> 
> 
> 3.       the variant mapping set in CDNC = the variant mapping set in JPRS
> [cid:image003.jpg at 01D02C04.FD28B860]
> In CGP
> Ò»4E00 (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> ‰Ò58F1 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> Ò¼58F9 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> o5F0C (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> 
> In JGP:
> Ò» 4E00(2,3);4E00(2,3);
> ‰Ò 58F1(2,3);58F1(2,3);
> Ò¼ 58F9(2,3);58F9(2,3);
> o 5F0C(2,3);5F0C(2,3);
> *: this example is ONLY an assumption
> 
> 
> 
> 
> 4.       the variant mapping set in CDNC ¡É the variant mapping set in
JPRS = 0
> [cid:image004.jpg at 01D02C04.FD28B860]
> The code point UNIQUELY exists in JGP table
> Þy8FBB(2,3);8FBB(2,3);
> 
> 
> 
> 
> 5.       the variant mapping set in CDNC ¡É the variant mapping set in
JPRS ¡Ù 0
> 
> and
> the variant mapping set in CDNC ¡Ù the variant mapping set in JPRS 
> [cid:image005.jpg at 01D02C04.FD28B860]
> 
> No specified example so far
> 
> \
> 
> In the past, we discussed the variants problem for many times, but mainly
based on the two types: allocatable and blocked. However, we think another
type ("out-of- repertoire") in the XML draft, may help the conflicted issue
between JGP and CGP, which was recommended by Asmus' mail.
> The basic principle is "any variant label with a code point
out-of-repertoire is invalid". We think this ¡°out-of-repertoire¡± type and
consequent ¡°invalid¡± action will tremendously decrease the complexity of
variant mapping coordination between us.
> 
> For scenario 1:
> In CGP
> Û 611B (0);?(86),Û(886);Û(0),?(0);
> ? 7231 (0);?(86),Û(886);Û(0),?(0);
> In JGP
> Û 611B(2,3);611B(2,3);
> 
> JGP take? 7231 into variant mapping set, but mark it as
¡°out-of-repertoire¡± and take ¡°invalid¡± action for WLG process, which
means, ? 7231 will never be generated into the labels.
> 
> JGP LGR:
> <language>und-Jpan</language>
> <char cp="611B" tag="sc:Hani">
>     <var cp="611B" type="alloc" comment="identity" />
>     <var cp="7231" type="out-of-repertoire-var" /> <!--Hans, JGP 
> should exist.--> </char> WLE rules:
> <action disposition="invalid" any-variant="out-of-repertoire-var"
> comment="any variant label with a code point out of repertoire is 
> invalid"/> <action disp="allocatable" all-variant="alloc"  />
> 
> CGP LGR:
> <language>und-Hani</language>
> <char cp="611B" tag="sc:Hani">
>     <var cp="611B" type="trad" comment="identity" /> <!-- Jpan -->
>     <var cp="7231" type="simp" />
> </char>
> <char cp="7231" tag="sc:Hani">
>     <var cp="611B" type="trad" /> <!-- Jpan -->
>     <var cp="7231" type="simp" comment="identity" /> </char> WLE 
> rules:
>          <action disp="blocked" any-variant="block" />
>          <action disp="allocatable" only-variants="simp both" />
>          <action disp="allocatable" only-variants="trad both" />
>          <action disp="blocked" any-variant="simp trad" />
>          <action disp="allocatable" comment="catch-all" />
> 
> 
> For scenario 2:
> In CGP:
> ¿¯520A (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);
> „X520B (0);¿¯520A(86),¿¯520A(886);¿¯(0),„X(0);
> 
> In JGP:
> ¿¯ 520A(2,3);520A(2,3);
> „X 520B(2,3);520B(2,3);
> –Ý 681E(2,3);681E(2,3);
> 
> Now it is CGP¡¯s turn to take–Ý 681E into variant mapping set, but mark it
as ¡°out-of-repertoire¡± and take ¡°invalid¡± action for WLG process, which
means, –Ý 681E will never be generated into the labels.
> 
> CGP LGR
> <language>und-Hani</language>
> <char cp="520A" tag="sc:Hani">
>     <var cp="520A" type="both" comment="identity" />
>     <var cp="520B" type="block" />
>     <var cp="681E" type="out-of-repertoire-var" /> <!-- Jpan --> 
> </char> <char cp="520B" tag="sc:Hani">
>     <var cp="520A" type="both" />
>     <var cp="520B" type="block" comment="identity" />
>     <var cp="681E" type="out-of-repertoire-var" /> <!-- Jpan --> 
> </char> <char cp="681E" tag="sc:Hani"> <!-- Jpan -->
>     <var cp="520A" type="block" />
>     <var cp="520B" type="block" />
>     <var cp="681E" type="out-of-repertoire-var" comment="identity"/> 
> </char> WLE rules:
>          <action disp="invalid" any-variant="out-of-repertoire-var"
> comment="any variant label with a code point out of repertoire is
invalid"/>
>          <action disp="blocked" any-variant="block" />
>          <action disp="allocatable" only-variants="simp both" />
>          <action disp="allocatable" only-variants="trad both" />
>          <action disp="blocked" any-variant="simp trad" />
>          <action disp="allocatable" comment="catch-all" />
> 
> JGP LGR:
> <language>und-Jpan</language>
>     <char cp="520A" tag="sc:Hani">
>     <var cp="520A" type="alloc" comment="identity" />
>     <var cp="520B" type="block" />
>     <var cp="681E" type="block" />
> </char>
> <char cp="520B" tag="sc:Hani">
>     <var cp="520A" type="block" />
>     <var cp="520B" type="alloc" comment="identity" />
>     <var cp="681E" type="block" />
> </char>
> <char cp="681E" tag="sc:Hani">
>     <var cp="520A" type="block" />
>     <var cp="520B" type="block" />
>     <var cp="681E" type="alloc" comment="identity"/> </char> WLE 
> rules:
>  <action disp="blocked" any-variant="block" />  <action 
> disp="allocatable" all-variant="alloc"  />
> 
> 
> 
> For Scenario 3:
> 
> In CGP
> Ò»4E00 (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> ‰Ò58F1 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> Ò¼58F9 (0); Ò¼58F9(86),Ò¼58F9(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> o5F0C (0); Ò»4E00(86),Ò»4E00(886); Ò»(0),‰Ò(0),Ò¼(0),o(0);
> 
> In JGP:
> Ò» 4E00(2,3);4E00(2,3);
> ‰Ò 58F1(2,3);58F1(2,3);
> Ò¼ 58F9(2,3);58F9(2,3);
> o 5F0C(2,3);5F0C(2,3);
> 
> JGP needs to create its own mapping set including all above 4 code points
and corresponding rules, otherwise, it will fall into scenario 1..
> 
> 
> For Scenario 4:
> Like UNIQUE code point ONLY exists in JGP table
>  Þy8FBB(2,3);8FBB(2,3);
> 
> 
> CGP probably will not include this code point into its repertoire.
> No extra work or rule are needed.
> 
> 
> For Scenario 5:
> [cid:image006.jpg at 01D02C04.FD28B860]
> 
> Actually, we have not find the code points which fit into this scenario.
> But the solution will refer to scenario 1 or 2, like:
> 
> For JGP, ¡°C¡± will be included but marked as ¡°out-of-repertoire¡±
> For CGP, ¡°A¡± will be included but marked as ¡°out-of-repertoire¡±
> 
> 
> In conclusion, ¡°out-of ?repertoire type¡± and ¡°invalid action¡± provide
us a conservative and simple way to reach a consensus for the variant
mapping and rules.
> According to our analysis on CGP table and JPRS table There are 4983 
> code points fit for Scenario 1 There are 840 code points fit for 
> Scenario 3 There are 170 code points fit for Scenario 4
> 
> Since JGP has not decided yet if variant relationship exist in JGP
repertoire, we don¡¯t have analytical number about scenario 3 and scenario
5. But what we believe is that the above solution can also be applied for
scenario 3 and 5 no matter what kind of variant mapping JGP will produce.
> 
> 
> All above is our proposal for settle the divergence at minimum cost for
both of us.
> What do you think about it? Looking forward for your reply.
> 
> 
> Best Regards,
> Wei Wang


_______________________________________________
ChineseGP mailing list
ChineseGP at icann.org
https://mm.icann.org/mailman/listinfo/chinesegp




More information about the ChineseGP mailing list