[ChineseGP] Integration Panel Considerations on 20151115 Chinese LGR draft

Sun Mar 6 12:04:45 UTC 2016

Dear Asmus, thank you for your valuable suggestion.

We will keep working on it

发件人: Asmus Freytag [mailto:asmusf at ix.netcom.com] 
发送时间: 2016年3月5日 7:51
收件人: 王伟 <wangwei at cnic.cn>
抄送: Sarmad Hussain <sarmad.hussain at icann.org>; ChineseGP at icann.org; hotta at jprs.co.jp
主题: Re: Integration Panel Considerations on 20151115 Chinese LGR draft

Dear Wang Wei,

The Integration Panel has not been able to discuss your contribution as a full panel. Therefore, I will give you my personal feedback on some issues, with the hope that this will be useful for any deliberations you will have among yourselves or with the other IP members who will be at ICANN55.

I will send, under separate cover an HTML version of your XML file. It provides a different way of looking at the data than the XML file, and you might find it useful. It's a rather large file, so I will send a compressed version.

A./

Comments below:

On 2/24/2016 10:00 PM, 王伟 wrote:

Dear Asmus

       Thanks for reviewing CGP XML document and pointing out the problems in it.

       I have fixed the flaws about reference id, reflective type, usage of “blocked”, and etc.

The XML looks much improved, but we haven't had the chance to test it yet. But I did run it through my tool.

Lines 70409 -70410 had some minor issues:

     <action disp="allocatable" only-variants="r-simp r-trad r-both" common="original label"/>
     <action disp="blocked"     any-variant="simp trad both r-simp r-trad r-both" "block any other mixed labels" />

common --> comment
"block... --> comment="block

I believe those where the only syntax issues I found.

Before answer the questions in PDF, I’d to explain the development principles of CGP repertoire.

1)  The core set of CGP repertoire is the intersection of CDNC table (the latest version 2015) and MSR

2)  The intersection of China official Normalized Hanzi List for Common Use and MSR

3)  The intersection of IICore and MSR (to include the other IIcore code points not covered in step 1 and 2)

Considering the principles above, the answers are listed as below:

Q1: The IP would like to know the rationale for adding the 22 IICORE characters that are not used in

existing IDN tables (see Section 4.22). We note that over half of them are included in IICORE for

support of Korean.

A1: we added 22 IICore code points not for support of Korean, but to reach a maximum support of IICore.

Noted.

Q2: The IP would like to learn the reasoning by the CGP with regards to deciding which code points

from the JO set to include. We were unable to identify a pattern to distinguish the 94 that were

selected from the 50 that were excluded. For example, we can see that a few of these 50 have

glyph shapes are distinctively ‘Japanese’ but on the other hand, many are shared with other IRG

sources.

A2: If a JP code point happens to be included in the above repertoire, CGP experts will analyze its variant relationship with others, whether or not it is a variant.

We don’t intend to analyze every code point in JGP repertoire and discuss to add it into CGP repertoire.

To understand this then, the cutoff you used for code points that are not specific to Chinese is the IICORE set and any partial coverage of the J0 set would be due to partial coverage of that set in IICORE? 

Q3: The IP wonders whether it is wise not to include the 7 characters that were deemed useful by

DotAsia for Singapore and other Chinese constituencies (listed explicitly in Section 4.1).

A3: thanks for reminding us that 7 characters from DotAsia. These 7 code points are not covered by the above 3 principles.

We will have CDNC meeting in the coming March, and to discuss if a new 4th principle should be adopted, to allow all New gTLD application IDN code points.

OK. This means that the issue is still open at this point. We'll wait for a report from CGP on the resolution.

Q4: The IP would like to see a more detailed analysis of the 62 characters IICORE encoded in the

block: CJK UNIFIED IDEOGRAPHS EXTENSION B and which are not included in the CLGR draft.

These are all HKSCS characters and are specific to Cantonese. The main reason that they are

encoded in Extension B (instead of the main block or Extension A) is because they were

processed later by the IRG. The IP feels that any rationale for excluding them (whether or not

this is because of their encoding in a Supplementary Plane), would need to be specifically

documented by the Chinese Generation Panel.

A4: we didn’t include the code points in extension B, whose coding length is 5 (from 2000 to 2A6D6), they are not fully support in IT systems in China.

If IP thinks it is necessary and insist to include extension B, we will discuss it in the coming CDNC meeting.

I would expect that such implementation issues are transient in nature and, over time, support for code points above U+FFFF will become more widespread and more reliable. For example, I expect Emoji are as popular in China as everywhere else, and IT systems will support them (not for IDNs, but for text and SMS of course). Most of them are supplementary characters in Unicode, which will put pressure on upgrading IT systems to handle code points beyond U+FFFF.

Because the proposed restriction appears to impact a specific geographic region and user community, it would seem important to get direct input from representatives of that user community on how critical these are for Cantonese and how widespread their support is in IT systems for that language. While I'm sure that the IP understands the need to be conservative, when the impact is culturally one-sided like this, I'm sure the IP would like to have the documentation that the restrictions are supported by or at least acceptable to the affected user communities. The procedure requires us to avoid the impression of bias.

I hope these answers could help. 

Please give us further advice and correction for the inadequacy of our work.

I'm sure I'm speaking for the IP as a whole when I say that we are looking forward to getting a complete draft of an LGR proposal, with all the principles and background for the decisions fully discussed.

Also, the <description> element in the XML file could be extended a bit to give a brief overview on the method used to calculated dispositions for variant labels. The idea is that there should be just enough information that a knowledgeable reader can understand the XML without having to read the full proposal, but not so much as to duplicate all the details from the latter. (The <description> would cite specific sections of the proposal document - as was done for LGR-1, which was just published).

Hope you are having a good meeting, if you are attending ICANN55

A./

Best Regards

Wang Wei

发 件人: Asmus Freytag [mailto:asmusf at ix.netcom.com] 
发 送时间: 2015年12月18日 13:52
收件人: Wang Wei <wangwei at cnic.cn <mailto:wangwei at cnic.cn> >
抄送: Sarmad Hussain <sarmad.hussain at icann.org <mailto:sarmad.hussain at icann.org> >
主题: Integration Panel Considerations on 20151115 Chinese LGR draft

Dear Wang Wei,

please find attached the Integration Panel's review of the latest draft that you shared with us, complete with a few requests for additional information. These are about the repertoire at this point, we are still reviewing the variants.

Thanks,

A./

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20160306/0e683190/attachment-0001.html>