[ChineseGP] Issues raised in the CJK meeting today

Tue Jul 1 07:25:39 UTC 2014

Yepp, I agree.
In fact, it is unnecessary to use the same principle for the root and second (or lower) levels.

2014-07-01 

Zhiwei Yan 

发件人： 齐超 
发送时间： 2014-07-01  15:16:14 
收件人： Asmus Freytag 
抄送： ChineseGP at icann.org 
主题： Re: [ChineseGP] Issues raised in the CJK meeting today 

Hello Asmus,

   I appreciate your elaborate explanation. 

From your mail and document of variant rules, the XML model can contain all variants from RFC3743,  also can define the variant rules (registration rules), which make it look like a little more complicated than plain text.

As for the overlaps from those scripts, there ought to be one rule of variant mapping for a certain code point, like 'U+4E7E;U+....', but several variant rules for it from hani, kanji or hanja in root zone LGR.

According to the experience of registration for second level tld, variant mapping and variant rules applied in '.ASIA', '.CN' or '.JP'.  The LGR would be fulfilled soon if the variant mapping is fully accepted .

I think it is more difficult to extend the root LGR mappings and rules to second level tld, which may modify the regular registration practice now.

   Thanks.

齐超 via foxmail

发件人： Asmus Freytag
发送时间： 2014年7月1日(星期二) 上午2:29
收件人： 齐超
抄送： chinesegp-bounces; integrationpanel
主题： Re: Issues raised in the CJK meeting today
On 6/30/2014 10:16 AM, 齐超 wrote:

Hello, Asmus and CGP Members, 

  I am qichao from CNNIC and my job is registration support for Chinese Domain Name based on RFC 3743. 

I was not in ICANN London, and I wonder what are the issues between LGR XML and RFC 3743, because 

there is an appendix B in http://tools.ietf.org/html/draft-davies-idntables-07#appendix-B  on how to translate RFC 3743 to XML, 

So the issue maybe not a technical problem, but a question from different scripts, I guess.

Dear Qichao,

I would be interested in confirming that the method in appendix B is correct. Just to make sure that we do not have a technical issue.

About the scripts question: 

  For example,  the appendix list a code from .ASIA in draft: 

U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6

and the U+4E7E/U+4E81 from CDNC are basically same:
U+4E7E(0);U+4E7E(86),U+5E72(86),U+4E7E(886);U+4E7E(0),U+4E81(0),U+5E72(0),U+5E79(0),U+69A6(0),U+6F27(0);
U+4E81(0);U+5E72(86),U+4E7E(886);U+4E7E(0),U+4E81(0),U+5E72(0),U+5E79(0),U+69A6(0),U+6F27(0);

but in JPRS' script, U+4E7E has no variants and even no U+4E81 exists:
4E7E(2,3);4E7E(2,3);	# 20-05, CJK UNIFIED IDEOGRAPH-4E7E

The code points can be defined in multiple scripts for many second level Tld, but only one is required in root.

There are several definitions of "script", which can make things confusing.

One definition is that used by Unicode (and many linguists). With that definition, Japanese uses three scripts, even four if you count the use of Latin letters and digits.

Another definition is that of ISO 15294. In that system, there  is a single code for Japanese (Jpan).

The LGR Root Zone project uses the second definition of script (as defined in the "Procedure" document).

Because of that, the root will have an overlap between the LGRs for "und-Hani", "und-Jpan" and potentially "und-Kore". Each of these LGRs will need to be integrated for use with the root.

So I think the definition of a code's variants and its mapping for root is a 'necessary' constraints, some unnecessary difference of variant rule will appear after that.

Because the root is a shared resource, if Hani defines an allocatable (a) variant X a->Y and Jpan defines a code point X, but either doesn't define Y or doesn't define any variant relation, then the Integration Panel would have to create the blocked (x) variant

    X x-> Y

for use in registering labels that are tagged with "Jpan".

The reason is that an application for XXX (whether tagged as Jpan or Hani) must prevent labels XXY, XYY, XYX, YYX and YYY from being delegated to a different applicant.

This situation does not exist on the second level, unless the second level also supports both Hani and Jpan.

The SSAC has recommended to use the root zone LGR for the second level. This cannot be understood as a strict requirement, because there are real differences between the levels. My personal understanding of this recommendation is that what is intended is to reduce the differences as much as practical.

Just to give one example: the root, for non IDN labels, does not allow hyphen or digits. So, there is a difference from the second and lower levels already. I don't think, SSAC intended to say that the second level should stop supporting hyphens and digits.

Many of the rules in IDNA 2008 are automatically satisfied in the root, because digits, hyphens, and mixing with latin letters are not supported, for example the bidi rules, or the rules about mixing the two types of Arabic digits.

Again, we see that the second level and the root cannot be identical.

So, as a result, there needs to be a discussion about what features of the root zone LGR can be usefully extended to the second level, to make the second level more predictable.

A./

I am not a professional of CJK languages but I hope I can give some clues.

Thanks.

                Qichao

------------------ 原始邮件 ------------------
>From: Asmus Freytag <asmusf at ix.netcom.com>
>Reply-To: 
>To: LGR Mailing List <lgr at icann.org>
>Subject: [ChineseGP] [lgr] Issues raised in the CJK meeting today
>Date: Tue, 24 Jun 2014 10:09:10 -0700
>

All,

Following up on issues raised in the CJK meeting in London today.

XML format and RFC 3743.

The XML format should allow to express some policies that are going beyond the root policies that are assumed in the "Procedure". Therefore, the format should be able to capture the full RFC 3743.

I would appreciate if someone knowledgeable in RFC 3743 could look over the latest spec as well as the "Variant Rules" document in the LGR project wiki to let me know if there are any limitations of the XML format.

Variant Rules
https://community.icann.org/download/attachments/43989034/Variant%20Rules.pdf?version=1&modificationDate=1396991883000&api=v2

Latest XML spec:
http://tools.ietf.org/html/draft-davies-idntables-07

A separate issue related to the question of possible inconsistencies between 2nd level and root. The root is a shared resource, which leads to some constraints not relevant for the second level. Such constraints might be called "necessary". What I would be interested in is examples (even hypothetical/imaginary ones) of "unnecessary" or "random" differences between the levels.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20140701/bfc7e446/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yanzhiwei.vcf
Type: text/x-vcard
Size: 138 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20140701/bfc7e446/yanzhiwei-0001.vcf>