[ChineseGP] Again,CJK Variants

Dillon, Chris c.dillon at ucl.ac.uk
Mon Sep 22 09:01:35 UTC 2014


Dear Professor Zhang,

Thank you for your email. I think it's a good idea, as you suggest, that the CGP works on the reasoning in areas such as the issue of large numbers of labels, "variant", "异体字" and "变体字" in Chinese for the next two meetings. Once the reasoning is clear, it's easy to translate, and I'm happy to help make the translation smooth English.

Two of the characters in the list you gave raise interesting issues:
1. 亊 U+4E8A is an unofficial form; I don't think it appeared in pre-War Japanese government tables. The official form was the same as the modern form: 事 U+4E8B.
2. 礦 U+7926: In Japanese, 鉱 is a 常用漢字 (in the Min. of Education's list). 礦 U+7926 is not in the 常用漢字 list and is regarded as 旧字体 (old character form). 砿 U+783F is not in the 常用漢字 list and regarded as a 拡張新字体 (expanded new form). As you know, when simplification was done in Japan only the characters in the 常用漢字 list were simplified. Those outside it exist as old forms or expanded new forms.

Regards,

Chris.
--
Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/chrisdillon>

From: chinesegp-bounces at icann.org [mailto:chinesegp-bounces at icann.org] On Behalf Of ZhangJoe
Sent: 22 September 2014 02:14
To: ChineseGP at icann.org
Subject: [ChineseGP] Again,CJK Variants
Importance: High

再说变体字,CGP-JGP
Date:2014-09-20~21
To: ICANN/CGP members
From: Zhang Zhoucai
Subject : Again,Variants Definition & Concept
Note:我现在没有时间将此文改写为英文,同时,用英文有的地方也难以表达。抱歉暂时如此。
首先,我觉得对于这个“变体字”的基本概念、基本术语还缺少沟通、缺少共识。虽然英文都是Variant,但在我们TLD 的范畴内,概念已经变了,不再是传统说法的“异体字”了。不知哪位先生从何时起翻译为“变体字”了,Chris 建议英文Lexical  Alternate,也是比较接近的。这种表达是明智的,避免了很多政治性问题,更加符合我们项目的实际了。
在讨论Mapping, Allocattable/Block之前,建议我们彻底把Chinese Variants/CJK Variants的定义概念讨论清楚。
因此,建议下次双周电话会议工作语言用中文,起码先在说中文的组员中形成共识。我写的那个CJK Variant定义稿建议可以当靶子讨论。
这不是Variant广义/侠义的问题,而是“转义”了:对中文而言,本来简体/繁体谁也不认为是Variant,大陆不以为然,港台也不以为然,现在都同意了是“变体”;有的异体字,两岸就是互为正异的,在TLD 没必要争论孰正孰异,比如决U+51B3和決U+6C7A,反正对应了,音和义都一样,互为变体字。扩大到CJK,主要是C-J,跨语境了,过去从来没人定义过跨语境的异体字,但客观上是存在着对应关系的,只不过现在靠“同音+同义”判断不行了,只能靠“同义”+“同源同用”来界定。这样的CJK变体字可能会有很多,但从TLD的角度,我们可以只选择那些常用的、高频的,还有易混淆的汉字编成CJK 变体字组。
下面是一些常用CJK Variants的实例。
Hans-----HanT-----Jpan
事U+4E8B事U+4E8B亊U+4E8A
处U+5904處U+8655処U+51E6
壤U+58E4壤U+58E4壌U+58CC
对U+5BF9對U+5C0D対U+5BFE
专U+4E13專U+5C08専U+5C02
恼U+607C惱U+60F1悩U+60A9
插U+63D2插U+63D2挿U+633F
曾U+66FE曾U+66FE曽U+66FD
樱U+6A31櫻U+6AFB桜U+685C
栈U+6808棧U+68E7桟U+685F
气U+6C14氣U+6C23気U+6C17
焰U+7130焰U+7130焔U+7114
烧U+70E7燒U+71D2焼U+713C
兽U+517D獸U+7378獣U+7363
瘦U+7626瘦U+7626痩U+75E9
发U+53D1發U+767C発U+767A
矿U+77FF礦U+7926砿U+783F
团U+56E2團U+5718団U+56E3
脑U+8111腦U+8166脳U+8133
图U+56FE圖U+5716図U+56F3
厅U+5385廳U+5EF3庁U+5E81
稳U+7A33穩U+7A69穏U+7A4F

在中文变体字组中引入日文变体字(主要是日文新字体字)有利也有弊。弊端是带来了异体字组的复杂性,但我们选择高频常用字来关联应当可以减少复杂性;有利的方面是,第一,扩大了TLD Label的国际性,第二,也可以避免有意无意或恶意的域名混淆。
以图书馆为例:
图书馆
圖書館
図書館
圗  舘
啚
其中 图有5个Variants(不算日文図是4个),书有两个Variants,馆有3个Variants(从文字学角度远比此数字多);
可能的Label组合不算日文有4*2*3=24个,算上日文有 5*2*3=30个。根据各国家地区的实际应用和频度,刨除“ill-formed string”,实际Allocatable Labels可能只有这三个:图书馆for hans,圖書館 for hant ,図書館for jpan。
同样的例子,
医学会for hans  醫學會for hant  医学会 for jpan

音乐艺术 for hans 音樂藝術for hant  音楽芸術for jpan

开发for hans 開發for hant 開発 for jpan
类似这样的CJK Variants 衍生的CJK Variant Labels,视为一个TLD entity好呢?还是多个entity  好?这里的利弊需要在CGP–JGP讨论,也需要ICANN的总策略来决定。
我们可以注意到,在跨语境的情形,Varian 的对称性symmetry和可传递性transivity 在绝大多数情况是成立的。
个别的非对称、不可传递的问题,多半与上下文有关,这种情况是极其个别的,如发-發-発 与髪的关系,需要人为处理,不应总为这种个例影响总体的规则的讨论。



发件人: chinesegp-bounces at icann.org [mailto:chinesegp-bounces at icann.org] 代表 ZhangJoe
发送时间: 2014年9月1日 18:53
收件人: 'Wang Wei'; ChineseGP at icann.org
主题: [ChineseGP] Proposed CJK Variants Definition
重要性: 高

Dear Colleagues,
Enclosed please find the proposed CJK Variants Definition for discussion.
Regards,

Zhang Zhoucai

发件人: chinesegp-bounces at icann.org<mailto:chinesegp-bounces at icann.org> [mailto:chinesegp-bounces at icann.org] 代表 Wang Wei
发送时间: 2014年8月25日 15:32
收件人: ChineseGP at icann.org<mailto:ChineseGP at icann.org>
主题: [ChineseGP] Memo of CGP Fortnightly Meeting 21st August


Dear colleagues and Hanchuan



         Thanks for attending the second fortnightly meeting.

We need to fulfill  the following tasks before the next meeting.



1)      Discuss CGP repertoire slimming plan within the CGP and with CDNC.

2)      Redefine the “variant”. It would be appreciated that Prof. Zhang give a general and compatible definition of “variant”, which can be well suitable for C, J and K, helping J and K understand the situation better.

3)      Provide the final version to IP.

4)      Provide some coordination examples on variants mapping to Japan and Korea communities (mainly to J). Qichao and Zhiwei will figure out these examples and let Kenny share it to JGP.



Please feel free to let me know if there is anything I forget to note.



Regards

Wang Wei



-----Original Message-----
From: chinesegp-bounces at icann.org<mailto:chinesegp-bounces at icann.org> [mailto:chinesegp-bounces at icann.org] On Behalf Of Wang Wei
Sent: 08 August 2014 02:46
To: ChineseGP at icann.org<mailto:ChineseGP at icann.org>
Subject: [ChineseGP] Memo of CGP Fortnightly Meeting 7th August



Dear Hanchuan and Colleagues



                Thank you for attending the meeting.

                Here are some tasks we need to do before the next fortnightly meeting:



                1)  Improve the proposal, specify the language, repertoire and etc. (I will send out a updated document next week)

                2)  Provide some coordination examples on variants mapping to Japan and Korea community, to help all have a better understanding of coordination principles.

                3)            Select Kenny and other representatives to KGP meeting next Tuesday



                Please feel free to let me know if there is anything I forget to mention.



Regards

Wang Wei


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20140922/6dea0d28/attachment-0001.html>


More information about the ChineseGP mailing list