[ChineseGP] [Koreangp] Proposed Action items before Seoul meeting

Dillon, Chris c.dillon at ucl.ac.uk
Wed Apr 29 08:04:41 UTC 2015


Dear Professor Zhang,

Thank you for your summary of issues that we will face in the mid to long term.

¡¤         Seems, fortunately, to be a smaller problem than I would have expected, but there seems to be no solution.

¡¤         I suspect that a solution for (3) could be based on what happens with labels containing the sorts of variants listed in (2). However, (3) would be a new departure for labels; for example, such situations as English www.pictures.com<http://www.pictures.com>, www.photo.com<http://www.photo.com>, German www.foto.com<http://www.foto.com> have traditionally been regarded as probably being different sites (I haven¡¯t checked this example ¨C if I¡¯m unlucky perhaps some of them are the same company!).

¡¤         (4) may well be desirable but difficult for all panels.

2. I have fallen into a trap as I have no list of variant issues dealt with at font level. I¡¯m sure I¡¯ve seen such a list (I did find a short list which I have attached) and would be grateful if any colleague has a better list. In this case the Chinese Simsun font has one dot and the Korean Malgun font has two.

I do hope we meet in Seoul.

Regards,

Chris.
--
Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/chrisdillon>

From: Joe Zhang [mailto:joezhang43 at hotmail.com]
Sent: 28 April 2015 04:28
To: Dillon, Chris; hotta at jprs.co.jp; KoreanGP at icann.org; ChineseGP at icann.org; JapaneseGP at icann.org
Subject: ´ð¸´: [ChineseGP] [Koreangp] Proposed Action items before Seoul meeting

Dear Chris,
I will deeply study the ToDo list, and your comments, which may take some more time.
Before finishing my homework, I would like to make two points for consideration:

1.       Up to now, what we are working on is limited to character (and its variants)- based label generation, we have not defined string/word based rule yet. Especial, we have not define the language and/or context sensitive string/word yet. It seems quite complicated, but is inevitable to work on it in the next stage. For examples,

(1)     Exception treatment, such as ·¢-°l-°k and·¢/ óŒ

(2)     C-J sensitive : ±õ-žI-亣¬ÒÕ-ˇ-Ü¿£¬

(3)     ͬÒå´Ê Synonym as label£¨word based variant string£©: ”µÎ»»¯-Êý×Ö»¯ £¬¼¤¹â-ÀØÉ䣬±ãµ±-Û͵±

(4)     Rules like: No simplified/traditional/variant mixing in labels. (important for CGP)

2.       Some visible differences amongst Hanzi/Hanja/Kanji would be so-called Z-difference in Unicode/UniHan, say,

¡°I have also been looking for differences between  Traditional Chinese characters and Korean hanja. So far I have found one: characters with the progression radical tend to start with two dots in hanja: ÌÓ and only one in Traditional Chinese: ÌÓ.¡±
 Actually, both are encoded at U+9003, but rendered in different fonts.

If  I may participate the coming Seoul meeting, we may discuss in detail.
Looking forward to seeing you there.
Regards,


Zhang

·¢¼þÈË: chinesegp-bounces at icann.org<mailto:chinesegp-bounces at icann.org> [mailto:chinesegp-bounces at icann.org] ´ú±í Dillon, Chris
·¢ËÍʱ¼ä: 2015Äê4ÔÂ27ÈÕ 20:43
ÊÕ¼þÈË: hotta at jprs.co.jp<mailto:hotta at jprs.co.jp>; KoreanGP at icann.org<mailto:KoreanGP at icann.org>; ChineseGP at icann.org<mailto:ChineseGP at icann.org>; JapaneseGP at icann.org<mailto:JapaneseGP at icann.org>
Ö÷Ìâ: Re: [ChineseGP] [Koreangp] Proposed Action items before Seoul meeting


Dear colleagues,



Here are some comments, as requested by Hiro.

I reckon I have now caught up after missing the Dallas meeting.



I believe Mr Yoneya¡¯s algorithm will work.



I have spent some amount of time looking for exceptions to various statements in it e.g. Slide 5 ¡°there exists at least one identical ideograph¡±. (No exception found.)

It is fortunate that ™C ¡¯machine¡¯ / »ú ¡¯desk¡¯ and °k ¡¯send¡¯ / óŠ ¡®hair¡¯ seem to be the only cases where (at least commonly used) different characters in Japanese are the same character in Simplified Chinese. (I haven¡¯t spent as much time with looking for characters that are separate in Chinese but brought together in Japanese. ÛÍ replaces at least three characters in Chinese, but I think none are common. I can imagine a . Û͵± TLD, so that may be good news for bento companies.)



I note the options for the disposition of variants not defined in the LGR-1s (Slide 6), i.e.:

- Blocked if the variant is not in the LGR-1 / Allocatable otherwise

- Blocked if the variant is not in the LGR-1 / Inherit its original disposition in the LGR-1 (Allocatable/Simp/Trad/Both)



Both case studies are most interesting. I note that there are some labels, e.g. ÓèÔ°  (with the first character, I think used only in Japan and the second only in Simplified Chinese) that perhaps we would prefer not to see allocatable in the ideal world, but suspect that blocking them would involve adding horrendous complexity.



I note that it is difficult to understand Japanese LGR-1, as the characters are not visible.



I have also been looking for differences between Traditional Chinese characters and Korean hanja. So far I have found one: characters with the progression radical tend to start with two dots in hanja: ÌÓ and only one in Traditional Chinese: ÌÓ.



Looking forward to Seoul,



Regards,



Chris.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20150429/55810935/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Chinese and Japanese variants chinese-stackexchange-com.png
Type: image/png
Size: 2702 bytes
Desc: Chinese and Japanese variants chinese-stackexchange-com.png
URL: <http://mm.icann.org/pipermail/chinesegp/attachments/20150429/55810935/ChineseandJapanesevariantschinese-stackexchange-com-0001.png>


More information about the ChineseGP mailing list