[UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese

Mark Svancarek marksv at microsoft.com
Tue Nov 7 00:26:38 UTC 2017


Haha, I we added that in after discussing open dot for so long 😉 definitely supports your conclusion that the list might grow.

(Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.)


From: Jim DeLaHunt [mailto:jfrom.uasg at jdlh.com]
Sent: Monday, November 6, 2017 4:24 PM
To: Mark Svancarek <marksv at microsoft.com>; ua-discuss at icann.org
Cc: Simon Cousins <simon at allegravita.com>
Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese


Mark:

Thank you for these citations. I will make a note of them.

So, RFC5895 "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008"<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rfc-editor.org%2Frfc%2Frfc5895.txt&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=3Hr0mMmmxpgY1tIdPefQxToYwhRifWwfiRchY8TCIAY%3D&reserved=0>, section 2 "The General Procedure", says,

4. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002) can be mapped to the FULL STOP before label separation occurs. There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly. This step was chosen because some input mechanisms do not allow the user to easily enter proper label separators.  Only the IDEOGRAPHIC FULL STOP character (U+3002) is added in this mapping because the authors have not fully investigated the applicability of other characters and the environments where they should and should not be considered domain name label separators.

And UTS #46 "Unicode IDNA Compatibility Processing"<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2Freports%2Ftr46%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=G8ovQyvvZLu9xbGiS%2FGx5qJWA%2BcPT0uXpwu51WTnJgs%3D&reserved=0>, section 2.3 "Notation", says,

In this document, a label is a substring of a domain name. That substring is bounded on both sides by either the start or the end of the string, or any of the following characters, called label-separators:

  1.  U+002E ( . ) FULL STOP
  2.  U+FF0E ( . ) FULLWIDTH FULL STOP
  3.  U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
  4.  U+FF61 ( 。 ) HALFWIDTH IDEOGRAPHIC FULL STOP

From my point of view as a UASG explainer, this is good an sufficient grounding for a recommendation that apps treat U+3002 as a label separator. I would go further and warn people that this list might grow; that U+FF0E and U+FF61 may be on their way.

It would good to have a footnote somewhere linking our recommendation to those documents. I see UASG007 cites RFC5895 in general (I don't see a citation to UTS #46 in UASG007).  Actually, this probably belongs more in a wiki somewhere, a list of the things UASG recommends and why we recommend them. This will help us as we bring more UA explainers up to speed.

(Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.)

Thank you for the citations, Mark.  Best regards,
      —Jim DeLaHunt, Vancouver, Canada

On 2017-11-06 15:42, Mark Svancarek wrote:
Correction, UTS#46 and RFC5895

From: Mark Svancarek
Sent: Monday, November 6, 2017 3:19 PM
To: 'Simon Cousins' <simon at allegravita.com><mailto:simon at allegravita.com>; Jim DeLaHunt <jfrom.uasg at jdlh.com><mailto:jfrom.uasg at jdlh.com>; ua-discuss at icann.org<mailto:ua-discuss at icann.org>
Subject: RE: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese

I advocated open dot equivalency based on UTS#46 when writing UASG007.

From: ua-discuss-bounces at icann.org<mailto:ua-discuss-bounces at icann.org> [mailto:ua-discuss-bounces at icann.org] On Behalf Of Simon Cousins
Sent: Friday, November 3, 2017 1:23 PM
To: Jim DeLaHunt <jfrom.uasg at jdlh.com<mailto:jfrom.uasg at jdlh.com>>; ua-discuss at icann.org<mailto:ua-discuss at icann.org>
Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese

When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot.

[cid:image001.png at 01D3571C.04573040]
I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise.

[cid:image002.png at 01D3571C.04573040]

Best, S.

Simon Cousins | 夏明
CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司
USA: 32 W 39 St 4th Floor, New York NY 10018
China: 北京市海淀区苏州街55号3层01-A509
simon at allegravita.com<mailto:simon at allegravita.com> | +1 347 850-3360 | +86 139 1010-5401




From: ua-discuss-bounces at icann.org<mailto:ua-discuss-bounces at icann.org> [mailto:ua-discuss-bounces at icann.org] On Behalf Of Jim DeLaHunt
Sent: Friday, November 3, 2017 4:17 PM
To: ua-discuss at icann.org<mailto:ua-discuss at icann.org>
Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese


Don:

> Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?

Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FChinese_punctuation%23Punctuation_marks&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=ZFgnt3TdVzU1W80MyIs%2B6tzai2ZUAxfsr2%2FjMFArn54%3D&reserved=0>>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJapanese_punctuation%23Full_stop&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=yPIWa2HFbh4iltGvScYIEyhee%2FBjESGwSRJ3YfcZYqo%3D&reserved=0>>.

In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text.

However.

In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation.  Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop.

In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text?

And,

> UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”

Do we know what the source is for this expectation?  Did it come from perspectives informed about Chinese and Japanese culture?  Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation?   Or did we at UASG make a guess at the Chinese and Japanese perspective?

If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use.

Best regards,

              —Jim DeLaHunt, Vancouver, Canada


On 2017-11-02 17:10, Don Hollander wrote:
G’day:

The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop.

The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
We found that some browsers do this.

As we go through the Linkification review, we’re not seeing this happen for social media communications apps.

Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?

Don



Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander





--

    --Jim DeLaHunt, jdlh at jdlh.com<mailto:jdlh at jdlh.com>     http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=wB7AvvGI%2Bc7vWjHrf0WPk3MxR3O3gutJTrscokifXSo%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=0Kh%2Bbj0hMpXvyjX2ACfXvLfWTWH0oAIjz1G6RtU15Oc%3D&reserved=0>)

      multilingual websites consultant



      355-1027 Davie St, Vancouver BC V6E 4L2, Canada

         Canada mobile +1-604-376-8953



--

    --Jim DeLaHunt, jdlh at jdlh.com<mailto:jdlh at jdlh.com>     http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=CgzaZG%2BG3TO3MjuR%2Fwkfdpz43RKeKri4qfVXWArg7uU%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=fqjuw8WROKPtU93PE5i18NiZYMZhikTNv441RaPVk4k%3D&reserved=0>)

      multilingual websites consultant



      355-1027 Davie St, Vancouver BC V6E 4L2, Canada

         Canada mobile +1-604-376-8953
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20171107/9470abdd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1106 bytes
Desc: image001.png
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20171107/9470abdd/image001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 1216 bytes
Desc: image002.png
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20171107/9470abdd/image002.png>


More information about the UA-discuss mailing list