[UA-discuss] Guidelines on linkification for URLs with non-ASCII characters

Tan Tanaka, Dennis dtantanaka at verisign.com
Tue Jun 6 17:48:37 UTC 2023

Hi Maria,

Thanks for addressing the observation. You are correct that in a single script context script-mixing does not make sense, but since this paper may be intended for a wider audience then a clarification is warranted. The updated version does that so thank you. Just one minor edit, the “more information” references in 1.4.d should be 3.3 and 3.4, IDN guidelines and Unicode Report 39 respectively.


From: Maria Kolesnikova <masha at cctld.ru>
Date: Tuesday, June 6, 2023 at 6:37 AM
To: Dennis Tan Tanaka <dtantanaka at verisign.com>, "UA-discuss at icann.org" <ua-discuss at icann.org>
Subject: [EXTERNAL] RE: [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters

Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Dear All, Dennis,

Thank you for all the comments that we have received on the Guidelines on linkification for URLs with non-ASCII characters so far. We really appreciate it!

Based on your input the Russian working group have prepared the updated version of the guidelines taking into account existence of local languages where script mixing is allowed by default.
Initially we focused on mostly Cyrillic script and local languages based on it, that’s why script mixing issue was not considered in this particular aspect.

Here are both versions of the document for your convenience – redlined and updated.
The script mixing paragraph 1.4 was enriched by point (d) plus some other additions.

We hope our work will be helpful for the whole community. In case UASG members consider this document valuable, you are welcome to use the proposed guidelines for further development.

With warm wishes,

Maria Kolesnikova

From: Tan Tanaka, Dennis <dtantanaka at verisign.com>
Sent: Thursday, May 4, 2023 4:54 PM
To: masha at cctld.ru; ua-discuss at icann.org
Subject: Re: [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters

Maria, thanks for sharing this.

One observation on 1.4.a (script mixing). There are known writing systems, namely Japanese, that will mix Unicode scripts in a single label. These scripts are Hiragana, Katakana and Han. Romaji (Latin) might be used in conjunction with the others too.

Here are a few resources that touch on that:

ICANN IDN guidelines 4.1 https://www.icann.org/en/system/files/files/idn-guidelines-22sep22-en.pdf<https://secure-web.cisco.com/1Isv9F-8uloVh4WychzF5bR_5dRjc6j2fTsNjx1bnX1MWTl7e6p8v6fB73jHlAYPju4pEKXp8eRkh7NWOs30ZfQupW7aqjWO7Zs2Bcgzvw-k7strGaaPf4ntmXzHHDtH9-kxTPTQUldw8TSimcxPR34ygIU19gHKkX6uqv1nDBHf0cQxEe_btXsah97kkKxm8i6yM8zmiS2Qbv7N4MvxPlK5iNf8EQ_19406WvT74npG1EAJd5pzhkfuFEOikPg45kDQh5ng2O6Vc7nRDSj4hHEcRAoV5_FhOtYoDq4c3bGAWKRqowquzBFwAgj0Yk4f6/https%3A%2F%2Fwww.icann.org%2Fen%2Fsystem%2Ffiles%2Ffiles%2Fidn-guidelines-22sep22-en.pdf>
Unicode Technical Standard 39 (Restriction-Level Detection, Highly Restrictive) http://www.unicode.org/reports/tr39/#Restriction_Level_Detection<http://secure-web.cisco.com/1wmbI-dpaPGkVUuYyIBcud-eCA9uzJF5hkLinz_LvvDYKf0c1lRLg8gRaH52-eKX2qh3BTbmjaOcuAMjk_HvOgteorqxOh10bioRblwK7Lg1rkuAgHNlSXNjHnHU4T5uMsOn55fcjOeKDDL_WJ_K8iL69Fd5Y07YKvYimgAaw1uyqw6w4hp471WO7EdYhQJehJb6Ng2wgK43kzMxaseBwusuHdRhW5U3-7pESKy89YTSN6fEPJoOs5PeWVxDl0wVZa4Vmzqt4mPst1WqibPzgjH8s-_kVAVnFvCuD9h6x7az7OLdcq_edkEirJEn7evrL/http%3A%2F%2Fwww.unicode.org%2Freports%2Ftr39%2F%23Restriction_Level_Detection>

Hope this is useful.


From: UA-discuss <ua-discuss-bounces at icann.org<mailto:ua-discuss-bounces at icann.org>> on behalf of "UA-discuss at icann.org<mailto:UA-discuss at icann.org>" <ua-discuss at icann.org<mailto:ua-discuss at icann.org>>
Reply-To: Maria Kolesnikova <masha at cctld.ru<mailto:masha at cctld.ru>>
Date: Thursday, May 4, 2023 at 7:15 AM
To: "UA-discuss at icann.org<mailto:UA-discuss at icann.org>" <ua-discuss at icann.org<mailto:ua-discuss at icann.org>>
Subject: [EXTERNAL] [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters

Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Dear all,

We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently.
The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name.
Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation.
If you have any comments on the document, we would be glad to hear them.

With best regards,
Maria Kolesnikova

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ua-discuss/attachments/20230606/7e71f60c/attachment.html>

More information about the UA-discuss mailing list