[Idngwg] Minutes of meeting and AIs from 1 Dec.

Mon Dec 5 17:19:14 UTC 2016

Dear all,

I will look through Sarmad's report below, and it might affect my text, but since we have meeting soon I send it to you directly, and I might update it later.

<AI2>
An analysis of homoglyphs (*) MUST be executed on the IDN table or tables (LGR or text table) for a specific TLD (or DNS zone). The analysis MUST be done both within each IDN table and between all IDN tables for the TLD (or zone). Homoglyphs can be within a single Unicode script or between different Unicode scripts. The analysis done for the root zone can be a source to discover possible homoglyphs, but it should be noted that character set has been limited to letters and characters equivalent to letters. E.g. digits and punctuations are not permitted in the root zone and therefore excluded from that analysis. Well-known homoglyphs in different Unicode scripts are found in Armenian, Cyrillic, Greek, and Latin scripts.

If homoglyphs are found, harmonization MUST be performed. The goal of the harmonization is to acheive system where it is not possible to register two domain names, under the same TLD (domain), that are homographs of each other. This is to reach a workable and secure system.

There are different ways to handle the possible homoglyphs, and decision of which way to go is up to the register of the actual TLD (domain). One possibility is to exclude code points so that there are no homographs. If the state homoglyphs is only reached when the code point is in a certain position of the label or together with some other code point, then contextual rules can be used to prohibit such positions. The third technique is to use blocking variant rules. If the homoglyphs are from different IDN tables, then the variant rules must operate on all the IDN tables.

*) "In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar." (<https://en.wikipedia.org/wiki/Homoglyph>)

**) Reference at Unicode Consortium
</AI2>

---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899
https://www.iis.se/en/

From: <idngwg-bounces at icann.org> on behalf of Sarmad Hussain <sarmad.hussain at icann.org>
Date: Sunday 4 December 2016 at 03:19
To: idngwg <idngwg at icann.org>
Subject: Re: [Idngwg] Minutes of meeting and AIs from 1 Dec.

Dear All,

Regarding the AI1 below, I have inquired about what has been published at it seems there are some sources, which we need to discuss in case we need to refer to them.  Here are the options:

1.       Some work has come out of Unicode’s Technical Standard 39<http://unicode.org/reports/tr39/>.  There are two files available here<ftp://ftp.unicode.org/Public/security/revision-02>:

a.       Confusables.txt<ftp://ftp.unicode.org/Public/security/revision-02/confusables.txt> is the larger set, which has the data we need, but much more data, as the definition of confusables is perhaps broader than the strict homoglyphs we may want.

b.       Intentional.txt<ftp://ftp.unicode.org/Public/security/revision-02/intentional.txt> is perhaps the subset which we may be looking for, though the list seems ominously short, and would need a more thorough review (which I am happy to perform after our discussion).

2.       RFC 5992 has data in its appendices, which also lists confusable code points, but is not restricted to homoglyphs.

3.       The Root Zone LGR proposals<https://www.icann.org/resources/pages/lgr-proposals-2015-12-01-en> will also provide a reasonably comprehensive list (though we already discussed that these may still be limited for the second level).  For example, see the list in Section 6 of the Armenian script proposal<https://www.icann.org/en/system/files/files/armenian-lgr-proposal-05nov15-en.pdf> already published for a subset.  Latin, Greek and Cyrillic GPs are also working on such lists, so we will have a multi-script community confirmation, once we have the other proposals.
However, it is interesting to note that this work remains largely limited to Cyrillic, Green and Latin homoglyphs.  Analysis is needed for other scripts.  We have recently concluded the analysis for Lao, Khmer and Thai scripts as part of the Root Zone LGR work (and the communities have not found significant homoglyph contexts – e.g. see the Khmer and Lao proposals published (Thai on its way soon)).  However, there may be some work needed for Neo-Brahmi scripts.

Regards,
Sarmad

From: idngwg-bounces at icann.org [mailto:idngwg-bounces at icann.org] On Behalf Of Sarmad Hussain
Sent: Friday, December 02, 2016 2:00 PM
To: idngwg at icann.org
Subject: [Idngwg] Minutes of meeting and AIs from 1 Dec.

Dear All,

Please find attached summary of the meeting of the WG on 1 Dec.  Please let me know if there are any changes or suggestions.

The meeting had the following AIs:

S. No.

Action Items

Owner

1

Find out if there are existing lists of homoglyphs which can be referenced

SH

2

Divide new recommendation on harmonization of LGRs into three recommendations, explaining harmonization, address cross-script homoglyphic variants, and address within-script variants caused by two different LGRs

MD

3

Write a new recommendation on how to address existing registrations which are not harmonized, giving flexibility to registries

KF

4

Re-write the recommendation on automatic activation based on the current input for further discussion

EC

The next meeting is schedule for 8 Dec. 11am UTC.

The attached notes of the meeting and the recording of the meeting are available at the IDNGWG wiki page at https://community.icann.org/display/IDN/IDN+Implementation+Guidelines.

Regards,
Sarmad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/idngwg/attachments/20161205/5b56fa64/attachment-0001.html>