[Latingp] Draft variants principles document - Dotless I

Tan Tanaka, Dennis dtantanaka at verisign.com
Fri Apr 6 19:58:26 UTC 2018


During the Brussels workshop we talked about doing visual tests, etc. We asked the IP whether such tests would be acceptable to support the definition of variant sets. They replied, no. Visual confusion if out of scope. We discussed it, accepted that reality and moved on.

About the Japanese case. What they told the Japanese GP is consistent what IP told us in Brussels regarding cross-script homoglyphs. So I see no problem here.

Dotless I case: we were requested to analyze this case due to compatibility issues vis-à-vis IDNA2003, not on grounds of visual confusion. So, all I’m asking is to focus on the ask.

-Dennis

From: Mats Dufberg <mats.dufberg at iis.se>
Date: Friday, April 6, 2018 at 5:21 AM
To: Dennis Tan Tanaka <dtantanaka at verisign.com>, Bill Jouris <bill.jouris at insidethestack.com>
Cc: Mirjana Tasić <mirjana.tasic at rnids.rs>, Latin GP <latingp at icann.org>, Michael Bauland <michael.bauland at knipp.de>
Subject: [EXTERNAL] Re: [Latingp] Draft variants principles document - Dotless I

Dennis,

I believe that a visual test of running text is relevant for Internet identifiers. Internet identifiers are often found in running text. What Bills test shows (which is of no surprise to us) is that people have a hard time noticing minute differences of no relevance.

I think we all agree that homoglyph pairs should be handled by variant rules. And that is because they are visual similar to the extreme.

At the ICANN 61 meeting, the IP actually proposed to the Japanese GP that some pairs of characters from different Unicode scripts (but from a Japanese perspective belonging to the Japanese script) be treated as variants even though those character pairs are not homoglyphs.

Yes, SMALL LETTER I and SMALL LETTER DOTLESS I are interesting because of the complexity of up-casing and down-casing in different locales. But if we are allowed to take upper case into consideration, there are other interesting cases. The upper case of LATIN SMALL LETTER D WITH STROKE (U+0111), LATIN SMALL LETTER ETH (U+00F0) and LATIN SMALL LETTER D WITH TAIL (U+0256) are homoglyphs which opens up for injections of "false" domains.


Mats

---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899
https://www.iis.se/en/

From: "Tan Tanaka, Dennis" <dtantanaka at verisign.com>
Date: Thursday 5 April 2018 at 20:27
To: Bill Jouris <bill.jouris at insidethestack.com>, Mats Dufberg <mats.dufberg at iis.se>, Michael Bauland <michael.bauland at knipp.de>
Cc: Mirjana Tasić <mirjana.tasic at rnids.rs>, ICANN Latin GP <latingp at icann.org>
Subject: Re: [Latingp] Draft variants principles document - Dotless I

Bill, thanks for this.

I have to question, though, the relevancy of your experiment. Is a visual test of running test relevant for internet identifiers? And on the subject of visual similarity, I believe this has been discussed extensively and this panel has agreed that visual similarity is outside the scope of our work.

The case of the “small dotless I” and “small letter I” is interesting because of the treatment under different locale settings. The focus of our analysis should be on that, taking into account the needs and expectations of different internet users, including the Turkish community.

-Dennis

From: Bill Jouris <bill.jouris at insidethestack.com>
Reply-To: Bill Jouris <bill.jouris at insidethestack.com>
Date: Thursday, April 5, 2018 at 2:05 PM
To: Dennis Tan Tanaka <dtantanaka at verisign.com>, Mats Dufberg <mats.dufberg at iis.se>, Michael Bauland <michael.bauland at knipp.de>
Cc: Mirjana Tasić <mirjana.tasic at rnids.rs>
Subject: [EXTERNAL] Re: [Latingp] Draft variants principles document - Dotless I

I've given some more thought to the Dotless I question.  It occurred to me that there are actually two approaches to the question: analysis and experiment.  So I ran an experiment.  Here are the results:







A dozen subjects were tested.  All were well-educated native speakers of English.  Approximately 1/3 are involved in IT, but none are network experts and none are involved in ICANN.







The subjects were given a paragraph to read (on the subject of variants).  In one word, the lower case I was replaced by a dotless I.  The number of subjects who noticed when reading the paragraph: Zero.

The subjects were then told that the substitution had been made, that it was in the first sentence, and shown the dotless I for information.  Half managed to locate the substitution in 1 or 2 re-reads of the sentence; half took 3 or more tries to spot the substitution – even though they knew what the substitution was and knew that it was there to find.  In short, misreading is the expected result of a substitution.

Accordingly, it is again recommended that U0069 and U0131 be determined to be blocked variants




Happily, the results are the same as the analysis.  I have updated the document with this information.

Bill Jouris
Inside Products
bill.jouris at insidethestack.com
831-659-8360
925-855-9512 (direct)

________________________________
From: "Tan Tanaka, Dennis via Latingp" <latingp at icann.org>
To: "Tan Tanaka, Dennis via Latingp" <latingp at icann.org>
Sent: Thursday, April 5, 2018 9:41 AM
Subject: [Latingp] Draft variants principles document

Need assistance with developing the sections for special cases:

https://docs.google.com/document/d/1IrT_kfildf1SumYUqjkaIkMT-TYx9IRqtuPMV4YvKXU/edit#heading=h.5xs6hwfrrh41

Thanks,
Dennis
_______________________________________________
Latingp mailing list
Latingp at icann.org<mailto:Latingp at icann.org>
https://mm.icann.org/mailman/listinfo/latingp



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20180406/97a12e3e/attachment-0001.html>


More information about the Latingp mailing list