[gnso-rpm-wg] [Ext] Data Collection Suggestions for Q4 of Trademark Claims - machine readable semantics of the non-exact matches document, analysis of the historical registration data and potential Issues for Registries and Registrars.

Thu Jul 20 04:18:32 UTC 2017

Thank you very much for this very detailed and helpful follow up, Maxim. As you may have noticed, staff put a very brief placeholder on this point in the updated Trademark Claims document that was used for the call earlier today. We will find a way to incorporate these additional details you have just provided, and I am sure the co-chairs will take this suggestion into consideration as well, when preparing a proposal for how to go about collecting all the data requested for the Working Group.

Thanks and cheers
Mary

From: Maxim Alzoba <m.alzoba at gmail.com>
Date: Wednesday, July 19, 2017 at 12:44
To: "gnso-rpm-wg at icann.org" <gnso-rpm-wg at icann.org>
Cc: Mary Wong <mary.wong at icann.org>, Amr Elsadr <amr.elsadr at icann.org>
Subject: Re: [Ext] Data Collection Suggestions for Q4 of Trademark Claims - machine readable semantics of the non-exact matches document, analysis of the historical registration data and potential Issues for Registries and Registrars.

Hello All,

Here are my thoughts about machine readable semantics of the non-exact matches document, analysis of the historical registration data and
potential Issues for Registries and Registrars.

The following text is based on analysis of "PROPOSAL FOR SMARTER NON-EXACT MATCHES", 29 MAY 2017 DRAFT v1.0.
The assumption was made by me that the wording "Sunrise and/or" are to be  removed according to
the recent conversations during meetings of the PDP WG.

For simplicity of the text I will use word 'rule' from computer science for particular suggested 'non-exact match criteria', so
'Character Removal non-exact match criterion' will be shortened to 'Character Removal rule'.

To avoid confusion of traditional Claims with the Claims caused by potential implementation of the suggested Non-Exact
Matches I will use term 'potential claims' for the latter.

During the last month (including our F2F meeting in Joburg) we heard many voices saying 'too many hits are to be expected'
and almost the same number of voices saying 'it is not going to be a lot', and it means that
we do not have enough material for making proper judgment.

So this text is a simplified description of the way we might take in order to quantify numbers/frequency of potential claims
 by applying mentioned 'Rules' to historical registrations (available to ICANN via escrow) and historical TMCH records
(available to ICNN via Deloitte & IBM).

I'd like to underline from the very beginning, that we will not need to disclose commercially sensible information,
only anonymized results of the processing will be required ( number of claims per historical registration (average/ maximum)
 for marks of different length of symbols , the number of variants =  number of non-exact matches for all applicable rules
 combined for historical registrations (average/ maximum) and
the number of potential claims combined ).

Why I am talking about those two sets of  numbers: first will give us understanding of how many notifications would be required
 in the past with the new rules applied, and the second will allow software/hardware  engineers (IBM, most probably, given
the structure of TMCH) to estimate methods to use for implementation, size of the database, speed of it's growth with the current
rate of TMCH increase, and what is quite important costs (I am not sure we will be happy to know that all we need for the
implementation .... is a supercomputer of a sorts). The last will give us understanding of the percent of potential claims for
historical registrations.

To make such an assessment the rules needs to be converted to machine readable semantics, so,
for example
 Rule 7, Digit Addition, will be something like :
 string OR string +'1'
 Rule 8, “Cheap” and “Buy”, will be :
string OR string+'cheap' OR 'cheap'+string OR string+'buy' OR 'buy'+string

(The particular semantics will depend on the programming language used for the assessment,
the examples provided are enough for an average programmer to understand the idea).

Rules 11 (Goods and Services and Industry Keywords) and 12 (Commonly Abused Terms) can not be assessed as easily,
  they require creation of dictionaries of words , and it is a separate task, and it is not simple (but having the results of
 the suggested assessment it will be easy to understand growth of numbers from the number of entries in those dictionaries of strings).

Important note:  to make such an assessment we will need to establish the Rules of Combination (which rules and in which order
 can be applied to the string in question prior to the comparison of the result  and TMCH entries).

Additional thoughts:

1. Issues for Registrars due to Spam:
The amount of potential claim notices (needs to be quantified via test v.s. historical data, current ideas vary from "not many"
to "almost 99%") might generate additional amount of unwanted claim notices  from Registrars. And to answer how many - we
 need the suggested assessment.

Spam lists usually need few signals of unwanted activity from different persons/organisations, and rise in number
 of unwanted notices (not all registrants want a lot of e-mails from registrars) might lead to rise of positive hits
 (from perspective of the spam list organisation) which might endanger business activities of registrars
 (inclusion of Registrar IP addresses/AS/domain names in the spam lists will cause technical and operational issues,
such as non-visibility to Internet, inability to  e.t.c.).

2. Potential chilling effect on the market. Claim registrations add additional 0.25 USD for each domain, and currently
 it is around 100 out of 10k domains, and now Registries can cope with it,  increase in probability of claim registrations
(coupled with evergreen claims for the proposed additional services) will make the numbers higher and Registries will
have to share the costs with the Registrars and via Registrars with Registrants.

3. The current design of the non-exact match document seems to be ASCII centric and it could be an area for further
improvement for support of IDN and non-English latin script languages .

4.  Rule 2 (Fat-finger Typos) needs to take into account AZERTY keyboard layout (second largest number to QWERTY
 ... it might be history soon, but not yet).

I am not against the implementation of the proposed model (after proper assessment)
 if it is used solely for notification of the TM owners, who have registered their trade marks in TMCH.

The reason to think so is that only a person from the potentially affected organisation (owner of the TM,
registered in TMCH) may say which of the suggested hits (potential claims) are dangerous for their organisation
 and which to protect (to avoid effect of  a death of a 1000 cuts, where company gets involved
 in too many unnecessary processes at the same time).

Sincerely Yours,

Maxim Alzoba
Special projects manager,
International Relations Department,
FAITID

m. +7 916 6761580(+whatsapp)
skype oldfrogger

Current UTC offset: +3.00 (.Moscow)

On Jul 12, 2017, at 00:13, Amr Elsadr <amr.elsadr at icann.org<mailto:amr.elsadr at icann.org>> wrote:

Hi Maxim,

I hope this email finds you well. I’m reaching out to you to follow up on an action item for you from the WG F2F meeting in Johannesburg. During the meeting, it was indicated that you may have some follow up suggestions on data collection regarding question 4 on Trademark Claims (the newly added question on the proposals of non-exact matches).

Please do review the attached updated table, and send any suggestions you have to the WG mailing list. It is our hope to wrap up the refinement of the questions, and the identification of data requirements this week.

Thanks, Maxim.

Amr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/gnso-rpm-wg/attachments/20170720/42e497cc/attachment-0001.html>