[gnso-rpm-wg] Critique of INTA survey

Tue Aug 29 21:05:42 UTC 2017

Hi folks,

I'm not sure how many have had a chance to read the INTA materials for
tomorrow's call yet, or have any background in statistics, but the
survey has truly deep and fatal flaws, making any conclusions drawn
from it entirely unreliable and non-robust.

I could write 50 pages on this (I've read the report three times now,
in horror), but I'll keep it relatively brief (and make these
statements in advance of the call, so that Lori or INTA/Nielsen have a
chance to rebut).

The entire basis of statistical inference is that one can make
statements about an entire population with a certain level of
confidence using only data from a subset of that population (i.e. the
sample in question). Prerequisites are that (a) the sample be random,
and (b) the sample be of sufficient size. INTA's study fails on both
counts (self-selected and unrepresentative sample, and a mere 33
responses).

INTA claims to represent 7,000 organizations as members:

https://www.inta.org/About/Pages/Overview.aspx

While they acknowledge on page 5 of the slides the small sample size
and suggest "some caution", alarm bells should be ringing regarding
that small sample size. Page 6 then demonstrates how unrepresentative
and non-random that sample is, with 52% of the 33 respondents having
total revenue exceeding $5 billion/year, and a whopping 77% (27%+52%)
having revenues exceeding $1 billion. This is hardly representative of
typical TM owners. Similarly, 39% of this sample had 25,000 or more
employees, and 78% (39%+39%) had 5,000 or more employees.

All throughout the report, the slides say "INTA members" (i.e. wrongly
attempting to extrapolate and assert a truth about the entire
population, rather than limiting the statements to be applicable only
to the sample of 33 respondents).

Basic sanity checks were not done with those
extrapolations/inferences. On page 25, the report asserts that "more
than 4 in 10 members have applied to operate a new TLD"?

45% of 7000 members implies 3,150 INTA members applied for new gTLDs.
That's not correct. The total applications by everyone was 1930 -- see
https://newgtlds.icann.org/en/program-status/statistics, and the
number by brand owners is a subset of that total (664 according to
https://icannwiki.org/Brand_TLD and that will be a bit high, due to
multiple applications). If one extrapolated that to the entire
universe of trademark holders (i.e. including non-INTA members),
millions of TM owners, it would be even more obvious how
unrepresentative and non-random the data in this sample is relative to
a "typical" TM holder. This sample is highly skewed to the largest of
the large organizations who happened to self-select a response to this
survey.

All throughout the report, important data on confidence intervals is
missing, obscuring the fact that the level of confidence is extremely
low (and the margin of error is high) due to the small sample size.
[confidence intervals are statements like "+/- 5%, 19 times out of 20]

There are actually calculators that let one know how big a sample
should be, in order to have a certain level of confidence and/or a
margin of error.

e.g. see: https://www.surveymonkey.com/mp/sample-size-calculator/

For a population size of 7000 members (INTA's total membership) and a
95% confidence level, with a huge 10% margin of error, you'd still
need 95 survey responses. Yet, there were only 33 responses. This is
particularly important to be kept in mind for charts with percentages
(pp. 17 and beyond), where the margin of error, even if sampled
properly, would be enormous.  Furthermore, those would have had to
have been RANDOMLY sampled responses to be proper, which we know isn't
the case. If you wanted smaller margins of error, say +/- 5%, you need
an even larger sample size (in this case, 365). Another useful
calculator is at:

https://www.surveymonkey.com/mp/margin-of-error-calculator/

ICANN has done surveys, by Nielsen even, that didn't suffer from these
deficiencies, e.g. see:

https://newgtlds.icann.org/en/reviews/cct/registrant-survey-faqs-25sep15-en

A key takeaway from that work was "Due to a low response rate to
emailed invitations to complete the survey, ICANN then worked with
Domain Tools to procure a larger sample of WHOIS records." They took
greater care in that study to have *randomized* samples, too, along
with the larger sample size.

While it is somewhat interesting to have a glimpse into brand
protection of some of the largest companies, ultimately this study is
not robust.

In summary, any conclusions from this INTA study really need to be
taken with a grain of salt, due to the small sample size, combined
with the non-random and unrepresentative sample itself. Indeed, many
of the conclusions need to be read as the *opposite* of what the study
suggests (i.e. if defensive costs are $150K/year for companies with $5
billion+ in revenues, that's a drop in the bucket, and would be much,
much smaller for a "typical" TM owner). To correct these deficiencies,
future surveys need to be random (easily done, e.g. random sample the
USPTO database or other national registries) and have a much larger
sample size. Understandably, that costs money, but that's what it
takes to do things properly.

Sincerely,

George Kirikos
416-588-0269
http://www.leap.com/

On Mon, Aug 28, 2017 at 4:02 PM, Mary Wong <mary.wong at icann.org> wrote:
> Dear all,
>
>
>
> The proposed agenda for our next Working Group call, scheduled for 0300 UTC
> on Thursday 31 August, is as follows:
>
>
>
> Roll call (via Adobe Connect and phone bridge only); updates to Statements
> of Interest
> Review and discuss results of INTA Cost Impact Survey
> Next steps/next meeting
>
>
>
> For Agenda Item #2, please review the survey results here:
> https://community.icann.org/download/attachments/61606864/INTA%20Cost%20Impact%20Report%20revised%204-13-17%20v2.1.pdf?version=1&modificationDate=1500376749000&api=v2
>
>
>
> Lori Schulman of INTA, and a member of this Working Group, also did a
> presentation of the results to the Competition, Consumer Protection &
> Consumer Trust (CCT) Review Team recently that may be helpful to review:
> https://community.icann.org/download/attachments/61606864/ICANN%20New%20gTLD%20Survey%20Update%2010May%20Final.pdf?version=1&modificationDate=1501098808000&api=v2.
> We are hopeful that Lori will be able to join us for this call, to
> facilitate our review and discussion.
>
>
>
> Thanks and cheers
>
> Mary
>
>
>
>
>
>
> _______________________________________________
> gnso-rpm-wg mailing list
> gnso-rpm-wg at icann.org
> https://mm.icann.org/mailman/listinfo/gnso-rpm-wg