[gnso-rpm-wg] Critique of INTA survey

Wed Aug 30 17:47:17 UTC 2017

Hi Everyone:

With the exception of the hyperbole at the outset and some slight garbling on confidence level definitions, I largely agree with George. 

The extremely low response rate and (more importantly) lack of randomness in the sample of responders essentially prohibits one from drawing any conclusions about the population as a whole. IF the selection was random, then the survey margin of error would be about +/- 18% with a 95% confidence level. This would mean we are 95% sure the margin of error is 18% or less and there is a 5% chance the margin of error is greater. 

However, “random" means that we picked the 33 members that responded essentially out of a hat filled with the names of the 6600 members. But that is not what happened. The survey probably received responses from their most DNS-savvy members - those that found the purpose of the survey interesting or where the questions seemed more straight-forward. This significantly skews the results. Georges email demonstrates this in more detail.

I don’t think the survey deals with the skewed data set fairly or honestly. The survey characterizes the findings as traits of the entire membership rather than as traits of the population that responded. This can’t be defended.

For instance, I don’t think it is correct to say: "Vast majority (97%) of members registered domain names in past 24 months, with 9 in 10 registering new TLDs,“ and "9 in 10 members have registered new TLD domains in the past two years in the Sunrise Period."  I think new TLD owners would be very surprised (and happy) to hear this. It would be accurate to say that "97% of the respondents registered….” (See slides 9 and 11)

More harmful to the credibility of the study are statements such as: 
3 in 4 members (76%) have incurred costs for internet monitoring of trademarks in the past 2 years, with more than half (57%) of the members spending $10k or more. (see slide 12)
On average, INTA members spend $150,000 per year on defensive actions (see slides 10 & 27)
These are the types of quotes that find themselves into print and become believed. (“INTA members spend $150K each  in defensive efforts, a ~$1 billion cost to industry!”) As George noted, as a rule, larger companies responded and so it can not established by this survey that each of the remaining (smaller) INTA members average $150,000 per year in defensive spend. 

There is another interesting facet to the asserted $150K / year spend rate. One company spent $5.2MM. Assuming this $5.2MM spend was over a two-year period, that means that the other 32 respondents averaged (33 x $292K — $5.2MM) / (32 x 2) =  $69,000 / year. So except for one outlier, the per year spend by the brand owners that chose to answer the study is half of what the study states. Why didn’t the study make this clear? (see slide 10). 

I am not sure of the purpose of the study but there are uses that can be made of it: 

There was one conclusion I could draw. It states that UDRP and Sunrise were the favored rights protection mechanisms, used to a major or moderate extent by 67% and 64% of the respondents respectively. The next most utilized RPMs were Trademark Claims and URS (by 36% and 27% respectively). To me this says that, to those who are in-the-know, Sunrise is a highly-valued RPM and, therefore, should be continued. (Sorry, George) (see slides 15 and 51) 

Also, the study makes one fact clear that we have already supposed: that business are not aware of new gTLDs and domain utility in general. There are several data sets that point to this. Rather than education efforts that identify costs and target abuse prevention and mitigation only, Brand education could also describe the benefits of domains as strategic tools, that provide greater access to products and indicia of reliability to brands’ customers.

I know this was way pedantic. Sorry. I can’t be on the call as it is at 4AM my time but I’d be pleased to respond to comments or questions. 

Best regards,

Kurt

> On Aug 29, 2017, at 10:05 PM, George Kirikos <icann at leap.com> wrote:
> 
> Hi folks,
> 
> I'm not sure how many have had a chance to read the INTA materials for
> tomorrow's call yet, or have any background in statistics, but the
> survey has truly deep and fatal flaws, making any conclusions drawn
> from it entirely unreliable and non-robust.
> 
> I could write 50 pages on this (I've read the report three times now,
> in horror), but I'll keep it relatively brief (and make these
> statements in advance of the call, so that Lori or INTA/Nielsen have a
> chance to rebut).
> 
> The entire basis of statistical inference is that one can make
> statements about an entire population with a certain level of
> confidence using only data from a subset of that population (i.e. the
> sample in question). Prerequisites are that (a) the sample be random,
> and (b) the sample be of sufficient size. INTA's study fails on both
> counts (self-selected and unrepresentative sample, and a mere 33
> responses).
> 
> INTA claims to represent 7,000 organizations as members:
> 
> https://www.inta.org/About/Pages/Overview.aspx
> 
> While they acknowledge on page 5 of the slides the small sample size
> and suggest "some caution", alarm bells should be ringing regarding
> that small sample size. Page 6 then demonstrates how unrepresentative
> and non-random that sample is, with 52% of the 33 respondents having
> total revenue exceeding $5 billion/year, and a whopping 77% (27%+52%)
> having revenues exceeding $1 billion. This is hardly representative of
> typical TM owners. Similarly, 39% of this sample had 25,000 or more
> employees, and 78% (39%+39%) had 5,000 or more employees.
> 
> All throughout the report, the slides say "INTA members" (i.e. wrongly
> attempting to extrapolate and assert a truth about the entire
> population, rather than limiting the statements to be applicable only
> to the sample of 33 respondents).
> 
> Basic sanity checks were not done with those
> extrapolations/inferences. On page 25, the report asserts that "more
> than 4 in 10 members have applied to operate a new TLD"?
> 
> 45% of 7000 members implies 3,150 INTA members applied for new gTLDs.
> That's not correct. The total applications by everyone was 1930 -- see
> https://newgtlds.icann.org/en/program-status/statistics, and the
> number by brand owners is a subset of that total (664 according to
> https://icannwiki.org/Brand_TLD and that will be a bit high, due to
> multiple applications). If one extrapolated that to the entire
> universe of trademark holders (i.e. including non-INTA members),
> millions of TM owners, it would be even more obvious how
> unrepresentative and non-random the data in this sample is relative to
> a "typical" TM holder. This sample is highly skewed to the largest of
> the large organizations who happened to self-select a response to this
> survey.
> 
> All throughout the report, important data on confidence intervals is
> missing, obscuring the fact that the level of confidence is extremely
> low (and the margin of error is high) due to the small sample size.
> [confidence intervals are statements like "+/- 5%, 19 times out of 20]
> 
> There are actually calculators that let one know how big a sample
> should be, in order to have a certain level of confidence and/or a
> margin of error.
> 
> e.g. see: https://www.surveymonkey.com/mp/sample-size-calculator/
> 
> For a population size of 7000 members (INTA's total membership) and a
> 95% confidence level, with a huge 10% margin of error, you'd still
> need 95 survey responses. Yet, there were only 33 responses. This is
> particularly important to be kept in mind for charts with percentages
> (pp. 17 and beyond), where the margin of error, even if sampled
> properly, would be enormous.  Furthermore, those would have had to
> have been RANDOMLY sampled responses to be proper, which we know isn't
> the case. If you wanted smaller margins of error, say +/- 5%, you need
> an even larger sample size (in this case, 365). Another useful
> calculator is at:
> 
> https://www.surveymonkey.com/mp/margin-of-error-calculator/
> 
> ICANN has done surveys, by Nielsen even, that didn't suffer from these
> deficiencies, e.g. see:
> 
> https://newgtlds.icann.org/en/reviews/cct/registrant-survey-faqs-25sep15-en
> 
> A key takeaway from that work was "Due to a low response rate to
> emailed invitations to complete the survey, ICANN then worked with
> Domain Tools to procure a larger sample of WHOIS records." They took
> greater care in that study to have *randomized* samples, too, along
> with the larger sample size.
> 
> While it is somewhat interesting to have a glimpse into brand
> protection of some of the largest companies, ultimately this study is
> not robust.
> 
> In summary, any conclusions from this INTA study really need to be
> taken with a grain of salt, due to the small sample size, combined
> with the non-random and unrepresentative sample itself. Indeed, many
> of the conclusions need to be read as the *opposite* of what the study
> suggests (i.e. if defensive costs are $150K/year for companies with $5
> billion+ in revenues, that's a drop in the bucket, and would be much,
> much smaller for a "typical" TM owner). To correct these deficiencies,
> future surveys need to be random (easily done, e.g. random sample the
> USPTO database or other national registries) and have a much larger
> sample size. Understandably, that costs money, but that's what it
> takes to do things properly.
> 
> Sincerely,
> 
> George Kirikos
> 416-588-0269
> http://www.leap.com/
> 
> 
> On Mon, Aug 28, 2017 at 4:02 PM, Mary Wong <mary.wong at icann.org> wrote:
>> Dear all,
>> 
>> 
>> 
>> The proposed agenda for our next Working Group call, scheduled for 0300 UTC
>> on Thursday 31 August, is as follows:
>> 
>> 
>> 
>> Roll call (via Adobe Connect and phone bridge only); updates to Statements
>> of Interest
>> Review and discuss results of INTA Cost Impact Survey
>> Next steps/next meeting
>> 
>> 
>> 
>> For Agenda Item #2, please review the survey results here:
>> https://community.icann.org/download/attachments/61606864/INTA%20Cost%20Impact%20Report%20revised%204-13-17%20v2.1.pdf?version=1&modificationDate=1500376749000&api=v2
>> 
>> 
>> 
>> Lori Schulman of INTA, and a member of this Working Group, also did a
>> presentation of the results to the Competition, Consumer Protection &
>> Consumer Trust (CCT) Review Team recently that may be helpful to review:
>> https://community.icann.org/download/attachments/61606864/ICANN%20New%20gTLD%20Survey%20Update%2010May%20Final.pdf?version=1&modificationDate=1501098808000&api=v2.
>> We are hopeful that Lori will be able to join us for this call, to
>> facilitate our review and discussion.
>> 
>> 
>> 
>> Thanks and cheers
>> 
>> Mary
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> gnso-rpm-wg mailing list
>> gnso-rpm-wg at icann.org
>> https://mm.icann.org/mailman/listinfo/gnso-rpm-wg
> _______________________________________________
> gnso-rpm-wg mailing list
> gnso-rpm-wg at icann.org
> https://mm.icann.org/mailman/listinfo/gnso-rpm-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/gnso-rpm-wg/attachments/20170830/d2d47cc7/attachment.html>