[NCAP-Discuss] [Ext] Re: Draft final Study 1 report

Wed Apr 29 20:27:59 UTC 2020

On 2020-04-28 20:17, Karen Scarfone wrote:

> Here are my responses to your questions. I've copied the text of your
> original email below for reference.

Thanks for the prompt response Karen, comments inline!

> 1. I stated in the report that "there does not appear to be any recent
> academic research into the causes of name collisions or name collision
> mitigation strategies." You disagreed with that and cited nine
> examples of recent work. One was a 2020 posting to the NCAP mailing
> list from Jeff Schmidt about corp.com. One was a blog posting from
> 2019 on how pen testers can take advantage of name collisions. The
> other seven examples are from 2017 or earlier. Perhaps you and I are
> interpreting "recent" differently, but you haven't provided any
> examples of academic (or industry) research into name collisions from
> the past three years, and all the examples except the pen tester blog
> posting were already reviewed for the draft report. Would it be
> clearer if I reworded my statement to say, "There does not appear to
> be any academic or industry research during the past three years into
> the causes of name collisions or name collision mitigation
> strategies"?
> 
> 2. I based my assertion about finding the causes for name collisions
> on evidence from previously identified causes. Sections 3.5 and 3.6 of
> the draft report contain most of this information. Causes mentioned
> include:
> * Shortened name usage
> * Search list processing
> * User error and misconceptions
> * Client software misconfiguration
> * Browser prefetching
> * Third-party applications or plug-ins
> * Web crawlers
> * Malware
> * Web Proxy Auto-Discovery (WPAD) protocol
> * Expired registrations
> * Intentional acquisition of colliding names

As noted on the call a moment ago, IF you're not doing honeypotting to 
qualify risks then you could at least look at the labels to determine 
the riskiness of some strings.  This is what Duane Wessels did in the 
mid 2000s, what Interisle did thereafter at scale, and what Verisign, 
JAS, and others did subsequently.  There are some techniques that can 
improve identifying these risky / unicorn strings considerably, to 
include this work from November 2017, which I certainly consider as 
recent, especially since this WG started in 2018, IIRC:

Client-side Name Collision Vulnerability in the New gTLD Era: A 
Systematic Study
https://dl.acm.org/doi/pdf/10.1145/3133956.3134084

But that also doesn't mean earlier work didn't allow ICANN Org or others 
to "test" strings for riskiness proactively v. hoping to break 
("interrupt") things during an initial delegation to notify potentially 
impacted parties.  Interisle did that what at ICANN's direction and it's 
well understood, IMO.

Furthermore, most of the riskiest classes of attacks where MitM and the 
like can occur are employing service discovery protocols where the 
labels are detectable at the root level (Today at least -- although 
QNAME minimization is in fact having a _significant impact on visibility 
at the root, which is something else that has changed now and impacts 
any analysis considerably).

> There is no evidence that there's a single root cause of most name
> collisions. The evidence is overwhelming that there are many root
> causes, and that the types of root causes in the list above are not
> ones you would identify by analyzing datasets. Datasets might give you
> a starting point, but most of the analysis work would need to be
> carried out on a case-by-case basis outside those datasets. But I
> don't see any evidence that there's a substantial number of
> unexplained name collisions happening, let alone causing problems.

I don't agree with this wholesale - especially for the class of service 
discovery protocols like WPAD and the thousands of others can be 
identified specifically through the QNAME, are are arguably the 
riskiness because someone is looking for a service to rendezvous with.  
I think you could just apply a framework like the one above and do it 
tomorrow and set some unacceptable thresholds and unless someone wants 
to do outreach then consider some bar as too risky - Interisle had this 
mostly right I think but the fact that 1000+ strings needed to be 
delegated to enable business plans "complicated things", IMO.

> 3. Regarding controlled interruption, Jeff Schmidt's email
> (https://mm.icann.org/pipermail/ncap-discuss/2020-April/000282.html)
> already said much of what I was going to say. Controlled interruption
> has been highly effective at mitigating name collisions--there's
> overwhelming evidence of that--and that encompasses all current root
> causes. Unless a major new root cause is identified that controlled
> interruption can't effectively mitigate, I do not see the need to
> study mitigation strategies other than controlled interruption. I
> don't even know *how* you would study mitigation strategies unless you
> know which root cause needs to be addressed and how controlled
> interruption is insufficient. I will state this more clearly in
> Section 6 of the report.

Warren and I both commented on this on the call.  I'm till of the 
opinion that controlled interruption has provided some value, but It's 
brute force, inherently reactive, ignores full classes of places where 
signal will never make it to the client / user, and doesn't allow any 
proactive feasibility for delegation "test" (to borrow from Neuman) to 
yield any useful data a priori or other insights prior to delegation.

Thanks,

-danny