[NCAP-Discuss] Current Status of the NCAP Project

Fri Nov 4 16:27:20 UTC 2022

Team:

I have reviewed the current draft and one thing really jumps out:

From this section:

> 6.4 Recommendation X - ICANN should replace the existing Name Collision 
> Management Framework [Merged with 6.X Recommendation X - ICANN
> should adopt the following name collision process.] “with the recommended
> Name Collision Risk Assessment Workflow”? 

The following sentence, which appears to be the sole justification for the most significant recommendation:

> The findings from the various study reports and the input from responses to the Board 
> questions make it clear that a broader set of actions including PCA and, potentially, 
> ACA are necessary to acquire the Critical Diagnostic Measurements necessary to inform
> a risk assessment. 

In reading the previous 60 pages, this isn't (at all) clear. What is clear is that there are many trade-offs and many competing objectives and opinions. There is no perfect solution. There is no chance that a reader of this document will be left convinced that "it is clear" that we must do all of these complex, expensive, high-risk, and never before done things like PCA, ACA, etc... especially since our own NCAP Study 1 says exactly the opposite! Casey's latest document highlights the reality of difficult trade-offs and competing objectives.

This is leap not (at all) supported by data, logic, rigorous scientific process, or past experience. 

In general, I cannot support the current draft that recommends multiple root delegations and honeypots for every applied-for string. I find this approach plainly unnecessary and high-risk to the point of reckless. The risks associated with this path far outweigh the benefits.

(1) Delegating each string to the TRT (presumably a contractor or ICANN itself) *doubles* the number of delegations/root change activities as compared to the previous round. In 2012, .foo was delegated once - to the new .foo registry. This was deliberate to reduce risk and root zone changes. We considered an intermediate delegation (CI being handled by a third party) but intentionally chose against this approach to reduce (re-)delegations. In the currently envisioned process, .foo would be delegated once to the TRT then again to the new .foo registry. 

The IANA-Verisign root management process is a low volume process and most requests span calendar weeks. IANA-Verisign processes to do what is envisioned here somehow in "bulk" don't currently exist, so we'd be asking IANA-Verisign to create new processes. Recall from the 2012 round the various "root scaling" research pieces (including one from SSAC and one from RSAC) stated that the "size" of the root zone was less of a concern than the rate of change. The currently envisioned process would *double* the number of root changes. Root zone changes are not zero cost and not zero risk. We better have a darn good reason to *double* the number of changes.

(2) Both TLD honeypots of the envisioned variety and the "passive" data collection scheme (delegate and serve a skeleton zone) create enormous risk and uncertainty. These things have never been done before (Casey notes SiteFinder is the closest historical analog - and that created massive unintended consequences). Not only do we have the "but for" issue that will subject the operator and ICANN to liability under every conceivable global privacy and cybersecurity framework, there is good reason to worry about the cure being worse than the disease operationally to those impacted. Not only would we be causing those impacted to exfiltrate data (what Verisign calls "controlled exfiltration"), we would also change their operational dynamics in ways we don't understand and have zero experience. This is a concern shared by OCTO and some in the DNS-OARC community: "As you discuss in the proposal, all the configurations would represent a change from the current behavior (the root sending authoritative NXDOMAIN replies). Changing this behavior could potentially cause disruption to some systems, so the benefit of this new behavior has to be weighed against the risk of disruption. We note that some in the DNS-OARC technical community have brought up similar issues in their Mattermost discussion system." (OCTO to Matt Thomas, email to NACP List 8/1/2022). Again, we better have a darn good reason to do these things.

Monkeying with the root zone is deadly serious. Verisign has a term I love - they refer to the root as being "unnaturally perfect." When we invented Controlled Interruption, we considered all of this very carefully and ultimately took the absolutely most conservative approach we could think of and tested it extensively. Why 127.0.53.53 and not 127.53.53.53? We found some older Cisco router firmware improperly defaulted localhost to 127/16 instead of 127/8. Why did we specifically recommend against an IPv6 address? We tested everything we could think of: 0::ffff:7f:/104 usually dumped to a default route or tunnel. FC00::/7 and FE80::/10 were unevenly implemented, particularly in Juniper (based on BSD) implementations, and occasionally dumped to a default route. We were extremely, extremely, extremely conservative. We wanted to be *sure* that we wouldn't cause data to leave the host that would not have otherwise. 

I can tell 'ya I was personally nervous to the point of nausea during the first few CI implementations in the 2012 round. There was real risk of something going sideways. We spent months testing everything we could think of. We talked to all the major vendors that would talk to us. We had every imaginable piece of gear in our labs, offices, and basements. We were ready with EBERO-like procedures in case the worst happened; it was a nail-biting experience. Fortunately, it seems to have worked out by any reasonable definition of worked out. We are now at the point where we have a decade and a thousand strings of operational experience with the 2012 procedures; prudence dictates that the bar must be exceedingly high to conjure-up something different.

Four years and 100 meetings later, NCAP hasn't come-up with anything that is demonstrably and objectively better. NCAP's own Study 1 says that.

What NCAP has come-up with are a bunch of ideas (most previously considered, exhaustively) and a few folks hoping real hard that these ideas will somehow be better. For an unclear definition of better. In fact, the things being considered have real risk of making things worse both procedurally (by creating quagmire and controversy) and technical/operational impacts to the root, root management processes, and to those end-systems experiencing collisions. I get the real sense that some in this group simply don't recognize the diligence applied a decade ago. There were really good reasons we did the things we did in 2012 and no evidence has emerged warranting the types of dangerous schemes being considered.

Makes zero sense to me. I will likely be submitting a dissenting opinion to be published within NCAP's final work product. Folks interested in contributing to the dissenting opinion please contact me offline.

Thanks,
Jeff