[NCAP-Discuss] Thoughts on proposed name collision frameworks

Thu Feb 16 22:56:20 UTC 2023

Dear Discussion Group,

In the Feb 1 DG meeting, I expressed some of my concerns about the mechanisms that are currently being proposed as recommendations by the DG.  I would like to document my thoughts here on the list and open it for further discussion.

PASSIVE COLLISION ASSESSMENT.
-------------------
Passive collision assessment is a mechanism for 1) getting higher fidelity name collision telemetry data from the DNS system, where data from traditional data sets (e.g., DITL) are becoming less reliable; 2) consolidating name collision query information into one place for facilitated analysis; and 3) potentially detecting name collisions ahead of--before third parties are affected by the disruption associated with controlled interruption (or its successor).

I support passive collision assessment.

I think passive collision assessment is an important part of any new name collision framework.  It is low risk, low cost, and results in increased telemetry data.  More details are found in section 3.6.5 ("Name Collision Telemetry") of (the current version of) the study 2 report [1].  However, it is not a replacement for an end-system alerting mechanism like controlled interruption; it is just about collecting data.

ACTIVE COLLISION ASSESSMENT.
-------------------
Active collision assessment is a mechanism for 1) disrupting transport- and application-layer communications from end systems; 2) intercepting application layer communications and sending custom content to inform the end-system what caused the disruption; and 3) creating additional telemetry data surrounding name collisions.

I strongly oppose active collision assessment.

Benefits: While one of the primary benefits promised is the ability to communicate to the user about the name collisions experienced, data suggests that web browser use (the primary mode for user interaction) is among the minority of applications that have been affected by controlled interruption.  More details are found in section 3.6.3 ("Root Cause Identification") of (the current version of) the study 2 report.  Support of IPv6 is another of the benefits promised by active collision assessment.  However, as mentioned in a previous email discussion [2], alerting coverage is not expected to benefit from IPv6 support except in the case of IPv6-only systems.  See also section 3.6 ("Evaluation and Comparison of Proposed Alerting and Data Collection Techniques" aka "Alerting Effectiveness and Coverage") of (the current version of) the study 2 report.  Finally, active collision assessment promises increased telemetry data, as mentioned earlier.

Costs: The benefits come at a cost.  First, the fact that applications would be sending application-layer data that--in many cases--was bound for private entities presents a potential security and privacy risk that is only fully known by the affected third parties.  In cases where TLS is used, the user is almost certainly presented with certificate errors.  And while those errors alone might be considered "alerts", I believe they raise the wrong sort of flags to the users and the organizations with which they are associated.  Attempting to make a sort-of wildcard certificate is infeasible, based on my knowledge of certificates.  This is detailed more in section 3.6.2 ("User Experience") of (the current version of) the study 2 report.  However, even in the case that valid certificates *can* be made, that also is a security concern.   To me, this exemplifies the systemic harm mentioned in a recent discussion group meeting, wherein users lose trust in the DNS because it was violated by deceitful behavior.

In summary, aside from the added telemetry data, the proposed benefits are marginal, in my professional opinion, and the costs greatly outweigh the benefits.

REJECT ALL.
-------------------
On Feb 1, I discussed a mechanism that is sort of a mix of controlled interruption and active collision assessment.  The idea is that names resolve with a wildcard A/AAAA to a public IP address, just as with active collision assessment.  However, there is *no* attempt to intercept communications.  All communications are rejected with TCP RST or ICMP port unreachable.

While there is no avenue to inform the end-user or end system of a name collision problem by sending custom application-layer content, there is another avenue for the user to get root cause analysis: reverse DNS.  With controlled interruption, the special IP address 127.0.53.53 was used for (at least) two purposes: 1) keep transport-layer traffic local to the machine; and 2) a special/unique identifier that could be "looked up" using a web search.  Unfortunately, analysis showed that root cause identification scored low for controlled interruption.  Many of those that were affected did not even notice the special address.  However, using a public IP address allows the reverse DNS to be looked up; this is a fairly common identity lookup for system administrators.

And... the name associated with the reverse lookup could actually be a Web server that hosts content associated with name collisions, so an end-user or system administrator could go directly there.

Benefits: 
 - User experience is (theoretically) no worse than controlled interruption, which is the baseline.
 - Does not have the privacy concerns associated with transmission of application-layer data.
 - It brings the same telemetry benefits as active collisions assessment
 - It allows system administrators to use reverse DNS to identify root cause.

Costs:
 - It exposes transport-layer communications from end users and systems

In summary, I think that the reject-all proposal addresses some of the deficiencies of the alerting mechanisms from the 2012 round (i.e., controlled interruption exclusively) at a much lower cost of security/privacy/user experience than active collision assessment.

CONTROLLED INTERRUPTION
-------------------
Even with the reject-all proposal, in which servers do not attempt to accept or respond to transport- or application-layer communications, some may still be concerned with the fact that transport-layer communications are exposed, along with the IP address and port from which they originated.  Traditional controlled interruption is still the baseline and definitely the most conservative and the most tested approach.

SUMMARY
-------------------
I oppose active collision assessment, as currently defined.  There are too many unknowns associated with security, privacy, and user experience concerns.

Controlled interruption could be a feasible alerting mechanism again, so long as it is combined with something like passive collision assessment, which allows analysis prior to more invasive alerting techniques.

The reject-all proposal addresses some of the deficiencies of controlled interruption with lower costs and fewer unknowns than active collision assessment.  Its risks are somewhere in between those of controlled interruption and active collision assessment.  I could probably support this proposal, but I'm interested in the feedback of the group.

Thanks,
Casey

[1] https://docs.google.com/document/d/13SQnZt1HHeD9i1cSds-kj16mxRQgxp6hpb2K1kLqB1U/edit?usp=sharing
[2] https://mm.icann.org/pipermail/ncap-discuss/2022-December/001050.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20230216/43cdf108/attachment.html>