[NCAP-Discuss] Thoughts on proposed name collision frameworks

Wed Mar 8 01:06:12 UTC 2023

Apologies in advance for the length...

> On Mar 6, 2023, at 10:05 AM, Jeff Schmidt <jschmidt at jasadvisors.com> wrote:
> 
> For the record, and the editors’ benefit, I agree with what Casey has nicely summarized below.
>  
> A few comments:
>  
> 1). Although I disagree with the term “passive collision assessment” (it involves a delegation and associated changes in behavior; it is certainly not “passive”), I agree that this mechanism provides some benefits to observe collisions. This is particularly true given the potential reduction of visibility in DITL.

Fair enough.  I used the term because it's what seems to have been adopted.  Perhaps "Trial TLD delegation to empty zone" (TTDEZ) is more accurate.

> HOWEVER I have 2 concerns about this approach:
>  
> a). “Passive” collision assessment is more subject to gaming than DITL. “Passive” collision assessment (the delegation) would occur at a known time, likely after the applicants are known, and will be trivially poisoned by gaming (whether in reality or the mere specter of gaming), which will call any and all data collected via this means into question. While it is theoretically possible to game all data sources, DITL having occurred in the past is generally more resistant. Not perfect, but more resistant. Gaming is a material risk to ICANN’s program which – even accusations or the “specter” of gaming – could cause quagmire and controversary. Any analysis will have to look at multiple datasets.

I agree that this data set is subject to gaming.  However, I have no reason to believe that this problem is unique to this data set or even that this data set is more susceptible than others to this problem.  Where there is economic incentive, a way will be found.  I just don't think this is any more of a problem than it would be with any other data set.

> b). Given the energy in 2012 about root zone changes, we need an Opinion from the IANA on impact to their processes and potentially RSAC about the impact to root servers if we were to *double* the number of changes/delegations associated with a TLD. They may not be concerned, but it would be irresponsible of this group to recommend something without asking!

I think this is a fine consideration.

>  
> Again, those are my concerns. I could get behind “passive” collision assessment if those issues are addressed. I understand the hoped-for benefits.

>  
> 2). For all the reasons Casey mentioned below, and all the reasons previously discussed ad nauseum, I too strongly oppose honeypots (marketed as “active collision assessment”) as I think it is being proposed.
>  
> The “reject all” honeypot approach is an improvement over a traditional “interactive” honeypot, but I remain unconvinced that the improvements are worth the material increase in cost and risk over the well proven CI approach. Juice not worth the squeeze.

I appreciate the thoughts not his. And while I'm not suggesting that "reject-all" is the right solution (it is merely proposed for discussion), let me tease a little more detail out of the phrases: "worth the material increase in cost and risk over the well proven CI approach" and "Juice not worth the squeeze".

CI is definitely the most conservative approach in terms of disclosure of name collision-like activity from end systems.  No transport- or application-layer communications leave the end system.

Nonetheless, there are some additional questions for consideration.

1) Telemetry.  Is it sufficient to rely on quantitative (only) metrics, reports from a severely filtered ICANN reporting form, and third-party posts to user forums to know "how well" it is really going?  The truth is that currently (and for the past 10+ years) we have only known *some* of what's going on.  The survey responses (few though they might be) showed that more than half of respondents didn't notice 127.0.53.53.  So it's possible that all the evidences for 127.0.53.53 we have observed represent fewer than half of the occurrences.  That's a problem for me.

But back to your question about cost-benefit analysis or the juice being worth the squeeze.  The benefit (or "juice") of telemetry is that we can actually be (more) informed about who is impacted, how much they are impacted, and what action might be taken--for the currently impacted third parties or future third parties that might be impacted.  We can do this either by inference, through quantitative measures, or directly, through due diligence reach-out.  And it could be performed by contracted individuals (e.g., a TRT) or by at-large researchers.  However, we cannot do either of these without data.  We have gotten by with the current approach, but as far as I'm concerned, we have only buried our heads in the sand.

There are two other proposals in place -- one for "active collision assessment" (aka "interactive honeypot") and one for "reject-all".  The telemetry-related benefits of both of these (independent of any other characteristic) is that they allow transport- and (in the case of active collision assessment) application-layer communications to take place between affected clients and some honeypot server.  This increases the telemetry, such that additional data is available for due-diligence, analysis, reach-out, etc.

There is also a third method to increase the telemetry data available, even if controlled interruption (127.0.53.53) is still used in lieu of any other.  That is to require the registry to log queries during the controlled interruption period and make them available to ICANN (or whoever the analysis team is) for analysis and reach-out.  This is another compromise between some "traditional" controlled interruption and the less conservative proposals.

Finally, I'll add that using something like FarSight/DomainTools DNSDB and/or NXDOMAIN feed is invaluable to supplement what little data we have -- even if that data set is not expanded by the use of additional mechanisms.  But the problems are primarily that 1) it tells us the *what* but not the *who*; 2) its coverage is not ubiquitous; and 3) it is a third-party dependency.  The first is probably the biggest limitation, but #2 and #3 should be noted nonetheless.  In other words, I view this as "better than nothing" but not "as good as we might have it".

2. Data management.  This is perhaps the most significant question of cost (or "squeeze") associated with the added telemetry.  I am not a data management expert (that is, with regard to laws and regulations).  However, there *are* data management experts, and I think that if the group feels that the telemetry data is valuable enough (see previous item), then it becomes a matter of 1) what data is being transmitted and 2) how that data is being stored/managed/protected.

As I have expressed before, I am strongly opposed to passing application-layer data (e.g., "active collision assessment"); I believe that it is a threat to both security and privacy to attempt application-layer communications with end-systems -- not to mention user experience.  However, I suspect that network- and transport-layer information is less problematic and would give any analysis team so much more to go on than what they have now for understanding the problem space and pursuing active reach-out.  I can't imagine that this group is the first that has done this, and I don't think this is an insurmountable problem.  For example OARC maintains the DITL data by restricting the machines that it resides on, the individuals that can access it (i.e., members only), and more.  I know that the data is not the same, but I'm just saying that there is some precedent.  Finally, DNS query collection should be the most innocuous and, in fact, is the minimum needed -- not just for "passive collision assessment" but for any disruptive mechanisms (controlled interruption, reject-all).  It surprises me that this has not been done thus far.

> Furthermore, as noted in my “comparison” grid, a reject all honeypot would discard a decade of awareness/content in the 127.0.53.53 signaling which has shown to be an effective way to communicate with administrators through logs. The notification effectiveness of any honeypot (I’ll call them “interactive” and “reject all”) would be materially less than CI for that reason.

Again, I suggest care be used with regard to a  blanket statement such as "has shown to be an effective way to communicate with administrators through logs" or that "The notification effectiveness of any honeypot ... would be materially less than CI for that reason." There is definitely data that shows that it is being found (in addition to other reports, see sections 4, 5, and 9 of the root cause analysis [1]), and that problems with name collisions are being corrected (see section 8).  But there is also data showing that it has not been seen or that its meaning was not clear (see section 9).  I feel that filling in the blanks for future applications/delegations is important, so we're not patting ourselves on the backs with such a limited understanding of what is actually happening.

Also, while 127.0.53.53 does now has a decade of experience, what does that really mean?

1. Does it mean that a new signaling method would now be confusing or that someone might rule out name collisions if they see an address other than 127.0.53.53?  Let me suggest that an individual or org that has experienced name collisions (and fixed the problem) will likely not experience it again (though possible).  And for a first-timer, does it really matter what the signal used to be?

2. While 127.0.53.53 has s decade of experience, sysadmins have been using other tools a lot longer and in other situations, including reverse DNS.  nslookup (or dig -x) can easily be used to issue a reverse lookup on an IP address--something that could not be done with 127.0.53.53.  If these tools returned a value of:

there-is-a-problem-with-your-dns.please-visit.name-collisions.icann.org.

for a given IP address, does this give an administrator additional hints?  Is it somehow discounted because it's not 127.0.53.53?  Also, it is very easy to translate variants of that domain name to an IP address, such that an administrator *could* visit a Web server at that IP address for more information. That is:

*.foo => 192.0.2.1
192.0.2.1 => there-is-a-problem-with-your-dns.please-visit.name-collisions.icann.org.
there-is-a-problem-with-your-dns.please-visit.name-collisions.icann.org. => 192.0.2.2
please-visit.name-collisions.icann.org.=> 192.0.2.2
name-collisions.icann.org => 192.0.2.2

(Please note that this is not the same as active collision assessment!)

We don't know the answer to this, and we have no experience with it.  However, what we do know is:

- Finding 127.0.53.53 has not been universal, and, according the survey, fewer than half are finding it.
- Reverse DNS lookup (via nslookup or similar tools) predates 127.0.53.53 and is a more general technique for identfying  an IP address.
- In the case of reject-all, the user experience is expected to be nearly the same.

However, as mentioned above, additional data (in the form of network- and transport-layer information) would be exposed, and there is a data management issue to be resolved.  I don't think the answer is immediate, but I don't think we're the first ones with that problem.  It just depends on how much we desire the telemetry data.  I think it is of value to the parties potentially affected by name collisions, those to whom ICANN contracts for the purposes of due diligence, and the research community at large.

Casey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20230307/31b49e4e/attachment-0001.html>