[NCAP-Discuss] Name collisions dataset comparisons discussion

Wed Apr 19 18:44:39 UTC 2023

On 19 Apr 2023, at 14:29, Jeff Schmidt wrote:

> Thanks, Jim!

We’re having fun now.  Comments inline.

>> I believe there are two places to consider manipulation
>> resistance: one is with the data itself and one is with
>> how the data is used.

> For the purposes of this exercise, I’m considering the datasets’ inherent resistance to manipulation, namely, how difficult it is for a bad actor to intentionally cause false/manipulative data to be present in the dataset. Based on the collection methodologies, some datasets inherently resist manipulation more than others.

> Agree that it will also be up to the analysts to sort out manipulation, but having reliable datasets is essential for reliable analysis! Taking the extreme position for illustration, if the only data one is looking at is manipulated, how will they know what is real and what isn’t? Garbage in, garbage out.

>> I still do not understand how any of the data collection itself is resistant to manipulation.

> Take the extreme cases of DITL and Farsight (most resistant to manipulation) and authoritative DNS (least resistant).

> How would an attacker reliably insert bad data into Farsight or DITL? In the case of DITL, the collection only takes place once a year and on a confidential date. In the case of Farsight, their sensor locations are not published, so it’s impossible to know where to send manipulative queries. Thus, the datasets are inherently resist manipulation. Not perfect, of course, but resist. Also there is lengthy, lengthy history on these datasets so an analyst can make apples-apples comparisons as a hint to detect manipulation.

I see your point with respect to Farsight. They have an advantage by using obfuscation about their input sources. In my opinion, and certainly opinions may vary, I would not regard this as “high” resistance. For me, to get a rating of “high” I would want to see an actual tangible tool or method in place. It’s nice to say “your secret is safe with me” but it’s an entirely different matter to see that stand the “test of time”.

Similarly, with respect to DITL, the resistance is grounded in the fact that the date of collection is obfuscated. In addition to the same argument as above I would add that a malefactor could maintain a sustained load of bad data for a period of time in order to manipulate the data collection of DITL. The success of this method is certainly open for debate.

So, for me at least, perhaps I could agree with a “medium” resistance, but I want to think about it some more. Nonetheless, I can not agree with “high”.

> The other extreme, authoritative DNS following delegation, is trivially manipulated. Bad actors can watch for delegation and send queries to that server whenever they please. Moreover, DNS (being UDP) is source spoofable, so they may (depending, depending) be able to impersonate any range of sources they choose. Finally, there will be no history so the analyst won’t have any sense of what “normal” looks like. This is a lot for an analyst charged with figuring out what’s going on to deal with!
>
> I think Warren showed how trivial it was to manipulate “top n” standings on the ICANN root (hardly a small root constellation) using only a very small number of cloudy VMs. That same exercise would be very difficult (not impossible, but difficult) to apply to say Farsight or DITL.
>
>> What mitigates manipulation, which I believe we are
>> trying to say in the report, is the comparison of the data
>> collected to historical data, i.e., analysis of the trends
>> that can be discerned and considered during the name
>> collision assessment.

Agree with all of the above.

> Availability of history is a strong help to an analyst attempting to ferret out manipulation, but it is not the only factor.

I agree that trend analysis is important and the best tool we have at this point in time, separate from the TRT that will of course be conducting the assessment.  They are the best tool and have quite a responsibility.

> Does that help?

It helps me, thanks!

Jim

> Thanks,
> Jeff