[NCAP-Discuss] Name collisions dataset comparisons discussion

Wed Apr 19 18:29:58 UTC 2023

Thanks, Jim!

> I believe there are two places to consider manipulation
> resistance: one is with the data itself and one is with
> how the data is used.
For the purposes of this exercise, I’m considering the datasets’ inherent resistance to manipulation, namely, how difficult it is for a bad actor to intentionally cause false/manipulative data to be present in the dataset. Based on the collection methodologies, some datasets inherently resist manipulation more than others.
Agree that it will also be up to the analysts to sort out manipulation, but having reliable datasets is essential for reliable analysis! Taking the extreme position for illustration, if the only data one is looking at is manipulated, how will they know what is real and what isn’t? Garbage in, garbage out.
> I still do not understand how any of the data collection itself is resistant to manipulation.
Take the extreme cases of DITL and Farsight (most resistant to manipulation) and authoritative DNS (least resistant). How would an attacker reliably insert bad data into Farsight or DITL? In the case of DITL, the collection only takes place once a year and on a confidential date. In the case of Farsight, their sensor locations are not published, so it’s impossible to know where to send manipulative queries. Thus, the datasets are inherently resist manipulation. Not perfect, of course, but resist. Also there is lengthy, lengthy history on these datasets so an analyst can make apples-apples comparisons as a hint to detect manipulation.
The other extreme, authoritative DNS following delegation, is trivially manipulated. Bad actors can watch for delegation and send queries to that server whenever they please. Moreover, DNS (being UDP) is source spoofable, so they may (depending, depending) be able to impersonate any range of sources they choose. Finally, there will be no history so the analyst won’t have any sense of what “normal” looks like. This is a lot for an analyst charged with figuring out what’s going on to deal with!

I think Warren showed how trivial it was to manipulate “top n” standings on the ICANN root (hardly a small root constellation) using only a very small number of cloudy VMs. That same exercise would be very difficult (not impossible, but difficult) to apply to say Farsight or DITL.

> What mitigates manipulation, which I believe we are
> trying to say in the report, is the comparison of the data
> collected to historical data, i.e., analysis of the trends
> that can be discerned and considered during the name
> collision assessment.

Availability of history is a strong help to an analyst attempting to ferret out manipulation, but it is not the only factor.

Does that help?
Thanks,
Jeff

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20230419/edbc3e15/attachment-0001.html>