[rssac-caucus] [Ext] Re: FOR REVIEW: Harmonizing the Anonymization of Queries to the Root

John Bond john.bond at icann.org
Fri Feb 23 11:18:02 UTC 2018


> On 22 Feb 2018, at 19:09, Geoff Huston <gih at apnic.net> wrote:
> 
> 
> 
>> On 22 Feb 2018, at 10:51 pm, John Bond <john.bond at icann.org> wrote:
>> 
>> Hi Paul, 
>> 
>> Thanks for the response
>> 
>>> 
>>>>> 3.3 ipcrypt
>>>> The one-to-one mapping also means it is susceptible to a know plain text attack but to what severity is unknown however the lack of prefix preservation would likely make any attack harder [then Cryptopan attacks]
>>> 
>>> A known-plaintext attack returns the key used, or allows the attacker some other way of de-anonymizing other addresses. That is not possible in the methods other than Cryptopan. However, if I can inject a query using a known source address to a particular root using an identifiable QNAME, I can find the result in the anonymized PCAP. What is important is that an attacker cannot use this to then determine the random key that was used.
>> I don't believe that someone needs to send a specifically crafted DNS query to reveal the true addresses. I suspect many researchers some of whom are on this list can already identify popular resolvers by looking at there DNS traffic signatures.  Further i believe that comparing an anonymised DITL with one from a previous year that had not been anonymised would allow one to start correlating traffic patterns.  Further statistical frequency analysis would likely reverse mappings as well.   These attacks fundamentally rely on statistics and pattern correlation therefore as the dataset grows it becomes easier to reverse the anonymisation.
>> 
>> We should also consider the attack you suggest where a user can poison the dataset by injecting unique qname queries that identify individual users.  I believe this is very similar to how Geoff's ad network research works but also how many ad networks work.  So at the very least geoff and Facebook will be able to reverse a lot of the annoynimsed addresses, rotating the salt would make it make this attack much harder.  i.e.. the Negative TTL in the root is 84600, if we rotated the salt every 5 minutes [and we have a perfect world] then the aforementioned attack would only be able to reverse ~0.3% of a users* traffic
> 
> 
> And many ad networks can generate the attribution data (which IP addresses use which DNS resolvers). If thats the aim then it seems to me that this measure will do little to obfuscate that relationship. If the aim is to prevent back working of which user is making which query then roll on QNAME Minimisation! 
My understanding is that the aim of this work is to make the data anonymous.  I am saying that if the mapping is a one-to-one mapping and that mapping stays the same across time then there are ways that are not to trivial which would allow an attacker to unmask that mapping. 

> 
> 
>> 
>>>> 
>>>>> 4 ASN and recommendation 3
>>>> I'm strongly apposed to this as i it would make de-annonamising the information and the know text attacks mentioned above much simpler to execute. 
>>> 
>>> Are you suggesting that we remove the recommendation (which Geoff Huston made) or simply make it clear that it is optional?
>> I personally think it should be removed.  At the very least this would allow a research to reverse the IP addresses of most/all ISP and public resolver infrastructure 
> 
> As number information is helpful in many cases in understanding the root cause of anomalous query patterns. If you want to effectively stop all forms of such analysis then thats ok, but at some point the inherent value of these logs - in providing a tool to observe and comprehend DNS behaviours - gets lost in a fog of progressive obfuscation of this log data.
I completely agree at some point the value is lost im sure to a researcher the raw un-anonymised logs are always going to be the ideal.  This document should try and find a balance between providing useful information and maintaining privacy specifically with things like GDPR in mind.  the inclusion of this recommendation is IMO on the wrong side of that equations.  In relation to GDPR it seems to me to constitute an 'Indirect identifier' and as such should be discouraged.  






More information about the rssac-caucus mailing list