[rssac-caucus] FOR REVIEW: Harmonizing the Anonymization of Queries to the Root

Wessels, Duane dwessels at verisign.com
Fri Mar 2 00:56:55 UTC 2018


Brian,

Yeah, I wouldn't mind seeing hashing as one of the options (for completeness if nothing else). 

I don't really know if efficiency will be a deciding factor.  They might all be efficient enough.  

As far as I can see, hashing's main advantage is irreversibility.  One of the proposed methods (mixing full addresses with truncation) is irreversible, but only for IPv4.  None of the techniques offers IPv6 irreversibility.

Of course hashing gets you collisions, so thats the tradeoff.

DW



> On Mar 1, 2018, at 11:20 AM, Brian Dickson <brian.peter.dickson at gmail.com> wrote:
> 
> Hi, all, and sorry for the top-post reply...
> 
> Most (or all) of the proposal use encryption as the anonymization technique.
> 
> I wonder if the goals might be better achieved with some kind of crypto one-way hash instead.
> 
> Selecting a common hash, and common salt or salting method, allows queries which are hashed with the same salt, at different root servers, to be matched up, which I think is a useful property.
> 
> Examples of a rotating salt (reducing the long-term identification problem by increasing the amount of work) might include:
> - daily UTC (or hourly, or whatever)
> - current (root) ZSK (rotates every N days, IIRC)
> - others?
> 
> One or more (identical) salts plus the original address, would give the same result regardless of root server receiving the query. Rotating the salts means more work to identify a given IP based on a hash, but trivial combining of multiple sources into larger data sets within a given interval.
> 
> Thoughts?
> Brian
> 
> This allows efficiency (hashes are reasonably efficient), and isn't directly reversible.
> 
> On Wed, Feb 21, 2018 at 2:53 PM, Paul Hoffman <paul.hoffman at icann.org> wrote:
> On Feb 13, 2018, at 2:59 PM, Wessels, Duane via rssac-caucus <rssac-caucus at icann.org> wrote:
> > In addition I would really like to see some kind of summary (table perhaps) that presents the following for the various techniques:
> >
> > - advantages / disadvantages
> 
> I don't think that is possible to do in a clean fashion. The advantages/disadvantages change radically if you are a:
> - RSO
> - Researcher
> - Person who wants your IP address completely anonymized
> 
> > - cryptographic strength (I realize this could be difficult since not all are well-studied at this point).
> 
> You also have to define what you mean by "cryptographic strength". If you mean "how much effort would I need to find the random key so I can de-anonymize the rest of the dataset", 3.1 (mixing with truncation) would require 2^128 operations, 3.2 (Cryptopan) would require 2^128 operations unless the RSO used shortcuts to keep certain CIDR classes together, and 3.3 (ipcrypt) should take 2^128 if there are no attacks on the cipher.
> 
> > - efficiency (i.e. CPU time to anonymize some amount of (DITL) data).
> 
> That's also difficult to measure given that no one has spent time optimizing the implementations. Please remember that you will only be running the mixing function if the mapping does not already exist in the table, and that will be true for the vast majority of the time unless you are under a DDoS that is using randomized source addresses. Also, if you about about to change the key and start another run, you can pre-fill in the table from the previous table and reduce the processing time even further.
> 
> > - whether or not "decryption with the same key" is a property of the technique
> 
> That is only a property of 3.3 (ipcrypt)
> 
> > - known implementations
> 
> For 3.1, the implementation is trivial. For 3.2, there are links to the implementations we know about (although they are not well documented). For 3.3, the implementation is given in the reference.
> 
> > Also I would like to better understand if the different techniques have any different cryptographic properties when there is at least one known true -> anonymized mapping.  I think we should assume it is trivial for a consumer of the anonymized data to inject beacon queries that would enable them to know the anonymized value of a specific source IP.
> 
> For 3.1, there is no linkage between any mappings: that's inherent in AES. For 3.2, there is a linkage if the mapping is in the same prefix as the address in question. In 3.3, if there is no known problem with the algorithm, there is no linkage between any mappings.
> 
> --Paul Hoffman
> _______________________________________________
> rssac-caucus mailing list
> rssac-caucus at icann.org
> https://mm.icann.org/mailman/listinfo/rssac-caucus
> 
> 




More information about the rssac-caucus mailing list