[rssac-caucus] Handing the anonymization document off to RSSAC

Tue Apr 10 11:11:52 UTC 2018

On 09/04/2018 21:42, Paul Vixie wrote:

> anonymizing at /48 for v6 and /24 for v4 isn't enough. even the
> least capable data scientist, using data that's less than a millionth
> of the other data in google's or facebook's or cambridge analytica's
>  possession, can _trivially_ deanonymize that.

Couldn't that same data scientist also reverse anything that maintains a
1:1 relationship between input and output?

At least truncating the data does ensure that some portion of the input
data is intentionally destroyed.   I think there's a balance somewhere
in the (to us desirable) property that this is prefix preserving,
against the increase in difficulty because of the N:1 mapping it creates.

If there are arguments to be made against prefix truncation then they
should be properly documented *in the paper*.

> please re-think this. you're making decisions about third party 
> safety

You appear to be shifting the goal posts.  The document doesn't mention
safety.

The entire documented rationale for the entire RSSAC study and therefore
anonymization seems to be this single sentence in the Introduction:

> Some operators are uncomfortable sharing IP addresses of the query 
> sources and some are even legally prevented from doing so.

GDPR seems to be the main driver for this right now.  I'm (currently)
satisfied that pseudonymization of IP addresses by truncation satisfies
any obligations we might have there.

Ray