[rssac-caucus] Handing the anonymization document off to RSSAC
Paul Vixie
paul at redbarn.org
Tue Apr 10 16:40:12 UTC 2018
Ray Bellis wrote:
> On 09/04/2018 21:42, Paul Vixie wrote:
>
>> anonymizing at /48 for v6 and /24 for v4 isn't enough. even the
>> least capable data scientist, using data that's less than a millionth
>> of the other data in google's or facebook's or cambridge analytica's
>> possession, can _trivially_ deanonymize that.
>
> Couldn't that same data scientist also reverse anything that maintains a
> 1:1 relationship between input and output?
this isn't 1:1, so i don't understand your question.
> You appear to be shifting the goal posts. The document doesn't mention
> safety.
i apologize-- i was not viewing your comments in the context of the
document.
> The entire documented rationale for the entire RSSAC study and therefore
> anonymization seems to be this single sentence in the Introduction:
>
>> Some operators are uncomfortable sharing IP addresses of the query
>> sources and some are even legally prevented from doing so.
>
> GDPR seems to be the main driver for this right now. I'm (currently)
> satisfied that pseudonymization of IP addresses by truncation satisfies
> any obligations we might have there.
i think there's a crypto-lite proposal that preserves identity of
endpoints but destroys their associativity. if true, this is likely to
do a better job of assuaging GDPR and similar concerns.
some of my sensor operators just always report 0.0.0.0 for the source
address. the document ought to mention this as an extreme example of
privacy preservation. and for my dayjob's purposes, this doesn't matter.
but for most forms of dns research, it's important to know that the same
endpoint re-asked the same question, or asked several questions. in that
sense, prefix masking discards more information than the crypto-lite
proposal i saw. this means there is a tradeoff between the
deanonymization risk of exposing that a dns transaction and some non-dns
activity came from the same network, vs. knowing that two or more dns
transactions came from the same endpoint.
the context of my remarks is the giant pendulum of history, which swung
too far in the direction of "let google and facebook run giant vacuums
and hope they and their customers behave ethically, even though there is
no transparency", and is now swinging in the direction of "if the user
or operator doesn't believe the collection is in their best interests,
and if they have not given verificable and revocable permission for it
to be collected, and if the laws of the land don't support it, then
assume it's bad and prohibit it blanket-wise."
the right answer is somewhere in the middle of those pendulum swings.
here in rssac-caucus we can afford to consider what's right rather than
only what's practical or inevitable. i hope we take that opportunity.
--
P Vixie
More information about the rssac-caucus
mailing list