[NCAP-Discuss] VI/VIN concerns (was: Re: Remaining Public Comments to Discuss)

Tue Mar 12 16:46:38 UTC 2024

> On Mar 4, 2024, at 9:40 PM, Michael Puckett <michael.puckett at icann.org> wrote:
> 
> I’m starting a thread for topics from public comments that need to be discussed by the DG.
> ...
> Legal/privacy concerns for VI/VIN (Brief <https://itp.cdn.icann.org/public-comment/proceeding/Draft%20NCAP%20Study%202%20Report%20and%20Responses%20to%20Questions%20Regarding%20Name%20Collisions-19-01-2024/submissions/ICANN%20org/Visible%20Interruption%20(VI)%20and%20Visible%20Interruption%20and%20Notification%20(VIN)%20-%20Privacy%20and%20data%20protection%20review-26-02-2024.pdf> from ICANN Legal):

I have re-read the legal brief from ICANN.  Here are a few thoughts (note: I am not a lawyer).

> "While VI/VIN can be useful tools for preventing conflicts, they also pose privacy and data protection
> related risks that should be considered....
> "ICANN’s or the third parties’ interests might be outweighed considering the significant risks and negative impacts on privacy and data protection"

I mentioned on the call last week -- and I reiterate -- that the document does very little to weigh utility or benefit against risks and harm.  Much of that is documented in the Study 2 documents, but let me do a quick summary.

I will try to be very objective about this, but I would like to try to reiterate what I feel are the benefits of this as well, as the risks of not doing this.

The risk of DNS queries expecting a negative response from public DNS servers when they are inadvertently leaked is that they are answered by a third party that does not respect their privacy and might even have malicious intent.  That third party could be any entity in control of a DNS server to which those queries are directed, which might be a TLD operator or any entity to which namespace has been delegated -- directly or indirectly.

Thus, the utility and benefits of any mechanisms are about reducing this risk of third-party surveillance, disruption, or theft.  This is what the "privacy and data protection related risks" should be weighed against.

How do we reduce risk of affected parties to third parties?  Well, we started with the notification mechanism known as controlled interruption (CI) -- which is mentioned in the document.  Here is my brief summary of benefits and shortcomings:

Since 2014 Controlled Interruption (CI) has already provided a mechanism for indirect notification of , via communication interruption and application disruption.  As shown in the root cause analysis, there is evidence that the CI IP address has been found -- in many cases.  However, the telemetry associated with CI is extremely limited.  We only know where the CI IP address been found based on: 1) Web searches (i.e., online forums; see Section 5 of the root cause analysis report); 2) self-reporting through ICANN's form (see Section 4 of the root cause analysis report); and 3) results from a survey distributed to system administrators (see Section 9 of the root cause analysis report).  However, #1 and #2 were limited to those that actually discovered the controlled interruption IP address, #2 was limited to those that found the ICANN report submission page and reported, despite the very high bar ("reasonable belief that the name collision presents a clear and present danger to human life"), and #3 was a relatively small sample.  In short, we know that we have a limited view, but we don't really know what we don't know.  We might infer that it's "working" based on the downward trend of collision-related queries at the root, but (as has been pointed out, extensively), that is complicated because the collection of queries at the root servers is a mixed bag -- offering some insights, while there are growing concerns about their representativeness.

Summary: Based on limited data, we might suspect that CI has been "successful" and that any harm that was occurred was that associated with CI itself, as opposed to third-party surveillance.  However, there really has been limited data to confirm this -- to know what's really going on and what potential harm might still remain -- both for users of TLDs since 2014 and users of future TLDs.

No interruption (NI) and Visible Interruption (VI) are attempts to get more data, so we can more effectively assess who is at risk and even initiate reach-out efforts.  The former addresses caching and other issues with analysis of only root server query data, which is what we had with CI.  But still only the IP address of the DNS resolver is revealed.  The latter makes available client IP addresses and ports, to help identify organizations and even users potentially at risk, with greater fidelity.  With both, additional data is leaked to the public Internet.  But that is in the name of minimizing potential risk of harm to users and organizations leaking queries associated with colliding namespaces.

Visible Interruption and Notification (VIN) provides the same data as VI but also involves: 1) the exchange of application-layer data that might include HTTP Request data, such as the requested path, query string or request body data (which might include username/password data, session information, or other sensitive data), browser version, and OS version; and 2) the return of a "message" that might (or might not) been seen by an actual human, to more directly notify them that they are experiencing name collisions.

The question asked by ICANN legal is whether the benefits of not doing VI (and VIN) outweigh the risks.  What I have written here is about the benefits of doing it -- something that was almost completely left out of the document.

> "Furthermore, the lack of transparency in data processing,

This is an unfair and possibly untrue statement.  I think that the proposals in the study 2 report have provided as much transparency as they could.  Even if the exact methodologies have not been spelled out, the objectives of such methodologies have been made pretty clear, from my perspective.

> the absence of data minimization practices,

To be fair, the discussion group laid all the methods out in order of data minimization: NI, CI, VI, VIN.  Between VI and VIN, VI *is* the data minimization.

>  and the risk of exposing sensitive data of data subjects without legal basis make VI/VIN a concern regarding compliance with data protection laws."

My second comment on the call last week was the point that while VI and VIN both involve the "risk of exposing sensitive data", the data disclosed is very different, in my opinion.  That is, what is disclosed at the application layer with VIN is potentially much more sensitive than that disclosed at the network and transport layers with VI.  Referring back to my previous comment about data minimization, these are two very different things and need to be considered differently, not simply lumped together.

> "However, due to the unpredictable nature of the data collected and the potentially high residual risks
> associated with data collection and further processing, implementing these privacy safeguards appears
> to be challenging, if not impossible...."
> 
> "As it will be difficult or even impossible to implement suitable mitigation measures this could have legal and reputational
> consequences for the entities conducting VI/VIN"

ICANN surely cannot be the first to implement data privacy techniques.  Calling this "challenging, if not impossible" sounds overly harsh and negative.

> 
> "Overall, the benefits of conducting VI/VIN may be overridden by the interests or fundamental rights and
> freedoms of the data subjects if the processing of personal data for this purpose involves significant risks
> or negative impacts on their privacy and data protection"

This is true, but again, the benefits need to be considered.

> 
> "Implementing the other safeguards to limit the amount of personal data processed would require knowing in advance what data would be collected when conducting VI/VIN, which is impossible."

Section 3.5 is pretty clear about what data would be disclosed and stored.  I don't understand this statement at all.

> "When a TLD is undergoing VI/VIN, the DNS queries from any system that attempts to resolve a domain
> name under that TLD are ultimately received by the TLD’s name servers.1 This means that information
> about the domains being queried is processed and this information may include personal data. If the
> entity conducting the assessment is not transparent about their data collection practices, this could be
> seen as invasive and raise concerns about data privacy."

This is the same as pre-CI, CI, and NI.  So I'm not sure what this paragraph adds.

This is my 2 cents.

Casey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20240312/da688ad9/attachment.html>