[NCAP-Discuss] Workflow methods

Thu Nov 30 22:48:04 UTC 2023

> On Nov 30, 2023, at 3:50 AM, Thomas, Matthew <mthomas at verisign.com> wrote:
> 
> Thanks for bringing this discussion to the group.  You have some good points in here that need to be teased out in the text of the document.  Risk, as you mention, is a bit vague in our denotation; however, I would offer that as you laid out in the email, risk is better described in the multiple dimensions of user/system disruption and data disclosure _combined.

I understand that, and I agree that everything needs to be considered holistically.  What I meant about separating these two things (user/system disruption and data disclosure) is that if we know what amount of data disclosure and collection is acceptable and all other things (user/system disruption) are effectively the same, then it seems to me that we should enable the disclosure of and collect *that* level of data all the time, for the sake of data completeness and consistency.  Internet data analysis is already hard enough with "complete" data; purposefully making it less complete makes it even harder to analyze.

However, ithe group consensus might be that data collection depends on the TLD string and might vary from string to string.  That is what we need to find out.

> In terms of data collection, I don’t believe the DG was saying just collect “counts” by method (I’m not even sure what you mean by method here). 

By "method (3)", I was referring to reject-all, specifically this sentence from the "Sections 1 - 3" doc, section 3.5.3 ("reject all") [1]:

"This method supports the requirements associated with data privacy as no data is collected beyond a record that a connection was made."

It could just be that I'm interpreting it wrong, and additional clarity needs to be added.  I am happy to update this, but I didn't want to make edits on what might be considered a controversial point, without understanding the consensus of the group.

In any case, for the time being I've made a comment in the doc.

> I’m pretty sure the DG was suggesting the client IP, port, etc.

Of course, that would be extremely useful :)

>  
> Could you offer the group what fields in each of the four would be?  I’ll offer a strawman below:
>  
> HINFO – This is DNS data at the authoritative name server. The most useful fields are likely to be the full Qname, timestamp, the source IP of query, protocol.  Maybe some other fields might be useful but those are likely the “big” ones. This would be recorded for every single query.

Yes, if you add query type, then these are the basic fields we would want.  Although, the data might be most effectively collected if something is used that has some precedent, like full pcap (used with DITL) or BIND logs.  Of course, those would include more information that are beyond the "essentials", but that is okay by me.

> CI – Same as data fields as #1 above.

Yes.

> Reject All – Now you have two data sets.  You would still have the DNS data from #1/2 above as well as the connection attempts on the honeypot server. That second server I would expect to log very similar data – timestamp, protocol, port, source IP. But those connections are from the impacted end system, not recursive resolvers.

Yes, this sounds right to me.  See also my note about collection methods above.

> Application layer – I think here you would collect the DNS data again from #1/2 as well as the same #3. I think us getting overly prescriptive on exactly what or if application data is logged here is not needed.  Since we have long said each name collision is its own snowflake, the TRT should have the flexibility to pivot/adapt here as needed.

I think what you are saying is to collect the same data as #3 (which includes DNS data from 1 and 2, of course).  If so, I think that's right.

(To be clear, I still do not find this an acceptable option at all because it allows application-layer data to be sent, but I'm nonetheless stating what the data collection should look like, should this be a thing.)

>  
> Let’s please try to continue this thread and have an active and productive discussion on the mailing list so that this information can get incorporated into the document.

Thanks,
Casey

[1] https://docs.google.com/document/d/13SQnZt1HHeD9i1cSds-kj16mxRQgxp6hpb2K1kLqB1U/edit?usp=sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20231130/c4da49e8/attachment-0001.html>