[NCAP-Discuss] Root Cause Analysis Reports - Final Call for Comments

Casey Deccio casey at deccio.net
Wed Aug 10 15:18:24 UTC 2022


Matt,

We might not come to agreement on this, particularly not in this public forum.

> On Aug 10, 2022, at 5:42 AM, Thomas, Matthew <mthomas at verisign.com> wrote:
> 
>> Section 7.3.1
>>  
> >> The criteria is explained more above (see 1 and 2), but in short, it is 100% of a *sample* of queries, and it is compared to previous work on the subject.
>  
> This only brings in more questioning to the reliability and accuracy and biased nature of using a sampled query data. My concerns remain even with your additional explanatory text.  There is no data driven foundation as to why 5 queries and 100% adherence are appropriate.  This is a fundamental premise that is currently unmotivated and unsupported properly. Stating this as an approximate lower bound measurement is fine, but it needs to be treated as such and it doesn’t facilitate making definitive statements like your conclusion.

You keep mentioning bias, but until you explain to me how *this* sample is biased, I fail to see it.  Using a sample is a completely acceptable approach.  I've already explained the reasoning for 100% minimized queries within the sample.  With regard to the minimum of 5 non-root queries, it is a judgment call, the intent of which is to keep provide enough query behavior that it could be tested for qname minimization, but without unnecessarily ruling out clients that might otherwise.  A negative match in this case rules out resolvers with 80% or less minimization, which is below the 95% mentioned in the de Vries paper.

>  
> >>I'm not exactly sure what you're saying.  The de Vries paper covers many facets of qname minimization, and they are very careful to distinguish which parts can be applied and compared elsewhere--and in how they apply them.  I've summarized some of those points in my introductory text of this email.  And again: 1) we used the same methodology for determining minimized queries in passive analysis as they did, which was based on the findings from their active analysis; 2) we applied the analysis of minimized queries to resolvers using metrics also from their paper; and 3) our technique is a heuristic.
>  
> >>The selection criteria ignores a more selective QNM criteria defined in the RFC such as the Qtype (e.g., A and NS) and excludes multiple implementations of QNM techniques (e.g., nonce second level labels, underscore labels, asterisk labels, etc.).
>  
> Yes – I’m saying profiling known QNM implementations for ground truth needs to be done (again – de Vries paper was years ago). Furthermore, the de Vries knowledge still doesn’t make it into your selection criteria. Why not test for it being NS or A qtypes?  Furthermore, QNM in 2018/2019 was not a standards track RFC.  Things have changed.  Industry implementations have changed. The RFC is now standards track. Applying a 2018/2019 heuristic without profiling current standards is ignoring the state-of-the-art.

There is a lot that *could* be done.  But please let's see the forest for the trees.  You are making some wonderful assertions about what needs to be done.  That might well be true, but 1) we are working with prior work, peer reviewed and published, and 2) for the purposes of *this* report, it is sufficient.  We are looking at *trends* with a *sample* of data.  Testing your hypotheses about the way things *currently* operate would be great, but it sounds like another project.

> Another point is that this analysis did not apply its measurements to “resolvers”, it applied it to any IP that queried the RSS during DITL.  There is no reason to believe that some IP sending 5 queries to the RSS is a recursive resolver.  This is a big distinction and again goes to why the thresholds in #1 are not motivated.

Yes, identifying the role of IP addresses querying the root is currently an unsolved problem, as far as I know.  So we do the best we can with the data we have.  This is precedent.  And in this case, we are looking at trends of querying IP addresses, so it doesn't really matter if it's a resolver or not -- because we are matching it to its name collision queries behavior from other parts of the report.

>   And to the last point of it being a heuristic, sure it is a heuristic but without proper motivation and reasoning, the results are just a measurement of an unmotivated heuristic and need to be treated as such

Sorry, I don't understand why you keep saying "unmotivated".  The point of a heuristic is that we are approximating based on data we have because we don't have exact numbers.

>  
> >>See introductory text, parts 3A and 3B in particular.
> 
> This has nothing to do with 3A or 3B. The state-of-the-art has moved significantly since 2018/2019 and de Vries with regards to QNM.  I’d encourage you to reexamine the current data from 2022 (actually from 2019 on) at DITL to better understand QNM deployment.  I’m looking at A root data now and I can clearly see a _sizeable percentage of QNM traffic that you are not capturing to the items I mentioned previously. I _can’t _stress _this _point _enough.

With all due respect, "I'm looking at A-root data now" does not help me.  This is in part because my analysis targets 2016 through 2021, but mostly the earlier years.  This is also in part because I have no idea what methodology you are using.  While you suggest that the de Vries study is out of date (and it might be), the rigor with which they studied resolver behavior was very impressive, using used multiple vantage points and data sets, passive and active analysis, etc.  This is academic rigor.

But again, the whole point of the analysis is to get a sample of resolvers that do not exhibit qname minimization behaviors, with which we can re-run the name collisions analysis to look for trends.  Some amount of false positives and false negatives is okay.

>  
> >>Sorry, I'm not sure what you are getting at here.  If you are referring to the longitudinal measurement plot, it was intended to show the % of ASNs over time with at least one qname minimizing resolver, as a deployment trend.  Nothing more, nothing less.
>  
> The point is that measuring QNM at a ASN or even IP level longitudinally within the context of NCAP makes no sense. Having one IP in a large ASN that might be QNM tells us nothing material around name collisions.

This had nothing to do with name collisions.  It was part of the longitudinal analysis of qname minimization, which was simply part of the context.

> >>Perhaps - but that is not within the scope of this report.  Some of that discussion is had in the "Fourteen Years..." paper.
>  
> Can you please share this paper b/c I can’t find it? 

It will be available publicly soon.  I will send you a link privately.

> >> This is, of course, unrelated to the qname minimization analysis.  But it is an interesting hypothesis that could be tested.
>  
> It is absolutely within scope if you are claiming a finding in the Root Cause Analysis Report that name collisions decrease via your data measurements using this passive DNS data.

As I said before, it is unrelated to the qname minimization analysis, which is the topic of this thread.  It's a very interesting idea.

> There are existing measurements of the difference between positive and negative referral query rates– not sure why this would need to be tested given what we already know about resolver treatment of NXD responses and the differences in positive/negative TTLs.

I suppose because measurement, or at least applying previous work to current data, is what we do?

> 
> >>With regard to the large public resolver, I'm sorry, but this this is very vague.  Without any documentation and/or empirical analysis to go on, I have nothing to re-assess or improve my analysis.
>  
> See: https://blog.apnic.net/2022/06/02/more-mysterious-dns-root-query-traffic-from-a-large-cloud-dns-operator/ <https://blog.apnic.net/2022/06/02/more-mysterious-dns-root-query-traffic-from-a-large-cloud-dns-operator/>  This was also disclosed years ago I believe at DNS-OARC events.


Thanks.

> >>Please remember that 1) we are interested in *trends* over time and 2) we only need samples--not complete data--to get those trends.  The samples are taken from the resolvers identified as non-qname-minimizing in 2021.  The process and the ultimate sample sizes are well documented in the text and the table.
>  
> Trends require a consistent measurement of the underlying study over time.  You have not motivated or shown why an IP should be treated equal or consistent for measurement over time.  Nor have you shown that a 13 query sample is representative.  Saying this sampling of data is representative is not true without additional evidence.  The same needs to be said about the measurements taken from the passive recursive data used for the root cause analysis report – there is no data or proof that shows that subset of queries to the root is representative and unbiased, given they were collected by a biased subset of operators, who were willing to deploy those data collection probes. 

I'm not really even sure how to respond to this.  This is just the way measurement, sampling, and statistics work.


I completely stand by the qname minimization methodology used and its application in the name collisions analysis.  I also agree that there might be other ways of doing it, but that doesn't make what I have done and written *wrong*.  Finally, please remember the *point* of this exercise, which is not perfection but approximation.

Casey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20220810/a59e6529/attachment-0001.html>


More information about the NCAP-Discuss mailing list