[RSSAC Caucus] 48 HOUR LAST CALL: UPDATED RSSAC Advisory on Metrics for the DNS Root Servers and Root Server System

Tue Feb 18 22:29:47 UTC 2020

> On 19 Feb 2020, at 5:57 am, Fred Baker <fred at isc.org> wrote:
> 
> 
> 
>> On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih at apnic.net> wrote:
>> 
>> I could imagine a test of sending 10 (or some other not too small, not too large number)  back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses
> 
> I'll echo Paul and Duane's comments here. On this one, I have a question of statistical validity. RFC 6928 recommends a TCP initial window of ten because that is a number that can be reasonably expected to traverse the open Internet if initiated as a burst. Matt has gone so far as to tell me that his measurements suggest that some TCP Offload Engines appear to successfully send bursts of 60K bytes or about 40 segments back to back. So I wonder whether we would learn anything from a ten packet burst - do we need a 100 packet burst, or something else, and for what reason do we need that?
> 
> In any event, I think we would need something resembling suggested text, and some evidence that the measurement tests a case that eluded the existing tests. That's not "push back" as much as "what do we learn if we add this one?”

So in terms of what "we do we learn if we add this one", if I can address that first, is the extent to which individual service instances are “coping” with the query load that is imposed on them. 

The other questions are more about the details of the measurement. If a train of UDP packets is injected into the network what is the anticipated success rate of transmission through the network. Will 10 enjoy a higher success probability for all 10 packets than 100? and so on. The ‘signal’ that this measurement would be looking for is missing responses and the inference would be that an overloaded UDP service would load shed by discarding incoming queries.

Its late in the process for this particular incarnation of the metrics document to bring this up and if we think that updating metrics of the RSS is an ongoing effort then another response may well be to keep this in mind for the next round of document revisions.

Geoff