[RSSAC Caucus] RSS Metrics work party -- purpose of metrics

Mon Jul 22 08:45:53 UTC 2019

On Jul 21, 2019, at 10:52 PM, fujiwara at jprs.co.jp wrote:
> I don't oppose metrics designed to measure service levels first.
> However, service level should not depend on vantage point locations.

Speaking strictly for myself, and quite happy to be considered wrong if that's a consensus.

I agree. That said, I think a fair bit of the work party discussion has been about possible metrics that are "interesting" from the viewpoint of a Phd candidate writing a dissertation or someone trying to answer a question that is not fundamental, rather than having operational utility. What I would like to focus on first is operational utility. Phd candidates can measure anything they want at any time, using tools such as RIPE Atlas; I'm not sure that's true of anything business-related.

Where I get a little crazy is a discussion of "service level agreements" when none are defined or actually being requested. If we're going to even discuss that, I would personally expect that we learn about what we can accurately measure, decide whether that has any utility in characterizing the service, and talk about service expectations and thresholds once we figure that out.

In the terms of the document at hand, to my mind we want to 
  - ensure that we are in fact delivering IANA-signed resource records
  - ensure that we are in fact delivering the latest available set of them
  - measure the timeliness of delivery within the context the RSO controls
  - measure RSO availability.

To my mind, the first two can be measured anywhere and may as well be measured at or near the root server. The third and fourth are useful in characterizing an RSO in the context of the vantage point (e.g., the servers that are reachable via anycast from that vantage point), and characterize the *RSS* at that vantage point if they are the servers one would expect a resolver at the vantage point to use (which might mean minimum among the set or median among the set).

I think the thing that Duane is pushing back on is what I would characterize as research questions - measurements that characterize the service from the viewpoint of a resolver somewhere else. There are several possible problems there, not the least of which is that one is necessarily measuring things that are not fixable by the RSO in question. That's not to say that such measurements are not interesting or useful; it's to say that they lend themselves to "gaming the system", as we have discussed, and are almost by definition an infinite set - there is always something new that someone might think of. It's also to push back on the concept of measuring user experience; to my way of thinking, the "users" of the RSS are primarily resolvers, of which there are thousands, not laptops or cell phones, of which there are bazillions.

In the context of "delivering IANA-signed resource records", we are discussing verifying the IANA signature. I think there may also be value in proving that the RSO in question delivers *all* of the resource records that IANA includes in a zone transfer, and that it delivers *only* the resource records that the IANA would include in a zone transfer. Per reports in the media, there are attacks in the DOH space in which TXT resource records are uploaded and used to control botnets. We have also seen cases historically in which certain resource records have been removed from the system entirely by some entity or replaced with CNAMEs to other services; I'm referring to Google names in China, and similar events with the same or other names in other domains. RSSAC001 is fairly explicit in what set of resource records we are delivering (the current set of IANA-signed records, all of them and only them), and we should be measuring that we are meeting that expectation.

My two yen...