[ICANN-CSC] July 2023 IANA Naming Function Performance Report

Rick Wilhelm Rwilhelm at PIR.org
Thu Aug 10 14:56:35 UTC 2023


Thanks Kim, for the detail about the test.

I think that the point that I’m trying to make is that the IANA SLAs are there to measure the effectiveness of the IANA function and things that are within its responsibility.

In this case, the timer on the SLA is inclusive of time that is outside of the IANA demarcation of responsibility.  And thus, as we’ve seen, IANA can be doing a perfectly good job (even doing a fair bit of parallelism), and because of non-responsiveness the server on the other end.

I’m wondering if there is some sort of a factor that can/should be added to account for latency that is not due to IANA?

Thx
Rick


From: Kim Davies <kim.davies at iana.org>
Date: Wednesday, August 9, 2023 at 3:41 PM
To: Rick Wilhelm <Rwilhelm at PIR.org>, Amy Creamer <amy.creamer at iana.org>, Bart Boswinkel via ICANN-CSC <icann-csc at icann.org>
Subject: [EXTERNAL] Re: [ICANN-CSC] July 2023 IANA Naming Function Performance Report
CAUTION: This email came from outside your organization. Don’t trust emails, links, or attachments from senders that seem suspicious or you are not expecting.
________________________________
Hi Rick, Hi all,

Generally speaking, a lack of network response from one or more nameservers has a compounding effect across a test run in our systems. Assuming that most nameservers have both an IPv4 and IPv6 address, this means that we would send four queries (we test each IP address via both TCP and UDP), and then we retry them 3 times before giving up for each query. The current timeout is set to 5 seconds, so this means a minimum of 60 seconds of test time eaten up for each unresponsive nameserver per sub-test. There are efficiencies through parallel execution of the tests, but that is offset by the fact we query for different kinds of records throughout a test run (SOA, DNSKEY, NS, A/AAAA, RD-bit set). In more pathological cases, if there are multiple nameservers all in the same network that are unreachable, it can multiply the test time further.

I looked at one test run from the individual case that caused us to exceed our SLAs last month and there were a total of 316 DNS queries sent that were not responded to throughout the course of that one test run, which took 14 minutes and 40 seconds to complete.

Thanks for the reference to Spec 10, it seems the pertinent standard is <1500ms response for TCP and <500ms response for UDP, for 95% of queries. Since our lookups are a one-shot real-time blocking operation, as opposed to passive ongoing tests done around the clock for gTLD SLA monitoring, it is a bit of a different proposition. Also I understand ICANN does SLA monitoring for multiple sites and aggregates the performance, which is not something we do in IANA today.

Happy to discuss this further.

kim



From: ICANN-CSC <icann-csc-bounces at icann.org> on behalf of Rick Wilhelm via ICANN-CSC <icann-csc at icann.org>
Reply-To: Rick Wilhelm <Rwilhelm at PIR.org>
Date: Wednesday, August 9, 2023 at 4:38 AM
To: Amy Creamer <amy.creamer at iana.org>, Bart Boswinkel via ICANN-CSC <icann-csc at icann.org>, Bart Boswinkel via ICANN-CSC <icann-csc at icann.org>
Subject: Re: [ICANN-CSC] July 2023 IANA Naming Function Performance Report

Amy, et al,

Thanks for sending over the report.  This might be a better topic for discussion at the next meeting meting than for the list, but I’ll try to frame it coherently:

Regarding the missed SLA:

>From what I can gather in reading the footnote, it seems that during the execution of the “One request (that) exceeded the technical check threshold of 10 minutes”, operations happened normally (i.e. kicked off normally, experienced no IANA-induced interruptions, etc), and the extra time was spent waiting.

It seems to me that the “technical check (retest)” should be designed with a DNS query timeout value that is sufficiently short such that waiting for the timeout(s) to expire does not cause the SLA to be violated.

I’m not familiar with the exact details of the test design, but I’d point folks to the Base Registry Agreement, Specification 10 (https://itp.cdn.icann.org/en/files/registry-agreements/base-registry-agreement-30-04-2023-en.html [itp.cdn.icann.org]<https://protect-us.mimecast.com/s/u0LtCzpyrXSw8lwSXxy4D?domain=urldefense.com>) search for “will be considered unanswered” for examples of language that contemplate non-responsive services

Happy to discuss.

Thanks
Rick




From: ICANN-CSC <icann-csc-bounces at icann.org> on behalf of Amy Creamer via ICANN-CSC <icann-csc at icann.org>
Date: Tuesday, August 8, 2023 at 1:39 PM
To: Bart Boswinkel via ICANN-CSC <icann-csc at icann.org>
Subject: [EXTERNAL] [ICANN-CSC] July 2023 IANA Naming Function Performance Report
CAUTION: This email came from outside your organization. Don’t trust emails, links, or attachments from senders that seem suspicious or you are not expecting.
________________________________
Dear CSC,

Please find attached the IANA Naming Function Performance report for July 2023. During the month of July, we met 98.3% of the SLA thresholds.  This was due to missing the SLA of:

Technical Check (Retest) - Routine (Technical): One change request had nameservers that were unreachable within the technical check threshold of 10 minutes.  This exception relates to time spent waiting for nameserver responses, i.e. time waiting to timeout, multiplied by retries.

We look forward to answering any questions you may have about the report.


Regards,

Amy Creamer
Director of Operations, IANA Services
Email: amy.creamer at iana.org<mailto:amy.creamer at iana.org>
Phone: +1-424-537-8917
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/icann-csc/attachments/20230810/1cf99a0e/attachment.html>


More information about the ICANN-CSC mailing list