[NCAP-Discuss] Outline of possible phases

Wed Aug 16 19:55:46 UTC 2023

I've added to the discussion that Rubens started here on a few more items, but I've decided to keep the rest of my discussion in a single email.

> On Aug 2, 2023, at 3:42 PM, Rubens Kuhl via NCAP-Discuss <ncap-discuss at icann.org> wrote:
> 
> 
> Phase 1:
> TRT looks into data from DITL, from L-Root and from OSINT about that string.
> NPT asks IANA for delegation of string into the root.
> 
> Option 1: Delegated zone only contains SOA, dotless NS, dotless DNSKEY, NSEC records and RRSIGs for all the records.

Addressed in a separate email.

https://mm.icann.org/pipermail/ncap-discuss/2023-August/001221.html

> Option 2: Delegated zone contains records for Google Ads and for an ad-network popular in China (Geoff Houston style, expanded to cover countries that block Google).

There is some discussion of this in (what is currently) section 3.6.6 of the Sections 1 - 3 document:

https://docs.google.com/document/d/13SQnZt1HHeD9i1cSds-kj16mxRQgxp6hpb2K1kLqB1U/edit?usp=sharing

In my mind, the biggest question that needs to be addressed before moving forward with this is the following.  Because these are artificially-generated queries, the resulting measurement data does not necessarily reflect legitimate activity by end users and systems.  The data can still be useful to learn network configuration, but again, it does not reflect normal system/user activity.  In this case, it is whether a not a network's resolvers are answering for a given TLD, rather than sending it to Internet authoritative servers.  However, it comes at a cost.  To me, the biggest cost is the noise at the root.  I include the following paragraph, which is from the Sections 1 - 3 document:

-- start of quote --
Queries observed at authoritative DNS servers—both TLD and root servers—will include queries from both actual end systems and the [ad] measurements herein proposed.  Without further filtering and processing, the queries from the [ad] measurements will affect the data and metrics associated with “normal” behavior.  At the very least, the two types of queries should be made distinguishable from one another to make accurate and meaningful assessments of the data.  This is possible at the TLD authoritative servers by using query names whose second label is distinguishable.  However, for measurements at root servers, this might not be possible due to a growing percentage of resolvers that use qname minimization and for which the second label will not be visible.
--end of quote--

Because of the noise and the difficulty distinguishing between data from the ad measurements and data from another, I would strongly suggest that if this option is used, it is only *after* a period in which option 1 is used.  However, my leaning is that it's not worth the cost, for what it's worth.

> Duration: no more and no less than 30 days.
> Timing: can start right after “Reveal Day”.
> Order: follows the application evaluation order (“ICANN Draw”) (even though is done per string, not per application)
> Pace: set by IANA to always have spare capacity following RSSAC/OCTO guidance on root zone scaling

I don't know exactly what all these terms mean, but asking more generally, will there be time in between the 30-day collection period and the next phase for an analysis to be done?  In other words, how much time is left for the TRT to do an analysis?

> 
> Question for the DG: do we pick one of the options above or leave that to TRT discretion to decide based on DITL/L-Root/OSINT ? If the later, we should probably separate these phases.
> 
> Phase 2: (optional)
> TRT looks into data from phase 1, and if there is concern of possible issues, decides whether to run or not phase 2. The data collection basis for this phase is the phase 1 report showing that there are possible collision issues.

Should there be a default here?  In other words, should the TRT recommend phase 2 unless the analysis shows that collisions are sufficiently low?  And how do we define that?

> 
> Option 1: Minimal honeypot
> Option 1A: Minimal honeypot RSTs all TCP (what about SPDY ?) and returns unreachable for UDP and ICMP

This is #3 (Transport-layer rejection at publicly available IP address) from my Aug 8 email.

(See https://mm.icann.org/pipermail/ncap-discuss/2023-August/001209.html)

> Option 1B: Minimal honeypot just don’t answer anything

I see that this option was not included in my Aug 8 email.  I have heard this proposed by some.  Personally, I would not support this option.

> Option 2: Controlled Interruption
> DNS servers respond with a wildcard with A, AAAA and SRV records (like 2012 CI)

This is #2 (Resolution to loopback using special IP address in 127/8) from my Aug 8 email.

> Option 3: Ads-based measurement, if Phase 1 is kept to only be an empty zone

See my previous comments on ads-based measurement above.

Is it being suggested that any or all of these could be applied?  If multiple, would be applied serially or at the same time?  If multiple (and particularly if serially), would the data collection period be extended beyond 90 days to accommodate the different techniques?

> Duration: no more than 90 days, on TRT’s discretion to end it before 90 days.
> Timing: if going to be performed, it needs to start no more than 30 days after phase 1, but should start sooner if allowed by TRT workload

Again, will there be time in between the 90-day collection period and the next phase for an analysis to be done?  In other words, how much time is left for the TRT to do an analysis?

> Requirement: DNS servers and DS record is unchanged in the root zone from phase 1, to minimize RZM load. Only zone content changes.
> 
> Question for the DG: do we pick one of the options above or leave that to TRT discretion to decide based on phase 1 ?