[NCAP-Discuss] [Ext] Re: An Approach to Measuring Name Collisions Using Online Advertisement

Wed Jun 8 18:27:36 UTC 2022

All that being said, I think ad network data may be an interesting dataset to collect when evaluating problem strings like c/h/m for potential future release from Purgatory. In those bespoke situations that are not in the mainline application processing, there is no reason not to try new things. Maybe after we gain some comfort with those datasets it could be a candidate for more widespread use in the future.

Jeff

From: Jeff Schmidt <jschmidt at jasadvisors.com>
Sent: Wednesday, June 8, 2022 1:17 PM
To: Steve Sheng <steve.sheng at icann.org>; ncap-discuss at icann.org
Subject: Re: [Ext] Re: [NCAP-Discuss] An Approach to Measuring Name Collisions Using Online Advertisement

While leveraging the Google ad network to collect data is clever, we have absolutely no idea what sorts of data it will generate and if the data is at all useful for the purpose. It's truly a science experiment. Not that I'm opposed to science projects or evaluating new approaches, but this is something that needs to be production ready and defensible if NCAP is going to recommend it.

We know a lot about the other collision metrics we've discussed: they've been used for decades or longer, published, peer-reviewed, and are back-test-able. Throwing new, unproven, un-back-test-able Google ad network data into the mix at this late date, with no history, guidelines, thresholds, etc. - as a gatekeeper in the application process - seems a recipe for disaster. We can't even discuss thresholds or criteria in advance because we have no idea what the data will look like! And it's not testable until after the first delegation so we have to recommend it first then hope for the best. We have no baseline and nothing to compare it to. What is "good" and what is "bad?"

Also, the Google ad network approach will only tell us about a subset of unknown representation of networks/infrastructure where Google ad processing browsers are attached. False negatives and false positives (result is not representative of local network segment/infrastructure) may occur in situations where the DNS resolution infrastructure is unrelated to the local network segment (people using GoogleDNS, CloudflareDNS, OpenDNS, on a VPN, etc) - all increasingly common. Additionally, Google ad networks are blocked in not insignificant networks globally.

Also, I believe this approach relies on a unique relationship between Google and a researcher (allowing JavaScript network libraries to run in ads that are typically blocked) and this approach may not be directly transferrable to other ad networks (where those unique relationships may not exist). Unfortunately, if limited to Google (itself a large Registry Operator), perceptions of conflicts of interest and unequal access to data may become an issue.

I have my doubts if this adds anything to our understanding of collisions; certainly one wonders if the proverbial juice is worth the squeeze. See page 25 of our final report:

"Even though all of our HTTP honeypot pages contained the overt request to contact
us, JAS received not a single notification. Reviewing our HTTP logs, less than 8% of
DNS resolutions ultimately led to the retrieval of one of our HTTP honeypot pages.
Reviewing the HTTP logs further, less than 12% of those 8% reported an HTTP
user-agent that could be considered a user-facing application (i.e. a Browser)."

Net-net: The vast majority of the things requesting colliding DNS names weren't human-facing/browser-y things. The things that would be running Google ads.

Finally, this approach is ripe for gaming and will be gamed if it is a gatekeeper in a future application round. All the approaches to ad-fraud / click-fraud might be used to steer this type of study any way an unscrupulous actor desires. The folks with experience running ad network research have never had to deal with adversaries actively working against them seeking to manipulate the results.

This is a solution in search of a problem, IMHO. What specific problems are we solving and is a science experiment justified?

Jeff

From: Steve Sheng <steve.sheng at icann.org<mailto:steve.sheng at icann.org>>
Date: Friday, April 29, 2022 at 9:32 AM
To: Jeff Schmidt <jschmidt at jasadvisors.com<mailto:jschmidt at jasadvisors.com>>, ncap-discuss at icann.org<mailto:ncap-discuss at icann.org> <ncap-discuss at icann.org<mailto:ncap-discuss at icann.org>>
Subject: Re: [Ext] Re: [NCAP-Discuss] An Approach to Measuring Name Collisions Using Online Advertisement
You don't often get email from steve.sheng at icann.org<mailto:steve.sheng at icann.org>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi Jeff,

  This is not SSAC recommendation or an official SSAC work product, but a thought piece generated by some SSAC members.

  I hope this clarifies, let me know if you have further questions.

Best
Steve

From: Jeff Schmidt <jschmidt at jasadvisors.com<mailto:jschmidt at jasadvisors.com>>
Date: Thursday, April 28, 2022 at 6:08 PM
To: Steve Sheng <steve.sheng at icann.org<mailto:steve.sheng at icann.org>>, "ncap-discuss at icann.org<mailto:ncap-discuss at icann.org>" <ncap-discuss at icann.org<mailto:ncap-discuss at icann.org>>
Subject: [Ext] Re: [NCAP-Discuss] An Approach to Measuring Name Collisions Using Online Advertisement

For clarity, is this an "SSAC Recommendation" or any sort of official SSAC work-product, or is it a thought piece generated by people that happen to also be SSAC Members?

Thx,
Jeff

From: NCAP-Discuss <ncap-discuss-bounces at icann.org<mailto:ncap-discuss-bounces at icann.org>> on behalf of Steve Sheng <steve.sheng at icann.org<mailto:steve.sheng at icann.org>>
Date: Thursday, April 28, 2022 at 4:57 PM
To: ncap-discuss at icann.org<mailto:ncap-discuss at icann.org> <ncap-discuss at icann.org<mailto:ncap-discuss at icann.org>>
Subject: [NCAP-Discuss] An Approach to Measuring Name Collisions Using Online Advertisement
You don't often get email from steve.sheng at icann.org<mailto:steve.sheng at icann.org>. Learn why this is important [aka.ms]<https://urldefense.com/v3/__https:/aka.ms/LearnAboutSenderIdentification__;!!PtGJab4!p3DXeMB6HOS0UqNysHwCmwPhWuQvaDpsGkjieh_W0vvXPhgDbXClUE1yeuD1-om5PzXKFbw$>
Dear NCAP Discussion Group,

   The SSAC NCAP WP, a group of SSAC members who are actively following and participating in the NCAP work, have developed a proposal to measure name collisions using advertisement-based measurement. It will complement the passive collision analysis as discussed in the discussion group by providing more direct data on the collision rates for a given candidate TLD string as well as  variance in collision rates between countries and between individual networks. The data may also help mitigation of name collisions.

   Please see the attached proposal for your consideration.

Best
Steve Sheng
On behalf of SSAC NCAP WP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/ncap-discuss/attachments/20220608/7b3f53cd/attachment-0001.html>