[NCAP-Discuss] Honeypot refresher

Thu Apr 30 15:24:07 UTC 2020

On 2020-04-30 10:35, Jeff Schmidt via NCAP-Discuss wrote:
> We are stuck in a Groundhog Day cycle of re-litigation on ancient
> issues.

I'll defer to Warren on his comments here but there are pros and cons, 
certainly.

Some comments inline.

> Some suggest (repeatedly) that Controlled Interruption was designed
> the way it was because we don’t like data and honeypots (redirecting
> colliding lookups to some Internet host controlled by “good guys”)
> would generate the data panacea we always wanted but never had.
> Suddenly all of our questions would be answered, toast would never
> burn, and the coronavirus would be cured.  This is wrong.  Honeypots
> create significant new risks; we concluded that the risks created by a
> honeypot approach were worse than the rewards and suggested
> (Recommendation 12) alternative approaches to gathering more data.
> This issue was discussed extensively in the JAS Phase 2 report.

Jeff, while I certainly understand the primitives was any actual legal 
analysis done on JAS's conclusion there?  I don't see any citations and 
have always wondered if while intuitive, did this benefit from any legal 
advice?

Further, I am familiar with large scale honeypots, as previously noted, 
e.g.,:

Tracking Global Threats with the Internet Motion Sensor
https://pdfs.semanticscholar.org/5ac5/c8dab89963c69198a7597d4a81a7bad980b3.pdf

The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets
https://www.usenix.org/legacy/event/sruti05/tech/full_papers/cooke/cooke.pdf

> Verisign in their public comments, agreed with us (quoting from
> Section 2 of their comment to the JAS Phase 2 Report):
> 
> <Verisign quote>
> 
> Verisign maintains its position that directing requesters to an
> internal address during the controlled interruption period is
> preferable to an external honeypot, because as previously stated, it
> avoids “controlled exfiltration” where sensitive traffic from an
> installed system – without the advance consent of the user or system
> administrator – may be drawn outside the local network. This risk is
> acknowledged in Google Registry’s comments advocating for the
> external honeypot [5]:
> 
> “Unfortunately, some protocols will send sensitive information
> unsolicited (e.g., login.example/login.php?user=fred and HTTP
> cookies). The honeypot will specifically not log this sort of
> information, but this doesn't change the fact that the information has
> been communicated over the Internet.”
> 
> </Verisign quote>

Indeed, I recall that text and don't disagree, but would love to see a 
study of the legal risks and liabilities as opposed to the risks of not 
doing something that may occur downstream.

> https://forum.icann.org/lists/comments-name-collision-26feb14/pdfTWUAZM3gBN.pdf
> 
> 
> Please refer to section 3.1.6 “Alternatives to Controlled
> Interruption” for a detailed discussion of Honeypot (Section 3.1.8)
> (and other) approaches:
> 
> https://www.icann.org/en/system/files/files/name-collision-mitigation-final-28oct15-en.pdf

As stated in the document, JAS selected the Controlled Interruption 
solution "that offers the most value with the least risk".  I think 
revisiting this is not Groundhog Day.  That was a unilateral decision by 
JAS and ICANN and I understand why it was expeditious and convenient 
given the backlog of applicants and businesses awaiting delegation 
(because this work didn't happen before applications were accepted), but 
it was not a community decision and the occurrence and risks of 
collisions still persist, and are arguably worse in some areas with more 
internet-connected devices that may present new risks (or opportunities) 
that use the DNS for rendezvous and service discovery functions.  
Further, what labels should be reserved v. the "constrained" set in the 
original AGB (e.g., at TLD or SLD level).

 From your no-bid justification it's apparent that you didn't think this 
work should happen at all and yet you acknowledge in your brief 
yesterday, as well as the actions with corp.com (that _are largely the 
result of .CORP) that there are significant risks of collisions, even in 
F20 companies that are surely the best resourced.  With these positions 
seem conflicted to me?

The whole point of this work is to provide some predictably to 
applicants when applying for new gTLDs (and solidifying what to do with 
CORP, HOME, and MAIL).  The is to Neuman's "give me a test and some 
predictably please" which I wholly agree with and have all along, and 
continue to believe that the work such as label-based analysis affords 
the best _proactive predictability -- and can measurably be impacted 
with outreach to query sources (e.g., developers, operators, etc...) IF 
necessary.  What data enables this and what are the considerations that 
should be factored (e.g., availability, negative caching, forwarders, 
recursive, and other intermediaries, local roots, operators that 
synthesize responses, stringent and increasing privacy implications and 
data masking/anonymization, and qname minimization) are but some 
examples of an evolving landscape.

-danny

> Jeff