[NCAP-Discuss] Addressing concerns in context

Wed Dec 7 21:51:17 UTC 2022

A few comments inline.

On 5 Dec 2022, at 12:54, Casey Deccio wrote:

> Hi all,
>
> I mentioned in last Wednesday's call that I wanted to make some clarifications with regard to some of the back-and-forth discussions that were had on the mailing list in recent weeks.  The discussions got complicated because they lost context over time.  This message is intended to restore some context and address three points in particular.  Please note that they are not about endorsing a particular solution but to address concerns about the proposed solutions.
>
>
> 1. The criticality of IPv6 support in manifesting name collisions.
>
> First, let me be clear on the following: 1) I am on board with promoting and helping with increased adoption of IPv6; and 2) I am a researcher in the area of Internet measurement, and as such, for me, more data is better.
>
> That being said, an IPv4-only name collision alerting system is effective for all systems and applications that include *any sort* of IPv4 connectivity.  It is only deficient for systems that *lack IPv4 support*.  In other words, it is independent of general IPv6 growth, promotion, or support, until systems and applications drop IPv4 functionality.
>
> Finally, in saying this, I am neither promoting controlled interruption nor saying that we should not desire IPv6 support.  I am merely stating that in terms of today's Internet (and frankly for the foreseeable future), IPv6 support is not essential to manifesting name collisions, and it certainly does not make IPv6-supporting solutions "vastly superior" by that characteristic alone.  That is simply a mischaracterization.

Personally, I believe a solution that covers IPv6 as well as IPv4 is superior to a solution that only covers one of them.  I would hope you would agree.  If not, then you and I will have to accept that we don’t agree.

If you agree that covering both is superior to covering one of them, then I would ask if your concern is the use of the adverb “vastly”?  If so, then I withdraw my use of it and note for everyone reading this that the term is not used anywhere in the current draft of the work product.  Hopefully, this means Casey no longer has any concerns.

If your concern is something else, would you please say more because I’m not understanding your concern.

>
> For more, please see the sub-section "Application Coverage" under "Alerting Effectiveness and Coverage" in the comparison document.
>
>
> 2. The comparison of risk and disruptiveness between active collision assessment and controlled interruption.
>
> It was stated that "active collision assessment and controlled interruption are equivalently risky and equivalently disruptive."
>
> "Equivalently disruptive."  Both active collision assessment and controlled interruption involve 1) DNS servers returning an answer to a requesting client, where a negative response was the expected response; and 2) the application acting on that answer (i.e., with transport-layer communications), where it previously would not have.  Thus, in both cases, communications are interrupted.
>
> The interruptions, however, are not equivalent.  Controlled interruption is expected to always return a "quick-response" error, while the error behavior for active collision assessment varies, depending on several factors, including network configuration, server configuration, and more.

There must be something I’m missing because I don’t understand the details of your assessment and the point you are making.

I will certainly agree that if one looks at the details of the implementations the interruptions are not equal.  They have different effects, which is intentional.  However, as far as I can tell they are equivalent to the extent they seek to achieve a similar goal, alerting the user, they just do it in a slightly different with the intent of improving the actual delivery of that alert.

More specifically:

What do you mean by “quick-response”?  In both cases an error is provided to the client.  In the case of controlled interruption, that “error” is provided in one round trip, i.e., one query and one response.  In the case of Active Collision Assessment that error may take more than one round trip to be provided.  In actual network terms, CI would provide the error most often in less than 1 second.  In ACA it may take more than 1 second.  Are you suggesting that CI is better because it responds in less than 1 second?

In terms of error behavior, your root cause analysis document showed quite explicitly that the use of the special IP address in the response of 127.0.53.53 what was not as useful as it was hoped it would be.  I believe it’s reasonable to infer this means the error behavior upon receipt was quite random on the part of a client.  Since a client did not actually “see” this response in such a way as to do something useful with it and thus the user got little to no information about the problem without doing some investigation, I’m having trouble understanding how this is better than error responses being proposed in ACA that are tailored to the client so that useful information can be provided directly to the user.

In any case, back to my question, although the details of the disruption are different and thus not equal, I do not understand how they are not equivalent.

Would you please say more?

>
> "Equivalently risky."  As I have mentioned before, "risk" is a term that is both highly loaded and highly ambiguous.  Nonetheless, the very fact that transport-layer communications leave the client with active collision assessment is a significant leap from transport-layer communications never leaving the host systems, as it is with controlled interruption.
>
> More importantly, active controlled interruption involves intercepting communications for proposed ports/applications and returning custom content (ports and applications that are not even enumerated at this point).  Perhaps the two biggest concerns are that 1) application-layer data *will be* communicated; and 2) the user experience *will be* fraught with issues, including (but not limited to) TLS certificate warnings.  Also, the last time something like this was done at similar scale (Site Finder), the public outcry was huge.
>
> For more, please see the sections entitled "Operational Continuity, Security, and Privacy", "User Experience", and "Public Response" in the comparison document.

Yes, I agree, we can all define risk differently and in a way that makes our respective point more significant.  Let me step back.

The fact that application data will be communicated is intentional.  So calling that an unacceptable risk is disingenuous.  It is a risk to be noted, explained, and justified.

Suggesting the user experience will be fraught with issues is an opinion.  The mitigation of those issues will have to be addressed by the developer of the software to be used.  The whole point of the software is to be fail-safe, and I would assert a good engineer is more than capable of addressing this concern given that it is a requirement to do so.

And addressing the concern is what distinguishes this work from SiteFinder, i.e., we’ve learned the lesson.

>
>
> 3. The evaluation of controlled interruption in meeting its goals.
>
> It has been stated that "controlled interruption did not achieve its desired objective."  This is based on the finding from the root cause analysis that "[c]ontrolled interruption is effective at disruption, but not at root cause identification." The problem is that it focuses only on the second half of the sentence and ignores the first half that indicates that it was "effective at disruption."  Disruption was part of the goal related to alerting.  But more important than that was *change*.  Note that another finding from the root cause analysis is the following: "Usage of private DNS suffixes colliding with newly delegated TLDs has decreased over time."  Measurements show that potential collisions have consistently decreased since 2014.  To me, this is the more important part--the fact that changes in the ecosystem were observed related to queries associated with name collisions.  Even though we have some qualitative data that help us anecdotally understand problem resolution, we c
>  annot definitively point to controlled interruption as the cause for the decrease in queries.  Nevertheless, the trend suggests that there was a clear decrease in queries associated with name collisions, which is a "good thing".
>
> For more, please see "Findings" section in the root cause analysis doc.

The objective being discussed in the context for the statement “controlled interruption did not achieve its desired objective” was the desire to notify the user.  CI failed at this, as detailed in your root cause analysis.

Jim

>
>
> Cheers,
> Casey
> _______________________________________________
> NCAP-Discuss mailing list
> NCAP-Discuss at icann.org
> https://mm.icann.org/mailman/listinfo/ncap-discuss
>
> _______________________________________________
> By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.