[ksk-rollover] [Ext] Re: Starting discussion on acceptable criteria for proceeding with the root KSK roll

Fri Jan 5 09:40:36 UTC 2018

On 5.1.2018 09:30, Erwin Lansing via ksk-rollover wrote:
> David,
> 
>> On 5 Jan 2018, at 02.29, David Conrad <david.conrad at icann.org
>> <mailto:david.conrad at icann.org>> wrote:
>>
>> I share this concern, but TBH, from my experience in the outreach I
>> was involved with personally, the response was bimodal, either:
>>
>> A) boredom, having to listen to yet another talk on stuff they’d
>> already dealt with (e.g., NANOGs, RIPE meetings, etc)
>>
>> - or -
>>
>> B) incomprehension, not even knowing what the letters DNS stand for.
>> (e.g., CIO/CTO forums, non-technical venues)
>>
>> The reality is that finding the right people to speak to to ensure
>> resolvers are properly configured for the KSK rollover turns out to be
>> quite hard.
>>
> I, and most people on this list, are definitely in group A.  Those talks
> are good breaks to check email during conferences :-)
> 
> But seriously, that goes to the heart of the problem.  The people trying
> to fix the issue (A) are not the people actually using the service (B).
>  That’s both a problem to reach those people that may need to act in
> some way, but also might lead to misunderstandings about how the world
> looks from the viewpoint of the other group.
> 
>> To be very clear, we don’t want to continue postponing. What we’re
>> looking for is for the community to tell us in the ICANN Org how to
>> move forward. We were surprised with the 8145 data (i.e., that we were
>> actually getting data and the number of misconfigurations we were
>> seeing were as high as they were). We’ve done a bit of analysis and
>> from what little we’ve been able to ascertain, there doesn’t appear to
>> be anything fundamentally broken with the architecture or
>> implementations, rather misconfiguration happens. This isn’t
>> surprising. However, now that we know concretely there will be
>> brokenness, how much is the community willing to tolerate (and what
>> metrics can we use to ensure we’re below that threshold).
>>
>>
> So we don’t want to not do the rollover, we know our data is incomplete,
> and we know there will be an unknown amount of fallout.  From the data
> that we do have through 8145, is there any indication that the amount of
> known brokenness is decreasing?  Could that be used as an indicator
> that, despite all the tremendous effort from ICANN and others over the
> last months, we have no way to decrease the known fallout further,
> thereby assuming there’s nothing more we can do to prevent the unknown
> fallout either?
> 
> Erwin

In my opinion, the important metric is *derivative* of:
# of users behind KSK-2017 capable resolver
vs.
# of users behind KSK-2010 only resolver

- If portion of users who are ready for the roll is increasing, it might
make sense to wait a little bit longer.
- If the number is steady or even getting worse we need to roll ASAP.

This approch has its own problems I can see:
0. This meric requires mapping resolver=>number of its users.
Geoff Huston's method can theoretically do that for big resolvers, but
his method (web ads) have inherent selection bias.
Is it good enough? I do not dare to guess.

1. Both RFC 8145 *and* IETF draft-huston-kskroll-sentinel suffer from
unquantified selection bias at the moment.

RFC 8145 data show 'best resolvers', i.e. those who are managed by
people who updated their software, but this tell us *nothing* about real
state of the Internet at large.

If there were new versions of software implementing
draft-huston-kskroll-sentinel we could attempt to quantify its
deployment, but I'm personally not convinced that it will get real
deployment anytime soon, so its data will again have so strong selection
bias so they will be useless.

As other people mentioned already in this discussion, we need to roll to
a) firedrill for future emergencies
b) improve trust in the DNSSEC technology (most likely by going to
stronger algs/keys in future rolls)

This leads me to conclusion that we do not have and most likely will not
have relevant data anyway, so it is pointless to postpone the roll any
further. Please will fix their stuff when it breaks.

(Geoff and others, I will be more than happy if you prove that I'm wrong
and that you have drawers full of data!)

-- 
Petr Špaček  @  CZ.NIC