[ksk-change] planned vs. emergency (was Re: [ksk-rollover] root zone KSK ...)

Sun Sep 21 23:54:12 UTC 2014

On 9/21/2014 6:16 PM, David Conrad wrote:
> Mike,
>
> On Sep 21, 2014, at 2:12 PM, Michael StJohns <msj at nthpermutation.com> wrote:
>> What I think you're saying above is basically, "I don't want a system that can deal with the most likely single compromise scenarios, but that I want to do a full scale trust reboot every so often and require 100 of 1000s (millions?) of manual updates of trust anchors.”
> What I want or don’t want is, of course, irrelevant.
>
> It might be interesting to explore assumptions.  For example, what do you believe is “the most likely single compromise scenario”?  And, what do you think the penetration of 5011 will be in validating resolvers now and in (say) 5 years?

The single, simplest compromise scenario is where someone manages to set 
the revocation bit of the current sole KSK during one of the regular 
re-signings.  That revokes the sole trust anchor and takes DNSSEC 
offline for a large portion of the network.  If I wanted to cause the 
most mischief for the least amount of effort, I'd try and attack things 
there.

With respect to 5011 penetration - easiest thing is to ask the vendors.  
AFAIK, its included in all the commercial for-sale products as well as 
in the custom built versions used by various providers.

>
> I am assuming:
>
> a. for all intents and purposes, the likelihood of _any_ compromise/loss of the root key is statistically equivalent.
Bad assumption.  First its "a" root key, not "the" root key - or should 
be.  Second, a stand-by key locked away in a safe with appropriate 
physical safe guards and maybe even split into N of K slices and 
encrypted is probably less vulnerable than using an active key to revoke 
itself or even using the root KSK to sign a set of ZSKs that weren't 
supposed to be signed.

The probability of the compromise of the root key system is the product 
of the probability of the compromise of each individual trust anchor 
private key, and that's the critical metric.  So if you have a 1 in a 
million chance to compromise one key, and the probability is similar 
between all keys, the probability of compromising the SYSTEM if you're 
using 2 keys is 1 in a trillion and 1 in a quintillion for 3 keys.
> b. regardless of (a), we _must_ be capable of dealing with a statistically unlikely event occurring.
Yes, but... you already have a way of dealing with clear field trust 
rebooting.  It's well documented. It's just clumsy as hell and requires 
manual intervention.    If you're looking for something better just for 
that occurrence, you're probably going to be looking for a long while.  
You can't magically bootstrap trust from no trust.

OTOH dealing with less than catastrophic single key compromises seems to 
be well within the possibility of automated and secure and is exactly 
what 5011 was designed to accomplish.

> c. touching the root key for any reason increases the probability of catastrophic failure/compromise by an infinitesimal but non-zero amount.
No.  Touching the *only* root key does that. Touching one root key where 
the others are locked away decouples the fate of the system from the 
fate of the key.

> d. changing the root key of the DNS is and will continue to be an infrequent event (both because of (c) but more likely the PITA-ness of changing the key).
  This is a circular argument, we won't change the key, because we 
haven't changed the key because its painful to change the key so we 
wont' change the key.    Then there's where you measure pain.  At the 
signing end, its just one more key generation ceremony followed by the 
appropriate KSK signing ceremonies.  At the validating end, its having 
5011 working correctly (and ideally automatically). There will be early 
pain - there always is when you start doing something new.  But the more 
often you do it the better you get at it, which is the whole point of 
doing a key cycle earlier rather than later.

>
> In addition, I’m assuming:
>
> e. few large scale organizations will be comfortable with a signal being sent from somewhere out of their control that results in permanent changes to critical configuration information.
How many large scale organizations do you know that manually provide a 
list of trusted CAs for the browsers their employee's use?  How many of 
them do you know check each and every browser release revision for the 
inclusion of new CAs?  It's not exactly a signal, but it has similar 
effects.

Another item in this space is anti-virus data.  I know of no large scale 
organization that breaks apart and manually verifies the virus 
signatures [provided to it before passing it on to its employees.

> f. it is hard to implement 5011 correctly.
> g. people will continue to ship crap code.
> h. as a result of a combination of (e), (f), and (g), some people won’t be able to enable 5011 support even if it does exist.

f.  It is hard to implement DNSSEC correctly.
g.  people will continue to ship crap code including outdated DNSSEC 
trust anchor information.
h.  as a result of f and g,  some websites will be unreachable that 
should be reachable and vice versa.
>
> And of course (not really an assumption, but),
>
> i. 5011 cannot help in the event of a catastrophic key compromise.
Repeat - NOTHING can help in the event of a compromise of the entire 
root key set except doing a complete and total trust reboot.
>
> The above assumptions leave me questioning the benefit of assuming any roll can or should be treated as “planned”.

I believe that your assumptions are flawed or missing supporting data 
that would tend to support those assumptions.

>
>> 5011 is for the normal update and supercession of keys short of a complete trust reboot.
> I guess this is where I get stuck: I don’t see how we will ever (or even should) get to a point where we see superseding the root key as a ‘normal’ thing.  If we assume people are dependent upon DNSSEC, I see mucking about with the root key as equivalent to juggling with an armed H-bomb: it isn’t something you want to normalize.

Stop saying "THE" root key.   As long as we stick with a single root 
key, all of your single key catastrophic predictions will come true.  
Its the major vulnerability to the system as currently implemented.

>   
> Regards,
> -drc
>
> P.S. An honest question: how often do root X.509 CAs roll their root keys?
It's kind of irrelevant, but somewhere between 5 and 20 years.  It's 
irrelevant since there are something like 50+ of them in common use (and 
probably another 50 in slightly sparser use).  The compromise of any one 
of them is unlikely to have the same effect as taking out the single 
DNSSEC KSK trust anchor key.

Later, Mike

ps:

One of the more useful analysis methods I've seen for thinking about 
things like this is Schniers's attack tree.  First come up with what you 
want to do in an attack, next list the possible ways to get there.  For 
each of the ways list what you need to accomplish (either an "or" or an 
"and" branch for each level you recurse - e.g. Either steal the password 
from the guys wallet, use social engineering, or break the passwd 
file).  Then assign cost and probability of success to each branch of 
the tree and its subtrees. Then use the tree to calculate probability 
and cost for accomplishing the overall attack.  For "or" branches, you 
take the best probability or lowest cost - you may end up calculating 
both versions of the tree.  For "and" branches, its the sum of the costs 
of the subtree and the product of the probabilities.

The "attack" I listed at the beginning (revoke the root trust anchor 
set) has several branches - only one of which is sneak in the revocation 
bit as part of the normal signing.  Other approaches are brute force, 
key extraction, compromise of the key backups, etc.

Of course, this doesn't necessarily help if someone comes up with an 
unanticipated attack.  All you can do then is hope your defense in depth 
has mitigated at least some branches of that attack's attack tree.