[ksk-change] planned vs. emergency (was Re: [ksk-rollover] root zone KSK ...)

Michael StJohns msj at nthpermutation.com
Mon Sep 22 04:02:52 UTC 2014

On 9/21/2014 10:55 PM, David Conrad wrote:
> Mike,
> [snipping for brevity, not necessarily agreement]
> On Sep 21, 2014, at 4:54 PM, Michael StJohns <msj at nthpermutation.com> wrote:
>>> a. for all intents and purposes, the likelihood of _any_ compromise/loss of the root key is statistically equivalent.
>> Bad assumption. [...]  So if you have a 1 in a million chance to compromise one key, and the probability is similar between all keys, the probability of compromising the SYSTEM if you're using 2 keys is 1 in a trillion and 1 in a quintillion for 3 keys.
> If the risk of compromise is constrained to key handling (which I gather you are assuming), given the processes and frequency of key use, then I will again assert that the probability of compromise of 1, 2, and 3 keys is pragmatically speaking, statistically equivalent and essentially zero. Even 1-in-a-million failure rate when you exercise the system once a quarter is beyond any reasonable timeframe that we might consider.

*sigh* There is a non-zero chance for compromise of any given key. See 
for example the CA compromises of 2011.  Those compromises generally 
take a non-zero time to accomplish so you mostly have some time between 
the exploit being public and the possible compromise of the next key.  
Except that we have no next key (nor did most of the CAs who either 
issued a new root and resigned all the subordinates or went bankrupt).

> Out of curiosity, how do you deal with the keyset size issue?

You mean the over the air size of the keyset? You live with the 
occasional need to fallback to TCP.  Or you move from RSA to EC to 
reduce the byte size of the key set while increasing the number of 
keys.  That has to be done at some point.

>> OTOH dealing with less than catastrophic single key compromises seems to be well within the possibility of automated and secure and is exactly what 5011 was designed to accomplish.
> You appear to have much more faith than I that code will operate as intended when it is exercised with a frequency appropriate to dealing with critical infrastructure.
> Or do you believe we should revise the key handling policies and processes to roll _much_ more frequently?
I'd suggest 1-2 years.  Or basically once every 4-8 times you do a ZSK 

>>> c. touching the root key for any reason increases the probability of catastrophic failure/compromise by an infinitesimal but non-zero amount.
>> No.  Touching the *only* root key does that. Touching one root key where the others are locked away decouples the fate of the system from the fate of the key.
> It does NOT fully decouple the fate of the system since you have to begin using the locked away key, which means you have to exercise (likely little used) processes and place that formerly-locked away key into use.  All of that increases risk “by an infinitesimal but non-zero amount”.

Until the safe is opened and the key is placed into service it does not 
share fate.  Once in service it *may* share fate depending on the 
configuration of the processes.   Ideally, if the compromise was an 
attack that can be mitigated, then the new key shares the benefit of 
that mitigation.

>>> d. changing the root key of the DNS is and will continue to be an infrequent event (both because of (c) but more likely the PITA-ness of changing the key).
>> This is a circular argument, we won't change the key, because we haven't changed the key because its painful to change the key so we wont' change the key.
> No. We won’t change the key frequently because (a) there is no operational reason that forces the key to change, (b) there is a risk — no matter how slight — that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected.
> 5011, at least theoretically, gives us the ability to roll the key frequently and/or for non-crtical reasons. However, given operational realities, it isn’t clear to me that is necessary/useful/helpful. Since we have to deal with a “full trust reboot” and that provides a superset of functionality to 5011, I’m still unclear as to why we care about 5011.

I think you're underestimating by perhaps several orders of magnitude 
the cost of a "full trust reboot".   Either that or the cost isn't 
important because you expect we'll never do one.

If you're going to do a full manual trust reboot in the next year - it 
should be interesting to see how long it takes to a) get the new key 
out, and b) what happens when you revoke the old one.  (Or if you're 
testing the emergency version, how bad it gets when you revoke the 
existing key before having the next one deployed)

Or are you going to wait to do the full manual trust reboot until 
something happens - hopefully after we're all dead?

If the end result of the discussion is already pre-disposed to be one of 
these results, I literally have nothing else to contribute. I think 
they're both bad ideas, and based on a flawed risk/reward analysis.

>>> P.S. An honest question: how often do root X.509 CAs roll their root keys?
>> It's kind of irrelevant, but somewhere between 5 and 20 years.
> I’ve been told (informally) the X.509 root CAs do not roll their root keys, period. It might be useful to get an authoritative answer on that question.

*sigh* Language issues.  Generally what happens is that a new CA (using 
the same or a new CA instance name) are placed into service using a new 
key pair.  No further signatures are made with the old key and the old 
CA certificate remains in the various browsers until revoked, removed 
from distribution or the CA certificate expires. The new CA certificate 
is distributed to the browsers several years or so in advance of the 
need to depend upon it.   You can see this pattern by examining the 
browser CA lists, and looking at what CAs are signing which servers over 
time.  The largest group of these was the replacement of the old 1024bit 
RSA keys.

There's also the occasional re-sign of a self-signed CA certificate 
(changing the validity time without changing keys or other contents of 
the CA certificate).  The new certificate is basically chained to the 
old certificate and replaces the old one in the browser CA trust store 
when its seen.

>> It's irrelevant since there are something like 50+ of them in common use
> If you’re one of those CAs and your root’s private key is compromised, the fact that you have 49 competitors is unlikely to be much of a consolation.  The point being that from the perspective of the CA, the loss of the key is an existential risk and your policies and processes are designed to deal with that risk. I see some parallels in the handling of the root key. It might be useful to understand how the CAs deal with that risk.

They mostly don't, or aren't prepared for it.  DigiNotar being a prime 
example and bankruptcy the result.  Depending on the risk structure and 
CPS of the CA, a compromise of the key material might result in a new CA 
key being created, and all the subordinate CA certificates of the old CA 
being re-signed.  That has the nice benefit that the addition of the new 
CA to the CA cert stores rehabilitates potentially millions of certificates.

The comment about irrelevancy of the CA model is that none of these are 
universal global roots of trust.  They compete and mostly that causes 
really interesting interoperability problems.  Failure of one of them is 
not going to have the universal/broad impact that the failure of the 
single DNSSEC root of trust would have.

> Regards,
> -drc

Going back to trust reboot - think about the timeline used for the 
original key creation and signing ceremonies.  Pretend a compromise 
happens "now".  How long until DNSSEC is back up using the trust reboot 
process?  Oh yeah - the compromise happened because the HSM you're 
using  was found to be insecure.  Ready,..... GO!

By the way, your attacker *is* using 5011.  Since he now has access to 
the trust anchor private key, he's using it to place new trust anchors 
for the BOA, Google and the IRS local resolvers by intercepting and 
replacing root zone queries.


More information about the ksk-rollover mailing list