[ksk-rollover] Concerns about KSK generation software quality.

Fri Feb 16 19:59:52 UTC 2018

In the "Suggested update" thread, Andres pointed me
(http://mm.icann.org/pipermail/ksk-rollover/2018-February/000385.html)
at the IANA DNSSEC-Keytools KSK generation tools -
https://github.com/iana-org/dnssec-keytools/blob/master/kskgen/kskgen.c

I took a quick look at the code, and much of it scares me.

I suspect that Andres was talking about the pkcs11 CKA_LABEL label,
which turns into a "keypair label" - from the code, this is unique
because (dnssec-keytools/common/pkcs11_dnssec.c):
/*
* CKA_LABEL HACK
* AEP Keyper can only display 7 characters and
* cannot change the HSM internal CKA_LABEL once created.
* So, we label them with a monotonically increasing
* string based on seconds since epoch.
*/

This is (AFAICT), simply an internal label for the key, but is
distinct / unrelated to the DNSSEC keytag (as far as I can see). The
DNSSEC key tag is generated (by dnssec_keytag in
dnssec-keytools/common/dnssec.c:25) purely from the [Flags, Prot, Alg,
PubKey] (RFC4034, Appendix B).

The comments interested me, so I looked a little bit further, and now
I'm filled with unease.

What is concerning to me is that the "void *pkcs11_genrsakey(int
bits,int flags)" function
(dnssec-keytools/common/pkcs11_dnssec.c:1121) tries 10 times to find a
unique keypair label, but if it cannot, throws an error about DNSSEC
keytags:
if(ts > 0) { /* DUPLICATE - try again */
   logger_warning("Found %u duplicate keys labeled \"%s\". Try
again...",ts,label);
   if(++tries < 10) {
      sleep(1);
   goto regen;
   }
logger_error("Can't get a unique DNSSEC tag after %d tries. Giving
up...",tries);
  goto err;
}

Note that this all happens before the actual key generation (~1248).
While this is obviously just a misleading error message (it's not a
DNSSEC tag, it is a key label), it makes me wonder about how carefully
the whole CKA_LABEL HACK bit has been reviewed.

Looking a bit further, the keylabel comes from get_monotonic_str()
(dnssec-keytools/common/pkcs11_dnssec.c:1087). This bit of code seems,
er, interesting.
It starts with:
static time_t t1=0;

I'm surprised by the "static time_t" -  function-local static
variables lead to all sorts to hard to debug side effects. Anyway, I
also cannot understand how the output of this could cause duplicates,
and so don't understand why this is being tried 10 times - if there
**is** already a keypair with this label, surely this is evidence of
much larger issue, and simply trying again is not a good idea?

A quick look also shows a bunch (13) of gotos, including at least 3
instances of "goto err;" without any useful sort of log message being
printed (e.g: if((rv=pfl->C_FindObjects(sh,hKeys,PKCS11_MAX_KEYS_PER_SLOT,&ts))
!= CKR_OK) goto err;   -- this will fail with no output as to why.)

A *very* quick skim of the source code also shows all sorts of typoes:
if(t1) { /* since we are dropping the last base32 char, wait to
gurantee uniqueness */    ---  guarantee

The CKA_LABEL is based on a time based sting for uniquness.      ---
string, uniqueness

logger_error("Key generaton failed");   --- generation

* create a CSR - a simple private key proof of possesion   ---- possession

\return NULL if failed; otherwise newlly crated key record associated
with pk. --- newly created

/* setup computation for DS rcords */    --- records

/*! check to see if we have access to the private key in this key
strust    --- struct?

/*! log in to HSM slot referrenced by pk   ---  referenced

/* get correspoding private key if possible */   --- corresponding

While typos in error messages and comments are (clearly) not going to
cause breakage, it does make me concerned that this code, which
generates the single most critical DNSSEC keys, has not received
sufficient careful review. These sorts of obvious typos are not
themselves a problem, but rather indicative of a larger issue. While
the obvious retort is "It is publicly posted in GitHub, and we asked
the community to have a look. Send pull requests!", it appears that
this hasn't resulted in enough review. I used to program C, but it has
been long enough that now I'm only qualified to say "Well, that
doesn't seem right!" - how do we get (more) review from people who are
better qualified?

The KSK ceremonies provide important transparency - how can we get
more review of the tools used in the ceremonies? Having people fly in,
and then having the tools fail, or, worse yet, questions about the
correctness of the result does not seem good.

W

-- 
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf