[CWG-Stewardship] Fwd: Re: [DTA - SLE] SLE Document with clarifying background info.

Andrew Sullivan ajs at anvilwalrusden.com
Sat May 9 21:29:07 UTC 2015


On Sat, May 09, 2015 at 11:06:14AM +0000, James Gannon wrote:
> Without reference to this specific situation as I am not familiar enough I would tend to somewhat disagree that its prudent engineering to maintain the status quo.

But we're not maintaining the status quo.  We're changing something
else.  In this case, we're changing the party monitoring performance,
and I'm suggesting that we should not change _at the same time_ the
things to be monitored.  That's because right now, there are certain
performance targets that a regularly exceeded.  After the cutover, if
the performance targets themselves will not have changed, it will be a
simple matter to compare before and after, and see whether any
performance changed.  If at the same time we change the targets
themselves, then evaluating conformance to the service levels pre- and
post-transition will be harder.

> Continuous improvement is a key part of any well run and functioning engineering system.

We agree about this, but it doesn't mean that one makes every change
at the same time.  Indeed, the fact of _continuous_ improvement ought
to mean that you can do small increments of change all the time.  I am
suggesting that would be appropriate in this case.

> If there is a possibility to improve a system or a process, and the risks have been quantified and assessed, then it is prudent to do so in many cases. 

This sentence is very near to being an analytic truth, but it's not
relevant to the present case, where nobody has done (as near as I can
tell) _any_ quatification of risks to the system by changing lots of
things at once.  Every one of these changes needs to be reviewed in
depth, and the more things we change the more we have to review.  We
_lower_ the risk by changing as little as possible, and there is no
reason to suppose that we could not in a subsequent effort alter the
service levels themselves.

I'm sure someone will argue that the quantification comes from the
fact that IANA is vastly exceeding these service level commitments
already.  If that is true, then there is no reason not to make the
change after the transition, too, because IANA is hardly going to
object to changing their service levels to be in line with what they
actually do.  (Indeed, the IETF's experience here is that IANA is
entirely reponsive to such adjustments.)

> In this case we seem to have a system which is run well enough that it is outperforming its current requirements, so unless there is a definite risk to the system I don't see why it's not a good time to examine this.

That puts the burden of proof exactly the wrong way around.  To me,
part of the reason we've ended up with such a complicated and
difficult to review proposal is because many seem to want to use this
opportunity to make unrelated changes that they want.  That is a
mistake if we want a successful transition.

> If anything it's a credit to the IANA and its running to be able to consider tightening the operating parameters and tolerances of the system. 
>

That's irrelevant.  The point is that a stable transition depends on
changing the minimum we need to get the transition to happen.  We're
already making many changes.  Why add this?

> As an analogy if you're running a datacenter and have data to show you're providing five nines of availability (99.999%) you generally wouldn't continue to only offer a 3 nines SLA (99.9%.

To begin with, the reasonableness of the new performance commitments
are not under dispute.  I think these are fine proposals for targets.
I was disputing the claim that the existing ones are not fit for
purpose, however.  That is a strong claim, and I think it needs at
least a little bit of argument other than "IANA is exceeding these."
The purpose of the service commitments is to ensure stability of the
registration system on the Internet.  There's a claim that the current
commitments are not fit to do that, and I haven't seen a single
argument to that effect.  So, the analogy is just irrelevant to this
question.

But anyway, why wouldn't I leave the three-nines offering in place?
Adding the additional-nine commitment exposes me to a risk that I am
not currently exposed to.  I go from being able to have outage time of
more than 8 hours a year without incurring penalties to being
vulnerable to penalties that kick in after less than an hour.  That's
real exposure, and for customers that demand 4 or 5 nines I will
charge a higher price.  I might run the datacentre as a five-nines
centre, but I won't offer that commitment to anyone but the premium
customers.  If I did otherwise, it would be irresponsible.

> Instead you'll show the progress and stability you've increased by, and you'll offer 4 nines (99.99%) to your customers.
>

If I did without getting either more money from customers or showing
that I could attract or keep customers this way, then I should be
fired.

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com


More information about the CWG-Stewardship mailing list