[gnso-rds-pdp-wg] Apologies, and some reflections on requirements

Sat Jun 25 10:25:26 UTC 2016

Suffice it to say I agree with all of Andrews points. I also had been feeling uncomfortable with our direction, but had not known the best way in which to bring it up. Thanks Andrew for doing it for me!

-J

On 25/06/2016, 03:03, "gnso-rds-pdp-wg-bounces at icann.org on behalf of Andrew Sullivan" <gnso-rds-pdp-wg-bounces at icann.org on behalf of ajs at anvilwalrusden.com> wrote:

>Dear colleagues,
>
>Apologies first.  I'm not going to be in Helsinki.  I'm in the middle
>of a move from NH back to Toronto, and it turns out that my movers'
>understanding of, "I need to leave on $date," entails arranging things
>such that goods will arrive after $date.  Alas, in this case the goods
>arrive Monday.  I will attempt to follow the ICANN meetings remotely
>next week, but I expect it will be tricky.
>
>I have been deeply dissatisfied with the way the work is going, and I
>believe it is because I see a mismatch in what we are trying to do and
>the kind of system we are trying to do it to.  In particular, I think
>we are trying to treat the RDS as a single monolithic system, and
>attempting to build "requirements" that match that assumption.  Here
>is an effort to sketch why I think that.  I didn't have time to write
>a short note, &c. &c.  Sorry this is long.
>
>Since the very introduction of the competitive-registrar model (and
>arguably before that), the RDS has been a distributed database.  It is
>far less successful than the other distrubuted database we all know
>and love -- DNS -- but it is nevertheless distributed.
>
>The distribution comes from different parties having various parts of
>the data.  In so-called "thin" registries, this was always the case.
>The registry has names and nameservers, and since the invention of
>registrars knows who the registrar is.  But if you wanted to know
>certain kinds of data, you had to ask the registrar in question.
>
>Because in (say) 1999-2001 nobody had anything better than the
>whois/rwhois/whois++ protocol(s) to deliver this kind of data, a whole
>bunch of bad compromises got enshrined in policy.  First, we continued
>to use whois and its descendents (anything on port 43) as the model
>for all of this.  The plain fact is that whois was obsolete nearly at
>birth.  It's a terrible protocol, and should be taken behind the ice
>house and put out of its misery.
>
>Second, in order to "fix up" whois, clients were created all over the
>Internet that built in a bunch of assumptions about whom to ask for
>what data.  The consequence of this was that clients routinely got bad
>data as they queried the wrong server.  Old registrar data hung around
>even after a transfer.  When I worked on the org transition from
>Verisign to PIR in 2003 (?), it took a long time before whois clients
>stopped asking Verisign about org data.  And so on.
>
>Third, in an attempt to hack around the above technical flaws in an
>already-obsolete protocol, "thick whois" gained popularity in possibly
>the worst possible arrangement known to data science.  Instead of
>insisting that registries hold the data and that registrars and
>everyone else treat the registry data as The Truth, we created "thick"
>whois in registries _without allowing registrars to stop their
>service_.  Any half-competent database person will tell you that
>storing "the same data" in two places that don't have tight
>connections is an excellent way to create data inconsistency, but is
>not a good way to arrive at the truth.  (Latterly, as though
>illustrating the tendency of people to double down on bad ideas, there
>have been suggestions that ICANN should run the One Giant RDS of the
>Universe and hold all the data in a central place.  What could
>possibly go wrong?)
>
>The thread running through this history of error is the idea that the
>RDS is one system.  But like the DNS, it only appears to be one
>system.  It's actually a "distributed database", where in this case
>the distribution is separable on organization lines.  That is,
>registries -- including ICANN, who can be thought of in this case as
>both the registry and registrar for the root zone -- have some data.
>Registrars have some other data.  Resellers and privacy/proxy services
>have yet other data.  In many cases, the data does not need to be
>shared across these organizational lines to make it queryable by humans.
>
>The reason that isn't clear to most of us is because whois -- the RDS
>we use today -- _was_ designed as a monolithic system.  It was
>designed that way because back when it was created -- RFC 812 is from
>_1982_! -- the database _was_ a monolithic database.  Whois (the
>protocol and the client program) continues to have all the
>deficiencies for distributed use that you might expect of a program or
>protocol designed to talk to exactly one authoritative service.
>Whois++ and rwhois attempted to graft on to this basic protocol some
>distributed operation, but the graft didn't really take and the
>ornamental shrub now looks like a weed.
>
>People have nevertheless internalized the whois-based thinking, which
>is why we keep asking things like, "What data should be collected?"
>In a distributed system like this, that's barely interesting, for the
>commercial interests in this case all militate against collecting data
>that nobody needs for any function.  Instead, we should ask what data
>should be collected _by different actors_.  This implicitly involves
>describing what those actors are doing to require the data.
>
>The nice thing, of course, is that protocol designers have done _a
>lot_ of this work for us, when they were working on RDAP.  They did
>this because they were trying to come up with use cases for the
>protocol, which finally did away with the monolithic-system thinking
>of whois and offers us a protocol designed precisely to work in the
>distributed-database environment that is the actual registration
>system.  That we even still have a work step that involves evaluating
>what protocol we're going to use for all this makes me a little ill.
>
>It seems to me that we can just say that we have to embrace the
>distributed-database fact.  For first, it's a fact of how registration
>actually works now.  If we don't agree with that, I think we should
>give up.  Second, it's consistent with how every single other thing on
>the Internet that has not crashed and burned works.  The Internet
>cannot scale depending on monolithic systems.  And nobody has the
>power to impose one anyway.
>
>Once we have done that, there are still important policy issues about
>what data ought to be collected by anyone, under what conditions they
>might reveal it to someone else (and who that someone else is), and so
>on.  But there are empirical tests for whether some of the answers
>people are proposing really match the distributed nature of the
>system.  If they don't, we can close off those avenues of inquiry,
>because they'll never be productive.
>
>Best regards,
>
>A
>
>
>-- 
>Andrew Sullivan
>ajs at anvilwalrusden.com
>_______________________________________________
>gnso-rds-pdp-wg mailing list
>gnso-rds-pdp-wg at icann.org
>https://mm.icann.org/mailman/listinfo/gnso-rds-pdp-wg