[gnso-rds-pdp-wg] Apologies, and some reflections on requirements

Sat Jun 25 11:47:12 UTC 2016

This is extremely interesting and important information. Could you please explain how registrants and users might access such a distributed database (technically) to modify/correct their data or just to view it?

Thank you in advance,

Nathalie

Sent from my iPhone

On Jun 25, 2016, at 6:25 AM, James Gannon <james at cyberinvasion.net> wrote:

> Suffice it to say I agree with all of Andrews points. I also had been feeling uncomfortable with our direction, but had not known the best way in which to bring it up. Thanks Andrew for doing it for me!
> 
> -J
> 
> 
> 
> 
> On 25/06/2016, 03:03, "gnso-rds-pdp-wg-bounces at icann.org on behalf of Andrew Sullivan" <gnso-rds-pdp-wg-bounces at icann.org on behalf of ajs at anvilwalrusden.com> wrote:
> 
>> Dear colleagues,
>> 
>> Apologies first.  I'm not going to be in Helsinki.  I'm in the middle
>> of a move from NH back to Toronto, and it turns out that my movers'
>> understanding of, "I need to leave on $date," entails arranging things
>> such that goods will arrive after $date.  Alas, in this case the goods
>> arrive Monday.  I will attempt to follow the ICANN meetings remotely
>> next week, but I expect it will be tricky.
>> 
>> I have been deeply dissatisfied with the way the work is going, and I
>> believe it is because I see a mismatch in what we are trying to do and
>> the kind of system we are trying to do it to.  In particular, I think
>> we are trying to treat the RDS as a single monolithic system, and
>> attempting to build "requirements" that match that assumption.  Here
>> is an effort to sketch why I think that.  I didn't have time to write
>> a short note, &c. &c.  Sorry this is long.
>> 
>> Since the very introduction of the competitive-registrar model (and
>> arguably before that), the RDS has been a distributed database.  It is
>> far less successful than the other distrubuted database we all know
>> and love -- DNS -- but it is nevertheless distributed.
>> 
>> The distribution comes from different parties having various parts of
>> the data.  In so-called "thin" registries, this was always the case.
>> The registry has names and nameservers, and since the invention of
>> registrars knows who the registrar is.  But if you wanted to know
>> certain kinds of data, you had to ask the registrar in question.
>> 
>> Because in (say) 1999-2001 nobody had anything better than the
>> whois/rwhois/whois++ protocol(s) to deliver this kind of data, a whole
>> bunch of bad compromises got enshrined in policy.  First, we continued
>> to use whois and its descendents (anything on port 43) as the model
>> for all of this.  The plain fact is that whois was obsolete nearly at
>> birth.  It's a terrible protocol, and should be taken behind the ice
>> house and put out of its misery.
>> 
>> Second, in order to "fix up" whois, clients were created all over the
>> Internet that built in a bunch of assumptions about whom to ask for
>> what data.  The consequence of this was that clients routinely got bad
>> data as they queried the wrong server.  Old registrar data hung around
>> even after a transfer.  When I worked on the org transition from
>> Verisign to PIR in 2003 (?), it took a long time before whois clients
>> stopped asking Verisign about org data.  And so on.
>> 
>> Third, in an attempt to hack around the above technical flaws in an
>> already-obsolete protocol, "thick whois" gained popularity in possibly
>> the worst possible arrangement known to data science.  Instead of
>> insisting that registries hold the data and that registrars and
>> everyone else treat the registry data as The Truth, we created "thick"
>> whois in registries _without allowing registrars to stop their
>> service_.  Any half-competent database person will tell you that
>> storing "the same data" in two places that don't have tight
>> connections is an excellent way to create data inconsistency, but is
>> not a good way to arrive at the truth.  (Latterly, as though
>> illustrating the tendency of people to double down on bad ideas, there
>> have been suggestions that ICANN should run the One Giant RDS of the
>> Universe and hold all the data in a central place.  What could
>> possibly go wrong?)
>> 
>> The thread running through this history of error is the idea that the
>> RDS is one system.  But like the DNS, it only appears to be one
>> system.  It's actually a "distributed database", where in this case
>> the distribution is separable on organization lines.  That is,
>> registries -- including ICANN, who can be thought of in this case as
>> both the registry and registrar for the root zone -- have some data.
>> Registrars have some other data.  Resellers and privacy/proxy services
>> have yet other data.  In many cases, the data does not need to be
>> shared across these organizational lines to make it queryable by humans.
>> 
>> The reason that isn't clear to most of us is because whois -- the RDS
>> we use today -- _was_ designed as a monolithic system.  It was
>> designed that way because back when it was created -- RFC 812 is from
>> _1982_! -- the database _was_ a monolithic database.  Whois (the
>> protocol and the client program) continues to have all the
>> deficiencies for distributed use that you might expect of a program or
>> protocol designed to talk to exactly one authoritative service.
>> Whois++ and rwhois attempted to graft on to this basic protocol some
>> distributed operation, but the graft didn't really take and the
>> ornamental shrub now looks like a weed.
>> 
>> People have nevertheless internalized the whois-based thinking, which
>> is why we keep asking things like, "What data should be collected?"
>> In a distributed system like this, that's barely interesting, for the
>> commercial interests in this case all militate against collecting data
>> that nobody needs for any function.  Instead, we should ask what data
>> should be collected _by different actors_.  This implicitly involves
>> describing what those actors are doing to require the data.
>> 
>> The nice thing, of course, is that protocol designers have done _a
>> lot_ of this work for us, when they were working on RDAP.  They did
>> this because they were trying to come up with use cases for the
>> protocol, which finally did away with the monolithic-system thinking
>> of whois and offers us a protocol designed precisely to work in the
>> distributed-database environment that is the actual registration
>> system.  That we even still have a work step that involves evaluating
>> what protocol we're going to use for all this makes me a little ill.
>> 
>> It seems to me that we can just say that we have to embrace the
>> distributed-database fact.  For first, it's a fact of how registration
>> actually works now.  If we don't agree with that, I think we should
>> give up.  Second, it's consistent with how every single other thing on
>> the Internet that has not crashed and burned works.  The Internet
>> cannot scale depending on monolithic systems.  And nobody has the
>> power to impose one anyway.
>> 
>> Once we have done that, there are still important policy issues about
>> what data ought to be collected by anyone, under what conditions they
>> might reveal it to someone else (and who that someone else is), and so
>> on.  But there are empirical tests for whether some of the answers
>> people are proposing really match the distributed nature of the
>> system.  If they don't, we can close off those avenues of inquiry,
>> because they'll never be productive.
>> 
>> Best regards,
>> 
>> A
>> 
>> 
>> -- 
>> Andrew Sullivan
>> ajs at anvilwalrusden.com
>> _______________________________________________
>> gnso-rds-pdp-wg mailing list
>> gnso-rds-pdp-wg at icann.org
>> https://mm.icann.org/mailman/listinfo/gnso-rds-pdp-wg
> _______________________________________________
> gnso-rds-pdp-wg mailing list
> gnso-rds-pdp-wg at icann.org
> https://mm.icann.org/mailman/listinfo/gnso-rds-pdp-wg