[gnso-rds-pdp-wg] Five models of RDS (was Re: Apologies, and some reflections on requirements)

Thu Jun 30 19:32:45 UTC 2016

Thanks, Andrew.  This is VERY helpful.

Model V is the one I'd build if I weren't so concerned about the plethora of local privacy laws and law enforcement regimes.  One we have a single repository owned by ICANN, we have a single entity which may be pressured by law enforcement or government agencies world-wide to divulge PII. Where that repository resides is also of concern due to issues of jurisdiction.  Lastly, the ownership of the repository and the operation of the associated web services must fall upon ICANN with all the cost and SLA concerns already raised by the community.

For these reasons, I prefer a nonfederated approach such as Model 4.

/marksv

-----Original Message-----
From: gnso-rds-pdp-wg-bounces at icann.org [mailto:gnso-rds-pdp-wg-bounces at icann.org] On Behalf Of Andrew Sullivan
Sent: Thursday, June 30, 2016 7:35 PM
To: gnso-rds-pdp-wg at icann.org
Subject: [gnso-rds-pdp-wg] Five models of RDS (was Re: Apologies, and some reflections on requirements)

Hi,

Reading further in the thread, I realise that perhaps not everything that I said was perfectly clear.  I'll respond to some other mails downthread, but before I do that I want to make sure we're all talking about the same thing.  Some of this explanation is in some of the background material we have, but the relation to what I'm talking about obviously isn't.  So here's some more explanation.  This may be a little tedious for those already familiar with the history, but it seems better to lay this out in more detail so that it's clear what we're talking about.

When I respond to other mails, I'll refer to these "Model I" through "Model V" descriptions, because these models are what I was thinking about when I wrote my earlier note.

On Fri, Jun 24, 2016 at 10:03:33PM -0400, Andrew Sullivan wrote:

> Since the very introduction of the competitive-registrar model (and 
> arguably before that), the RDS has been a distributed database.  It is 
> far less successful than the other distrubuted database we all know 
> and love -- DNS -- but it is nevertheless distributed.

I have a feeling that people have conflated the "distributed" and "federated" discussion with some other points I was trying to make (and indeed, I'm not sure I was totally clear).  So let me try to walk through a bunch of different ways that any RDS can work.

I know that some people find diagrams easier, so I've tried to create some.  I'm sorry I'm so bad at diagrams.  Since email is a lousy way to inline diagrams, I'll make some references to some external diagrams I've shared.

MODEL I

Consider
<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.google.com%2fdrawings%2fd%2f1KCLGP8pClFTLoGp1uqVF-ao80NRP3QAxkkies5h3NDQ%2fedit%3fusp%3dsharing&data=01%7c01%7cmarksv%40microsoft.com%7c1275723590cb4e03459108d3a104a1f6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Sd3eJBf4YZZby1Sy8CwnG%2brSA%2b4IKZlZfFcnWBFpwhg%3d>.
This is an approximate picture of the very earliest registration and RD systems.  You registered a name with the NIC.  The NIC also prepared the NICNAME directory, which was originally literally a document, published on paper.  Oral histories tell me that the very earliest version of the "store" was a piece of paper kept in Jon Postel's pocket, but how true that is I don't know (I was certainly not involved at the time, since I was in senior elementary school and didn't have a connection to the Internet).

There appear to have been small iterations on this model, but the central features are that, for any given registry (or registry operator, in fact), there is a single point of registration and a single source of data for any registr data service.  I've called this Model I.  To be clear, in this model there _could_ be multiple data sources for all registration data, because (for example) some country code TLDs ran their registries separately using this model.  In that iteration of the model, whois clients had built-in lists of whois servers to consult.

MODEL II

The big change in the early whois evolution was the addition of rwhois and friends, which corresponded to the period in which registrars were added to the registration systems.  I drew a diagram at <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.google.com%2fdrawings%2fd%2f1keKN1__qoboMQ2vEmsY6bnnUpLktQmF1GtjJ8hrMSoI%2fedit%3fusp%3dsharing&data=01%7c01%7cmarksv%40microsoft.com%7c1275723590cb4e03459108d3a104a1f6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=3DVXxIuc8o3VZqm%2fObi9VJsRYFrLJeG25BSYGBzohdo%3d>.
This picture is the "thin registry" model.  I've called it Model II.
There are several features to note here.

First, notice that the registration path involves a convolution.  The registrant sends data to the registrar.  The registrar has a bunch of needs for data, some of which are necessary for the business processes of the registrar and some of which are necessary for the functioning of the domain name.  The registrar stores some or all of that data in a local registrar store.  Also, such data as is needed for the registry is either passed through to the registry (the dotted line) or else copied from the registrar store to the registry (the solid line), using whatever protocol the registry used (some sort of registry-registrar protocol, or else EPP).  The registry then also stores some data -- the data for registration (which is usually just the name, the registrar, and the name servers and necessary glue if provided).

Second, in order to support this mode of working, whois had to change.
The naïve use of a whois query (whois $domainname) required some adjustments.  A client needs to know which server to start querying for a given domain name.  (In general, even today, this is a static list compiled into the whois client, though many clients allow you to specify an alternate.  In the earliest days, this feature was not widely deployed, which is how so much stale data "leaked out" -- people would query the wrong server, and get its answer.)  When the registry whois server replied, it would provide a referral to the registrar's whois.  Then the client would additionally query the registrar's whois, and get data like the name and contact of the registrant and so on.  The client would assemble this somehow, and then present it to the user.  Note that this last step might not be using the whois protcol -- in particular, most of the "web whois"
systems are basically a user interface via http(s) to a port 43 whois client.

As an aside, it's worth noting that the "referral" part of this system didn't work that well, mostly because of the whois protocol.  Whois was designed to be read by humans, so the protocol is dead stupid:
connect to port 43, send a string, and receive a bunch of strings back.  There is no data format whatsoever, and therefore when you wanted to communicate something machine-readable in the data that came back (like, for instance, "here's a referral"), there was no reliable way to do it.  Instead, the client had to parse the response and figure out which parts were supposed to be instructions to the client instead of something the human should see.  This is perhaps the least-reliable way to make a protocol ever, and it didn't work very well.  The many efforts to standardise whois output formats within ICANN have all been gross hacks on top of this basic problem: if you make the output format sufficietly consistent, machines get better at processing that output.  (This is why rwhois got better over time -- see below.)

Third, because a client can initiate a query directly against a whois server, the client could ask a registrar who is no longer the actual registrar for a name about that name.  This can result in "wrong"
data, because the registrar might have lost the registration in the past and have old data hanging around.  This was at one time a major problem, and an awful lot of policy that exists today is quite obviously an attempt to solve this problem (which is as a matter of fact no longer really that interesting -- the technical reality has made some of that policy obsolete, but people continue to insist on solving a problem they had in 1999).

Note that this model is the beginning of what I was describing as a "distributed system".  Data comes from multiple sources, and it is assembled by the end client into a whole that answers the initial query.

Finally, note that I've drawn this with common data stores for registration and whois at the registry and registrar.  In implementation, the actual database systems that the whois servers use might be different than the one the registration servers use, and there might even be some transformation (one system I worked on, for instance, precompiled whois answers so that the whois server went faster).  But the point is that the data store is operated by the same operator as the registration database (i.e. the operator of the RDS is the operator of the server that accepts the incoming data).

MODEL III

Partly in response to the unreliability of rwhois, and partly I suspect because people didn't really trust registrars to do their job, many registries (some for contractual reasons) adopted a "thick whois"
model.  This is just a patch on top of Model II:
<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.google.com%2fdrawings%2fd%2f183F855BODDVIt0IriSGU35EyoaLYpLQMUxAk4RsNo-Y%2fedit%3fusp%3dsharing&data=01%7c01%7cmarksv%40microsoft.com%7c1275723590cb4e03459108d3a104a1f6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=NSdyUKAzkMyoQ%2fwgrwxma6YRPiUPOnIY71EDM4S3Md8%3d>.
The basic system works exactly the same way, but when the client queries the registry whois server it gets a complete response instead of having to ask the registrar for more data.  For reasons I never fully understood, registrars still had to maintain a whois server for these registries, so it remains possible to ask a registrar about a domain name (the orange lines in the diagram).  Just as in Model II, the registrar will respond with what it has, which might not be correct.  But you're way less likely to ask the registrar unless you're on purpose querying the wrong place.

This is the model that most contracted registries are using today.  It is slightly less distributed than Model II because the registries provide the data from a central point for that registry.  It's still distributed in that each registry answers for its area of authority.

MODEL IV

I've illustrated the basic approach that I think most of us involved in specifying RDAP had in mind in <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.google.com%2fdrawings%2fd%2f1HftBWzxA4DGwa0-PMCrpxXkK0Nwpm8ZS8Rsql6VtkJQ%2fedit%3fusp%3dsharing&data=01%7c01%7cmarksv%40microsoft.com%7c1275723590cb4e03459108d3a104a1f6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=mFstCDY6ZXsDgJFQMnnWrNcj%2fg%2bNE%2bpO8xUNNubeH%2fY%3d>.
This is fundamentally a modification of Model II, in that it is a distributed query and distributed store system.  RDAP does some things differently, however.

First, its output is JSON documents, which can be used reliably in multiple different ways (including being parsed by a machine or formatted for presentation to humans).

Second, there's a bootstrap mechanism in which, to get started, you query IANA.  Note that some whois clients had already started to do this for Model II, but it was nowhere specified.  Now it's just part of how things work.

Finally, RDAP has built in the idea that the client could be authenticated.  This automatically means that different responses can be returned depending on the credentials the client presents.

Note that the protocol for this is all https, so there's no more port
43 traffic arising from this.  Note also that you can do the same sort of thing with a "thick registry" model -- the registry doesn't send a referral in that case.  This model is exactly as distributed as Model II or III, depending on which approach we take.

MODEL V

Model V is a little harder to illustrate, because it's not clear (especially in our discussions) what protocols it'd use.
Nevertheless, I've sketched it at
<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.google.com%2fdrawings%2fd%2f1c3G3guMO7-IFtm-D1FIdEXXP9_ww8Op4Ul9N1ypTG2U%2fedit%3fusp%3dsharing&data=01%7c01%7cmarksv%40microsoft.com%7c1275723590cb4e03459108d3a104a1f6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=iO8yDYCTbjcTzD0LrdiC%2fqt%2f5TvyWcfDPZQNbyhEKzs%3d>.
The key thing here is to notice that the model takes a bunch of data from disparate sources, federates it (somehow) into a single data store, and then offers the federated RDS to clients that query from the Internet.  There are some interesting consequences of this.

First, note that this is the only model in which the answers to an RDS query come from a party that is not directly responsible for collecting some data.  Model III went partway toward that by having registries hold data that is really only relevant to registrars.
Model V takes this all the way to its conclusion, in that the data store backing the RDS is operated by someone who doesn't collect any of that data from the original source.  

Second, in order to make this happen, a federation process of some kind needs to be designed, written, and tested.

Presumably the RDS could use RDAP or whois for its protocols, or we could specify that we need something else.  I believe that the IETF will only support RDAP for this use, so if we decide to specify that some other protocol is needed then I guess we'll have to find someone to develop that specification before we can proceed.

Finally, this model is as monolithic as the system can get: it basically re-invents the Whois model of the pre-registrar period, when the data in the WHOIS was small enough that it could be put into a mimeographed booklet.  

I hope this is helpful.

A

--
Andrew Sullivan
ajs at anvilwalrusden.com
_______________________________________________
gnso-rds-pdp-wg mailing list
gnso-rds-pdp-wg at icann.org
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmm.icann.org%2fmailman%2flistinfo%2fgnso-rds-pdp-wg&data=01%7c01%7cmarksv%40microsoft.com%7c1275723590cb4e03459108d3a104a1f6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=GRv9DWziFtpe0tilUSIpzvsFBEik0SsI22%2fywZP%2bKVs%3d