[gnso-rds-pdp-wg] "optional", who has data, and what "data collection" means (was Re: IMPORTANT: Invitation for Poll from 29 August Meeting)

Mon Sep 4 17:51:50 UTC 2017

Hi,

On Mon, Sep 04, 2017 at 12:38:18PM -0400, Stephanie Perrin wrote:
> the Registrars' client records, so at the risk of slowing us down, I do
> think it is useful to consider exactly how separate these various data
> collections are.  I think it is quite relevant to the discussion of whether
> an optional field has to be filled if data exists [somewhere].

I think that the above opens (or re-opens) the important topic of what
"data collection" means in the RDS.  Once again, I feel like imprecise
terminology is getting in our way, so I think I want to try to
describe my current model and see whether it matches what others think.

It is plain that the collection side of the RDS is not just the SRS,
or we would not be bothering to create a new term.  Therefore, I have
been working lately from the ostensive definition we came up the last
time I got on this hobby horse.  It's something like this: the RDS
contains data about registrations of some objects related to domain
name registrations, where the data might be queried for some
legitimate purpose (under some terms and conditions) by someone via
the RDDS.  For gTLDs, all of this is to be governed by such ICANN
consensus policies as are in effect at the time.

Now, since every RDDS protocol we have except the original
(i.e. pre-separation of registries and registrars) whois can do
distributed queries (whois, admittedly, not very well), this means
that the RDS might be a distributed database.  As such, there is only
one meaningful definition of "collected" in this account, and that is
such data as is acquired from whoever made the connection to the
registrar.  In many cases, that is a registrant.  In other cases, of
course, it is a reseller, and it might be a reseller that itself does
not have a direct relationship to the registrant (because of reseller
chains and so on).

Once data is in the registrar's possession, the question is not
whether the data is "collected" in the RDS, but whether it is
accessible through the RDDS.  By definition, it _could_ be in the RDS,
but if it is never to be queried via the RDDS under any conditions,
then the data is not in the RDS.  Otherwise, it _is_ in the RDS.
Working out data access is not our task right now, so I'm going to set
that problem aside.

To me, this understanding of data collection means that any data that
is "optional" can't possibly have rules about what to do "if it is
available", because the registrant (or some downstream reseller, for
that matter) might just decline to provide it.  There is no way to
know whether the registrant decided not to provide it because s/he
didn't want to, or didn't provide it because it didn't exist.  ICANN
does not have a direct contractual relationship with the registrants,
and I do not think it would be a good idea for ICANN to start
mandating ways that registrars audit registrants' data submissions.

Does this match what other people are thinking?

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com