[gnso-rds-pdp-wg] a suggestion for "purpose in detail"

Andrew Sullivan ajs at anvilwalrusden.com
Mon Mar 20 23:21:45 UTC 2017


Hi,

I left the meeting with data protection experts last week feeling
quite strongly the need for a specific and concrete purpose for each
datum we recommend to collect and to make available; and the need for
a definition of who the maximal (appropriate) audience is (given the
purpose).

At the same time, I think that a reasonably short and high-level
statement of purpose along the lines that we have been preparing can
provide a useful set of principles.

It strikes me that maybe we could take the high-level purpose
statement, and go through some potential data elements and link each
one concretely to at least one of the principles in our candidate
list.  In what follows I name these "purpose 1", "purpose 2", &c.  The
purposes are numbered according to the scheme in RDS PDP Phase 1: Key
Concepts Deliberation –Working Draft-7March2017 (on p7).  I'm aware
that the details in the candidate list are still in flux, but I think
the broad strokes are pretty close anyway, so I thought I'd try it
with the "thin" data we agreed to start with.  This mail is a little
long because I'm dealing with all the classes of elements in one
message.  I suppose we could break this into one-thread-per-element
(or class) if we don't converge quickly on each of them.  The outline
below is just my view, of course, though obviously I think that what I
say is true.  I use the "maximal audience" because I think that if
there is any "whole public" use then there's no point considering more
restrictive uses.  (For instance, if we need the domain name to be
published to everyone on the Internet because it won't work otherwise,
then it makes no difference if LEOs want that data under some sort of
authorized-access protocol, because they'll just get it under the
wide-open rules instead.  So we don't need to care about the LEO
purpose in that case.)  "Maximal audience" might not work for cases
where two different classes have different needs both of which require
some restrictions, but it's handy here because we're talking about
thin data.

I'm sorry this is long, but I hope it is a useful contribution to the
discussion.

Best regards,

A

---%<---cut here---

Here is a convenient example thin whois response, in case anyone wants
it to for reference in what follows.  (Among other things, it reminds me
that something I started to do has never been completed, so thank you
to this WG for reminding me of that. :-) )

   Domain Name: ANVILWALRUSDEN.COM
   Registrar: TUCOWS DOMAINS INC.
   Sponsoring Registrar IANA ID: 69
   Whois Server: whois.tucows.com
   Referral URL: http://www.tucowsdomains.com
   Name Server: NS1.SYSTEMDNS.COM
   Name Server: NS2.SYSTEMDNS.COM
   Name Server: NS3.SYSTEMDNS.COM
   Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Updated Date: 17-jan-2017
   Creation Date: 30-jun-2010
   Expiration Date: 30-jun-2017


1. DOMAIN NAME
---------------

a. Collection

The domain name is required to be collected under purpose 1.  Without
this, there is no domain name, so it is literally impossible to have
anything to collect or publish.

b. Publication

The domain name is required to be published under purpose 1, because
it is a key by which data is accessed.  If you wish to look up the
current data about a particular name, you use the name as the key by
which you query.  (This is not the only possible key.  For instance,
in an EPP registry you could in principle use the ROID to look up a
particular name object.  But that does not give you the current data
for the thing so named; it just gives you the data about that
Repository Object.  Two different versions of the same name -- like if
example.com is registered by Alice then deleted and later registered
by Bob -- have different ROIDs.)

c. Maximal audience

The data audience is Internet-wide under purpose 1 or purpose 2 (or
both).  The domain name is by definition not private data, because
domain names registered in DNS domain name registries (i.e. every
registry possibly covered by ICANN policy -- the registries
subordinate to the IANA DNS name registries) are name registration in
a public name space.  Note that it is not possible to keep the
existence of a name private, because even if a name were initially
undisclosed its existence would be disclosed whenever someone else
tried to register it.

2.  REGISTRAR IDENTITY
-------------------

There are four items here, but three classes of data.  The (i)
registrar ID provides data about the entity that created the entry in
the registry (formally, in EPP, "repository").  The (ii) Whois Server
and Referral URL both provide metadata necessary for the operation of
the distributed database that makes up the RDS (in systems other than
whois, approximately the same data with the same relation to identity
would be in place, but the details might be different.  I think we can
treat this as a class anyway).  Finally, IANA has a registry of
registrar IDs
(https://www.iana.org/assignments/registrar-ids/registrar-ids.xhtml#registrar-ids-1),
and that contains their (iii) names.  This is a protocol parameter
registry, but it appears to be managed by ICANN so it is probably
appropriate for this PDP to make the policy about how that is to be
managed.

a.  Collection

Data (i) and (ii) are all required to be collected under purposes 1
and 2.  Without this data it is not possible to know the source of the
data and it is not possible to trace it further in the system.  Data
(iii) needs to be collected in order to give (i) meaning, because it
is the only way to know whether two IANA ids are bound to the same
organization or person.

b.  Publication

Data (i) are possibly required to be published under purpose 1.  This
largely depends on whether we think the identity of who is managing an
object in the registry is part of the "lifecycle of a domain name".
My feeling is "yes".  Also, this information is likely to be disclosed
anyway; see below.

Data (ii) are required to be published under purposes 1 and 2, as long
as there is at least one data element that is required under some
purpose and is not available from the registry.  (Since the actual
registration life cycle is controlled by the registrar and not the
registry, this appears likely.)  Owing to the way these work,
publication of these is likely to "leak" information about (i) or
(iii) also.

c.  Maximal audience

Given purposes 2 and 3 and probably 5, and since the source of contact
information is registrars, the maximal audience is probably everyone
on the Internet.  If we think that purposes 2, 3, or 5 are limited in
respect of who needs to make such contact or who needs to check
accuracy, then the maximal audience is the set of all those who have
such a need.  It's worth observing, however, that at least the
technical contact for a name ought to be contactable by anyone on the
Internet, since when we want to "facilitate communication with domain
contacts" at least part of the reason is as a fallback when a site
breaks in some way.  (This may suggest that we need to unpack the
details of purpose 3.)

3.  NAME SERVERS
---------------

a.  Collection

Without collecting the name servers, domain names cannot function on
the Internet, so this is required under purposes 1 and 2.  (Given that
the registration of the name itself and the collection of the name
servers are both required for the basic functioning of the Internet
Domain Name System, it strikes me that we may be missing a more
obvious purpose in our list, but I guess (1) and (2) will be enough
and we're already so late that I am loathe to suggest something more.)

b.  Publication

Whenever a name is available on the Internet, the name server data is
already available in the DNS, so this data is necessarily published.
Under either purpose 1 or 2 (or both), the data about nameservers in
the RDS provides an avenue for troubleshooting issues in the DNS, and
so it is required for those purposes.

c.  Maximal audience

Anyone who wants to access a site must be able to find this data in
the DNS.  Potentially anyone who has a problem with resolution can use
the data in the RDS to aid in troubleshooting, so the audience under
purpose 1 or 2 (or both) is everyone on the Internet.

4.  STATUS VALUES
----------------

a.  Collection

The status values are not exactly "collected", but are at least in
part the result of various actions by the sponsoring registrar and
registry on the name.  (Some can be set directly.)  These govern the
disposition of the name in question, and are a necessary condition for
having a shared registration system, so they are required under
purpose 1.

b.  Publication

The status values govern the possible things that could be done to a
name, and therefore the data must be published under purpose 1.

c.  Maximal audience

At leasr some status values are required for doing some
troubleshooting of resolution failures, so the audience for at least
some values under purposes 1 or 2 is "everyone on the Internet".  It
is possible to argue that some of the status values are relevant only
to those people who wish to perform some actions on the domain (such
as transferring) or people in a position to do some kinds of activity
(such as updating contact information).   If we really think it
necessary, we could undertake the exercise of audience evaluation for
each EPP status.

5.  DATES
---------

While the dates might appear to be different kinds, they aren't, since
for our purposes they all have at least one common utility (see
below).

a.  Collection

The dates, like status values, are not exactly "collected": they're a
consequence of certain activities.  They're necessary for the workings
of the shared registration systems using the current fee-for-term
model that (approximately?) all gTLD registries use today, so they're
required under purpose 1.

b.  Publication

The dates are required under purpose 1 or 2 in order to aid
troubleshooting of resolution.  (If a name worked yesterday and not
today, it is helpful to know that it was just created -- meaning the
old one was deleted -- or that it is expired, or that someone updated
the name only last night.)

c.  Maximal audience

Because of the troubleshooting aspects of these dates, the audience
under purpose 1 or 2 is everyone on the Internet.

-- 
Andrew Sullivan
ajs at anvilwalrusden.com


More information about the gnso-rds-pdp-wg mailing list