[registrars] Draft Registrar Submission to TF1

Paul Stahura stahura at enom.com
Fri Apr 2 00:20:36 UTC 2004


I drafted this submission to whois TF1 on Restricting Access (whois data
mining).  I attempted to gather the input from registrars in Rome, on the
list, and in private emails and calls.  I tried to incorporate all registrar
viewpoints.  I did not weigh other constituency's viewpoints highly.  If you
would like to make modifications, please let me know, or just go ahead and
make changes (with changes turned on) and send them to me or the list. 

Below follows a plain-text version (see attached for word doc).  Three other
TF1 constituency statements are also below.

Paul





RC Statement 		Vers1
On Whois TF1: Restricting Access/Data Mining

The registrar's policy recommendation for the Restricting Access/Data Mining
whois task force (TF1) has a great dependency on the results of the "Data
Collected and Displayed" whois task force (TF2).  If for example, the TF2
determines that the data to be displayed, especially via port-43, is limited
to non-sensitive information ("non-sensitive information" defined as the
domain itself, name servers, organization-names, and the
registrar-of-record) and that information is not personally identifiable
information, then the information to be mined will be of less value to
miners and hence, mining will be reduced.  On the other hand, if the TF2
determines that sensitive information ("sensitive information" defined as,
but not limited to, person-names, street addresses, phone numbers, and email
addresses) is to be displayed, then there will be a great incentive to mine
the data because it will be more valuable.  There is also a dependency on
TF3, because if the data is 100% accurate, and at the same time, mandated to
be displayed, then it becomes even more valuable, which further increases
the motivation for mining.

The whois data is the registrant's data.  It should remain in the control of
the data subject as much as possible.  As the whois data moves away from the
registrants to the registrars and further, to fat registries, and to even
more distant 4th  and 5th parties, it becomes less and less in the control
of the registrants.  The registrars should not be obligated to provide whois
data to any party that can not guarantee that the data will be treated in a
manner consistent with the policies and legislation under which it was
collected.  Therefore, any data collected from registrants must remain as
close as possible to the registrants, at the registrar.  As the whois
information is passed to these other entities, more access policy-control
problems are created (because there are geometrically more locations at
which to mine the data). Because the registrars will always be closer to the
registrants, and in between the registry and the registrant, the utility of
a thick registry model should be evaluated.

If TF2 determines that sensitive information must be displayed, the
registrars support a policy whereby registrars may:
1)	Shut off port-43 access to the public; if not completely remove it
for all. 
a.	If not completely removed, 
i.	Who is the "the public" and who is not would need to be defined
ii.	Registrars must be granted access to port-43 whois,  in standardized
format, but only for the purposes of performing transfers and only for so
long as all gTLD registries are not EPP (thick or thin) or until another
inter-registrar transfer mechanism replaces it.
iii.	Port-43 query rate limiting must be allowed.
iv.	The identities of the non-public requestors must be known to the
registrars and may be recorded by the registrars so that it can be
communicated to the registrants.
v.	The requestor must have a defined, valid purpose for each request
and that purpose must be known to the registrars and may be recorded by the
registrars so that it can be communicated to the registrants.  Some
registrars believe a valid purpose exists currently and some do not.
vi.	The requestor cannot act as a proxy 
2)	Display the whois information on a publicly accessible web site, but
only in a manner such that the information cannot be easily mined, and
consistent with the policies and governmental laws under which it was
collected.  It is the registrars' real-world experience that CAPTCHA systems
(systems that perform checks for humans, such as requesting a person to type
in number to access a single whois record) and other systems (such as
tracking the number of queries from a particular IP address), though
imperfect, do work to greatly reduce automated data mining of the whois via
the web.  Registrars must continue to be allowed to use such systems.
3)	Continue to provide "identity protection" products to registrants.

Because the result is the same (obtaining the totality, or a large portion,
of the whois information), the registrars assert that the following are
identical:
1)	Mining of registrar's port-43 output
2)	Mining of fat registry's port-43 output
3)	Mining a 3rd party's port-43 that proxies access to any registrar's
or registry's port-43 output
4)	Mining the registrar's web-based display of whois information
5)	Mining the fat registries web-based display of whois information
6)	Bulk access 
Therefore, if the data elements displayed/disclosed is the same, whatever
access policies and controls are put in place for one must be in place for
the others.  For example, if the identity of the requestor (and purpose,
lets say) must be known for bulk access, then it also must be known for
mining (high query rate) of port-43.




Summary of Positions

I.	IPC
A.	IPC supports, in principle, the use of query volume limitations on
Port 43 access in order to discourage data mining.
B.	Being supportive of the debate, the IPC submits that any changes in
practice or regulation have to be designed in a manner that does not
inadvertently have detrimental effects on the legitimate use of Whois.
C.	Specifics:
a.	Any provision should maintain and ensure availability of unhampered
access to Port 43 for legitimate applications that require high volume
access to domain name Whois for use in creating value-added products and
services that are of great value to the intellectual property community and
to the business community in general.  
b.	Adequate provision must be made for intermediaries which aggregate
low-volume requests from end-users into a relatively high volume of queries
through Port 43.
c.	A solution must identify realistic volume break-points between
low-volume queries via Port 43 that should remain unrestricted, and a very
high volume of queries that could, in principle, require an efficient and
workable form of disclosure to registrars (or registries in the thick
registry model) of the uses to which query results would be put.  
d.	The solution should also preserve the unrestricted availability of
Whois queries through a web-based interface, and the status of Port 43 as a
service available free of charge. 
e.	The solution must be accompanied by proactive enforcement of the
obligation to make bulk access available. 



II.	ALAC
	
A	Two-tiered system.
*	Tier 1:  Public Access.  Users who access a future WHOIS-like system
anonymously get access to non-sensitive information concerning a domain name
registration, to be defined in detail by task force 2.
*	Tier 2: Authenticated access.  Users who want to access a more
complete data set (to be defined in detail by task force 2) need to reliably
identify themselves, and indicate the purpose for which they want to access
the data.	The identity of the data user and their purpose is recorded
by registrars and registries, and made available to registrants when
requested.  This information could be withheld for a certain amount of time
if the data user is (1) a law enforcement authority that is (2) accessing
the data for law enforcement purposes.
B.	Implementation:  No specific implementation recommended; example:
SSL client certificates. [Prefer IRIS or other dedicated protocol over web
forms.]
C.	Rationale:
*	Find out purpose of use of Whois data.  Registrars would have to
verify purpose, but can't.  Resort to heuristics.
*	The best heurisitc we know of is to hold data users accountable for
their activities, and to put enforcement of purpose limitations into the
hands of registrants.  This can be achieved by reliably identifying data
uses and putting their identity, contact information, and purpose indication
in the hands of registrants.
*	At the same time, a tiered system -- if implemented reasonably --
could preserve the ability of data users to automatically access WHOIS data
in reasonable quantities. Registrars, on the other hand, would be enabled to
limit the amount of data any particular party can access in a given interval
of time.

B.	Discussion of other proposals
*	CAPTCHA:  There have been suggestions that "automated access" could
be used as a heuristic to determine illegitimate access.  In this scheme,
automated access is blocked by attempting to require human attention with
all queries.  One set of	implementations of these kinds of tests is
known as CAPTCHA.
o	CAPTCHA blocks legitimate automated access
o	Easy to circumvent because of design problems (See
http://boingboing.net/2004_01_01_archive.html#107525288693964966 and
http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html 
o	Accessibility issues: http://www.w3.org/TR/turingtest/ 
o	In Sum:  Do not recommend.




III.	Noncommercial Domain Name Holders
A.	ICANN must recognize that the purpose of Whois originally was
identification of domain owners for purposes of solving technical problems.
The purpose was not to provide law enforcement or other self-policing
interests with a means of circumventing normal due process requirements for
access to contact information. 
B.	NCUC does not believe it is possible to develop technical mechanisms
that can restrict port 43 or port 80 access only to a specific type of
purpose; e.g., "nonmarketing uses." Access restrictions imposed by TF1 will
inevitably apply to any whois user regardless of purpose. Moreover,
restricting Port 43 access while leaving Port 80 open will only drive the
automated processes to Port 80.
C.	Therefore we question whether TF1 can achieve anything of value.
Task force should refrain from making judgments about the legitimacy of,
justifications for, or "need" for any non-marketing uses. It is outside the
scope of TF1 to make any such determinations. 
D.	Automated scripts or programs using port 43 are effectively a
substitute for bulk access.  A policy determination on port 43 access is
best made in conjunction with a determination on bulk access.
E.	Fifth, the best way to stop abuse of ports 43 or 80 is to get data
that is valuable to spammers out of the public Whois database. [TASK FORCE
2]
F.	Changes to Port 43 are not a substitute for privacy protection.




-------------- next part --------------
A non-text attachment was scrubbed...
Name: TF1 statement v1.doc
Type: application/msword
Size: 32256 bytes
Desc: not available
URL: <https://mm.icann.org/pipermail/registrars/attachments/20040401/2d6f897c/TF1statementv1.doc>


More information about the registrars mailing list