[registrars] TF1 draft

Paul Stahura stahura at enom.com
Fri Apr 16 20:16:26 UTC 2004


Here is the latest draft for TF1


RC Statement 		Vers4
On Whois TF1: Restricting Access/Data Mining

The registrars' policy recommendation for the Restricting Access/Data Mining
whois task force (TF1) has a great dependency on the results of the data
collected and displayed (whois task force (TF2)).  If for example, the TF2
determines that the data to be displayed, especially via port-43, is limited
to non-sensitive information ("non-sensitive information" defined as the
domain itself, name servers, organization-names, and the
registrar-of-record) and does not include personally identifiable
information, then the information to be mined will be of less value to
miners and hence, mining will be reduced.  On the other hand, if the TF2
determines that sensitive information ("sensitive information" defined as,
but not limited to, person-names, street addresses, phone and fax numbers,
and email addresses) is to be displayed, then there will be a great
incentive to mine the data because it will be more valuable.  There is also
a dependency on TF3, because if accuracy requirements are made more
exacting, and at the same time, this far more accurate and current data is
mandated to be displayed, then it becomes even more valuable, which further
increases the motivation for mining.  The potential rate of mining is a
concern not only to the registrants, whose sensitive data is taken by
miners, but also to registrars, for whom this has significant business
implications.

Whois data is the registrant's information.  It should remain in the control
of the data subject as much as possible.  As the whois data storage moves
away from the registrants to the registrars and further, to "thick"
registries, and to even more distant (and un-identified) 4th and 5th
parties, the registrant loses more and more control.  As the public has
learned more about how their information is abused, customers have begun to
demand more privacy for their information and to object to such loss of
control to parties with which they have no relationship or contact.
Customers are not happy about their registrars publishing their sensitive
whois data because registrars can not guarantee that the "4th and 5th"
parties would treat the data in a manner consistent with the policies and
laws under which it was collected.  

Requiring registrars to make data available to parties that they can not
bind to any standards or restrictions flies in the face of registrars'
responsibilities to their customers.  Registrars are in the untenable
position of having to comply with directly contradictory requirements - from
ICANN, and from their customers and national privacy laws.    As the whois
information is passed to these other entities, more access policy-control
problems are created (because there are geometrically more locations at
which to mine the data). Because the registrars are closer to the
registrants (their customers), registrars are in the best position of
protecting their customers' data, per the permissions provided by the
registrants. To protect their customers, registrants strongly advocate for
the ability to maintain data control. This means the right to display only
non-sensitive information to the public, while providing appropriate limited
access to the sensitive information.  This also means providing only
non-sensitive information at the registry level.

 
If TF2 determines that sensitive information must be displayed on the Web,
the registrars support a policy whereby registrars may:

1)	Shut off port-43 access to the public.  This requires a definition
of certain issues: 
a.	Who is the "the public" 
b.	Who has access?
i.		Registrars must be granted access to port-43 whois,  in
standardized format, but only for the purposes of performing transfers and
only for so long as all gTLD registries are not EPP (thick or thin) or until
another inter-registrar transfer mechanism replaces it.
ii.		The identities of the non-public requestors must be known to
the registrars and may be recorded by the registrars so that it can be
communicated to the registrants in appropriate circumstances.
iii.		The requestor must have a defined, valid purpose for each
request and that purpose must be known to the registrars and may be recorded
by the registrars so that it can be communicated to the registrants.  Some
registrars believe a valid purpose exists currently and some do not.
iv.		The requestor cannot act as a proxy 
c.	Port-43 query rate limiting must be allowed to protect against
mining, but the level of the limit must be determined.

2)	Display the whois information on a publicly accessible web site, but
only in a manner such that the information cannot be easily mined, and
consistent with the policies and governmental laws under which it was
collected.  It is the registrars' real-world experience that CAPTCHA systems
(systems that perform checks-for-humans, such as requesting a person to type
in number to access a single whois record) and other systems (such as
tracking the number of queries from a particular IP address), though
imperfect, do work to greatly reduce automated data mining of the whois via
the web.  Registrars must continue to be allowed to use such systems.

3)	Continue to provide "identity protection" products to registrants.

The safe guards established for Port 43 access must be put in place for all
analogous access points.   All of the following access points provide a
miner with access to all, or a large portion, of the whois database of many
registrants' sensitive information.  
1)	Mining of registrar's port-43 output
2)	Mining of fat registry's port-43 output
3)	Mining a 3rd party's port-43 that proxies access to any registrar's
or registry's port-43 output
4)	Screen-scraping (mining) the registrar's web-based display of whois
information
5)	Screen-scraping (mining) the fat registries web-based display of
whois information
6)	Bulk access 
Therefore, they are the same, any safe guard policies and controls put in
place for one access point must be in place for the others.  For example, if
the identity of the requestor (and purpose, lets say) must be known for bulk
access, then it also must be known for mining (high query rate) of port-43.









More information about the registrars mailing list