[gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

Wed Feb 3 17:40:37 UTC 2016

> -----Original Message-----
> From: John Levine [mailto:johnl at taugh.com]
> Sent: Wednesday, February 03, 2016 12:06 PM
> To: gtld-tech at icann.org
> Cc: Hollenbeck, Scott
> Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server
> Content
> 
> >That will only work if a crawler reads robots.txt and respects the
> >published directive(s). Not all do.
> 
> All of the search engines used by consumers do.

I've personally seen one operated by a company whose name starts with "G" GETting content on a site I operate in violation of the directives I publish in the site's robots.txt file. YMMV.

> I'm still having trouble understanding what the problem is here.  The
> specific set of records that Scott noticed are in fact just some
> examples linked from a public web page, and I see no reason to think
> that it'd be hard to keep RDAP info out of the usual search engines if
> that's what you want to do.  For a very long time, Domaintools and
> others have scraped WHOIS info and provide a little of it for free and
> more for pay.  RDAP doesn't change that.

RDAP *could* change that.

> If you want to redact information beyond what's in WHOIS, that's a
> reasonable discussion to have, but it's exactly the same for WHOIS or
> RDAP.

If "it" ("it's exactly the same") refers to the source of the data, yes, they are the same. If "it" refers to the tools we have available to control access to the data I have to disagree. The example Gavin found is as you and others have noted. The problem (as I see it anyway) would be more obvious if the indexed response contained PII.

As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit.

Scott