[gtld-tech] [weirds] Search Engines Indexing RDAP Server Content
shollenbeck at verisign.com
Wed Feb 3 17:40:37 UTC 2016
> -----Original Message-----
> From: John Levine [mailto:johnl at taugh.com]
> Sent: Wednesday, February 03, 2016 12:06 PM
> To: gtld-tech at icann.org
> Cc: Hollenbeck, Scott
> Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server
> >That will only work if a crawler reads robots.txt and respects the
> >published directive(s). Not all do.
> All of the search engines used by consumers do.
I've personally seen one operated by a company whose name starts with "G" GETting content on a site I operate in violation of the directives I publish in the site's robots.txt file. YMMV.
> I'm still having trouble understanding what the problem is here. The
> specific set of records that Scott noticed are in fact just some
> examples linked from a public web page, and I see no reason to think
> that it'd be hard to keep RDAP info out of the usual search engines if
> that's what you want to do. For a very long time, Domaintools and
> others have scraped WHOIS info and provide a little of it for free and
> more for pay. RDAP doesn't change that.
RDAP *could* change that.
> If you want to redact information beyond what's in WHOIS, that's a
> reasonable discussion to have, but it's exactly the same for WHOIS or
If "it" ("it's exactly the same") refers to the source of the data, yes, they are the same. If "it" refers to the tools we have available to control access to the data I have to disagree. The example Gavin found is as you and others have noted. The problem (as I see it anyway) would be more obvious if the indexed response contained PII.
As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit.
More information about the gtld-tech