[gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

Wed Feb 3 12:06:17 UTC 2016

> -----Original Message-----
> From: gtld-tech-bounces at icann.org [mailto:gtld-tech-bounces at icann.org]
> On Behalf Of Stephane Bortzmeyer
> Sent: Wednesday, February 03, 2016 5:05 AM
> To: Francisco Arias
> Cc: gtld-tech at icann.org
> Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server
> Content
> 
> On Wed, Feb 03, 2016 at 12:23:42AM +0000,
>  Francisco Arias <francisco.arias at icann.org> wrote
>  a message of 77 lines which said:
> 
> > The search page
> > (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to
> > be the result of crawling links from the first link that appears
> > there (http://rdg.afilias.info/rdap/help). The help page contains
> > links to search and lookup examples that return several objects with
> > their directly-related objects, which are in turn shown in the
> > search results. This could have happened in web-Whois if someone
> > were to publish a page containing example queries.
> 
> It seems to me that having a robots.txt at the root of the RDAP server
> would solve the problem (if you regard it as a problem).
> 
> User-agent: *
> Disallow: /

That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.

Scott