[gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

Wed Feb 3 16:58:06 UTC 2016

On Wed, Feb 3, 2016 at 7:38 AM, Michele Neylon - Blacknight
<michele at blacknight.com> wrote:
> On 03/02/2016, 12:06, "gtld-tech-bounces at icann.org on behalf of Hollenbeck, Scott" <gtld-tech-bounces at icann.org on behalf of shollenbeck at verisign.com> wrote:
>
>>>-----Original Message-----
>>>From: gtld-tech-bounces at icann.org [mailto:gtld-tech-bounces at icann.org]
>>>On Behalf Of Stephane Bortzmeyer
>>>Sent: Wednesday, February 03, 2016 5:05 AM
>>>To: Francisco Arias
>>>Cc: gtld-tech at icann.org
>>>Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server
>>>Content
>>>On Wed, Feb 03, 2016 at 12:23:42AM +0000,
>>>  Francisco Arias <francisco.arias at icann.org> wrote
>>>  a message of 77 lines which said:
>>>> The search page
>>>> (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to
>>>> be the result of crawling links from the first link that appears
>>>> there (http://rdg.afilias.info/rdap/help). The help page contains
>>>> links to search and lookup examples that return several objects with
>>>> their directly-related objects, which are in turn shown in the
>>>> search results. This could have happened in web-Whois if someone
>>>> were to publish a page containing example queries.
>>>It seems to me that having a robots.txt at the root of the RDAP server
>>>would solve the problem (if you regard it as a problem).
>>>User-agent: *
>>>Disallow: /
>>
>>That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.
>>
>>Scott
>>
>
>
>
> The nastier bots ignore the robots.txt directives ..
>
> As Scott and others have pointed out, unauthenticated access *is* a problem
>
> Trying to draw parallels between current whois (web or otherwise) and RDAP might work with less technical types, but with this audience it simply won’t fly.
>
> RDAP’s entire “power” lies in the way that you can traverse the database in multiple ways
>
> You cannot do that with “normal” whois and this is both a security and a privacy issue
>
> Repeatedly telling us you don’t think it is doesn’t change the fact that it is
>
> Regards
>
> Michele
>

This is incidental crawling though. Data miners have been targeting
Whois for years with great success.

-andy