[CPWG] ME community satetment about the ICANN Open data prlatform

Chokri Ben Romdhane chokribr at gmail.com
Fri Feb 4 23:08:44 UTC 2022


Just to add that  the ICANN open data metadata vocabulary is based on
the Project
Open Data Metadata Schema v1.1
<https://project-open-data.cio.gov/v1.1/schema/> with minor amendments.

Chokri


Le jeu. 3 févr. 2022 à 7:55 PM, John McCormac via CPWG <cpwg at icann.org> a
écrit :

> On 03/02/2022 15:26, Chokri Ben Romdhane via CPWG wrote:
> > Dear Friends,
> > During the ICANN72 ME space session
> > <https://72.schedule.icann.org/meetings/ir8CyynKdp3GwtbsY> , we
> > submitted a statement
> > <https://drive.google.com/file/d/1ZRqAXPrjcU1B9v_6ZwSdHoF-SgLjj_65/view>
>
> > to the board about the ICANN Open Data Platform, and we received the
> > following answers
> > <
> https://drive.google.com/file/d/1OoiqWDS7pkT_EplN5J_izfzSQj7aBJY-/view?usp=sharing>
>
> > from the Board.
>
> In the presentation given, I thnk that Ashwin Rangan may have been
> unaware of the issues with the ODP when it came to the per-registrar
> data. The problems with the per-registrar transactions were mainly that
> the importation of the CSV files into the ODP was not a simple process
> due to missing data, corrupted data and differing formats in the CSV files.
>
> The limitation of the ODP in handling what are effectively trivial
> datasets is disturbing. With the expansion of the numbers of gTLDs and
> subsequent rounds, the ODP, with a limited dataset licence, would
> quickly be of limited value. That should have been immediately obvious
> to ICANN.
>
> The retention of CSVs in parallel with the ODP is the best strategy.
> This is because the CSV is a more robust format and errors are much
> easier to identify. This is how it was possible to identify the problems
> with the per-registrar data.
>
> There is a serious normalisation problem with the per-registrar data in
> that some registries have their own names for the registrars. The
> language for the column headers issue is a relatively simple issue with
> a properly designed database schema but I am not sure how the ODP could
> handle multiple languages. I tried subscribing to the ICANN ME mailing
> list after the presentation.
>
> Though the ODP is a useful tool, it is lacking historical depth. Some of
> this is due to data formats and data being in PDF format (which varied
> from registry to registry) rather than CSV. I successfully
> reverse-engineered and extracted the data from most of these PDFs back
> to 2006 for some gTLDs to build a database of historical per-registrar
> transactions. It was an interesting exercise.
>
> The formatting in the PDFs varied. Some of the data (deletion figures)
> for .COM and .NET was missing from the per-registrar reports until
> Verisign adopted the new reporting format. There were some other data
> quality issues that have persisted The .AFRICA per-registrar reports
> have been missing the new-adds and renews data and have been so since
> the gTLD launched. The latest (October 2021) report for the gTLD is
> still missing this data.
>
> The ODP offers a useful interface for dealing with the data but the best
> application would be one in Python, Ruby or other programming language
> to download datasets to be processed locally. The database schema for
> the per-registrar reports is standardised so it is easy enough to load
> this data into a database with a single statement. The schema for the
> other datasets is also available on the ODP, I think.
>
> Regards...jmcc
> --
> **********************************************************
> John McCormac  *  e-mail: jmcc at hosterstats.com
> MC2            *  web: http://www.hosterstats.com/
> 22 Viewmount   *  Domain Registrations Statistics
> Waterford      *  Domnomics - the business of domain names
> Ireland        *  https://amzn.to/2OPtEIO
> IE             *  Skype: hosterstats.com
> **********************************************************
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
>
> _______________________________________________
> By submitting your personal data, you consent to the processing of your
> personal data for purposes of subscribing to this mailing list accordance
> with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and
> the website Terms of Service (https://www.icann.org/privacy/tos). You can
> visit the Mailman link above to change your membership status or
> configuration, including unsubscribing, setting digest-style delivery or
> disabling delivery altogether (e.g., for a vacation), and so on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/pipermail/cpwg/attachments/20220205/44a4d35b/attachment-0001.html>


More information about the CPWG mailing list