[CPWG] ME community satetment about the ICANN Open data prlatform

gopal at annauniv.edu gopal at annauniv.edu
Fri Feb 4 14:53:58 UTC 2022


Dear All,

Handling Data in Multiple File Formats is a vexing problem. There is no 
perfect file format.
Each will have advantages and disadvantages. File format choices are 
often bundled with and
determined by the software used.

Creating a generic service that can convert between different file 
formats is a
good solution. There are many "ELECTRONIC DATA CAPTURE SOFTWARE TOOLS".

This approach may be an alternative to evolving a consensus based 
standard format.
More so, when the multi-stakeholder context evolves.

Hope this helps.

Sincerely,




Gopal T V
0 9840121302
https://vidwan.inflibnet.ac.in/profile/57545
https://www.facebook.com/gopal.tadepalli

PS: @ APRALO Ms. Justine Chew was with ICANN DAAR. But, I do not 
remember
listening to her on this topic in the past few CPWG meetings.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dr. T V Gopal
Professor
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
       (Res) 24454753
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On 2022-02-04 19:06, Chokri Ben Romdhane via CPWG wrote:
> Thank you Hadia for your permanent support.
> 
> Thank you John for the great point  and I totally agree with your
> point  that a consensual standard reports (Data) structure (Formats)
> may be adopted by contracted parties in order to facilitate data
> exchange and/or Integration between systems.
> Note that current trends are for the use of JSON or XML formats in
> order to exchange Data rather than CSV format.
> Note also that With the Rest API, datasets can be locally downloaded
> in order to  be used by any Software Development Kit and/or used
> remotely.
> 
> Friendly
> Chokri
> 
> Le jeu. 3 févr. 2022 à 19:55, John McCormac via CPWG
> <cpwg at icann.org> a écrit :
> 
>> On 03/02/2022 15:26, Chokri Ben Romdhane via CPWG wrote:
>>> Dear Friends,
>>> During the ICANN72 ME space session
>>> <https://72.schedule.icann.org/meetings/ir8CyynKdp3GwtbsY> , we
>>> submitted a statement
>>> 
>> 
> <https://drive.google.com/file/d/1ZRqAXPrjcU1B9v_6ZwSdHoF-SgLjj_65/view>
>> 
>>> to the board about the ICANN Open Data Platform, and we received
>> the
>>> following answers
>>> 
>> 
> <https://drive.google.com/file/d/1OoiqWDS7pkT_EplN5J_izfzSQj7aBJY-/view?usp=sharing>
>> 
>>> from the Board.
>> 
>> In the presentation given, I thnk that Ashwin Rangan may have been
>> unaware of the issues with the ODP when it came to the per-registrar
>> 
>> data. The problems with the per-registrar transactions were mainly
>> that
>> the importation of the CSV files into the ODP was not a simple
>> process
>> due to missing data, corrupted data and differing formats in the CSV
>> files.
>> 
>> The limitation of the ODP in handling what are effectively trivial
>> datasets is disturbing. With the expansion of the numbers of gTLDs
>> and
>> subsequent rounds, the ODP, with a limited dataset licence, would
>> quickly be of limited value. That should have been immediately
>> obvious
>> to ICANN.
>> 
>> The retention of CSVs in parallel with the ODP is the best strategy.
>> 
>> This is because the CSV is a more robust format and errors are much
>> easier to identify. This is how it was possible to identify the
>> problems
>> with the per-registrar data.
>> 
>> There is a serious normalisation problem with the per-registrar data
>> in
>> that some registries have their own names for the registrars. The
>> language for the column headers issue is a relatively simple issue
>> with
>> a properly designed database schema but I am not sure how the ODP
>> could
>> handle multiple languages. I tried subscribing to the ICANN ME
>> mailing
>> list after the presentation.
>> 
>> Though the ODP is a useful tool, it is lacking historical depth.
>> Some of
>> this is due to data formats and data being in PDF format (which
>> varied
>> from registry to registry) rather than CSV. I successfully
>> reverse-engineered and extracted the data from most of these PDFs
>> back
>> to 2006 for some gTLDs to build a database of historical
>> per-registrar
>> transactions. It was an interesting exercise.
>> 
>> The formatting in the PDFs varied. Some of the data (deletion
>> figures)
>> for .COM and .NET was missing from the per-registrar reports until
>> Verisign adopted the new reporting format. There were some other
>> data
>> quality issues that have persisted The .AFRICA per-registrar reports
>> 
>> have been missing the new-adds and renews data and have been so
>> since
>> the gTLD launched. The latest (October 2021) report for the gTLD is
>> still missing this data.
>> 
>> The ODP offers a useful interface for dealing with the data but the
>> best
>> application would be one in Python, Ruby or other programming
>> language
>> to download datasets to be processed locally. The database schema
>> for
>> the per-registrar reports is standardised so it is easy enough to
>> load
>> this data into a database with a single statement. The schema for
>> the
>> other datasets is also available on the ODP, I think.
>> 
>> Regards...jmcc
>> --
>> **********************************************************
>> John McCormac  *  e-mail: jmcc at hosterstats.com
>> MC2            *  web: http://www.hosterstats.com/
>> 22 Viewmount   *  Domain Registrations Statistics
>> Waterford      *  Domnomics - the business of domain names
>> Ireland        *  https://amzn.to/2OPtEIO
>> IE             *  Skype: hosterstats.com [1]
>> **********************************************************
>> 
>> --
>> This email has been checked for viruses by AVG.
>> https://www.avg.com
>> 
>> _______________________________________________
>> CPWG mailing list
>> CPWG at icann.org
>> https://mm.icann.org/mailman/listinfo/cpwg
>> 
>> _______________________________________________
>> By submitting your personal data, you consent to the processing of
>> your personal data for purposes of subscribing to this mailing list
>> accordance with the ICANN Privacy Policy
>> (https://www.icann.org/privacy/policy) and the website Terms of
>> Service (https://www.icann.org/privacy/tos). You can visit the
>> Mailman link above to change your membership status or
>> configuration, including unsubscribing, setting digest-style
>> delivery or disabling delivery altogether (e.g., for a vacation),
>> and so on.
> 
> 
> Links:
> ------
> [1] http://hosterstats.com
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
> 
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.


More information about the CPWG mailing list