[CPWG] ME community satetment about the ICANN Open data prlatform
gopal at annauniv.edu
gopal at annauniv.edu
Fri Feb 4 14:53:58 UTC 2022
Dear All,
Handling Data in Multiple File Formats is a vexing problem. There is no
perfect file format.
Each will have advantages and disadvantages. File format choices are
often bundled with and
determined by the software used.
Creating a generic service that can convert between different file
formats is a
good solution. There are many "ELECTRONIC DATA CAPTURE SOFTWARE TOOLS".
This approach may be an alternative to evolving a consensus based
standard format.
More so, when the multi-stakeholder context evolves.
Hope this helps.
Sincerely,
Gopal T V
0 9840121302
https://vidwan.inflibnet.ac.in/profile/57545
https://www.facebook.com/gopal.tadepalli
PS: @ APRALO Ms. Justine Chew was with ICANN DAAR. But, I do not
remember
listening to her on this topic in the past few CPWG meetings.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dr. T V Gopal
Professor
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
(Res) 24454753
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 2022-02-04 19:06, Chokri Ben Romdhane via CPWG wrote:
> Thank you Hadia for your permanent support.
>
> Thank you John for the great point and I totally agree with your
> point that a consensual standard reports (Data) structure (Formats)
> may be adopted by contracted parties in order to facilitate data
> exchange and/or Integration between systems.
> Note that current trends are for the use of JSON or XML formats in
> order to exchange Data rather than CSV format.
> Note also that With the Rest API, datasets can be locally downloaded
> in order to be used by any Software Development Kit and/or used
> remotely.
>
> Friendly
> Chokri
>
> Le jeu. 3 févr. 2022 à 19:55, John McCormac via CPWG
> <cpwg at icann.org> a écrit :
>
>> On 03/02/2022 15:26, Chokri Ben Romdhane via CPWG wrote:
>>> Dear Friends,
>>> During the ICANN72 ME space session
>>> <https://72.schedule.icann.org/meetings/ir8CyynKdp3GwtbsY> , we
>>> submitted a statement
>>>
>>
> <https://drive.google.com/file/d/1ZRqAXPrjcU1B9v_6ZwSdHoF-SgLjj_65/view>
>>
>>> to the board about the ICANN Open Data Platform, and we received
>> the
>>> following answers
>>>
>>
> <https://drive.google.com/file/d/1OoiqWDS7pkT_EplN5J_izfzSQj7aBJY-/view?usp=sharing>
>>
>>> from the Board.
>>
>> In the presentation given, I thnk that Ashwin Rangan may have been
>> unaware of the issues with the ODP when it came to the per-registrar
>>
>> data. The problems with the per-registrar transactions were mainly
>> that
>> the importation of the CSV files into the ODP was not a simple
>> process
>> due to missing data, corrupted data and differing formats in the CSV
>> files.
>>
>> The limitation of the ODP in handling what are effectively trivial
>> datasets is disturbing. With the expansion of the numbers of gTLDs
>> and
>> subsequent rounds, the ODP, with a limited dataset licence, would
>> quickly be of limited value. That should have been immediately
>> obvious
>> to ICANN.
>>
>> The retention of CSVs in parallel with the ODP is the best strategy.
>>
>> This is because the CSV is a more robust format and errors are much
>> easier to identify. This is how it was possible to identify the
>> problems
>> with the per-registrar data.
>>
>> There is a serious normalisation problem with the per-registrar data
>> in
>> that some registries have their own names for the registrars. The
>> language for the column headers issue is a relatively simple issue
>> with
>> a properly designed database schema but I am not sure how the ODP
>> could
>> handle multiple languages. I tried subscribing to the ICANN ME
>> mailing
>> list after the presentation.
>>
>> Though the ODP is a useful tool, it is lacking historical depth.
>> Some of
>> this is due to data formats and data being in PDF format (which
>> varied
>> from registry to registry) rather than CSV. I successfully
>> reverse-engineered and extracted the data from most of these PDFs
>> back
>> to 2006 for some gTLDs to build a database of historical
>> per-registrar
>> transactions. It was an interesting exercise.
>>
>> The formatting in the PDFs varied. Some of the data (deletion
>> figures)
>> for .COM and .NET was missing from the per-registrar reports until
>> Verisign adopted the new reporting format. There were some other
>> data
>> quality issues that have persisted The .AFRICA per-registrar reports
>>
>> have been missing the new-adds and renews data and have been so
>> since
>> the gTLD launched. The latest (October 2021) report for the gTLD is
>> still missing this data.
>>
>> The ODP offers a useful interface for dealing with the data but the
>> best
>> application would be one in Python, Ruby or other programming
>> language
>> to download datasets to be processed locally. The database schema
>> for
>> the per-registrar reports is standardised so it is easy enough to
>> load
>> this data into a database with a single statement. The schema for
>> the
>> other datasets is also available on the ODP, I think.
>>
>> Regards...jmcc
>> --
>> **********************************************************
>> John McCormac * e-mail: jmcc at hosterstats.com
>> MC2 * web: http://www.hosterstats.com/
>> 22 Viewmount * Domain Registrations Statistics
>> Waterford * Domnomics - the business of domain names
>> Ireland * https://amzn.to/2OPtEIO
>> IE * Skype: hosterstats.com [1]
>> **********************************************************
>>
>> --
>> This email has been checked for viruses by AVG.
>> https://www.avg.com
>>
>> _______________________________________________
>> CPWG mailing list
>> CPWG at icann.org
>> https://mm.icann.org/mailman/listinfo/cpwg
>>
>> _______________________________________________
>> By submitting your personal data, you consent to the processing of
>> your personal data for purposes of subscribing to this mailing list
>> accordance with the ICANN Privacy Policy
>> (https://www.icann.org/privacy/policy) and the website Terms of
>> Service (https://www.icann.org/privacy/tos). You can visit the
>> Mailman link above to change your membership status or
>> configuration, including unsubscribing, setting digest-style
>> delivery or disabling delivery altogether (e.g., for a vacation),
>> and so on.
>
>
> Links:
> ------
> [1] http://hosterstats.com
> _______________________________________________
> CPWG mailing list
> CPWG at icann.org
> https://mm.icann.org/mailman/listinfo/cpwg
>
> _______________________________________________
> By submitting your personal data, you consent to the processing of
> your personal data for purposes of subscribing to this mailing list
> accordance with the ICANN Privacy Policy
> (https://www.icann.org/privacy/policy) and the website Terms of
> Service (https://www.icann.org/privacy/tos). You can visit the Mailman
> link above to change your membership status or configuration,
> including unsubscribing, setting digest-style delivery or disabling
> delivery altogether (e.g., for a vacation), and so on.
More information about the CPWG
mailing list