[ChineseGP] [lgr] On drafting and LGR in the new XML format

Asmus Freytag asmusf at ix.netcom.com
Thu May 22 20:12:44 UTC 2014


LGR list members,

the requirement that the LGRs created by the Generation Panels must be 
submitted in the new XML Format For Representing Label Generation Rules 
<link> continues to be the cause of some apprehensions.

My intent with this message is to start a discussion and perhaps help 
dispel some of these apprehensions.

The MSR drafted by the integration panel contains an XML file, that 
lists the repertoire of the MSR in the XML format. If your Generation 
Panel does NOT support variants and does NOT support script-specific WLE 
rules, the simplest way to generate the required XML file might be to 
delete from the MSR all entries that are not included in your LGR.

A more generic approach would be to use a plain text file in a private 
format to list the code points and variants to do all the editing. This 
file could be in one of the several existing format for IDN tables.

Because /generating /XML is much easier than /editing /XML, the approach 
that we followed in creating the MSR is to do our editing in such a 
private format, and then to use a short Perl script to generate the XML 
from that.

By doing this, we avoid introducing the kinds of XML errors that are the 
result of hand-editing an XML file.

For those scripts that use variants on the second level today, existing 
IDN tables have ways to express variants, and these can be converted to 
the new format by a Perl script as well. The effort required is quite 
moderate. If I may, I'd like to refer to my own experience. For a 
different ICANN project, I managed to write a functioning reader in a 
full-fledged programming language (not a script language) in about a 
week, supporting not just one IDN table format, but as many formats as 
existed in the IANA registry. To write the method that outputs the same 
data in XML format took some additional time.

I assume that most GPs already have draft repertoire lists that could be 
converted to the same format as the MSR, and a simple diff could be used 
to find out whether the draft repertoire accidentally contains code 
points not allowed in the MSR. Alternatively, there are many tools that 
read XML files and can translate them into columnar tables. MS- Excel 
can do that, for example. Once saved in .CSV format, such tables are 
simple to transform into whatever private format your GP has decided to 
use for editing.

Either way, you will have your draft data in a format that you are 
familiar with while you are editing, and a format that is easily 
compared to both your starting collection (the one you derived from the 
MSR intersected by your script) and to later versions of your draft 
repertoire.

Now, when it comes to creating whole label evaluation rules, no such 
shortcuts exist. What I would propose is that any Generation Panel that 
intends to define such rules, create a detailed description with regular 
expressions or pseudo code (as in RFC 5892 for example). Those of us who 
are more familiar with the format would then be able to assist you in 
translating these rules into the XML format.

The resulting XML for the rules can be appended by script to the part 
that is created from the plain text list of repertoire and variants. 
That way, your Generation Panel can continuously have an updated draft 
of its complete LGR in the XML format without doing any actual XML editing.

Finally, there is the matter of validation and correctness of the XML 
specification. If your GP produces XML drafts according to these 
suggestions, it would be a simple matter to assist the GPs in making 
sure that the XML is valid and satisfies any other restrictions that are 
specific to the Root Zone LGR work. It's not clear whether those tools 
that Integration Panel members wrote for our own purposes on the panel 
can be shared, or whether ICANN staff at this time has a tool that is 
shareable, but I know this is being looked into.

Until then, we can always run any early LGR drafts that are shared with 
us through our tools and report any issues. The main goal is to make 
sure that no LGR is rejected based on a simple typo or syntax error.

A./
Asmus Freytag
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mm.icann.org/mailman/private/chinesegp/attachments/20140522/c54d56ef/attachment.html>
-------------- next part --------------
_______________________________________________
lgr mailing list
lgr at icann.org
https://mm.icann.org/mailman/listinfo/lgr


More information about the ChineseGP mailing list