[UA-discuss] DRAFT: UA103 - Programming Hacks

Tex Texin textexin at xencraft.com
Thu Jun 29 08:42:55 UTC 2017


Stuart makes a good point. I added a number of suggestions to the document as well.

tex

 

From: ua-discuss-bounces at icann.org [mailto:ua-discuss-bounces at icann.org] On Behalf Of Stuart Stuple via UA-discuss
Sent: Wednesday, June 28, 2017 7:35 AM
To: Don Hollander; ua-discuss at icann.org
Subject: Re: [UA-discuss] DRAFT: UA103 - Programming Hacks

 

There is somewhat “in passing” statement about sorting such that punycode and full representations are treated as equivalent. How is that expected to be done? It seems like a fair bit of additional code that would slow down a simple sort.

 

From: ua-discuss-bounces at icann.org [mailto:ua-discuss-bounces at icann.org] On Behalf Of Don Hollander
Sent: Tuesday, June 27, 2017 2:37 PM
To: ua-discuss at icann.org
Subject: [UA-discuss] DRAFT: UA103 - Programming Hacks

 

https://docs.google.com/document/d/1i4OAeojY5dj3ZAG8sHcrpSdaG1vN_R011MOuza4djRY/edit?usp=sharing

 

As part of our efforts to raise awareness, we’re producing editorials/guest blogs that we’re getting published in various professional body newsletters and such.

 

Our third editorial is focusing on Programming Hacks.

 

Editable copy is available at the link above.

 

Here’s the current draft.

 

Comments please, by the 10th of July.

 

Thanks.

 

Don

 

Programming language hacks

UA103

 

Computer programmers are nice people.  They are intelligent.  They are wise.  (Their mothers would say they are good looking J).  They know that if people have a chance to make a mistake, they will.  So, the wise, intelligent (and good looking) computer programmers build their systems to help prevent mistakes at the source.  Computer programmers are also very efficient (their siblings might say lazy) and will reuse code – either their own or someone else’s. For years computer programmers have been putting the same data validation of email addresses and domain names into their code to reduce the amount of Garbage In.  And this now turns out to be a mistake.

 

What’s happened is that while the validation code has been pretty static, email addresses and domain names have been changing quite radically.   

 

Since 2001, Top Level Domain Names (TLDs) have been longer than two or three characters.  Since 2010 Top Level Domain names have been available in non-ASCII characters.  And since 2013, the number of TLDs and their frequency of entry into the Root Zone has gone off the charts!

 

And today we also have mailbox names (the label to the left of the ‘@’) that can also be in non-ASCII characters!

 

So, it’s time for computer programmers to update their code to accommodate these new domain name options.

 

So here are some tips and tricks for computer programmers to use when updating their code:

 

Input

Data fields that accept domain names or email addresses should be able to accept ASCII and non-ASCII characters.  UTF-8 is the key here.  This will affect both programs that accept data from a keyboard or other data sources and the database where its stored.   The good news is that most modern databases will have no problems with this.

 

Validation

The easiest way to deal with this is to only use syntactic validation against the specifications in the RFCs <https://uasg.tech/wp-content/uploads/2017/06/UA006-Relevant-RFCs.pdf> [1].

There are other ways of making sure the data entered is what the user meant, such as requiring entry of the field twice and doing a compare.

If you need to validate further, use a DNS lookup – that’s the most certain.   Or if you’re going to use a local table, make sure that it’s from an authoritative source and that it’s updated at least daily.  

 

Storage

The easiest way to deal with storage is to support UTF-8.   But for applications that can’t, there is an algorithm (Punycode)[2] that allows transformation of domain names between ASCII and non-ASCII strings. NB: the Punycode conversion may NOT work for the mailbox name in an email address.  Alternative encoding schemes exist and should be applied.

 

Processing

There are times when two different representations of a domain name are not the same but are equivalent.  For example, when a non-ASCII domain name has been converted using the Punycode algorithm.  When processing or sorting, it’s important that they are treated as equivalent.   This will require some policies for the application or indeed the organization as to how domain names and email addresses are being dealt with.

 

Display

Because domain names in non-ASCII characters (and mailbox names too) are growing more popular, you’ll need to make sure that you’re able to display them in a way that works for your community.   Public facing applications should certainly display in native scripts and not an ASCII resulting from a Punycode transformation.

 

Check Libraries

A growing number of libraries, particularly Open Source Programming Language Libraries, will be correcting their validation routines, so being able to be UA Ready may be as simple as recompiling the code using the latest versions of the library.   The UASG is encouraging remediation work in quite a number of libraries.

Github and SourceForge are also two good places to look to find working code.

The UASG also publishes some good reference material at www.uasg.tech/documents.

 

Most efforts to get applications UA Ready will fall into the ‘Bug Fix’ level of effort.  It’s time to get applications up to scratch.   

 

 

 

 

 




  _____  




  _____  


  _____  

[1] https://uasg.tech/wp-content/uploads/2017/06/UA006-Relevant-RFCs.pdf

[2] https://www.ietf.org/rfc/rfc3492.txt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170629/52044e47/attachment.html>


More information about the UA-discuss mailing list