[UA-discuss] DRAFT: UA103 - Programming Hacks
don.hollander at icann.org
Tue Jun 27 21:37:27 UTC 2017
As part of our efforts to raise awareness, we’re producing editorials/guest blogs that we’re getting published in various professional body newsletters and such.
Our third editorial is focusing on Programming Hacks.
Editable copy is available at the link above.
Here’s the current draft.
Comments please, by the 10th of July.
Programming language hacks
Computer programmers are nice people. They are intelligent. They are wise. (Their mothers would say they are good looking J). They know that if people have a chance to make a mistake, they will. So, the wise, intelligent (and good looking) computer programmers build their systems to help prevent mistakes at the source. Computer programmers are also very efficient (their siblings might say lazy) and will reuse code – either their own or someone else’s. For years computer programmers have been putting the same data validation of email addresses and domain names into their code to reduce the amount of Garbage In. And this now turns out to be a mistake.
What’s happened is that while the validation code has been pretty static, email addresses and domain names have been changing quite radically.
Since 2001, Top Level Domain Names (TLDs) have been longer than two or three characters. Since 2010 Top Level Domain names have been available in non-ASCII characters. And since 2013, the number of TLDs and their frequency of entry into the Root Zone has gone off the charts!
And today we also have mailbox names (the label to the left of the ‘@’) that can also be in non-ASCII characters!
So, it’s time for computer programmers to update their code to accommodate these new domain name options.
So here are some tips and tricks for computer programmers to use when updating their code:
Data fields that accept domain names or email addresses should be able to accept ASCII and non-ASCII characters. UTF-8 is the key here. This will affect both programs that accept data from a keyboard or other data sources and the database where its stored. The good news is that most modern databases will have no problems with this.
The easiest way to deal with this is to only use syntactic validation against the specifications in the RFCs.
There are other ways of making sure the data entered is what the user meant, such as requiring entry of the field twice and doing a compare.
If you need to validate further, use a DNS lookup – that’s the most certain. Or if you’re going to use a local table, make sure that it’s from an authoritative source and that it’s updated at least daily.
The easiest way to deal with storage is to support UTF-8. But for applications that can’t, there is an algorithm (Punycode) that allows transformation of domain names between ASCII and non-ASCII strings. NB: the Punycode conversion may NOT work for the mailbox name in an email address. Alternative encoding schemes exist and should be applied.
There are times when two different representations of a domain name are not the same but are equivalent. For example, when a non-ASCII domain name has been converted using the Punycode algorithm. When processing or sorting, it’s important that they are treated as equivalent. This will require some policies for the application or indeed the organization as to how domain names and email addresses are being dealt with.
Because domain names in non-ASCII characters (and mailbox names too) are growing more popular, you’ll need to make sure that you’re able to display them in a way that works for your community. Public facing applications should certainly display in native scripts and not an ASCII resulting from a Punycode transformation.
A growing number of libraries, particularly Open Source Programming Language Libraries, will be correcting their validation routines, so being able to be UA Ready may be as simple as recompiling the code using the latest versions of the library. The UASG is encouraging remediation work in quite a number of libraries.
Github and SourceForge are also two good places to look to find working code.
The UASG also publishes some good reference material at www.uasg.tech/documents.
Most efforts to get applications UA Ready will fall into the ‘Bug Fix’ level of effort. It’s time to get applications up to scratch.
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4540 bytes
Desc: not available
More information about the UA-discuss