[UA-discuss] Regular Expression

Chaals McCathie Nevile chaals at yandex.ru
Tue Sep 12 22:01:07 UTC 2017

On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander
<don.hollander at icann.org> wrote:

> Thanks Rubens.  Which raises the question as to when the validation  
> takes place.  Before or after a >punycode transformation.

I would generally like validation to take place after punycode conversion.
First because there are strings that match the regex bu not punycode
constraints. Likewise I agree with Rubens that assuming TLDs are not
domains and email must go to a subdomain seems less than prescient with

> And David, thanks for the article.   The UASG has long advocated turning  
> validation off - but very >few active practitioners seem willing think  
> outside that box.

I'm not entirely convinced by that approach either.

I think there is value in validation - first, to determine whether an
email address is real - if it isn't, you are probably better off getting a
warning than trying to send it.

Second, I find it very helpful, including as a protection against phishing
emails, to be told if an email is not recognised as a contact to whom I
have *sent* an email, which is a stricter validation check. Applications
that do that for me - especially for scripts I don't read fluently like
Chinese - are common, and I would be upset if they were to stop validating.

On the other hand, incorrect validation, e.g. of an address in a form, with
no punycode conversion run first and no reason not to accept an
internationalised email is clearly a bad idea - largely since it fails to
actually validate whether something is a valid email address.

A given application or toolchain may be incapable of handling some valid
email addresses, but I think a campaign to convince developers to produce
a statement like "this application is second-rate and obsolete" would face
significant challenges. Whether it is worth pushing for such applications  
to state that they do not yet support appropriate standards may be worth  



> D
>> On 13/09/2017, at 8:31 AM, Rubens Kuhl <rubensk at nic.br> wrote:
>>> On Sep 12, 2017, at 3:44 PM, Don Hollander <don.hollander at icann.org>  
>>> wrote:
>>> Please note that this is a Geeky post - so carry on if that’s not you.
>>> Email validation is an area where many websites fall short as we found  
>>> in our study on Website UA >>>Readiness (nearing publication)
>>> The technologies behind these websites generally use a Regular  
>>> Expression as their first line of >>>defence against rubbish data.    
>>> The issue is that most of these RegExs are overly restrictive.
>>> As an appendix to the Website review, we looked at some of the  
>>> technologies behind the websites to >>>see if there were common  
>>> denominators for good and bad experiences.
>>> One RegEx has stood out as being simple and correct.   I’d like the  
>>> UASG to consider recommending >>>this in our documentation.   Toward  
>>> that end, this thread is for discussion.
>>> /^.+@(?:[^.]+\.)+(?:[^.]{2,})$
>>> Regular expression check in Javascript. This accepts any Unicode  
>>> characters, only insisting that >>>the domain must have more than one  
>>> label and the TLD is 2 characters or longer.
>>> Your thoughts?
>> Single IDN TLDs for some scripts is something being considered for  
>> subsequent procedures, so I >>would think of 1 or more and prevent the  
>> same UA challenges previous rounds TLDs are suffering.
>> Rubens
> Don Hollander
> Universal Acceptance Steering Group
> Skype: don_hollander

Chaals is Charles McCathie Nevile
find more at http://yandex.com

More information about the UA-discuss mailing list