[UA-discuss] Regular Expression

Wed Sep 13 15:16:05 UTC 2017

... so, regex advice could be given as something like: "When there are suitable constraints, as determined by the service provider or the user or a combination of both, email validation can be readily achieved using regex. Suitable constraints include restriction to a single language/script."

André Schappo

PS. Just thinking about a web form where a user enters name, postal address, telephone number, email address ...etc... Email address validation could be performed in conjunction with the user. The form could interact with the user to determine the constraints, if any, for email validation. This could take the form of Q&A or presentation of radio buttons for choices ...etc... So, involve the user in the process.

On 13 Sep 2017, at 11:52, Andre Schappo <A.Schappo at lboro.ac.uk<mailto:A.Schappo at lboro.ac.uk>> wrote:

I have thought about this many times over the years and here are some of my thoughts -

A lot depends on when the validation is done. If it is done at registration time then I think it is possible to use regex to validate.

Take http://datamail.in<http://datamail.in/> http://电邮.在线<http://xn--wny099c.xn--3ds443g> http://डाटामेल.भारत<http://xn--c2bd4bq1db8d.xn--h2brj9c> The user is explicitly selecting a language/script and thus constraining the Unicode characters which can be used for the local part.. For a particular language/script the IDN is fixed for that language/script so that leaves just the local part/ mailbox name to validate. One could use a simple regex with a modern Unicode aware regex engine.

For a chinese local part then something like: \p{Han}+

or Hindi/Devanagari local part something like: \p{Devanagari}+

André Schappo

On 12 Sep 2017, at 19:44, Don Hollander <don.hollander at icann.org<mailto:don.hollander at icann.org>> wrote:

Please note that this is a Geeky post - so carry on if that’s not you.

Email validation is an area where many websites fall short as we found in our study on Website UA Readiness (nearing publication)

The technologies behind these websites generally use a Regular Expression as their first line of defence against rubbish data.   The issue is that most of these RegExs are overly restrictive.

As an appendix to the Website review, we looked at some of the technologies behind the websites to see if there were common denominators for good and bad experiences.

One RegEx has stood out as being simple and correct.   I’d like the UASG to consider recommending this in our documentation.   Toward that end, this thread is for discussion.

/^.+@(?:[^.]+\.)+(?:[^.]{2,})$

Regular expression check in Javascript. This accepts any Unicode characters, only insisting that the domain must have more than one label and the TLD is 2 characters or longer.

Your thoughts?

Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170913/11ac0ce3/attachment.html>