[UA-discuss] Regular Expression

Mark Svancarek marksv at microsoft.com
Wed Sep 13 16:20:43 UTC 2017


I believe that validation should be as light as possible.  "Contains '@' " is about the extent of it unless you are willing to look at bidi and IFS.  Just capture the string and send a test message.

-----Original Message-----
From: ua-discuss-bounces at icann.org [mailto:ua-discuss-bounces at icann.org] On Behalf Of Vittorio Bertola
Sent: Wednesday, September 13, 2017 1:34 AM
To: Chaals McCathie Nevile <chaals at yandex.ru>; ua-discuss at icann.org
Subject: Re: [UA-discuss] Regular Expression

> Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile <chaals at yandex.ru> ha scritto:
> 
> 
> On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander 
> <don.hollander at icann.org> wrote:
> 
> I think there is value in validation - first, to determine whether an 
> email address is real - if it isn't, you are probably better off 
> getting a warning than trying to send it.

But this is nothing you can do just with a regexp. The regexp could allow you to intercept blatant mistakes - e.g., there are national keyboards where typing "@" requires pressing Alt or some uncommon combination of keys, so it's easy to mistype it and you can easily warn the user that their entered string does not have a "@" - but anything beyond that is unnecessary, because, if a user mistypes an email address in any other way, it's very likely that he will still end up entering a valid email address that no regexp will be able to tell as non-existing, or that could even exist but belong to someone else.

On the other hand, if you try to implement a complex regexp, and especially if you try to figure it out on your own, it's almost certain that you will mark as invalid several valid email addresses that are corner cases but should be accepted, as well as many future developments of the standards which are invalid now but will be valid in the future.

> Second, I find it very helpful, including as a protection against 
> phishing emails, to be told if an email is not recognised as a contact 
> to whom I have *sent* an email, which is a stricter validation check. 
> Applications that do that for me - especially for scripts I don't read 
> fluently like Chinese - are common, and I would be upset if they were to stop validating.

But this, again, is a validation that cannot be done via a regexp (can you write a regexp representing your entire contact book?) and that, on the other hand, poses an additional stricter condition than just "the email address is valid". We are just discussing how to check that the email address is syntactically valid, any other checks could still be implemented however appropriate.

The point here is that you should not try to determine whether an email address is valid by checking its syntax, other than checking that it has a "@" and possibly a "." on the right of it (but even this latter condition is already too strict, as the ideographic full stop "。" should be accepted in place of the ASCII dot, if you check strings in IDN form). Anything beyond that is going to exclude some valid addresses while not increasing in any significant way your chances of intercepting user input error at this stage - and you will still intercept any user error a few seconds later, when you send the validation/confirmation message.

Regards,
-- 

Vittorio Bertola | Research & Innovation Engineer vittorio.bertola at open-xchange.com Open-Xchange Srl - Office @ Via Treviso 12, 10144 Torino, Italy


More information about the UA-discuss mailing list