[UA-discuss] Regular Expression

Vittorio Bertola vittorio.bertola at open-xchange.com
Wed Sep 13 08:33:50 UTC 2017


> Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile <chaals at yandex.ru> ha scritto:
> 
> 
> On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander
> <don.hollander at icann.org> wrote:
> 
> I think there is value in validation - first, to determine whether an
> email address is real - if it isn't, you are probably better off getting a
> warning than trying to send it.

But this is nothing you can do just with a regexp. The regexp could allow you to intercept blatant mistakes - e.g., there are national keyboards where typing "@" requires pressing Alt or some uncommon combination of keys, so it's easy to mistype it and you can easily warn the user that their entered string does not have a "@" - but anything beyond that is unnecessary, because, if a user mistypes an email address in any other way, it's very likely that he will still end up entering a valid email address that no regexp will be able to tell as non-existing, or that could even exist but belong to someone else.

On the other hand, if you try to implement a complex regexp, and especially if you try to figure it out on your own, it's almost certain that you will mark as invalid several valid email addresses that are corner cases but should be accepted, as well as many future developments of the standards which are invalid now but will be valid in the future.

> Second, I find it very helpful, including as a protection against phishing
> emails, to be told if an email is not recognised as a contact to whom I
> have *sent* an email, which is a stricter validation check. Applications
> that do that for me - especially for scripts I don't read fluently like
> Chinese - are common, and I would be upset if they were to stop validating.

But this, again, is a validation that cannot be done via a regexp (can you write a regexp representing your entire contact book?) and that, on the other hand, poses an additional stricter condition than just "the email address is valid". We are just discussing how to check that the email address is syntactically valid, any other checks could still be implemented however appropriate.

The point here is that you should not try to determine whether an email address is valid by checking its syntax, other than checking that it has a "@" and possibly a "." on the right of it (but even this latter condition is already too strict, as the ideographic full stop "。" should be accepted in place of the ASCII dot, if you check strings in IDN form). Anything beyond that is going to exclude some valid addresses while not increasing in any significant way your chances of intercepting user input error at this stage - and you will still intercept any user error a few seconds later, when you send the validation/confirmation message.

Regards,
-- 

Vittorio Bertola | Research & Innovation Engineer
vittorio.bertola at open-xchange.com 
Open-Xchange Srl - Office @ Via Treviso 12, 10144 Torino, Italy


More information about the UA-discuss mailing list