[UA-discuss] Re : Re: Regular Expression

Asmus Freytag asmusf at ix.netcom.com
Wed Sep 13 22:16:55 UTC 2017


On 9/13/2017 10:32 AM, Dr. AJAY D A T A wrote:
> This is what Microsoft suggests for EAI Validation.
>
> https://blogs.msdn.microsoft.com/shawnste/2014/04/01/eai-email-address-internationalization-address-validation/
"^([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~\u00A0-\uD7FF\uE000-\uFFFF-]|([\uD800-\uDBFF][\uDC00\uDFFF]))+$"

This would allow most of the ASCII range and all of UTF-16 beyond ASCII.

It would have been cleaner/clearer to express the reverse, that is, all 
code points not allowed, such as {@, ", controls, Space, NBSP, etc).

The blog post suggests splitting the address at the @ and separately 
using the validation regex on the localpart and translation to punycode 
for the host (using a validating converter).

A./

> *Dr. Ajay DATA* *| Founder & CEO *
> Get email id like *अजय@डाटा.भारत* in your own language,
> visit www.xgenplus.com <http://www.xgenplus.com/>
> ------------------------------------------------------------------------
> *From:* Don Hollander <don.hollander at icann.org> MailId : [73397993]
> *To:* Mark Svancarek <marksv at microsoft.com>
> *Cc:* Universal Acceptance <ua-discuss at icann.org>
> *Subject: *Re: [UA-discuss] Regular Expression
> *Date:* 13 Sep 2017 10:37:07 PM
>
> Mark.
>
> What would such a RegEx look like?
>
> D
>
> >  On 14/09/2017, at 4:26 AM, Mark Svancarek via UA-discuss 
> <ua-discuss at icann.org> wrote:
> >
> >  Depending on your mail client, you may have experienced a 
> linkification error in my response. Weird.
> >
> >  -----Original Message-----
> >  From: Mark Svancarek
> >  Sent: Wednesday, September 13, 2017 9:21 AM
> >  To: `Vittorio Bertola` <vittorio.bertola at open-xchange.com>; Chaals 
> McCathie Nevile <chaals at yandex.ru>; ua-discuss at icann.org
> >  Subject: RE: [UA-discuss] Regular Expression
> >
> >  I believe that validation should be as light as possible. "Contains 
> `@` " is about the extent of it unless you are willing to look at bidi 
> and IFS. Just capture the string and send a test message.
> >
> >  -----Original Message-----
> >  From: ua-discuss-bounces at icann.org 
> [mailto:ua-discuss-bounces at icann.org] On Behalf Of Vittorio Bertola
> >  Sent: Wednesday, September 13, 2017 1:34 AM
> >  To: Chaals McCathie Nevile <chaals at yandex.ru>; ua-discuss at icann.org
> >  Subject: Re: [UA-discuss] Regular Expression
> >
> >> Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile 
> <chaals at yandex.ru> ha scritto:
> >>
> >>
> >> On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander
> >> <don.hollander at icann.org> wrote:
> >>
> >> I think there is value in validation - first, to determine whether an
> >> email address is real - if it isn`t, you are probably better off
> >> getting a warning than trying to send it.
> >
> >  But this is nothing you can do just with a regexp. The regexp could 
> allow you to intercept blatant mistakes - e.g., there are national 
> keyboards where typing "@" requires pressing Alt or some uncommon 
> combination of keys, so it`s easy to mistype it and you can easily 
> warn the user that their entered string does not have a "@" - but 
> anything beyond that is unnecessary, because, if a user mistypes an 
> email address in any other way, it`s very likely that he will still 
> end up entering a valid email address that no regexp will be able to 
> tell as non-existing, or that could even exist but belong to someone else.
> >
> >  On the other hand, if you try to implement a complex regexp, and 
> especially if you try to figure it out on your own, it`s almost 
> certain that you will mark as invalid several valid email addresses 
> that are corner cases but should be accepted, as well as many future 
> developments of the standards which are invalid now but will be valid 
> in the future.
> >
> >> Second, I find it very helpful, including as a protection against
> >> phishing emails, to be told if an email is not recognised as a contact
> >> to whom I have *sent* an email, which is a stricter validation check.
> >> Applications that do that for me - especially for scripts I don`t read
> >> fluently like Chinese - are common, and I would be upset if they 
> were to stop validating.
> >
> >  But this, again, is a validation that cannot be done via a regexp 
> (can you write a regexp representing your entire contact book?) and 
> that, on the other hand, poses an additional stricter condition than 
> just "the email address is valid". We are just discussing how to check 
> that the email address is syntactically valid, any other checks could 
> still be implemented however appropriate.
> >
> >  The point here is that you should not try to determine whether an 
> email address is valid by checking its syntax, other than checking 
> that it has a "@" and possibly a "." on the right of it (but even this 
> latter condition is already too strict, as the ideographic full stop 
> "。" should be accepted in place of the ASCII dot, if you check strings 
> in IDN form). Anything beyond that is going to exclude some valid 
> addresses while not increasing in any significant way your chances of 
> intercepting user input error at this stage - and you will still 
> intercept any user error a few seconds later, when you send the 
> validation/confirmation message.
> >
> >  Regards,
> >  --
> >
> >  Vittorio Bertola | Research & Innovation Engineer 
> vittorio.bertola at open-xchange.com Open-Xchange Srl - Office @ Via 
> Treviso 12, 10144 Torino, Italy
>
> Don Hollander
> Universal Acceptance Steering Group
> Skype: don_hollander
>
>
>
>
> Do not Remove:
> [HID]20170913223703930[-HID] 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170913/afd79eb8/attachment.html>


More information about the UA-discuss mailing list