[UA-discuss] Re : Re: Regular Expression

Mark Svancarek marksv at microsoft.com
Wed Sep 13 23:38:27 UTC 2017


I still feel this is too complex for the average web developer.  I would just look for <anytext> + @ + <anytext>, and send a test email.  Looking for dots in the domain part requires you to understand which is the domain part, which requires you to understand bidi rules.

Sending a test email pushes the complexity to your email program, of course, and non-UA behavior in the email ecosystem remains.  But it’s easier for UASG to measure and inform the relatively small set of email ecosystem players than it is to inform every website developer that uses regexes, let alone to influence them to consistently change.

Does that make sense?

From: ua-discuss-bounces at icann.org [mailto:ua-discuss-bounces at icann.org] On Behalf Of Asmus Freytag
Sent: Wednesday, September 13, 2017 3:17 PM
To: ua-discuss at icann.org
Subject: Re: [UA-discuss] Re : Re: Regular Expression

On 9/13/2017 10:32 AM, Dr. AJAY D A T A wrote:
This is what Microsoft suggests for EAI Validation.

https://blogs.msdn.microsoft.com/shawnste/2014/04/01/eai-email-address-internationalization-address-validation/<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblogs.msdn.microsoft.com%2Fshawnste%2F2014%2F04%2F01%2Feai-email-address-internationalization-address-validation%2F&data=02%7C01%7Cmarksv%40microsoft.com%7Cb542ae9cd6564c12531208d4faf5266a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636409378204623021&sdata=TbjHDjr1MUow%2Bj2z392Eh4V3RPZMTvoKApAakK55e1s%3D&reserved=0>

"^([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~\u00A0-\uD7FF\uE000-\uFFFF-]|([\uD800-\uDBFF][\uDC00\uDFFF]))+$"

This would allow most of the ASCII range and all of UTF-16 beyond ASCII.

It would have been cleaner/clearer to express the reverse, that is, all code points not allowed, such as {@, ", controls, Space, NBSP, etc).

The blog post suggests splitting the address at the @ and separately using the validation regex on the localpart and translation to punycode for the host (using a validating converter).

A./




Dr. Ajay DATA  | Founder & CEO
Get email id like अजय@डाटा.भारत<mailto:अजय@डाटा.भारत> in your own language,
visit www.xgenplus.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.xgenplus.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7Cb542ae9cd6564c12531208d4faf5266a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636409378204623021&sdata=3zd14Z8%2FLa0hJEERX6i6TRTIB5hM6FynLBu8Tz7EKYY%3D&reserved=0>

________________________________
From: Don Hollander <don.hollander at icann.org><mailto:don.hollander at icann.org>  MailId : [73397993]
To: Mark Svancarek <marksv at microsoft.com><mailto:marksv at microsoft.com>
Cc: Universal Acceptance <ua-discuss at icann.org><mailto:ua-discuss at icann.org>
Subject: Re: [UA-discuss] Regular Expression
Date: 13 Sep 2017 10:37:07 PM

Mark.

What would such a RegEx look like?

D

>  On 14/09/2017, at 4:26 AM, Mark Svancarek via UA-discuss <ua-discuss at icann.org><mailto:ua-discuss at icann.org> wrote:
>
>  Depending on your mail client, you may have experienced a linkification error in my response. Weird.
>
>  -----Original Message-----
>  From: Mark Svancarek
>  Sent: Wednesday, September 13, 2017 9:21 AM
>  To: `Vittorio Bertola` <vittorio.bertola at open-xchange.com><mailto:vittorio.bertola at open-xchange.com>; Chaals McCathie Nevile <chaals at yandex.ru><mailto:chaals at yandex.ru>; ua-discuss at icann.org<mailto:ua-discuss at icann.org>
>  Subject: RE: [UA-discuss] Regular Expression
>
>  I believe that validation should be as light as possible. "Contains `@` " is about the extent of it unless you are willing to look at bidi and IFS. Just capture the string and send a test message.
>
>  -----Original Message-----
>  From: ua-discuss-bounces at icann.org<mailto:ua-discuss-bounces at icann.org> [mailto:ua-discuss-bounces at icann.org] On Behalf Of Vittorio Bertola
>  Sent: Wednesday, September 13, 2017 1:34 AM
>  To: Chaals McCathie Nevile <chaals at yandex.ru><mailto:chaals at yandex.ru>; ua-discuss at icann.org<mailto:ua-discuss at icann.org>
>  Subject: Re: [UA-discuss] Regular Expression
>
>> Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile <chaals at yandex.ru><mailto:chaals at yandex.ru> ha scritto:
>>
>>
>> On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander
>> <don.hollander at icann.org><mailto:don.hollander at icann.org> wrote:
>>
>> I think there is value in validation - first, to determine whether an
>> email address is real - if it isn`t, you are probably better off
>> getting a warning than trying to send it.
>
>  But this is nothing you can do just with a regexp. The regexp could allow you to intercept blatant mistakes - e.g., there are national keyboards where typing "@" requires pressing Alt or some uncommon combination of keys, so it`s easy to mistype it and you can easily warn the user that their entered string does not have a "@" - but anything beyond that is unnecessary, because, if a user mistypes an email address in any other way, it`s very likely that he will still end up entering a valid email address that no regexp will be able to tell as non-existing, or that could even exist but belong to someone else.
>
>  On the other hand, if you try to implement a complex regexp, and especially if you try to figure it out on your own, it`s almost certain that you will mark as invalid several valid email addresses that are corner cases but should be accepted, as well as many future developments of the standards which are invalid now but will be valid in the future.
>
>> Second, I find it very helpful, including as a protection against
>> phishing emails, to be told if an email is not recognised as a contact
>> to whom I have *sent* an email, which is a stricter validation check.
>> Applications that do that for me - especially for scripts I don`t read
>> fluently like Chinese - are common, and I would be upset if they were to stop validating.
>
>  But this, again, is a validation that cannot be done via a regexp (can you write a regexp representing your entire contact book?) and that, on the other hand, poses an additional stricter condition than just "the email address is valid". We are just discussing how to check that the email address is syntactically valid, any other checks could still be implemented however appropriate.
>
>  The point here is that you should not try to determine whether an email address is valid by checking its syntax, other than checking that it has a "@" and possibly a "." on the right of it (but even this latter condition is already too strict, as the ideographic full stop "。" should be accepted in place of the ASCII dot, if you check strings in IDN form). Anything beyond that is going to exclude some valid addresses while not increasing in any significant way your chances of intercepting user input error at this stage - and you will still intercept any user error a few seconds later, when you send the validation/confirmation message.
>
>  Regards,
>  --
>
>  Vittorio Bertola | Research & Innovation Engineer vittorio.bertola at open-xchange.com<mailto:vittorio.bertola at open-xchange.com> Open-Xchange Srl - Office @ Via Treviso 12, 10144 Torino, Italy

Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander




Do not Remove:
[HID]20170913223703930[-HID][https://data.in/XGenPlusMessageID:15053239640295291a-#RCPT#.jpg] [http://dlr.tbms.in:8077/XET9956:201709.jpg]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170913/13b5d7d8/attachment.html>


More information about the UA-discuss mailing list