[UA-discuss] Regular Expression

Jordyn Buchanan jordyn at google.com
Thu Sep 14 17:38:54 UTC 2017


Also worth remembering that "works according to the universe at the moment
the RegExp was written" is how we got into a lot of today's UA mess in the
first place.  Just because dotless domains or some other rule is in place
today, I'd want to avoid encoding them into a regexp that we tell people to
use since the rules may change again and I don't want to have another group
following along in our wake 10 years from now trying to undo the code that
we told everyone to write.

Jordyn

On Thu, Sep 14, 2017 at 1:27 PM, Rubens Kuhl <rubensk at nic.br> wrote:

>
> The BiDi issue suggests to me that even enforcing the non-dotless rule is
> too much for a simple regex, as shabaka.example at don is a valid Arabic EAI
> , while the same ASCII combination is not valid even if a .don TLD gets
> delegated.
> [non-empty]@[non-empty] looks better to me.
>
>
> Rubens
>
>
>
>
>
>
>
>
> > Em 14 de set de 2017, à(s) 13:58:000, Don Hollander <
> don.hollander at icann.org> escreveu:
> >
> > Thanks Jim.
> >
> > The BiDi issue, with raw data input, is which side has the domain side.
> >
> > usually you’ll encounter mailbox at domainname.tld
> >
> > But in Arabic or Hebrew you’ll encounter tld.domainname at mailbox
> >
> > Don
> >
> >
> >> On 15/09/2017, at 3:44 AM, Jim Hague <jim at sinodun.com> wrote:
> >>
> >> On 12/09/2017 19:44, Don Hollander wrote:
> >>> One RegEx has stood out as being simple and correct.   I’d like the
> UASG
> >>> to consider recommending this in our documentation.   Toward that end,
> >>> this thread is for discussion.
> >>>
> >>> /^.+@(?:[^.]+\.)+(?:[^.]{2,})$
> >>>
> >>> Regular expression check in Javascript. This accepts any Unicode
> >>> characters, only insisting that the domain must have more than one
> label
> >>> and the TLD is 2 characters or longer.
> >>
> >> Note that this in the context of an in-browser check. I only examined a
> >> small random subset of the sites surveyed in the main evaluation, and
> >> obviously without access to server code could only examine client-side
> >> operations. In all the sites I examined, the only check performed was
> >> against one (or in one case two) regular expression(s). No decomposition
> >> of the email address was attempted, and certainly no translation of the
> >> domain to Punycode.
> >>
> >> It was in that context that I highlighted the above regex, on the basis
> >> that it's probably the only sensible option to suggest to organisations
> >> as a low-impact UA improvement (I won't say fix) at the moment. If a
> >> future evaluation exercise verifies that an existing Javascript module
> >> does the right thing, that would be a better alternative, but that would
> >> involve more substantial modifications to site code.
> >>
> >> I agree that modifying it to allow 1 character TLDs would be sensible.
> >>
> >> I also agree with the page referenced at the start of the thread (which
> >> I read before working on the report) that just checking for '@' is about
> >> all one should attempt, certainly client-side.
> >>
> >> Turning again to the above regex, of course, being a proposed regex for
> >> validating email addresses, it's got an obvious deficiency. It needs to
> >> add support for other label separators (e.g. open dot).
> >>
> >> Mark Svancarek raised the excellent point of bidi in the domain.
> >> Personally I'm not confident I understand the bidi rules. But if the
> >> regex requires at least one label separator character in the domain and
> >> non-empty labels, will that work, given that if the regex allows 1
> >> character TLDs then a valid TLD is simply a non-empty label?
> >> --
> >> Jim Hague - jim at sinodun.com          Never trust a computer you can't
> lift.
> >
> > Don Hollander
> > Universal Acceptance Steering Group
> > Skype: don_hollander
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170914/486253c1/attachment.html>


More information about the UA-discuss mailing list