[UA-discuss] Regular Expression

Thu Sep 14 17:41:59 UTC 2017

On 9/14/2017 10:27 AM, Rubens Kuhl wrote:
> The BiDi issue suggests to me that even enforcing the non-dotless rule is too much for a simple regex, as shabaka.example at don is a valid Arabic EAI , while the same ASCII combination is not valid even if a .don TLD gets delegated.
> [non-empty]@[non-empty] looks better to me.

Isn't the bidi limited to the display side, that is, in back storage 
there should not be alternative ordering of host and local parts?

A./
>
>
> Rubens
>
>
>
>
>
>
>
>
>> Em 14 de set de 2017, à(s) 13:58:000, Don Hollander <don.hollander at icann.org> escreveu:
>>
>> Thanks Jim.
>>
>> The BiDi issue, with raw data input, is which side has the domain side.
>>
>> usually you’ll encounter mailbox at domainname.tld
>>
>> But in Arabic or Hebrew you’ll encounter tld.domainname at mailbox
>>
>> Don
>>
>>
>>> On 15/09/2017, at 3:44 AM, Jim Hague <jim at sinodun.com> wrote:
>>>
>>> On 12/09/2017 19:44, Don Hollander wrote:
>>>> One RegEx has stood out as being simple and correct.   I’d like the UASG
>>>> to consider recommending this in our documentation.   Toward that end,
>>>> this thread is for discussion.
>>>>
>>>> /^.+@(?:[^.]+\.)+(?:[^.]{2,})$
>>>>
>>>> Regular expression check in Javascript. This accepts any Unicode
>>>> characters, only insisting that the domain must have more than one label
>>>> and the TLD is 2 characters or longer.
>>> Note that this in the context of an in-browser check. I only examined a
>>> small random subset of the sites surveyed in the main evaluation, and
>>> obviously without access to server code could only examine client-side
>>> operations. In all the sites I examined, the only check performed was
>>> against one (or in one case two) regular expression(s). No decomposition
>>> of the email address was attempted, and certainly no translation of the
>>> domain to Punycode.
>>>
>>> It was in that context that I highlighted the above regex, on the basis
>>> that it's probably the only sensible option to suggest to organisations
>>> as a low-impact UA improvement (I won't say fix) at the moment. If a
>>> future evaluation exercise verifies that an existing Javascript module
>>> does the right thing, that would be a better alternative, but that would
>>> involve more substantial modifications to site code.
>>>
>>> I agree that modifying it to allow 1 character TLDs would be sensible.
>>>
>>> I also agree with the page referenced at the start of the thread (which
>>> I read before working on the report) that just checking for '@' is about
>>> all one should attempt, certainly client-side.
>>>
>>> Turning again to the above regex, of course, being a proposed regex for
>>> validating email addresses, it's got an obvious deficiency. It needs to
>>> add support for other label separators (e.g. open dot).
>>>
>>> Mark Svancarek raised the excellent point of bidi in the domain.
>>> Personally I'm not confident I understand the bidi rules. But if the
>>> regex requires at least one label separator character in the domain and
>>> non-empty labels, will that work, given that if the regex allows 1
>>> character TLDs then a valid TLD is simply a non-empty label?
>>> -- 
>>> Jim Hague - jim at sinodun.com          Never trust a computer you can't lift.
>> Don Hollander
>> Universal Acceptance Steering Group
>> Skype: don_hollander
>>
>>
>>
>