[UA-discuss] Regular Expression

Andre Schappo A.Schappo at lboro.ac.uk
Fri Sep 15 10:26:30 UTC 2017


Yes indeed. Whatever directionality the text is affects the display ordering and not the memory ordering. So memory order will always be (or should be according to current best practice) mailbox at domainname.tld<mailto:mailbox at domainname.tld> and so one processes/validates as usual without regard for the display order. That is the current best practice.

When presenting to the user than one uses display order. Which reminds me of a blog article I wrote some time ago because with appropriate html/css one can determine how it is displayed😀 http://schappo.blogspot.co.uk/2016/10/computer-science-internationalization.html Oh ...and... http://schappo.blogspot.co.uk/2016/03/arabic-email-addresses.html

André Schappo

On 14 Sep 2017, at 18:41, Asmus Freytag <asmusf at ix.netcom.com<mailto:asmusf at ix.netcom.com>> wrote:

On 9/14/2017 10:27 AM, Rubens Kuhl wrote:
The BiDi issue suggests to me that even enforcing the non-dotless rule is too much for a simple regex, as shabaka.example at don is a valid Arabic EAI , while the same ASCII combination is not valid even if a .don TLD gets delegated.
[non-empty]@[non-empty] looks better to me.

Isn't the bidi limited to the display side, that is, in back storage there should not be alternative ordering of host and local parts?

A./


Rubens








Em 14 de set de 2017, à(s) 13:58:000, Don Hollander <don.hollander at icann.org<mailto:don.hollander at icann.org>> escreveu:

Thanks Jim.

The BiDi issue, with raw data input, is which side has the domain side.

usually you’ll encounter mailbox at domainname.tld<mailto:mailbox at domainname.tld>

But in Arabic or Hebrew you’ll encounter tld.domainname at mailbox

Don


On 15/09/2017, at 3:44 AM, Jim Hague <jim at sinodun.com<mailto:jim at sinodun.com>> wrote:

On 12/09/2017 19:44, Don Hollander wrote:
One RegEx has stood out as being simple and correct.   I’d like the UASG
to consider recommending this in our documentation.   Toward that end,
this thread is for discussion.

/^.+@(?:[^.]+\.)+(?:[^.]{2,})$

Regular expression check in Javascript. This accepts any Unicode
characters, only insisting that the domain must have more than one label
and the TLD is 2 characters or longer.
Note that this in the context of an in-browser check. I only examined a
small random subset of the sites surveyed in the main evaluation, and
obviously without access to server code could only examine client-side
operations. In all the sites I examined, the only check performed was
against one (or in one case two) regular expression(s). No decomposition
of the email address was attempted, and certainly no translation of the
domain to Punycode.

It was in that context that I highlighted the above regex, on the basis
that it's probably the only sensible option to suggest to organisations
as a low-impact UA improvement (I won't say fix) at the moment. If a
future evaluation exercise verifies that an existing Javascript module
does the right thing, that would be a better alternative, but that would
involve more substantial modifications to site code.

I agree that modifying it to allow 1 character TLDs would be sensible.

I also agree with the page referenced at the start of the thread (which
I read before working on the report) that just checking for '@' is about
all one should attempt, certainly client-side.

Turning again to the above regex, of course, being a proposed regex for
validating email addresses, it's got an obvious deficiency. It needs to
add support for other label separators (e.g. open dot).

Mark Svancarek raised the excellent point of bidi in the domain.
Personally I'm not confident I understand the bidi rules. But if the
regex requires at least one label separator character in the domain and
non-empty labels, will that work, given that if the regex allows 1
character TLDs then a valid TLD is simply a non-empty label?
--
Jim Hague - jim at sinodun.com<mailto:jim at sinodun.com>          Never trust a computer you can't lift.
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20170915/b78993dd/attachment.html>


More information about the UA-discuss mailing list