[UA-discuss] Regular Expression To Validate EAI Addresses

Marc Blanchet marc.blanchet at viagenie.ca
Fri Aug 14 16:08:12 UTC 2020


On 14 Aug 2020, at 11:51, Abdelmeniem Tharwat wrote:

> Could we have an examples so I could fix that !!

for IDNA, find any codepoint which does not have a Unicode property 
Letter but is PVALID from IDNA. Digits come to mind immediately \p{N}, 
but that is not the end of the story: many PVALID codepoints do not have 
the Unicode property L or N. Again, see RFC5892 and IANA IDNA registry. 
As I wrote and discussed in the Java tutorial for UA, if you fully want 
to correctly handle IDNA in a regex, you end up coding the full IDNA 
rules into Regex, which is, well, if not impossible, very very very 
complicated, and not worth the work.

for EAI, then find any codepoint does not have a Unicode property 
Letter, and it won’t work with the regex below.

The danger here is again promoting a regex which « kinda work but not 
quite » and that it becomes the « standard » everybody uses, and 
then one essentially create a fork of the RFCs with more limitations 
from an implementation point of view.

One may argue that we should have done IDNA2008 based on the fact that 
it could be implemented in a regex, but that did not happen.

The best way would be to modify the regex engine itself to embed the 
IDNA protocol inside it and then define a new regex token for IDNA and 
then we will be in business… Not an easy task.

Regards, Marc.


>
> Sent from my iPhone
>
>> On Aug 14, 2020, at 5:49 PM, Marc Blanchet 
>> <marc.blanchet at viagenie.ca> wrote:
>>
>> On 14 Aug 2020, at 11:32, Abdelmeniem Tharwat wrote:
>>
>>> Dear colleagues,
>>>                I hope that you are doing well, since along time I 
>>> tried to use regex to validate EAI addresses for many project I have 
>>> related to UA, I used the tool here<https://rubular.com/> and used 
>>> this
>>> Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI 
>>> addresses and it works well like the below screenshot.
>>
>> {L} is for Unicode property Letter. So:
>> - for IDNA, it is near (as IDNA base is Unicode Letter property) but 
>> not quite. see RFC5892
>> - for EAI, then it is restricting a lot since the mailbox can be 
>> almost any UTF8 string. see RFC6531
>>
>> So you may want to use that regex, but be aware of its side-effects, 
>> including not accepting some domains and mailboxes.
>>
>> Finally, not all regex engines support Unicode properties, so make 
>> sure the one used support it.
>>
>> Regards, Marc.
>>
>>>
>>> [cid:image010.png at 01D67260.F5721690]
>>>
>>>
>>> Thanks a lot.
>>>
>>> All the Best,
>>> Abdalmonem Tharwat Galila
>>> Deputy Manager, Dot Masr Registry,
>>> Operation Sector.
>>>
>>> [NTRA Logo 2016]
>>> National Telecommunication Regulatory Authority
>>> [Description: Description: Description: Description: Description: 
>>> Description: Description: Description: Description: Description: 
>>> 1365523405_telephone]   Office Tel.: +2 02 
>>> 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300>
>>> [Description: Description: Description: Description: Description: 
>>> Description: Description: Description: Description: Description: 
>>> Mobile]    Mobile:  +2 010 00049068<tel:010%2000049068>
>>> [Description: Description: Description: Description: Description: 
>>> Description: Description: Description: Description: Description: 
>>> ICON]   Fax       :  +2 02 35370537<tel:02%2035370537>
>>> [Description: Description: Description: Description: Description: 
>>> Description: Description: Description: Description: Description: 
>>> oNLINE]  Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/>
>>>                        : 
>>> http:\\www.tra.gov.eg<http://www.mcit.gov.eg/>
>>> [Description: Description: Description: Description: Description: 
>>> Description: Description: Description: Description: Description: 
>>> 1365523294_email]   E-mail     : 
>>> agalila at mcit.gov.eg<mailto:agalila at mcit.gov.eg>
>>>                        : 
>>> atharwat at tra.gov.eg<mailto:atharwat at tra.gov.eg>
>>> [Description: 1447802547_skype]  Skype      : abdalmonem.galila
>>> [Description: static_qr_code_without_logo]
>>> [Description: Description: Description: Description: Description: 
>>> Description: Description: Description: Description: Description: 
>>> 1365523469_error]DISCLAIMER
>>>          This e-mail and any files transmitted with it are 
>>> confidential and intended solely for the use of the individual or  
>>> entity to which they are addressed. If you have received this email 
>>> in error please notify your system support manager. Please note that 
>>> any views or opinions presented in this email are solely those of 
>>> the author and do not necessarily represent those of the National 
>>> Telecom Regulatory Authority (NTRA) .  Finally, the recipient should 
>>> check this email and any attachments for the presence of viruses. 
>>> The NTRA accepts no liability for any damage caused by any virus 
>>> transmitted by this email.
>>
>>
>>> _______________________________________________
>>> UA-discuss mailing list
>>> UA-discuss at icann.org
>>> https://mm.icann.org/mailman/listinfo/ua-discuss
>>> _______________________________________________
>>> By submitting your personal data, you consent to the processing of 
>>> your personal data for purposes of subscribing to this mailing list 
>>> accordance with the ICANN Privacy Policy 
>>> (https://www.icann.org/privacy/policy) and the website Terms of 
>>> Service (https://www.icann.org/privacy/tos). You can visit the 
>>> Mailman link above to change your membership status or 
>>> configuration, including unsubscribing, setting digest-style 
>>> delivery or disabling delivery altogether (e.g., for a vacation), 
>>> and so on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20200814/8b17783e/attachment.html>


More information about the UA-discuss mailing list