[UA-discuss] Mixing between RTL and LTR scripts

Edmon edmon at registry.asia
Thu May 10 10:48:30 UTC 2018


I think it should be a good idea:

========================================

Would it be useful if the UASG published a Good Practice guide to BiDi in Domain Names and Email Addresses?   (I think, Raed, that you’ve done the work)  We could then update our existing documents to reference it.   Or, if there’s someone else who’s got a good guide, we could reference that instead of building it afresh.

 

The UASG documents touch on BiDi, but don’t go into  any depth.

========================================

And perhaps updating the existing UASG docs where appropriate.

 

Edmon

 

 

 

From: UA-discuss [mailto:ua-discuss-bounces at icann.org] On Behalf Of Don Hollander
Sent: Thursday, May 10, 2018 6:44 PM
To: Raed AlFayez <rfayez at citc.gov.sa>; Roberto Gaetano <roberto_gaetano at hotmail.com>
Cc: Universal Acceptance <ua-discuss at icann.org>
Subject: Re: [UA-discuss] Mixing between RTL and LTR scripts

 

Raed, et al…

 

Thanks for the well documented discussion.  You have identified good practice.

 

Here are my thoughts:

 

1)      Only ICANN can regulate this issue at the top level.

2)      ICANN can only regulate this issue at the second level for new gTLDs

3)      Individual registries can regulate this issue at the second level (or 3rd of the provide direct registration at those levels)

4)      Just because something is confusing doesn’t mean someone won’t do it if there are no restrictions against it.  And as we see with emojis, even RFCs don’t preclude activities that are contrary to published RFCs.

 

I’m not sure where this should be documented.   In the Unicode Consortium?   I don’t think the IETF, but I may be wrong.  Does it fit within the W3C?

 

Perhaps the TF-AIDN?

 

Would it be useful if the UASG published a Good Practice guide to BiDi in Domain Names and Email Addresses?   (I think, Raed, that you’ve done the work)  We could then update our existing documents to reference it.   Or, if there’s someone else who’s got a good guide, we could reference that instead of building it afresh.

 

The UASG documents touch on BiDi, but don’t go into  any depth.

 

Thoughts, please, from this group.

 

Don

 

 

From: UA-discuss <ua-discuss-bounces at icann.org <mailto:ua-discuss-bounces at icann.org> > On Behalf Of Raed AlFayez
Sent: Thursday, 10 May 2018 8:47 PM
To: Roberto Gaetano <roberto_gaetano at hotmail.com <mailto:roberto_gaetano at hotmail.com> >
Cc: Universal Acceptance <ua-discuss at icann.org <mailto:ua-discuss at icann.org> >
Subject: Re: [UA-discuss] Mixing between RTL and LTR scripts

 

Dear Robert & All,

 

We believe mixing RTL with LTR labels/code-points in the domain and/or email (mailbox) will be confusable, not logical, not acceptable and not easeful to the Arabic user communities. Also it is not safe since it will confuse users and may be a playground for domain/email phishing. With an exceptional for digits (LTR) in Arabic label if the digits are in the middle or at the end of and RTL label.

 

We have reach out this conclusion after many studies on the user expectation and understanding of a label that was combined of RTL & LTR in domains and email address. See the results below.

 

 

Here is  a summary and conclusion of our findings of our studies on mixing RTL and LTR in domains and email address (mailbox):

I.	Mixing RTL and LTR within a label of a domain name or cross all the labels 

a.	The entire label(s) (as part of a domain name or cross the whole domain) should be formulated from a single script and a single direction (RTL or LTR) with the exception of digits (LTR) that can be in the middle or at the end of that label, i.e., no mixture of Arabic (RTL) and ASCII (LTR) code points within a domain name label or across all the domain labels. Thus, the following examples are not accepted: 

*	givennameEMANRUS (Raedالفايز)  = givenname+surname
*	EMANRUSgivenname (الفايزRaed)  = surname+givenname
*	EMANRUS.givenname (Raed.الفايز) = givenname.surname
*	123EMANRUS (123الفايز) = digits+surname 
*	tld.EMANNIAMOD (sa.رسيل) = domainname.tld
*	DLT.domainname (raseel.السعودية) = domainname.tld

 

II.	Mixing RTL and LTR within the user part of an email address (EAI) 

a.	It is the same as the previous point (mixing in domain labels), no mixing is allowed. Thus, the following examples are not accepted: 

*	givenname.EMANRUS (Raed.الفايز) = givenname.surname
*	EMANNEVIG.surname (رائد.alfayez)= givenname.surname

 

III.	Mixing RTL and LTR between domain and mailbox

a.	The entire domain name part (i.e. all labels, e.g., domainname.tld)  and the entire user part (the mailbox name, e.g. Givenname.Surname@ <mailto:Givenname.Surname@> ) should be formulated from a single script (with the exception of digits with a condition (that are LTR)), i.e., no mixture of Arabic (RTL) and ASCII (LTR) code points at all. 
b.	Thus, some of the following examples are clear and understandable by Arabic users while others are not:


User Direction

Domain Direction

Display format

Real example (image)

Clear to Arabic users?


LTR

LTR

givenname.surname at domainname.tld <mailto:givenname.surname at domainname.tld> 



Yes


RTL

RTL

DLT.EMANNIAMOD at EMANRUS.EMANNEVIG <mailto:DLT.EMANNIAMOD at EMANRUS.EMANNEVIG> 



Yes


RTL

LTR

EMANRUS.EMANNEVIG at domainname.tld <mailto:EMANRUS.EMANNEVIG at domainname.tld> 



No


LTR

RTL

DLT.EMANNIAMOD at givenname.surname <mailto:DLT.EMANNIAMOD at givenname.surname> 



No

Please note, the last two rows are not easy to deal with, to implement, or to differentiate between mailbox and domain parts from reader point of view

 

b.	It is desire that this rule is enforced at the protocol level (i.e., IDNA, EAI) or any other levels (e.g., OS, Applications ... etc. ). The rationale behind this rule is because the mixture will be confusable, not logical, not acceptable and not easeful to the Arabic user communities.

IV.	Display issues when having an RTL domain or email in LTR context (e.g. inserting an Arabic domain/email in an English article or vice versa):

a.	RTL text should remain intact all the time regardless of the context.

i.	the RTL mailbox part should be always as: EMANRUS.EMANNEVIG (example: رائد.الفايز)
ii.	the RTL domain name part should be always as: DLT.EMANNIAMOD (example: رسيل.السعودية)

b.	LTR text should remain intact all the time regardless of the context.

ii.	the LTR mailbox part should be always as: givenname.surname (example: raed.alfayez)
iii.	the LTR domain name part should be always as: domainname.tld (example: raseel.sa)

  

I hope I have provide some insight about the Arabic user expectations when we mix RTL and LTR in domain and EAI.

 

Raed

 

From: Roberto Gaetano [mailto:roberto_gaetano at hotmail.com] 
Sent: Wednesday, May 09, 2018 9:06 PM
To: Raed AlFayez
Cc: Andre Schappo; Universal Acceptance
Subject: Re: [UA-discuss] Mixing between RTL and LTR scripts

 

Thanks Raed. 

Just a further comment.

There are situations in which an email address could have mixed scripts. For instance, some ASCII TLDs allow IDN at the second level. This can bring email addresses like this one (sorry, but I do not have the skills for making a graphic representation):

 

<Arabic script user>@<Arabic script SLD>.<Latin script TLD>

 

or maybe even:

 

<Latin script user>@<Arabic script SLD>.<Latin script TLD>

 

Can you please describe what would happen in this case?

Thanks,

Roberto

 

 

On 09.05.2018, at 10:55, Raed AlFayez <rfayez at citc.gov.sa <mailto:rfayez at citc.gov.sa> > wrote:

 

Dear Andre & All,

 

Please allow me to resend my comments on the blog article that was shared by Andre in his last email:

 

 

 

Dear Andre,

 

With great interest and appreciation, I have read your posts in the UA-discuss mailing list as well as your recent  <http://schappo.blogspot.co.uk/2016/03/arabic-email-addresses.html> blog article on how to handle Arabic emails address in RTL and LTR context.

Your suggestion sounds good at first glance. However, it is confusing and puzzling when you look at it from the point of view of an ordinary Arabic speaking user. Hence, I have the following comments that I would like first to share with you before posting them to the mailing list. I will be using examples to illustrate my point of view. They are demonstrated from native Arabic speaking point view. 

(Please note that I will be using pictures for the texts so that they will not be ruined when transferred by the email system)

 

Consider my email address:

 <mailto:rfayez at citc.gov.sa> rfayez at citc.gov.sa

As a (normal) user I can easily make out the following (correct) assumptions regardless of the text direction:

1.     The user part is always to the left-side of the sign (@): rfayez

2.     The domain name is always to the right-side of the sign (@): citc.gov.sa

3.     A domain name is arranged in a well-defined label hierarchy where a TLD is always the rightmost label of the domain name: .sa

So I will use my email address as is ( <mailto:rfayez at citc.gov.sa> rfayez at citc.gov.sa)  without changing its direction or swapping between its parts. Consider the following examples where my email address is used in different text writing directions:

 

<image001.jpg>

 

As you can see, regardless of the text direction (LTR or RTL) the email address maintain its form  (i.e.,  <mailto:user at domain.TLD> user at domain.TLD). This allows the user easily construct and deconstruct email addresses correctly without confusing and mixing up its parts. For example, the following set of examples:

           <mailto:care.sa at car.com> care.sa at car.com

           <mailto:car.com at care.sa> car.com at care.sa

 

will be straightforwardly interpreted as follows (no confusion whatsoever):

 

<image002.jpg>

Now let us repeat the examples using an Arabic email address:

 

<image003.jpg>

 

Here a native Arabic-speaking user would make the following assumptions as well regardless of the text direction:

1.     The user part is always to the right-side of the sign (@): اندري

2.     The domain name is always to the left-side of the sign (@): رسيل.السعودية

3.     A domain name is arranged in a well-defined label hierarchy where an Arabic TLD is always the leftmost label of the domain name: .السعودية

 

Therefore, the given Arabic email address:

<image004.jpg>

should be used without changing its direction or swapping between its parts to maintain its form and hence remove any confusion or misinterpretation.

 

Consider the following examples where the previous email address

is used in different text writing directions:

<image005.jpg>

As you can see, regardless of the text direction (LTR or RTL) the email address maintain its form.

This allows the user easily construct and deconstruct email addresses correctly without confusing and mixing up its parts. For example, the following set of examples:

 

 

<image006.jpg>

will be straightforwardly interpreted as follows (no confusion whatsoever):

<image007.jpg>

However, if your suggestion is followed then the above email addresses will be used as follows depending in the text direction:

<image008.jpg>

and frankly this is absolutely confusing.

 

 

As an Arabic speaker we were dealing with LTR and RTL together long time ago (far before Computers where invented) because our Arabic alphabetic is RTL while the Arabic numbers are LTR. 

 

So if I want to write the following sentience in Arabic "My salary is 321 Pound" I will write it like this:

<image009.jpg>

Or

<image010.jpg>

And not like this:

<image011.jpg>

Nor

<image012.jpg>

Since to any Arabic user the last two images means "My salary is 123 Pound"!

 

Later, when computer was introduced in our region (1980s) we used to write English names within the Arabic text without chaining their direction. In other words, if I want to write the following sentence in Arabic (without Arabizing the English names) "We welcome the interest of Mr. André Schappo in the Arabic language" then I will write it like this:

 

<image013.jpg>

not like this:

 

<image014.jpg>

 

And definitely not like this:

<image015.jpg>

 

Moreover, when the internet was introduced (1990s) we used to write domains and email addresses in a similar manner and as what I have explained to you in my previous email.

 

I hope that I have clarified the view of an Arabic speaker regarding your thoughts on how to handle RTL in LTR context and vice versa.

 

BTW the following represent a sample of tons of examples form famous Newspapers inside the Arabic world:

 

 <http://www.alriyadh.com/975687> http://www.alriyadh.com/975687

<image016.jpg>

 

             <http://www.albayan.ae/economy/last-deal/2011-12-09-1.1551679> http://www.albayan.ae/economy/last-deal/2011-12-09-1.1551679

            <image017.jpg>

 

             <http://aitmag.ahram.org.eg/News/38238.aspx> http://aitmag.ahram.org.eg/News/38238.aspx

<image018.jpg>

 

With best regards,

Raed

 

 

From: UA-discuss [ <mailto:ua-discuss-bounces at icann.org> mailto:ua-discuss-bounces at icann.org] On Behalf Of Andre Schappo
Sent: Tuesday, May 08, 2018 6:49 PM
To:  <mailto:ua-discuss at icann.org> ua-discuss at icann.org
Subject: Re: [UA-discuss] Mixing between RTL and LTR scripts

 

 

Time to revive a blog article which I wrote in March 2016😀 My blog article is about presentation of Arabic Email addresses ➜  <http://schappo.blogspot.co.uk/2016/03/arabic-email-addresses.html> schappo.blogspot.co.uk/2016/03/arabic-email-addresses.html Using this presentation method would make components of an email address or domain name clearer even when mixing LTR and RTL 

 

André Schappo

 

On 5 May 2018, at 13:03, Abdalmonem Tharwat Galila < <mailto:agalila at mcit.gov.eg> agalila at mcit.gov.eg> wrote:

 

So how could any application process the domain name !! It will be RTL or LTR !!

Ex Abdo.عبدو.Ahmed 

Where is the 1st label !!! Is it Abdo or Ahmed !!!

 

Consider if the domain name starts with RTL text !! Or RTL in the middle !!! Or at the end !!!

 

Sent from my iPhone


On May 5, 2018, at 11:50 AM, Andrew Sullivan < <mailto:ajs at anvilwalrusden.com> ajs at anvilwalrusden.com> wrote:

Hi,

 

In the same label, it's mostly a bad idea (there's some discussion of this in the bidi document). But my point was about domain names, not individual labels. 

 

A

 

-- 

Please excuse my clumbsy thums

 


  _____  


On May 5, 2018 04:29:05 Abdalmonem Tharwat Galila < <mailto:agalila at mcit.gov.eg> agalila at mcit.gov.eg> wrote:

Hi Andrew, Thanks for your below reply , I spend a lot of time try to do some mixing examples between RTL and LTR within the same label, what I got  is strange and unclear

Label as a result for ex.

 <mailto:%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%85%D9%86%D8%B9%D9%85-Abdo@%D8%B3%D8%AC%D9%84.%D9%85%D8%B5%D8%B1> عبدالمنعم-Abdo@سجل.مصر

Abdoعبدالمنعم.مصر

… etc

Many issues you cannot imagine , also another thing using dot in RTL context or in LTR context will give you different labels although they must be the same.

To be away from the display issues we get if we mix RTL and LTR code points

in the same labels.

 

Take a look here  <https://tools.ietf.org/html/rfc5564> link.

 

 

 

-----Original Message-----
From: UA-discuss [ <mailto:ua-discuss-bounces at icann.org> mailto:ua-discuss-bounces at icann.org] On Behalf Of Andrew Sullivan
Sent: Friday, May 04, 2018 2:57 PM
To: John Levine < <mailto:john.levine at standcore.com> john.levine at standcore.com>; Abdalmonem Tharwat Galila < <mailto:agalila at mcit.gov.eg> agalila at mcit.gov.eg>
Cc: Ahmed Bakhat Masood ( <mailto:ahmedbakhat at pta.gov.pk> ahmedbakhat at pta.gov.pk) < <mailto:ahmedbakhat at pta.gov.pk> ahmedbakhat at pta.gov.pk>;  <mailto:ua-discuss at icann.org> ua-discuss at icann.org; Ahmed Bakhat ( <mailto:ahmedbakhat at yahoo.com> ahmedbakhat at yahoo.com) < <mailto:ahmedbakhat at yahoo.com> ahmedbakhat at yahoo.com>
Subject: Re: [UA-discuss] Mixing between RTL and LTR scripts

 

Also I don't know how you disallow script mixing for domain names. IDNA is label by label. The DNS is distributed, so there's no way to prevent mixing, is there?

 

A

 

--

Please excuse my clumbsy thums

----------

On May 4, 2018 05:36:02 "John Levine" < <mailto:john.levine at standcore.com> john.levine at standcore.com> wrote:

 

>>                I hope you all doing well, after back to TF-AIDN "Task Force Of Arabic

>>                IDNs", I got the following regards to mixing LTR and RTL texts within the

>>                same label.

>> -          Mixing between different scripts is not allowed for domain names

>> and email addresses

>> -          Numbers at the middle or at the end of the RTL domain name is

>> allowed.

>> 

>> To be away from the display issues we get if we mix RTL and LTR code points

>> in the same labels.

> 

> Thanks.  I think this clarifies the point that we have no advice on

> displaying e-mail addresses, since mailboxes are not domain names and are

> not labels and are not subject to IDNA2008.

> 

> Regards,

> John Levine,  <mailto:john.levine at standcore.com> john.levine at standcore.com

> Standcore LLC

 

 

 

 

🌏 🌍 🌎
André Schappo
 <mailto:%E5%B0%8F%E5%B1%B1@%E7%94%B5%E9%82%AE.%E5%9C%A8%E7%BA%BF?Subject=%E4%BD%A0%E5%A5%BD%E5%B0%8F%E5%B1%B1%F0%9F%98%9C> 小山@电邮.在线?Subject=你好小山😜
 <https://schappo.blogspot.co.uk/> schappo.blogspot.co.uk
 <https://twitter.com/andreschappo> twitter.com/andreschappo
 <https://weibo.com/andreschappo?is_all=1> weibo.com/andreschappo?is_all=1
 <https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization> groups.google.com/forum/#!forum/computer-science-curriculum-internationalization

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20180510/4268b465/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2748 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20180510/4268b465/image001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 2630 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20180510/4268b465/image002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 2318 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20180510/4268b465/image003.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 2607 bytes
Desc: not available
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20180510/4268b465/image004.jpg>


More information about the UA-discuss mailing list