[UA-discuss] Store domain in Punycode or Unicode?

Dr Ajay Data ajay at data.in
Wed Apr 4 11:08:57 UTC 2018


  
+1 Andre..In XgenPlus Email Server, we store all domains in Unicode only and that`s the way to go.  There are many benefits. In this ever changing and fast paced world, Who knows you may have xn-- domains allowed :)
 
thanks
 



Dr. Ajay DATA  | Founder &amp CEO 

Get email id like अजय@डाटा.भारत in your own language,visit www.xgenplus.com 

 

From: Andre Schappo   MailId : [80351250]To: "ua-discuss at icann.org" Subject: Re: [UA-discuss] Store domain in Punycode or Unicode?Date: 04 Apr 2018 02:39:51 PM 
 
I think the Unicode form should be stored . My reasons for recommending this is a little different  .
 
Mostly, I use MySQL and phpMyAdmin for my database work. Storing IDNs and/or EAI addresses in Unicode form has advantages .
 
① I can search by constructing an SQL query in phpMyAdmin. eg all IDNs which contain 食品
② I can learn a lot by visual inspection eg I can readily identify text being in Korean, Thai, Sinhala, Chinese, Arabic, Cyrillic scripts
 
I could not do either of the above if only the punycode form is stored .
 
Basically, people can relate to the Unicode form and not the punycode form. So, if it involves people, store in the Unicode form .
 
Actually, there is one punycode label I always recognise, which is .xn--fiqs8s😀 xn--fiqs8s = 中国 = China. I recognise it because I have seen it so many times and I remember when it went live as I posted to IDNforums idnforums.com/forums/26659-china-idn-cctlds-are-live.html That is the only punycode label I recognise .
 
Andr&eacute Schappo


On 3 Apr 2018, at 20:51, Andrew Sullivan ajs at anvilwalrusden.com> wrote:


Hi,  On Tue, Apr 03, 2018 at 06:36:03PM +0000, Carolyn Liu via UA-discuss wrote:
 Today we do not allow customers to enter IDN (in Unicode) in our system (O365), so customers can only enter domain in ASCII, or an IDN in Punycode form .
 To be clear, this means that domain names with labels of the form xn--[punycode-goes-here] are allowed, but no non-LDH characters are allowed in any domain name label but, after permitting EAI addresses you will accept UTF-8 in the local-part? 
already brings a challenge for us since mail may come in as UTF8 form .
 Under EAI, it _will_ come in that form. 
allow our customers to enter a Unicode domain in O365, and which form we shall store the domain &ndash Unicode or punycode?
 If you attempt to support IDNA2003 or at least some of the compatibility modes of UTS#46, you effectively need to store both. IDNA2003 can lose information in a round trip from Punycode-form and Unicode-form, so you basically need to know the whole set.  This fundamental problem was actually one of the most urgent requirements for IDNA2008, and it`s why some of us remain pretty annoyed with UTS#46 as a strategy since one of its profiles breaks that plan without any suggestion of how it`ll eventually wean people from it. (We didn`t have a weaning suggestion either in the IDNABIS WG, which is why we decided to break the backward compatibility in the few cases, reasoning that pain early in deployment was less bad than pain later.)  If you`re restricting your supported domains to IDNA2008, then you don`t have to care: every actual U-label is also exactly one A-label, and conversely.  So you can store U-labels or A-labels and get the same result.  The usual recommendation is that you store U-labels just because storing A-labels will result in transformation for every user event, and that might have nasty performance effects. 
1. Domains in our system is unique, meaning domain is a key. One domain shall    only exist once and belong to one customer only .
 This is true regardless of whether it`s a U-label or A-label: since they`re DNS names they _must_ be unique globally within the DNS.  
3. At gateway we need to know whether a domain is in our system. The match    logic will be at follows:     a. Is domain in system? If so go ahead and accept.     b. If not, is it UTF8 form? If so convert to Punycode and search again .
 This sounds like a round trip plan.  Why not just run it through the relevant algorithm and check one time?  (LDH-only names will not undergo any transformation.  You may need a coalesce function or similar.) 
4. Every time when we display, we will always convert the domain to Unicode .
 This is a reason to prefer U-label forms: no conversion on display, when the user is waiting. 
5. This is how DNS supports IDN. A uniform storage will make implementation a    lot easier .
 This is true. 
But if we allow to store domain in Unicode, then we have to understand those in Punycode and those in Unicode and convert back and force.  I understand we always need conversion, but if in only one form we know we always need to covert to the other form, vs we might need to covert both directions everywhere, very costly and very confusing .
 It is _certainly_ true that you want to pick one, and if you already have A-labels in the sytem then you might have a migration problem. That might be a reason to use A-labels for storage.  A  --  Andrew Sullivan ajs at anvilwalrusden.com



 
Do not Remove:[HID]20180404143951408[-HID]  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20180404/d3e4bf55/attachment.html>


More information about the UA-discuss mailing list