<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">

<div class=""><br class="">

</div>

I think the Unicode form should be stored. My reasons for recommending this is a little different.

<div class=""><br class="">

</div>

<div class="">Mostly, I use MySQL and phpMyAdmin for my database work. Storing IDNs and/or EAI addresses in Unicode form has advantages.</div>

<div class=""><br class="">

</div>

<div class="">① I can search by constructing an SQL query in phpMyAdmin. eg all IDNs which contain 食品</div>

<div class="">② I can learn a lot by visual inspection eg I can readily identify text being in Korean, Thai, Sinhala, Chinese, Arabic, Cyrillic scripts</div>

<div class=""><br class="">

</div>

<div class="">I could not do either of the above if only the punycode form is stored.</div>

<div class=""><br class="">

</div>

<div class="">Basically, people can relate to the Unicode form and not the punycode form. So, if it involves people, store in the Unicode form.</div>

<div class=""><br class="">

</div>

<div class="">Actually, there is one punycode label I always recognise, which is .xn--fiqs8s😀 xn--fiqs8s = 中国 = China. I recognise it because I have seen it so many times and I remember when it went live as I posted to IDNforums <a href="http://idnforums.com/forums/26659-china-idn-cctlds-are-live.html" class="">idnforums.com/forums/26659-china-idn-cctlds-are-live.html</a> That

 is the only punycode label I recognise.</div>

<div class=""><br class="">

</div>

<div class="">André Schappo<br class="">

<div><br class="">

<blockquote type="cite" class="">

<div class="">On 3 Apr 2018, at 20:51, Andrew Sullivan <<a href="mailto:ajs@anvilwalrusden.com" class="">ajs@anvilwalrusden.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div class="">Hi,<br class="">

<br class="">

On Tue, Apr 03, 2018 at 06:36:03PM +0000, Carolyn Liu via UA-discuss wrote:<br class="">

<blockquote type="cite" class=""><br class="">

Today we do not allow customers to enter IDN (in Unicode) in our system (O365),<br class="">

so customers can only enter domain in ASCII, or an IDN in Punycode<br class="">

form.<br class="">

</blockquote>

<br class="">

To be clear, this means that domain names with labels of the form<br class="">

xn--[punycode-goes-here] are allowed, but no non-LDH characters are<br class="">

allowed in any domain name label; but, after permitting EAI addresses<br class="">

you will accept UTF-8 in the local-part?<br class="">

<br class="">

<blockquote type="cite" class="">already brings a challenge for us since mail may come in as UTF8<br class="">

form.<br class="">

</blockquote>

<br class="">

Under EAI, it _will_ come in that form.<br class="">

<br class="">

<blockquote type="cite" class="">allow our customers to enter a Unicode domain in O365, and which form we shall<br class="">

store the domain – Unicode or punycode?<br class="">

</blockquote>

<br class="">

If you attempt to support IDNA2003 or at least some of the<br class="">

compatibility modes of UTS#46, you effectively need to store both.<br class="">

IDNA2003 can lose information in a round trip from Punycode-form and<br class="">

Unicode-form, so you basically need to know the whole set.  This<br class="">

fundamental problem was actually one of the most urgent requirements<br class="">

for IDNA2008, and it's why some of us remain pretty annoyed with<br class="">

UTS#46 as a strategy since one of its profiles breaks that plan<br class="">

without any suggestion of how it'll eventually wean people from it.<br class="">

(We didn't have a weaning suggestion either in the IDNABIS WG, which<br class="">

is why we decided to break the backward compatibility in the few<br class="">

cases, reasoning that pain early in deployment was less bad than pain<br class="">

later.)<br class="">

<br class="">

If you're restricting your supported domains to IDNA2008, then you<br class="">

don't have to care: every actual U-label is also exactly one A-label,<br class="">

and conversely.  So you can store U-labels or A-labels and get the<br class="">

same result.  The usual recommendation is that you store U-labels just<br class="">

because storing A-labels will result in transformation for every user<br class="">

event, and that might have nasty performance effects.<br class="">

<br class="">

<blockquote type="cite" class="">1. Domains in our system is unique, meaning domain is a key. One domain shall<br class="">

   only exist once and belong to one customer only.<br class="">

</blockquote>

<br class="">

This is true regardless of whether it's a U-label or A-label: since<br class="">

they're DNS names they _must_ be unique globally within the DNS.<br class="">

<br class="">

<br class="">

<blockquote type="cite" class="">3. At gateway we need to know whether a domain is in our system. The match<br class="">

   logic will be at follows:<br class="">

    a. Is domain in system? If so go ahead and accept.<br class="">

    b. If not, is it UTF8 form? If so convert to Punycode and search again.<br class="">

</blockquote>

<br class="">

This sounds like a round trip plan.  Why not just run it through the<br class="">

relevant algorithm and check one time?  (LDH-only names will not<br class="">

undergo any transformation.  You may need a coalesce function or<br class="">

similar.)<br class="">

<br class="">

<blockquote type="cite" class="">4. Every time when we display, we will always convert the domain to Unicode.<br class="">

</blockquote>

<br class="">

This is a reason to prefer U-label forms: no conversion on display,<br class="">

when the user is waiting.<br class="">

<br class="">

<blockquote type="cite" class="">5. This is how DNS supports IDN. A uniform storage will make implementation a<br class="">

   lot easier.<br class="">

</blockquote>

<br class="">

This is true.<br class="">

<br class="">

<blockquote type="cite" class="">But if we allow to store domain in Unicode, then we have to understand those in<br class="">

Punycode and those in Unicode and convert back and force.  I understand we<br class="">

always need conversion, but if in only one form we know we always need to<br class="">

covert to the other form, vs we might need to covert both directions<br class="">

everywhere, very costly and very confusing.<br class="">

</blockquote>

<br class="">

It is _certainly_ true that you want to pick one, and if you already<br class="">

have A-labels in the sytem then you might have a migration problem.<br class="">

That might be a reason to use A-labels for storage.<br class="">

<br class="">

A<br class="">

<br class="">

-- <br class="">

Andrew Sullivan<br class="">

<a href="mailto:ajs@anvilwalrusden.com" class="">ajs@anvilwalrusden.com</a><br class="">

</div>

</div>

</blockquote>

</div>

<br class="">

<br class="">

</div>

</body>

</html>