[UA-discuss] [UA-International] IDN-as-punycode-encoded-label in Baidu search engine results

Tan Tanaka, Dennis dtantanaka at verisign.com
Wed Dec 2 15:02:06 UTC 2015


Jothan,

.com/.net supports IDNA2008.

I think you may be onto something wrt developers (i.e. implemeters) being fixed in IDNA2003 + TR46.

For example, the domain name “faß.de” with U+00DF:
IDNA2003 maps it to “fass.de” (all ASCII)
IDNA2008 maps it to xn--fa-hia.de (Punycode encoded form of faß.de)

I just tested it in Chrome and “faß.de” takes me to “fass.de” (and then redirects to a landing page), on the other hand “xn--fa-hia.de” goes to a parked page. If Chrome was supporting IDNA2008 all the way “faß.de” and “xn--fa-hia.de” should be equivalents, and they are not. IE does the same mapping from “faß.de” to “fass.de”. Interestingly, IE didn’t recognize the punycode-encoded form of the IDN.

Dennis

From: Jothan Frakes [mailto:jothan at jothan.com]
Sent: Tuesday, December 01, 2015 6:54 PM
To: Tan Tanaka, Dennis
Cc: Jin Wang; ua-international at icann.org; ua-discuss; Andre Schappo
Subject: Re: [UA-discuss] [UA-International] IDN-as-punycode-encoded-label in Baidu search engine results

@Andre I like hearing a Firefox win.  We deal with U-Labels and A-Labels in the domain part of a URI

@Dennis-

It is good to see the Punycode at least appearing within the results of the Baidu search.
With respect to the A-Label displaying instead of the U-Label, there's something distinct to .com and a few of the other early innovator TLDs that supported 2003 which might be at play.

Which IDNA is .COM / .NET supporting?  I seem to recall Verisign really being a strong contributor in early times of IDN - but due to the early innovation, importance, and support Verisign put behind advancing IDN, I seem to recall it was IDNA2003.  I could see developers noticing the ways IDNA2008 might render a name like xn--fiqu8s.com<http://xn--fiqu8s.com> restricted as a variant of xn--fiqs8s.com<http://xn--fiqs8s.com> where it was available in IDNA2003.  With registrations existing that -might- be restricted cohabitating with those that are not, I could see those developers selecting a perfectly reasonable solution being falling back to use of punycode rather than attempting to pick and choose whose domain is or is not valid.

Just a hunch.  I didn't speak with Baidu

-Jothan


Jothan Frakes
Tel: +1.206-355-0230

On Thu, Nov 26, 2015 at 4:13 AM, Andre Schappo <A.Schappo at lboro.ac.uk<mailto:A.Schappo at lboro.ac.uk>> wrote:
Hi Jin

Thank you very much for the complement, but I am not an expert in East Asian languages. I am , instead, a keen amateur :) I find the East Asian languages fascinating and beautiful.

Thank you very much for the link. This is really good news. Prompted by your email I found a CNNIC article http://cnnic.cn/gywm/xwzx/rdxw/2015/201511/t20151116_53029.htm

Yesterday I had tested weibo idn links using Safari (OSX) and they failed. The CNNIC article states they should work. So, today, I tried with Firefox (OSX) instead and the chinese IDNs work in weibo with firefox :) This is such a huge step forward. I tested with .中国 .网络 .公司 IDNs. When I get time I will try with other browsers.

So, why do they work with firefox and not safari. Here are my initial and brief findings.

I had a quick look at the http headers and here is what appears to be happening — Weibo's t.cn<http://t.cn> service is returning the unicode form and not the punycode form. The t.cn<http://t.cn> service is probably further encoding (probably percent encoding, and there may be more than one level of encoding at play). I think a backend service like t.cn<http://t.cn> should return the punycode form. Firefox copes with this but Safari fails.

When I get time I will investigate further

Those of you that have a Sina Weibo account - can you please test with other browsers and Operating Systems. Here are some test chinese IDNs
http://北京电影学院.中国<http://xn--1lq90imwijke8n8a9u5b.xn--fiqs8s>
http://废品回收.网络<http://xn--jvr02cd1l4kh.xn--io0a7i>
http://宿迁物流.公司<http://xn--oct602bexdr40b.xn--55qx5d>

André Schappo

On 26 Nov 2015, at 09:49, Jin Wang wrote:

> Hi Andre,
>
> From your personal blog and sina weibo I learnt you are an expert in Eastern Asian languages, so I recommend a Chinese news clip from CNNIC's Wechat platform. I personally think it is a great achievement and do hope more IDNs and new gtlds can join the effort & share the achievement in near future.
>
> Apology in advance for the ridiculously long url.
>
> The topic is "Sina Weibo Now Fully Support Chinese IDN", it basically stated "Sina Weibo now giving equal treat on to 4 mainstream Chinese IDNs as ASCII tlds"
> https://mp.weixin.qq.com/s?__biz=MjM5MzQ1ODQyMQ==&mid=400695189&idx=1&sn=700fffd846a532fbc750332702f4cb08&scene=2&srcid=1116dFUQTc6sB9oBsWxCtwc6&from=timeline&isappinstalled=0&key=ff7411024a07f3ebe7630197303ce39540fa79341032bd6201895e4003deb5480e1dae74dffb38e5da9211e324c07335&ascene=0&uin=OTI0NzMyMzgw&devicetype=iMac+MacBookPro12%2C1+OSX+OSX+10.10.5+build(14F27)&version=11020201&nettype=cmnet&pass_ticket=RrQ639TbIq%2FLcvSbltUj3ojz3mNDhBbWKujwHWQWK%2FZQndGgsSMXna9Mc7CcuEeR
>
> Maybe you can try to play around with some IDN urls in Weibo and see how China's biggest social media platform is doing on UA right now. Looking forward to your feedback and please do contact me if any assistance is needed.
>
> Best Regards,
> Jin
>
> On Thu, Nov 26, 2015 at 4:33 PM, Andre Schappo <A.Schappo at lboro.ac.uk<mailto:A.Schappo at lboro.ac.uk>> wrote:
> Hi Jin
>
> I am intrigued by your sentence "As for the Sina Weibo, I believe our beloved colleagues from CNNIC has just achieved some milestones with Sina and the rest of us from China ICANN Community would love to join them in the action."
>
> So, please do tell us more about this as I am really curious :)
>
> TIA
>
> André Schappo
>
> On 25 Nov 2015, at 09:18, Jin Wang wrote:
>
> > Hi Andre,
> >
> > Thank you very much for the suggestion. Yes we are targeting all major social media platforms\search engines\input methods\browsers- they are helping more than 760 million internet users on a daily basis.
> >
> > We also want all the .brand registry operators which are mainly banks and telecommunication carriers. The good news is, as you have pointed out, most of the targeted stakeholders are new gtld registry operators for their own.
> >
> > As for the Sina Weibo, I believe our beloved colleagues from CNNIC has just achieved some milestones with Sina and the rest of us from China ICANN Community would love to join them in the action.
> >
> > I will keep working with my colleagues in China and also keep you posted on our progress.
> >
> > Best Regards,
> > Jin
> >
> > On Wed, Nov 25, 2015 at 4:15 PM, Andre Schappo <A.Schappo at lboro.ac.uk<mailto:A.Schappo at lboro.ac.uk>> wrote:
> > Hi Jin
> >
> > Good to know of UA activities in China.
> >
> > It would be good if you could include Sina Weibo 新浪微博 in your outreach initiatives. I have been waiting years for Sina Weibo to correctly handle IDNs. I had more or less given up and have not tested IDNs on their system for a long long time. The situation for years has been that Sina could only handle the punycode form of IDNs. Prompted by this thread I tested again and some progress has been made :)
> >
> > So I tested by posting on Sina Weibo with http://xn--6kr28kk1brxs.xn--fiqs8s and http://南昌大学.中国<http://xn--6kr28kk1brxs.xn--fiqs8s>
> >
> > http://xn--6kr28kk1brxs.xn--fiqs8s works as it always did — linkifies ok ...but shows punycode form in popup when one hovers over the link
> >
> > http://南昌大学.中国<http://xn--6kr28kk1brxs.xn--fiqs8s> — linkifies ok (which it did not last time I tested) & shows unicode form in popup when one hovers over the link …but does not work when one clicks on the link
> >
> > …and… still waiting for the new gtld .微博 to be delegated to DNS Root
> >
> > André Schappo
> > http://weibo.com/andreschappo
> >
> > On 24 Nov 2015, at 16:21, Jin Wang wrote:
> >
> > > Hi Dennis,
> > >
> > > Sorry for the confusion caused, I did miss your point, and yes of course you are definitely right about the a-label showing on the search result.
> > >
> > > As for the plan for China UA task, actually there is an existing yet very strong 'ICANN community' in China. we are working together to form a combined effort in UA by:
> > >
> > > Firstly identify the major stakeholders (policy makers, IDN tld registries, top registrars, top search application/service providers, internet research organisations),
> > >
> > > Secondly gather the demands from registries' side,
> > >
> > > Thirdly we need to interview the search engines and input methods companies to analyse the potential technical obstacles-- with the interests of a stakeholder group rather than one tld at a time,
> > >
> > > Last but not least, we should raise the awareness among the CIOs and CMOs in China through as many channels/methods as possible.
> > >
> > > The current obstacles for this China group to join the UA discuss are their working language (which is mainly Chinese) & the time difference. Nonetheless we still want to make more contribution to ICANN UA by helping the localization of any UA related document and sharing first-hand feedbacks from Chinese 'netizens' and internet service providers.
> > >
> > > Best Regards,
> > > Jin
> > >
> > > On Tue, Nov 24, 2015 at 11:19 PM, Tan Tanaka, Dennis <dtantanaka at verisign.com<mailto:dtantanaka at verisign.com>> wrote:
> > > Hi Jin,
> > >
> > >
> > >
> > > Perhaps I was not clearly enough in my note. I was referring to the fifth search result on the picture. Note that the URL is “xn—ebr05n.com<http://ebr05n.com>” instead of ”墨刀.com<http://xn--ebr05n.com>”. The point is that the a-label should not be displayed to the end user. The application should have identified the label as an IDN and as such transformed it to Unicode. That didn’t happen here.
> > >
> > >
> > >
> > > On your other note, if you can elaborate a plan on how to engage these companies in China, that would be extremely welcome. Could we discuss your high level plan in two weeks from today?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Dennis
> > >
> > >
> > >
> > > From: Jin Wang [mailto:jin.wang at internetregistry.info<mailto:jin.wang at internetregistry.info>]
> > > Sent: Tuesday, November 24, 2015 1:44 AM
> > > To: Tan Tanaka, Dennis
> > > Cc: ua-discuss; ua-international at icann.org<mailto:ua-international at icann.org>; yaojk; Brent London; Don Hollander; Mark Švančárek
> > > Subject: Re: [UA-discuss] [UA-International] IDN-as-punycode-encoded-label in Baidu search engine results
> > >
> > >
> > >
> > > Hi Gentelmen,
> > >
> > >
> > >
> > > I reckoned the term Dennis used is an IDN.com but all the BAIDU results just took the Chinese key words (the parts that were high-lighted in red)--- it barely has anything to do with domain name. It is really hard to come to the conclusion that BAIDU is supporting IDNs in their search engine.
> > >
> > >
> > >
> > > At present, Baidu is the leading search engine in China but there are still a handful companies providing similar services in China. Therefore, instead of having one or two experts talking to the companies one by one, I suggest we'd better make a combined effort to talk to a group of companies - search engines, mailbox service provider, input methods (for Chinese pinyin), and also some of the smartphone manufacturers. As most of them are competing locally rather than globally, it might works better to start within China, especially when some of the world's biggest player is in absence in China.
> > >
> > >
> > >
> > > I can voluntarily embark on finding out a viable communication channel so that we can have a dialog mechanism in the future.
> > >
> > >
> > >
> > > Best Regards,
> > >
> > > Jin
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Nov 24, 2015 at 1:40 PM, Jiankang Yao <yaojk at cnnic.cn<mailto:yaojk at cnnic.cn>> wrote:
> > >
> > >
> > >
> > > I can help to talk to baidu and forward your message to them.
> > >
> > >
> > >
> > > Jiankang Yao
> > >
> > >
> > >
> > > From: Tan Tanaka, Dennis
> > >
> > > Date: 2015-11-24 05:45
> > >
> > > To: UA-discuss at icann.org<mailto:UA-discuss at icann.org>
> > >
> > > CC: ua-international at icann.org<mailto:ua-international at icann.org>
> > >
> > > Subject: [UA-International] IDN-as-punycode-encoded-label in Baidu search engine results
> > >
> > > Often times I hear that IDNs are not indexed by certain search engines. While I know this is not true, the example below doesn’t help my case either (at least not 100%). Here is an example where the IDN I’m looking for is showing up in the first 5 search results on Baidu (see picture below). However, the string is displayed as the punycode-encoded label instead of the corresponding Chinese IDN (i.e. xn--ebr05n.com<http://xn--ebr05n.com>) .
> > >
> > >
> > >
> > > Google and Yandex appear to work as expected. Bing didn’t display the domain name in the results (first two pages).
> > >
> > >
> > >
> > > Is there someone interested (and with the language skills) in taking the action item to reach out to Baidu? This might be in the form of opening a bug ticket to explain the problem (IDN is displayed as punycode-encoded label. Example: xn--ebr05.com<http://xn--ebr05.com>) and what the expected result should have been (IDN displayed as Chinese domain nam. Example: 墨刀.com<http://xn--ebr05n.com>).
> > >
> > >
> > >
> > >
> > >
> > > <image001.png>
> > >
> > >
> > >
> > > <image002.gif>
> > >
> > > Dennis Tan
> > > Sr. Product Manager
> > >
> > > Naming Services
> > >
> > > DTanTanaka at Verisign.com<mailto:DTanTanaka at Verisign.com>
> > >
> > >
> > > m: 571-246-7303<tel:571-246-7303> t: 703-948-4197<tel:703-948-4197>
> > > 12061 Bluemont Way, Reston, VA 20190
> > >
> > > VerisignInc.com
> > >
> > > <image003.gif>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > 王瑨
> > > 中国区总经理
> > > Mr. Jin Wang
> > >
> > > China General Manager
> > >
> > > 域通联达
> > >
> > >
> > >
> > > 中文新顶级域名市场领航者
> > > The Market-leading New Chinese Domains
> > >
> > > 北京 | 香港 | 赫尔辛基 | 纽约 | 奥斯汀 | 奥斯陆
> > > Beijing | Hong Kong | Helsinki | New York | Austin | Oslo
> > >
> > > 移动电话: +86 159 0110 8743<tel:%2B86%20159%200110%208743>
> > > 电子邮件: jin.wang at internetregistry.info<mailto:jin.wang at internetregistry.info>
> > >
> > >
> > >
> > > --
> > > 王瑨
> > > 中国区总经理
> > > Mr. Jin Wang
> > > China General Manager
> > > 域通联达
> > >
> > >
> > > 中文新顶级域名市场领航者
> > > The Market-leading New Chinese Domains
> > >
> > > 北京 | 香港 | 赫尔辛基 | 纽约 | 奥斯汀 | 奥斯陆
> > > Beijing | Hong Kong | Helsinki | New York | Austin | Oslo
> > >
> > > 移动电话: +86 159 0110 8743<tel:%2B86%20159%200110%208743>
> > > 电子邮件: jin.wang at internetregistry.info<mailto:jin.wang at internetregistry.info>
> >
> >
> >
> >
> >
> >
> > --
> > 王瑨
> > 中国区总经理
> > Mr. Jin Wang
> > China General Manager
> > 域通联达
> >
> >
> > 中文新顶级域名市场领航者
> > The Market-leading New Chinese Domains
> >
> > 北京 | 香港 | 赫尔辛基 | 纽约 | 奥斯汀 | 奥斯陆
> > Beijing | Hong Kong | Helsinki | New York | Austin | Oslo
> >
> > 移动电话: +86 159 0110 8743<tel:%2B86%20159%200110%208743>
> > 电子邮件: jin.wang at internetregistry.info<mailto:jin.wang at internetregistry.info>
>
>
>
>
> --
> 王瑨
> 中国区总经理
> Mr. Jin Wang
> China General Manager
> 域通联达
>
>
> 中文新顶级域名市场领航者
> The Market-leading New Chinese Domains
>
> 北京 | 香港 | 赫尔辛基 | 纽约 | 奥斯汀 | 奥斯陆
> Beijing | Hong Kong | Helsinki | New York | Austin | Oslo
>
> 移动电话: +86 159 0110 8743<tel:%2B86%20159%200110%208743>
> 电子邮件: jin.wang at internetregistry.info<mailto:jin.wang at internetregistry.info>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20151202/ffd2c2fe/attachment.html>


More information about the UA-discuss mailing list