[UA-discuss] Fun with Unicode

Asmus Freytag asmusf at ix.netcom.com
Sat Nov 10 09:40:52 UTC 2018


On 11/10/2018 1:11 AM, Dr Ajay Data wrote:
> Is there any encoding/decoding method like punycode for these special 
> symbols , which browsers are following.  What makes browser map these 
> symbols to three different characters.. ?

Unicode *compatibility* decomposition.

Probably the browsers are applying normalization form NF*K*C to the 
input data.

That normalization form is defined as applying compatibility 
decomposition followed by *canonical* composition. As a result of NFKC 
the data is in NFC.

Likewise you will find browsers do accept uppercase strings for IDNs and 
apply case folding to lower case before resolving. This allows users to 
enter IDNs in uppercase, even though IDNs are only lowercase per IDNA 2008.

A./

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20181110/08d02e09/attachment.html>


More information about the UA-discuss mailing list