[UA-discuss] Fun with Unicode
Asmus Freytag
asmusf at ix.netcom.com
Sat Nov 10 09:40:52 UTC 2018
On 11/10/2018 1:11 AM, Dr Ajay Data wrote:
> Is there any encoding/decoding method like punycode for these special
> symbols , which browsers are following. What makes browser map these
> symbols to three different characters.. ?
Unicode *compatibility* decomposition.
Probably the browsers are applying normalization form NF*K*C to the
input data.
That normalization form is defined as applying compatibility
decomposition followed by *canonical* composition. As a result of NFKC
the data is in NFC.
Likewise you will find browsers do accept uppercase strings for IDNs and
apply case folding to lower case before resolving. This allows users to
enter IDNs in uppercase, even though IDNs are only lowercase per IDNA 2008.
A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/ua-discuss/attachments/20181110/08d02e09/attachment.html>
More information about the UA-discuss
mailing list