[UA-discuss] SAC095 - SSAC Advisory on the Use of Emoji in Domain

Mon May 29 15:55:45 UTC 2017

Hi,

On Mon, May 29, 2017 at 03:39:36PM +0000, Stuart Stuple via UA-discuss wrote:
> 
> Given the conversations we’ve had here around phishing, would we say any aspect of Unicode meets the requirements of Finding 2:
> …are not required by design, standard, or convention to be visually uniform (one code point displayed the same way in all circumstances) or visually distinguishable (different code points displayed in ways that permit them to be disambiguated regardless of context).
> 

I think you're missing the point of this remark in the document.  The
non-specification of visial forms in Unicode for letters is not the
same as it is for emoji, because whereas Unicode does not specify
fonts the implementation of emojis is quite a bit less constrained --
it's actually expected to diverge not just by font, but by OS and so
on.  (The basic problem here is that, whereas "font" is well-defined,
emoji presentation is not yet.  This is why even things like smiley
face have quite large variations even in "the same" font.)

> Beyond even that fundamental consideration, as pointed out on another branch, the use of the ZWJ seems completely independently of any discussion of Emoji. Yet it’s necessary for some writing systems. Two other related areas would be worth evaluating for similar risk would be precomposed versus combining diacritics for symbols such as é and IVSes.
> 

Not exactly: ZWJ and ZWNJ is not subject to any predictable rules with
emojis because emojis are not letters.  We can therefore write
CONTEXTJ rules for ZW[N]J on letters in a way we can't on emoji.

The combining diacritics remark above is similarly wide of the mark,
because the problem with combining diacritics vs precomposed forms is
normally sorted out by normalization (and remember, all U-labels are
required to be in NFC).  But there is no normalization for emojis,
which is an important part of the reason that SSAC is pointing out
they are poorly suited for identifiers.  Again, "smiley face" is
instructive.  The differences in presentation among fixed-width
vs. variable-width font e-with-acute are solved by the code point:
it's the same one all the time.  The differences in
e-plus-combining-acute vs e-with-acute are solved by NFC.  But the
differences among all the various smiley faces are actually because of
using different code points, but you get one of them based on your
rendering engine and so on, and there's no way to normalize them all
to the same "smiley face" thing.  That's not a problem for humans when
communicating casually.  It's a big deal when the same code points are
used as network identifiers.

>  The point raised about the skin tone implementation and color-blind individuals is (pardon the pun) a red-herring. The emojis are designed to be distinguishable based on modern accessibility standards.
>

But we're not only talking to humans; we're talking to computers, and
they need exact match.

And none of what you say addresses the basic problem that emojis were
excluded from IDNA because of their Unicode properties: this is
_Unicode's_ advice we're following.  

> FWIW one opinion is worth, I disagree with the assertion that adding emoji will slow the move towards universal acceptance. Certainly within software products, we’re seeing emoji as one of the forces driving a more robust support of the full Unicode standard and rendering in ways that make emoji useful in content.
> 

But in the above, you are failing to distinguish between "content" and
"identifiers".  Domain names are the latter, and if you want to argue
that they aren't any more then you have a bigger problem than
universal acceptance.  You have a mismatch with the definition of the
thing you want to be universally accepted.

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com