[UA-discuss] Progress on HTML and email...

Mark Svancarek marksv at microsoft.com
Tue Nov 14 22:57:23 UTC 2017


Maybe this is semantics, but I don't think of it as "the browser" doing the conversions.  It would need to be some JS coded into the page.  The browser is just the sandbox the page's script would be running inside.  

-----Original Message-----
From: UA-discuss [mailto:ua-discuss-bounces at icann.org] On Behalf Of Andrew Sullivan
Sent: Tuesday, November 14, 2017 2:19 PM
To: ua-discuss at icann.org
Subject: Re: [UA-discuss] Progress on HTML and email...

On Tue, Nov 14, 2017 at 06:28:28PM +0000, Shawn Steele wrote:
> Um. Something's confused about that statement.  "After all, Windows doesn't even use UTF-8 input natively, so it would literally be impossible for a Windows user to input correct UTF-8 at all."
>

Not confused, just probably a little too glib.

> That's not how input works on any browser.  People type things on their keyboard (or soft keyboard or whatever) and those get translated into whatever characters the browser's using for their input boxes (hopefully unicode).  On Windows basically the "input" from the user to the browser is UTF-16.  All of that's irrelevant as far as the HTML spec is concerned.
> 

This was my point.

> When the user submits the form, then it's up to the browser to send it to the server in the correctly negotiated encoding - hopefully that's UTF-8 for most sites and all browsers, though some negotiations could've stuck it in some really stupid limited codepage.
> 

Yes, but again my point was that if the HTML spec now says that the server-part of an email address in input has to be LDH (which is what the draft spec says), and the user has put some Unicode characters in the server-part of an email address, then a browser running on Windows has to do the "something to UTF-8" transformation before it can even get around to doing the U-label to A-label transformation, because the user-supplied input isn't in UTF-8 normalized with NFC _anyway_, and that's the starting line for being a candidate U-label.  The same, actually, would be true of the local-part of the email address, since EAI addresses are required to be in UTF-8 but the user-supplied input won't necessarily be that (won't be on MacOS, either, since the native form on OSX is NFD, not NFC).  So there's nothing strange about a browser having to do transformations of user-supplied information here, I think.

Best regards,

A

--
Andrew Sullivan
ajs at anvilwalrusden.com


More information about the UA-discuss mailing list