Asmus, I can understand why forum operators would apply those restrictions, but as you indicate, they don’t apply to new links in other contexts.


However, coming back to linkification, the guidelines don’t address the query and fragment portions of a URL. As with the distinction to apply the script set rules to labels, it is worth pointing out in the guidelines that those rules do not apply to the portion after the “?” unless perhaps the query portion in turn is a URL.


Example 1  http://domain.com/?refer= http://newdomain.com 


Example 2  http://domain.com/?title=script1 <http://domain.com/?title=script1&author=script2&quote=script1+script2+script3> &author=script2&quote=script1+script2+script3


The guidelines doesn’t address escapes. Should linkification attempt to unescape escaped characters? That might make the process much more complex. However, ignoring escapes might also lead to very inconsistent results.





Thanks Mark and Asmus.


I agree about the distinction of script mixing within a label. The guide should clarify this.


Also support Asmus clarification regarding ASCII vs all of Latin. 


For the attention to mixing digits within a label, I agree although I would need to review if I can easily know which digits are widely used vs of historical interest. I don’t believe that being a bit broad in linkification acceptance is a problem. The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing. (I know there are problems with reliance on registries). Being too restrictive in linkification could hurt users that need to enter a legitimate URL and can’t.

Digits come in sets that are specified as such in the Unicode Standard (although implicitly: the members of such sets have a property "decimal digit" and Unicode follows the convention of encoding these in complete sets from 0-9). Therefore, not linkifying something that contains a mixture of these sets can be implemented deterministically (although regex syntax leads to particularly grim expressions for specifying this constraint, it can be done).

Realistically, only the modern set of about 30 scripts is of practical importance, so a scheme that does not track the addition of future historic alphabets in Unicode would be adequate....

Where native digits are (largely) historic holdovers, we wouldn't need them at all, but linkification isn't a good place to filter those. 

Some reluctance on automatic conversions of "risky" URLs would be a benefit; it's along the same line as not linkifying something not under the author's control: the risk for mischief is just too great.

Forum software that I have been a user of tended to implement three restrictions that are not related to new TLDs or IDN TLDs:

1) limit file names by extension (e.g. if the link entered was supposed to be for an image, do not allow it to link to something that doesn't have a common image file extension).

2) disallow any link with a "?" in it - rationale: it's not a static link and who knows what will be served later (including risky stuff)

3) require http://, etc., even in text spans that are marked as being URLs or in link attributes.

In some cases these restrictions were deliberate decisions by forum operators - part of reigning in certain kinds of forum spam. 

I feel we need to be cognizant of the needs for limiting the risk profile of certain operations - in particular where the result then winds up online to an open audience (as opposed to just sharing something in a private message). 

The alternative for an operator is to simply blacklist specific TLDs and domains (and most of those will be any IDNs that are not local to the operator...). 


And it seems we need to clarify our implied intent for the guidance about the “implied intent of user’s entry”.  J (I couldn’t resist any longer.)







Hmm, I don’t recall approving that principle (hopefully it was added while I was out on leave, and not just because I carelessly failed to notice it was being added).


I mention that because it seems the opposite of what we could recommend i.e. we SHOULD allow use of Highly Restrictive and continue to discourage Moderately Restrictive.  Do we need to revisit this?  Sorry if I am just confused.


Note that, as Asmus points out, our concern is about script-mixing within a label, not use of different scripts in different labels.  Tex’s examples are all the latter, and should linkify cleanly by UA-ready SW.




Some questions:


1.	Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that  <mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> tex@普遍接受-测试.世界 would not become a link? Or http:// 普遍接受-测试.世界.com?


Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters.

Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead.

For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it.

However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region).

Mixing digit sets in the same label should be a no-no and indicated something's not well-formed. 


2.	I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly.

My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL. 





A quick update on Linkification


We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkification.pdf <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2Fwp-content%2Fuploads%2F2017%2F06%2FUASG010-Quick-Guide-to-Linkification.pdf&data=02%7C01%7Cmarksv%40microsoft.com%7Ce0ff7a322ef44f31a64208d507829df2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636423179950153243&sdata=gGQ5xhFTuJLf1kkFXWsOnfGdOH%2FMb0XbhK7tLiJfqzQ%3D&reserved=0>   This builds on discussions we had post the UASG meeting in Seattle in April.


We are also working on an evaluation of Linkification in major Social Media Communication applications.   (Here’s the link to the Help Wanted advertisement -  <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2Fwp-content%2Fuploads%2F2016%2F11%2FHelp-Wanted%25E2%2580%25A6-Linkification-Evaluation-1.0.pdf&data=02%7C01%7Cmarksv%40microsoft.com%7Ce0ff7a322ef44f31a64208d507829df2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636423179950153243&sdata=EuBYBxDZgX1FbvYLIbUE6RKuC4Jhk%2B7DQSrRa5guzH4%3D&reserved=0> Help Wanted: Linkification Evaluation)


This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future.   While early days, we expect to provide a preliminary report during the ICANN60 meeting.   As we go through the testing it is raising some additional questions about our Good Practice guide and expectations.  We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences.






Don Hollander

Universal Acceptance Steering Group

Skype: don_hollander






