[UA-discuss] Fw: Re: IDN Implementation Guidelines [RE: Re : And now about phishing...]
Asmus Freytag (c)
asmusf at ix.netcom.com
Sun Apr 23 06:08:50 UTC 2017
On 4/22/2017 9:24 PM, ajay at data.in wrote:
> Take a look at this paragraph. Can you read what it says? All the
> letters have been jumbled (mixed). Only the first and last letter of
> ecah word is in the right place:
> I cnduo't bvleiee taht I culod aulaclty uesdtannrd waht I was rdnaieg.
> Unisg the icndeblire pweor of the hmuan mnid, aocdcrnig to rseecrah at
> Cmabrigde Uinervtisy, it dseno't mttaer in waht oderr the lterets in a
> wrod are, the olny irpoamtnt tihng is taht the frsit and lsat ltteer
> be in the rhgit pclae. The rset can be a taotl mses and you can sitll
> raed it whoutit a pboerlm. Tihs is bucseae the huamn mnid deos not
> raed ervey ltteer by istlef, but the wrod as a wlohe. Aaznmig, huh?
> Yaeh and I awlyas tghhuot slelinpg was ipmorantt!
> Try out with friends. If they can that too.
> Some clue from above ?
The clue from the above is that most people do not read
"letter-by-letter" most of the time, but based on word-shape - and the
latter is pretty resilient to alterations in sequences.
If we had limited identifier to dictionary words, 90% of non-homograph
spoofing would disappear, because many of the spoofs that look like
words, aren't in the dictionary.
If this weren't the case (and most of the jumbles were words
themselves), you couldn't read the scrambled text above, because it
would then look like a different text.
We didn't adopt this, so we have to look at other means to defend
The interesting thing is that the letter shapes still matter. Note that
the example doesn't simply keep first and last and then substitutes
That means that the use of diacritics, for example, remains highly
distinctive; because the marks change the "outline" of the word. A
likely exception to that are populations accustomed to expecting
diacritics to be optional.
Note also, that while you can figure out the intended content of the
above text quickly (that is you can "read" it, rather than having to
decrypt it letter-by-letter, it still is immediately detectable as being
(Also, the test may be skewed towards English, because there are so many
short words in English - all the one, two and three-letter words are
retained, and the four-letter words have precisely one possible
Anything that you see in the example that you shared with us?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the UA-discuss