Unicode Homoglyphs

August 14, 2019 Olivier van der Toorn & Ramin Yazdani

For the last couple of months Ramin Yazdani has been looking into phishing
domains using Unicode characters to appear like the target domain. In this
process he developed a new ‘confusables’ table of Unicode characters which can
easily be mistaken for their ASCII counterpart. The table is based on the
‘Unicode Confusables list’ and the ‘Unicode Similarity List’.

The proposed Unicode Confusables table can be found here.
The dataset is supplied as a ‘csv’ file where the first column represents the
decimal codepoints of the Unicode characters. The following columns together
represent the homoglyph for this character (if there is a string to character
mapping you would see multiple homoglyph parts, otherwise only one part).

Additionally, Ramin used the confusables table to find domains which have a ASCII
counterpart. The research is aimed at finding malicious Unicode homoglyph
domains. To this end Ramin compared his findings with entries from the
following blacklists:

blog

Home

About

Consortium

People

Publications

Posters

Slides

Blog

Recent Posts

ANYway: Measuring the Amplification DDoS Potential of Domains (preprint)

TXTing 101: Finding Security Issues in the Long Tail of DNS TXT Records

A Case of Identity: Detection of Suspicious IDN Homograph Domains Using Active DNS Measurements

Looking beyond the horizon: Thoughts on Proactive Detection of Threats