National Key Research and Development Program of China (2016YFB0801003);Dongguan Innovative Research Team Intorduction Project of Guangdong Province (No.201636000100038)
Prior works on detection of typosquatting abuse are based on the calculation of edit distance between domains. They do not fully utilize the context information of domains
and usually give many false positive results for short domains. Actively crawling much related information of the given domains can help improving the results
but introduce a heavy overhead. Therefore
we design a lightweight detecting strategy based on domain names
and introduce the bi-directional long short-term memory (LSTM) model to make full use of the domain context information. Furthermore
we give a locality sensitive hashing function for domain names
in order to increase the speed of typosquatting abuse detection over large-scale domain sets. Experimental results on a real data set show that the proposed method can overcome the shortcomings of edit distance based methods