ZHOU Zhou1, FU Wen-liang2, SONG Tian2, LIU Qing-yun1
1. National Engineering Laboratory for Information Security Technologies Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;
2. School of Computer Science, Beijing Institute Engineering, Chinese Acadenmy of Sciences, Beijing 100081, China
URL lookup is fundamental to numerous networking systems,including URL filters,web caches,etc.With the explosive development of the Internet,the main challenge in implementing URL lookup operation is to achieve fast lookup speed and accommodate large URL sets while keeping memory and power consumption low.This paper presents a new URL lookup scheme based on parallel Bloom Filters.It can adapt to set cardinality and perform fast longest prefix matching(LPM)with large URL sets in a highly parallel fashion.The theoretical analysis and experiments on real-life data traces show that the proposed approach leads to reduced false positive probability for up to an order of magnitude(or reduced memory and hardware logic resources with the same false positive probability guaranteed)compared with the existing methods.Moreover,the architecture can be easily mapped to the state-of-the-art FPGAs with moderate resource consumption to provide over 150M lookups per second.
周舟, 付文亮, 嵩天, 刘庆云. 一种基于并行Bloom Filter的高速URL查找算法[J]. 电子学报, 2015, 43(9): 1833-1840.
ZHOU Zhou, FU Wen-liang, SONG Tian, LIU Qing-yun. Fast URL Lookup Using Parallel Bloom Filters. Chinese Journal of Electronics, 2015, 43(9): 1833-1840.
[1] Broder A Z,et al.Efficient URL caching for world wide web crawling[A].Proc of WWW 2003[C].Budapest,Hungary:ACM,2003.679-689.
[2] Fan L,et al.Summary cache:a scalable wide-area web cache sharing protocol[J].IEEE/ACM Trans on Networking,2000,8(3):281-293.
[3] Huang N F,et al.A fast URL lookup engine for content-aware multi-gigabit switches[A].Proc of AINA 2005[C].Taipei,Taiwan:IEEE Computer Society,2005.641-646.
[4] Zhou Z,et al.A high-performance URL lookup engine for URL filtering systems[A].Proc of ICC 2010[C].Cape Town,South Africa:IEEE Computer Society,2010.1-5.
[5] Netcraft.September 2013 Web Server Survey[EB/OL].http://news.netcraft.com,2013-09-30.
[6] Michel B S,et al.URL forwarding and compression in adaptive web caching[A].Proc of INFOCOM 2000[C].Tel-Aviv,Israel:IEEE Computer Society,2000.670-678.
[7] Prodanoff Z G,et al.Managing routing tables for URL routers in content distribution networks[J].International Journal of Network Management,2004,14(3):177-192.
[8] Dharmapurikar S,et al.Longest prefix matching using bloom filters[J].IEEE/ACM Trans on Networking,2006,14(2):397-409.
[9] Yu H,et al.A memory-efficient hashing by multi-predicate bloom filters for packet classification[A].Proc of INFOCOM 2008[C].Phoenix,USA:IEEE Computer Society,2008.1795-1803.
[10] Chang F,et al.Bigtable:A distributed storage system for structured data[J].ACM Trans on Computer Systems,2008,26(2):1-26.
[11] Cai H,et al.Applications of bloom filters in peer-to-peer systems:issues and questions[A].Proc of NAS 2008[C].Chongqing,China:IEEE Computer Society,2008.97-103.
[12] Song H,et al.IPv6 lookups using distributed and load balanced bloom filters for 100Gbps core router line cards[A].Proc of INFOCOM 2009[C].Rio de Janeiro:IEEE Computer Society,2009.2518-2526.
[13] Lu Y,et al.Robust counting via counter braids:an error-resilient network measurement architecture[A].Proc of INFOCOM[C].Rio de Janeiro:IEEE Computer Society,2009.522-530.
[14] 王洪波,等.高速网络超连接主机检测中的流抽样算法研究[J].电子学报,2008,36(4):809-818. Wang Hong-bo,et al.On flow sampling for identifying super-connection hosts in high speed networks[J].Acta Electronica Sinica,2008,36(4):809-818.(in Chinese)
[15] 谢鲲,等.布鲁姆过滤器查询算法[J].软件学报,2009,20(1):96-108. Xie K,et al.Bloom filter query algorithm[J].Journal of Software,2009,20(1):96-108.(in Chinese)
[16] Guo D,et al.Theory and network applications of dynamic bloom filters[A].Proc of INFOCOM 2006[C].Barcelona,Spain:IEEE Computer Society,2006.1-12.
[17] Xiao B,et al.Using parallel bloom filters for multiattribute representation on network services[J].IEEE Trans on Parallel and Distributed Systems,2010,21(1):20-32.
[18] URLBlacklist.com.Thelatest blacklist download[EB/OL].http://urlblacklist.com,2013-09-20.
[19] Sogou.Sogou Labs User query log[EB/OL].http://www.sogou.com/labs/dl/q.html,2013-10-14.