一种基于并行Bloom Filter的高速URL查找算法

doi:10.3969/j.issn.0372-2112.2015.09.023

PDF(3356 KB)

电子学报 ›› 2015, Vol. 43 ›› Issue (9) : 1833-1840. DOI: 10.3969/j.issn.0372-2112.2015.09.023

学术论文

一种基于并行Bloom Filter的高速URL查找算法

周舟¹, 付文亮², 嵩天², 刘庆云¹

作者信息 +

Fast URL Lookup Using Parallel Bloom Filters

ZHOU Zhou¹, FU Wen-liang², SONG Tian², LIU Qing-yun¹

Author information +

文章历史 +

摘要

URL查找是众多网络系统中重要的组成部分,如URL过滤系统、Web缓存等.随着互联网的迅速发展,URL查找面临的主要挑战是实现大规模URL集合下的高速查找,同时保证低存储和低功耗.本文提出了一种基于并行Bloom Filter的URL查找算法,CaBF.该算法高度并行化,提供大规模URL集合下的高速最长前缀匹配,并很好地适应集合中不同数量的URL组件.理论分析和真实网络数据集上的实验表明,该算法相比现有算法可以降低假阳性概率达一个数量级(或者在满足相同假阳性概率的前提下降低存储和硬件逻辑资源消耗).此外,该方法的体系结构很容易映射到FPGA等硬件器件上,提供每秒超过150M次的URL查找速度.

Abstract

URL lookup is fundamental to numerous networking systems,including URL filters,web caches,etc.With the explosive development of the Internet,the main challenge in implementing URL lookup operation is to achieve fast lookup speed and accommodate large URL sets while keeping memory and power consumption low.This paper presents a new URL lookup scheme based on parallel Bloom Filters.It can adapt to set cardinality and perform fast longest prefix matching(LPM)with large URL sets in a highly parallel fashion.The theoretical analysis and experiments on real-life data traces show that the proposed approach leads to reduced false positive probability for up to an order of magnitude(or reduced memory and hardware logic resources with the same false positive probability guaranteed)compared with the existing methods.Moreover,the architecture can be easily mapped to the state-of-the-art FPGAs with moderate resource consumption to provide over 150M lookups per second.

导出引用

周舟, 付文亮, 嵩天, 刘庆云. 一种基于并行Bloom Filter的高速URL查找算法[J]. 电子学报, 2015, 43(9): 1833-1840. https://doi.org/10.3969/j.issn.0372-2112.2015.09.023

ZHOU Zhou, FU Wen-liang, SONG Tian, LIU Qing-yun. Fast URL Lookup Using Parallel Bloom Filters[J]. Acta Electronica Sinica, 2015, 43(9): 1833-1840. https://doi.org/10.3969/j.issn.0372-2112.2015.09.023

中图分类号： TP393

参考文献

[1] Broder A Z,et al.Efficient URL caching for world wide web crawling[A].Proc of WWW 2003[C].Budapest,Hungary:ACM,2003.679-689.
[2] Fan L,et al.Summary cache:a scalable wide-area web cache sharing protocol[J].IEEE/ACM Trans on Networking,2000,8(3):281-293.
[3] Huang N F,et al.A fast URL lookup engine for content-aware multi-gigabit switches[A].Proc of AINA 2005[C].Taipei,Taiwan:IEEE Computer Society,2005.641-646.
[4] Zhou Z,et al.A high-performance URL lookup engine for URL filtering systems[A].Proc of ICC 2010[C].Cape Town,South Africa:IEEE Computer Society,2010.1-5.
[5] Netcraft.September 2013 Web Server Survey[EB/OL].http://news.netcraft.com,2013-09-30.
[6] Michel B S,et al.URL forwarding and compression in adaptive web caching[A].Proc of INFOCOM 2000[C].Tel-Aviv,Israel:IEEE Computer Society,2000.670-678.
[7] Prodanoff Z G,et al.Managing routing tables for URL routers in content distribution networks[J].International Journal of Network Management,2004,14(3):177-192.
[8] Dharmapurikar S,et al.Longest prefix matching using bloom filters[J].IEEE/ACM Trans on Networking,2006,14(2):397-409.
[9] Yu H,et al.A memory-efficient hashing by multi-predicate bloom filters for packet classification[A].Proc of INFOCOM 2008[C].Phoenix,USA:IEEE Computer Society,2008.1795-1803.
[10] Chang F,et al.Bigtable:A distributed storage system for structured data[J].ACM Trans on Computer Systems,2008,26(2):1-26.
[11] Cai H,et al.Applications of bloom filters in peer-to-peer systems:issues and questions[A].Proc of NAS 2008[C].Chongqing,China:IEEE Computer Society,2008.97-103.
[12] Song H,et al.IPv6 lookups using distributed and load balanced bloom filters for 100Gbps core router line cards[A].Proc of INFOCOM 2009[C].Rio de Janeiro:IEEE Computer Society,2009.2518-2526.
[13] Lu Y,et al.Robust counting via counter braids:an error-resilient network measurement architecture[A].Proc of INFOCOM[C].Rio de Janeiro:IEEE Computer Society,2009.522-530.
[14] 王洪波,等.高速网络超连接主机检测中的流抽样算法研究[J].电子学报,2008,36(4):809-818. Wang Hong-bo,et al.On flow sampling for identifying super-connection hosts in high speed networks[J].Acta Electronica Sinica,2008,36(4):809-818.(in Chinese)
[15] 谢鲲,等.布鲁姆过滤器查询算法[J].软件学报,2009,20(1):96-108. Xie K,et al.Bloom filter query algorithm[J].Journal of Software,2009,20(1):96-108.(in Chinese)
[16] Guo D,et al.Theory and network applications of dynamic bloom filters[A].Proc of INFOCOM 2006[C].Barcelona,Spain:IEEE Computer Society,2006.1-12.
[17] Xiao B,et al.Using parallel bloom filters for multiattribute representation on network services[J].IEEE Trans on Parallel and Distributed Systems,2010,21(1):20-32.
[18] URLBlacklist.com.Thelatest blacklist download[EB/OL].http://urlblacklist.com,2013-09-20.
[19] Sogou.Sogou Labs User query log[EB/OL].http://www.sogou.com/labs/dl/q.html,2013-10-14.