YE Yun-ming, YU Shui, MA Fan-yuan, et al. On Distributed Web Crawler: Architecture, Algorithms and Strategy[J]. Acta Electronica Sinica, 2002, 30(S1): 2008-2011.
YE Yun-ming, YU Shui, MA Fan-yuan, et al. On Distributed Web Crawler: Architecture, Algorithms and Strategy[J]. Acta Electronica Sinica, 2002, 30(S1): 2008-2011.DOI:
We describe a large-scale distributed Web Crawler system
i.e.Igloo V1.2.Igloo' s distributed architecture is based on our two-tiered Hash mapping algorithm
so that it can do efficient task partition while at the same time providing dynamic scalability. As the quality of crawled Web pages is an important factor for evaluating crawlers
it employs PageRank value as die evaluation metric of pages to improve its crawling efficiency. This paper also provides a detailed discussion of the peuormance bottlenecks in crawler systems
and proposes a new URL repository access method based on delayed merging strategy to enable high-speed crawling.The experiments show Igloo can quickly crawl high-quality Web pages as well as present high performance.