一种评价搜索引擎信息覆盖率的模型及其验证

孟 涛; 闫宏飞; 李晓明

您当前的位置：

首页 >

文章列表页 >

一种评价搜索引擎信息覆盖率的模型及其验证

论文 | 更新时间：2025-07-16

- 一种评价搜索引擎信息覆盖率的模型及其验证
- An Evaluation Model on Information Coverage of Search Engines
- 电子学报 2003年31卷第8期页码：1168-1172
- 作者机构：
  
  北京大学计算机科学技术系,北京,100871
- 作者简介：
- 基金信息：
  
  国家重点基础研究发展规划 (973)项目 (No.G1999032706);北京大学985项目
- DOI：
  中图分类号： TP393
- 纸质出版：2003
- 稿件说明：
移动端阅览
孟涛, 闫宏飞, 李晓明. 一种评价搜索引擎信息覆盖率的模型及其验证[J]. 电子学报, 2003,31(8):1168-1172.

MENG Tao, YAN Hong-fei, LI Xiao-ming. An Evaluation Model on Information Coverage of Search Engines[J]. Acta Electronica Sinica, 2003, 31(8): 1168-1172.
孟涛, 闫宏飞, 李晓明. 一种评价搜索引擎信息覆盖率的模型及其验证[J]. 电子学报, 2003,31(8):1168-1172. DOI：

MENG Tao, YAN Hong-fei, LI Xiao-ming. An Evaluation Model on Information Coverage of Search Engines[J]. Acta Electronica Sinica, 2003, 31(8): 1168-1172. DOI：

摘要

搜索引擎的网页搜集子系统通常以WWW的网页构成的有向图结构为依据

循着网页间的链接进行搜集从而扩大信息覆盖面.本文针对这种信息覆盖能力

建立量化模型从多个角度考察搜集系统对WWW信息资源的覆盖程度.文章首先分析了网页搜集不完全性的若干因素

在指出信息覆盖率的研究意义后提出了三类重要的信息覆盖率概念

然后围绕其中的数量和质量覆盖率展开研究工作.在建立"采样-权值计算-验证"的覆盖率评测模型之后

以北大"燕穹"网页信息博物馆为考察对象并获得其网页数据

用不同的方式对中国Web进行采样;然后分别采用PageRank和HITS两种网页权值算法算出其中的重要网页作为样本

从量和质的角度考察"燕穹"系统的信息覆盖率

得到合理的数量和质量覆盖率值

从而验证了"燕穹"系统信息覆盖率结论的合理性和该信息覆盖率评测模型的可靠性.

Abstract

Search engines usually get web pages by using links between them.With already massive and ever increasing of web pages

they can only crawl and index a portion of the whole web pages.A model to evaluate their information coverage percentages is presented.We analyze main factors why crawlers can't cover all web information

and put up three kinds of benchmarks to measure the coverage of a search engine.The paper gives out an evaluation model for two of three benchmarks as follows:First

sampling WWW to get many web pages

which are used to check the coverage percentage of quantity through generating random IPs or breadth first search.Second

selecting high-qualified pages as samples of important pages

by HITS or PageRank algorithms.Finally

we submit the samples to page database of search engines

and get the coverage percentage.In our research work

we get experimental data from WebInfoMall system of Peking University and compute the coverage percentages of quantity and quality.Using different sampling approaches and algorithms

we get the same results

which can prove our model is right and all the results are exact.

关键词

Keywords

references

浏览量

1816

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

一种基于用户路径模型的搜索引擎检索性能度量方法

搜索引擎中信息动态采集策略的研究

对搜索引擎中评分方法的研究

基于本体与模式的网络用户兴趣挖掘

直流电－直流电开关变流器的统一建模方法