电子学报 ›› 2014, Vol. 42 ›› Issue (11): 2174-2183.DOI: 10.3969/j.issn.0372-2112.2014.11.008

• 学术论文 • 上一篇    下一篇

基于加权PageRank算法的关键包识别方法

潘伟丰1,2, 李兵2,3, 马于涛2,3, 姜波1   

  1. 1. 浙江工商大学计算机与信息工程学院, 浙江杭州 310018;
    2. 武汉大学软件工程国家重点实验室, 湖北武汉 430072;
    3. 武汉大学计算机学院, 湖北武汉 430072
  • 收稿日期:2013-08-09 修回日期:2014-02-08 出版日期:2014-11-25
    • 通讯作者:
    • 潘伟丰
    • 作者简介:
    • 李兵 男,1969年3月出生于湖北省武汉市,武汉大学教授,博士生导师,主要研究方向为软件工程、云计算、人工智能和复杂网络. E-mail:bingli@whu.edu.cn
    • 基金资助:
    • 国家973重点基础研究发展计划 (No.2014CB340401); 国家自然科学基金 (No.61202048,No.61273216,No.61272111); 浙江省自然科学基金 (No.LQ12F02011,No.LY13F020010); 软件工程国家重点实验室开放基金 (No.SKLSE-2012-09-21)

Identifying the Key Packages Using Weighted PageRank Algorithm

PAN Wei-feng1,2, LI Bing2,3, MA Yu-tao2,3, JIANG Bo1   

  1. 1. School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China;
    2. State Key Laboratory of Software Engineering, Wuhan University, Wuhan, Hubei 430072, China;
    3. School of Computer, Wuhan University, Wuhan, Hubei 430072, China
  • Received:2013-08-09 Revised:2014-02-08 Online:2014-11-25 Published:2014-11-25
    • Supported by:
    • National Program on Key Basic Research Project of China  (973 Program) (No.2014CB340401); National Natural Science Foundation of China (No.61202048, No.61273216, No.61272111); National Natural Science Foundation of Zhejiang Province,  China (No.LQ12F02011, No.LY13F020010); Open Fund of State Key Laboratory of Software Engineering (No.SKLSE-2012-09-21)

摘要:

识别软件中的关键实体对于人们理解软件,控制和降低维护费用具有重要意义.然而现有的工作基本都是针对关键类识别的,针对关键包、方法/属性等的研究甚少;同时现有的工作也未能揭示关键类与软件外部质量属性间的关系.为丰富现有的工作,本文提出了一种基于加权PageRank算法的关键包识别方法.该方法用加权有向软件网络模型抽象包粒度软件系统,提出新度量PR(PackageRank)从结构角度量度节点重要性,并引入加权的PageRank算法计算该度量值.数据实验部分以六个开源Java软件为例,分析了包的PR值与常用复杂网络中心性指标(介数中心性、接近中心性、度数中心性等)间的相关性;使用加权的SIR(Susceptible-Infectious-Recovered)模型分析了PR所识别关键包的传播影响,并与其它相关方法进行比较,验证了本文方法的有效性;最后,以其中两个软件为例,分析了包的PR值与包可理解性间的关系,进一步验证了本文方法的有效性.

关键词: 关键包, PageRank算法, 软件网络, 程序理解

Abstract:

Identifying key entities has many implications for software understanding and controlling and reducing maintenance costs.However the existing methods only focus on identifying key classes.Little work has been done on the identification of key entities at the other levels.Further the existing work also failed to reveal the relationships between key classes and external quality attributes.In this paper,we introduce a novel method IDEEP (IDEntifying kEy Packages using weighted PageRank algorithm) to identify the key packages.IDEEP uses a weighted and directed software network to describe packages and their dependencies,proposes a new metric PR (PackageRank) to quantify the package importance,and introduces a weighted PageRank algorithm to compute PR values.Our experiments are carried out on six Java software systems.First we analyze the correlation between PR values and other centrality metrics such as betweenness,closeness and degree.Second we use a weighted version of the susceptible-infectious-recovered model to examine the spreading influence of each node.The results show that our method is better than other six methods.Further,we reveal the relationships between key packages and their understandability and show that the key packages identified by our method are more meaningful from a software engineering perspective.

Key words: key package, PageRank algorithm, software network, program comprehension

中图分类号: