

浏览全部资源
扫码关注微信
绍兴文理学院计算机科学与工程系,浙江绍兴 312000
Received:12 August 2019,
Revised:2021-07-26,
Published:25 October 2021
移动端阅览
柳菁,李琪.DisHAP:基于层次亲和聚类的分布式大图划分算法[J].电子学报,2021,49(10):2002-2011.
LIU Jing,LI Qi.DisHAP: A Distributed Partition Algorithm for Large Scale Graphs Based on Hierarchical Affinity Clustering[J].ACTA ELECTRONICA SINICA,2021,49(10):2002-2011.
柳菁,李琪.DisHAP:基于层次亲和聚类的分布式大图划分算法[J].电子学报,2021,49(10):2002-2011. DOI: 10.12263/DZXB.20190919.
LIU Jing,LI Qi.DisHAP: A Distributed Partition Algorithm for Large Scale Graphs Based on Hierarchical Affinity Clustering[J].ACTA ELECTRONICA SINICA,2021,49(10):2002-2011. DOI: 10.12263/DZXB.20190919.
平衡图划分是改善并行图计算性能的关键.一个良好的划分算法应保证划分后的子图在负载均衡的前提下,减少子图之间的交互边(切割边)规模,从而减少网络通信.对此,本文设计一种基于层次亲和聚类的分布式大图划分算法(DisHAP).该算法采用亲和聚类的思想,将图初始划分为规模相等的
k
个子图;再将结果映射成顶点序列,以线性嵌入顺序处理节点,通过局部交换策略优化割边率;最后将DisHAP应用在MapReduce框架中,使用多种真实及理论图数据,与现有的大图划分算法做比较分析.以Twitter图为例,划分2
4
8
16
32个子区,相较于现有的大图划分算法(LDG
BLP
Spinner
Fennel
ParMetis及PSA-MIR算法),割边率减少1.7%~30.2%,说明了该算法的优越性.同时该算法具有良好的可扩展性,划分的子区数量及图的规模对划分时间具有较低的影响.
It is the key to improve the performance of graph calculation by improving the efficiency of the graph partitioning algorithm and reducing the communication edge scale between the subgraphs. Due to the limited memory capacity of single computing node
it is difficult to meet the partitioning requirements of large-scale graphs in terms of partitioning efficiency and partitioning quality. In this paper
a distributed graph partitioning algorithm based on hierarchical affinity clustering for massive scale graphs is designed
called DisHAP. It uses the Boruvka minimum spanning tree algorithm to balance cluster the graph under the constraint condition of the input graph according to the vertex similarity
and partitions the graph into
k
subgraph (partitions) with equal size. In order to optimize the fractiom of edges cut between these large subgraphs
we map the initial partition results to the vertex sequence and cut into a large number of subsections
randomly select the sub slice pairs in the neighbor subgraph sequence
and migrate the vertices according to the mutual exchange and the single vertex positive profit. Thus
the optimization problem of large data volume is converted to a large number of small problems to solve. The algorithm is applied to the MapReduce framework to effectively improve the efficiency of the algorithm. Finally
we use various actual and theoretical graph data to compare with existing graph partitioning algorithms to verify the effectiveness of the DisHAP algorithm. Taking the graph Twitter as an example
it is partitioned into 2
4
8
16
32 partitions. Compared to LDG
BLP
Spinner
Fennel
ParMetis and PSA-MIR algorithms
the fraction of edges cut is reduced by 30.2%
29.4%
10.2%
7.8%
1.7%
and 3.3% respectively.
Basak A , Li S C , Hu X , et al . Analysis and optimization of the memory hierarchy for graph processing workloads [A]. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) [C]. Washington, DC, USA : IEEE , 2019 . 373 - 386 .
Miller G L , Teng S H , Vavasis S A . A unified geometric approach to graph separators [A]. 1991 Proceedings 32nd Annual Symposium of Foundations of Computer Science [C]. San Juan, PR, USA : IEEE , 1991 . 538 - 547 .
Stanton I , Kliot G . Streaming graph partitioning for large distributed graphs [A]. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 12 [C]. New York, USA : ACM Press , 2012 . 1222 - 1230 .
Stanton I . Streaming balanced graph partitioning algorithms for random graphs [A]. Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms [C]. Philadelphia, PA, USA : Society for Industrial and Applied Mathematics , 2014 . 1287 - 1301 .
Andreev K , Räcke H . Balanced graph partitioning [A]. Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures - SPAA04 [C]. New York, USA : ACM Press , 2004 : 120 - 124 .
Liu N , Li D S , Zhang Y M , et al . Large-scale graph processing systems: A survey [J]. Frontiers of Information Technology & Electronic Engineering , 2020 , 21 ( 3 ): 384 - 404 .
Benlic U , Hao J K . An effective multilevel tabu search approach for balanced graph partitioning [J]. Computers & Operations Research , 2011 , 38 ( 7 ): 1066 - 1075 .
Kokosiński Z , Bała M . Solving graph partitioning problems with parallel metaheuristics [A]. Recent Advances in Computational Optimization [C]. Cham, Germany : Springer , 2018 . 89 - 105 .
Awadelkarim A , Ugander J . Prioritized restreaming algorithms for balanced graph partitioning [A]. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining [C]. New York, NY, USA : ACM , 2020 . 1877 - 1887 .
Nishimura J , Ugander J . Restreaming graph partitioning: Simple versatile algorithms for advanced balancing [A]. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [C]. New York, NY, USA : ACM , 2013 . 1106 - 1114 .
Srinuvasu M A . Analysis of large graph partitioning and frequent subgraph mining on graph data [J]. International Journal of Advanced Computer Research , 2015 , 6 ( 7 ): 12 .
Moon B , Jagadish H V , Faloutsos C , et al . Analysis of the clustering properties of the Hilbert space-filling curve [J]. IEEE Transactions on Knowledge and Data Engineering , 2001 , 13 ( 1 ): 124 - 141 .
Tsourakakis C , Gkantsidis C , Radunovic B , et al . FENNEL: streaming graph partitioning for massive scale graphs [A]. Proceedings of the 7th ACM International Conference on Web Search and Data Mining [C]. New York, NY, USA : ACM , 2014 . 333 - 342 .
Battaglino C , Pienta P , Vuduc R . GraSP: distributed streaming graph partitioning [A]. The 1st High Performance Graph Mining Workshop [C]. Arcelona, SPAIN : Barcelona Supercomputing Center , 2015 . DOI: 10.5821/hpgm15.3 http://dx.doi.org/10.5821/hpgm15.3 .
Martella C , Logothetis D , Loukas A , et al . Spinner: scalable graph partitioning in the cloud [EB/OL]. https://arxiv.org/abs/1404.3861 https://arxiv.org/abs/1404.3861 , 2019 .
Pope A S , Tauritz D R , Kent A D . Evolving multi-level graph partitioning algorithms [A]. IEEE Symposium Series on Computational Intelligence (SSCI) [C]. Athens, Greece : IEEE , 2016 . 1 - 8 .
Tsourakakis C E , Kolountzakis M N , Miller G L . Approximate triangle counting [EB/OL]. https://www.researchgate.net/publication/24356900_Approximate_Triangle_Counting https://www.researchgate.net/publication/24356900_Approximate_Triangle_Counting , 2019 .
Qian X H . Graph processing and machine learning architectures with emerging memory technologies: A survey [J]. Science China Information Sciences , 2021 , 64 ( 6 ): 1 - 25 .
Borůvka , Otakar . O jistém problému minimálním [EB/OL]. https://dml.cz/bitstream/handle/10338.dmlcz/500114/Boruvka_01-0000-6_1.pdf https://dml.cz/bitstream/handle/10338.dmlcz/500114/Boruvka_01-0000-6_1.pdf , 2019 .
Han M Y , Daudjee K , Ammar K , et al . An experimental comparison of pregel-like graph processing systems [J]. Proceedings of the VLDB Endowment , 2014 , 7 ( 12 ): 1047 - 1058 .
Murtagh F , Contreras P . Algorithms for hierarchical clustering: An overview [J]. WIREs Data Mining and Knowledge Discovery , 2012 , 2 ( 1 ): 86 - 97 .
Tang C F , Rao Y , Yu H L , et al . Improving knowledge graph completion using soft rules and adversarial learning [J]. Chinese Journal of Electronics , 2021 , 30 ( 4 ): 623 - 633 .
Khayyat Z , Awara K , Alonazi A , et al . Mizan: A system for dynamic load balancing in large-scale graph processing [A]. Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys 13 [C]. New York, USA : ACM Press , 2013 . 169 - 182 .
Fiduccia C M , Mattheyses R M . A linear-time heuristic for improving network partitions [A]. 19th Design Automation Conference [C]. Las Vegas, NV, USA : IEEE , 1982 . 175 - 181 .
0
Views
13
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621