电子学报 ›› 2015, Vol. 43 ›› Issue (1): 36-44.DOI: 10.3969/j.issn.0372-2112.2015.01.007

• 学术论文 • 上一篇    下一篇

结构网格CFD应用程序在天河超级计算机上的高效并行与优化

王勇献1,2, 张理论1,2, 车永刚1,2, 徐传福1, 刘巍1, 程兴华1   

  1. 1. 国防科技大学计算机学院, 湖南 长沙 410073;
    2. 国防科技大学并行与分布处理重点实验室, 湖南 长沙 410073
  • 收稿日期:2013-03-29 修回日期:2014-07-12 出版日期:2015-01-25 发布日期:2015-01-25
  • 作者简介:王勇献 男, 1975年7月出生, 河南安阳人.国防科技大学计算机学院副研究员、硕士生导师.主要从事高性能计算及其应用、并行算法等方面的研究工作.E-mail:yxwang@nudt.edu.cn;张理论 男, 1975年5月出生, 河南南阳人.国防科技大学计算机学院研究员、硕士生导师.主要从事高性能计算与应用方面的研究工作.
  • 基金资助:

    国家自然科学基金(No.61379056,No.11272352);空气动力学国家重点实验室基金(No.SKLA20130105)

Efficient Parallel Computing and Performance Tuning for Multi-Block Structured Grid CFD Applications on Tianhe Supercomputer

WANG Yong-xian1,2, ZHANG Li-lun1,2, CHE Yong-gang1,2, XU Chuan-fu1, LIU Wei1, CHENG Xing-hua1   

  1. 1. College of Computer, National University of Defense Technology, Changsha, Hunan 410073, China;
    2. Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China
  • Received:2013-03-29 Revised:2014-07-12 Online:2015-01-25 Published:2015-01-25

摘要:

对多区结构网格大规模CFD流场模拟的高效并行方法进行了研究,以天河超级计算机平台的CPU同构计算环境和CPU+MIC异构计算环境为例,重点讨论了CFD应用特点与超级计算机运行环境相适应的性能优化与改进策略,发展了一系列多层次并行与性能优化方法.通过在天河2高性能计算平台上进行了多个算例的数值模拟,验证了这些优化方法的并行效果;在CPU+MIC异构平台上模拟的最大CFD问题规模达到6800亿个网格单元,共使用137.6万CPU+MIC处理器核,测试结果表明在CPU+MIC异构平台上移植优化后的程序性能提高2.6倍左右,且具有良好的可扩展性.

关键词: 计算流体力学, 多区结构网格, 并行计算, 天河计算机, CPU+MIC异构计算

Abstract:

How to improve the parallel performance of CFD applications with typical multi-block structured grid based on the CPU sub-platform of Tianhe-1A and CPU+MIC co-processor heterogeneous platform of Tianhe-2 supercomputer system,is focused in this paper.Some strategies of performance optimization matched with both the characteristic of CFD application and the architectures of high-performance computing (HPC) platform are discussed in detail.Some numerical experiments are performed on Tianhe-2 supercomputer system with the maximum of grid cells achieving 6.8×1011,and the total amount of processors and/or co-processors being 1.376×106.It shows that the optimized code can get a speedup of 2.6 times faster on CPU and co-processor hybrid platform than that on the CPU platform only,and good scalability is also observed from the test results.

Key words: computational fluid dynamics, multi-block structured grid, parallel computing, Tianhe supercomputer, CPU+MIC heterogeneous computing

中图分类号: