电子学报 ›› 2015, Vol. 43 ›› Issue (8): 1642-1650.DOI: 10.3969/j.issn.0372-2112.2015.08.026

• 科研通信 • 上一篇    下一篇

基于可重构计算系统的矩阵三角化分解硬件并行结构研究

刘书勇, 吴艳霞, 张博为, 张国印, 戴葵   

  1. 哈尔滨工程大学计算机科学与技术学院, 黑龙江哈尔滨 150001
  • 收稿日期:2014-01-07 修回日期:2014-10-15 出版日期:2015-08-25
    • 通讯作者:
    • 吴艳霞
    • 作者简介:
    • 刘书勇 男,1978年出生于河南省长垣县,博士研究生,主要研究方向为嵌入式系统、可重构编译系统. E-mail:liushuyong@hrbeu.edu.cn
    • 基金资助:
    • 国家自然科学基金 (No.61003036); 计算机体系结构国家重点实验室开放课题 (No.CARCH201301); 博士后科研启动基金 (No.LBH-Q12134); 中央高校基本科研业务经费专项基金 (No.HEUCF100606)

Research of Parallel Hardware Architecture for Matrix Triangularization Decomposition Based on Reconfigurable Computing System

LIU Shu-yong, WU Yan-xia, ZHANG Bo-wei, ZHANG Guo-yin, DAI Kui   

  1. College of Computer Science and Technology, Harbin Engineering University.Harbin, Heilongjiang 150001, China
  • Received:2014-01-07 Revised:2014-10-15 Online:2015-08-25 Published:2014-07-28
    • Supported by:
    • National Natural Science Foundation of China (No.61003036); Open Project of State Key Laboratory of Computer Architecture,  ICT,  CAS (No.CARCH201301); Doctoral Resaerch Fund (No.LBH-Q12134); Special Fund of Fundamental Research Funds for the Central Universities (No.HEUCF100606)

摘要:

可重构计算系统成为加速计算密集型应用的重要选择之一.在众多受到关注的计算密集型问题中,矩阵三角化分解作为典型的基础类应用始终处于研究的核心地位,在求解线性方程组、求矩阵特征值等科学与工程问题中有重要的研究价值.本文面向矩阵三角化分解中共有的三角化计算过程,通过分析该过程的线性计算规律,提出一种适于硬件并行实现的子矩阵更新同一化算法及矩阵三角化计算FPGA (Field Programmable Gate Array)并行结构.针对LU矩阵三角化分解在并行结构模板上的高性能实现及优化方法开展了研究.理论分析表明,该算法针对矩阵三角化计算过程具有更高的数据并行性与流水并行性;实验结果表明,与通用处理器的软件实现相比,根据该算法实现的矩阵三角化分解FPGA并行结果在关键计算性能上可以取得10倍以上的加速比.

关键词: 矩阵三角化分解, 三角化过程, 并行算法, LU分解, 现场可编程门阵列

Abstract:

The reconfigurable computing system became an important choice according to accelerating compute-intensive applications.Among most compute-intensive applications, the matrix triangularization decomposition always was in the central position of research subjects and presented a great value to solve linear equation systems and matrix eigenvalue problems in science or engineering area.This paper analyzed the linear computing process of triangularization and proposed a hardware-adaptive parallel sub-matrix identity updating algorithm and a high-performance parallel structure hardware template for matrix triangularization on FPGA (Field Programmable Gate Array) according to the common triangularization computing process of the matrix triangularization decomposition.The research focused on the high-performance FPGA parallel structure implementation and optimization methods for the LU matrix triangularization decomposition.In theoretical analysis, the proposed algorithm presents better pipeline-parallelism and data-parallelism during the matrix triangularization process.The experimental result shows that the proposed structure gets over decuple speedup compared to general-purpose processors and the previous works in vital performance.

Key words: matrix triangularization decomposition, triangularization process, parallel algorithm, LU decomposition, field programmable gate array

中图分类号: