The reconfigurable computing system became an important choice according to accelerating compute-intensive applications.Among most compute-intensive applications, the matrix triangularization decomposition always was in the central position of research subjects and presented a great value to solve linear equation systems and matrix eigenvalue problems in science or engineering area.This paper analyzed the linear computing process of triangularization and proposed a hardware-adaptive parallel sub-matrix identity updating algorithm and a high-performance parallel structure hardware template for matrix triangularization on FPGA (Field Programmable Gate Array) according to the common triangularization computing process of the matrix triangularization decomposition.The research focused on the high-performance FPGA parallel structure implementation and optimization methods for the LU matrix triangularization decomposition.In theoretical analysis, the proposed algorithm presents better pipeline-parallelism and data-parallelism during the matrix triangularization process.The experimental result shows that the proposed structure gets over decuple speedup compared to general-purpose processors and the previous works in vital performance.
刘书勇, 吴艳霞, 张博为, 张国印, 戴葵. 基于可重构计算系统的矩阵三角化分解硬件并行结构研究[J]. 电子学报, 2015, 43(8): 1642-1650.
LIU Shu-yong, WU Yan-xia, ZHANG Bo-wei, ZHANG Guo-yin, DAI Kui. Research of Parallel Hardware Architecture for Matrix Triangularization Decomposition Based on Reconfigurable Computing System. Chinese Journal of Electronics, 2015, 43(8): 1642-1650.
[1] 余慧,王健.一种专用可重配置的FPGA嵌入式存储器模块的设计和实现[J].电子学报,2012,40(2):215-220. Yu Hui,Wang Jian.The design and implement of a special reconfigurable FPGA embedded BRAM[J].Acta Electronica Sinica,2012,40(2):215-220.(in Chinese)
[2] 王佰玲,田志宏,张永铮.奇异值分解算法优化[J].电子学报,2010,38(10):2234-2239. Wang Bai-ling,Tian Zhi-hong,Zhang Yong-zheng.Optimization of singular vector decomposition algorithm[J].Acta Electronica Sinica,2010,38(10):2234-2239.(in Chinese)
[3] 薄华,马缚龙,焦李成.图像纹理的灰度共生矩阵计算问题的分析[J].电子学报,2006,34(1):155-158. Bo Hua,Ma Fu-long,Jiao Li-cheng.Research on computation of GLCM of image texture[J].Acta Electronica Sinica,2006,34(1):155-158.(in Chinese)
[4] 于苏东,刘雷波,尹首一,等.嵌入式粗颗粒度可重构处理器的软硬件协同设计流程[J].电子学报,2009,37(5):1136-1140. Yu Su-dong,Liu Lei-bo,Yin Shou-yi,et al.Hardware-software co-design flow for embedded coarse-grained reconfigurable processor[J].Acta Electronica Sinica,2009,37(5):1136-1140.(in Chinese)
[5] J Jang,S Choi,V K Prasanna.Area and time efficient implementation of matrix multiplication on FPGAs[A].Proceedings of the First IEEE International Conference on Field Programmable Technology[C].Piscataway,NJ,United States:IEEE Inc,2002.93-100.
[6] J Jang,S Choi,V K Prasanna.Energy-efficient matrix multiplication on FPGAs[A].Proceedings of the 12th International Conference on Field Programmable Logic and Application [C].Heidelberg,Germany:Springer Verlag,2002.534-544.
[7] S Choi,V K Prasanna.Time and energy efficient matrix factorization using FPGAs[A].Proceedings of the 13th International Conference on Field Programmable Logic and Applications [C].Heidelberg,Germany:Springer Verlag,2003.507-519.
[8] L Zhuo,V K Prasanna.High-performance and parameterized matrix factorization on FPGAs[A].Proceedings of the 16th International Conference on Field Programmable Logic and Applications [C].Heidelberg,Germany:Springer Verlag,2006.1-6.
[9] L Zhuo,V K Prasanna.Hardware/software co-design on reconfigurable computing systems[A].Proceedings of the 21st IEEE International Parallel&Distributed Processing Symposium [C].Piscataway,NJ,United States:IEEE Inc,2007.1-10.
[10] D Boland,G A Constantinides.An FPGA-based implementation of the MINRES algorithm[A].Proceedings of the 18th International Conference on Field Programmable Logic and Applications[C].Heidelberg,Germany:Springer Verlag,2008.379-384.
[11] A R Lopes,G A Constantinides.A high throughput FPGA-based floating point conjugate gradient implementation[A].Proceedings of the International Symposium on Applied Reconfigurable Computing[C].Heidelberg,Germany:Springer Verlag,2008.75-86.
[12] A R Lopes,A Shahzad,et al.More flops or more precision accuracy parameterizable linear equation solvers for model predictive control[A].Proceedings of the 17th IEEE Symposium on Field-Programmable Custom Computing Machines[C].Piscataway,NJ,United States:IEEE Inc,2009.209-216.
[13] Y Dou,S Vassiliadis,et al.64-bit floating-point FPGA matrix multiplication[A].Proceedings of the 13th ACM/SIGDA International Symposium on Field Programmable Gate Arrays[C].NY,USA:ACM,2005.86-95.
[14] Y Dou,J Zhou,et al.Unified co-processor architecture for matrix decomposition[J].Journal of Computer Science and Technology,2010,25(4):874-885.
[15] D Kim,S V Rajopadhye.An improved systolic architecture for LU decomposition[A].Proceedings of Application-Specific Systems,Architectures and Processors [C].Piscataway,NJ,United States:IEEE Inc,2006.231-238.
[16] E Casseau,D Degrugillier.Linear systolic array for LU decomposition[A].Proceedings of the IEEE International Conference on VLSI Design [C].Los Alamitos,CA,United States:IEEE Inc,1994.353-358.
[17] 邬贵明.FPGA矩阵计算并行算法与结构[D].长沙:国防科学技术大学,2011.