一种高效的面向基2 FFT算法的SIMD并行存储结构

陈海燕; 杨超; 刘胜; 刘仲

doi:10.3969/j.issn.0372-2112.2016.02.001

您当前的位置：

首页 >

文章列表页 >

一种高效的面向基2 FFT算法的SIMD并行存储结构

学术论文 | 更新时间：2025-07-16

- 一种高效的面向基2 FFT算法的SIMD并行存储结构
- An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation
- 电子学报 2016年44卷第2期页码：241-246
- 作者机构：
  
  国防科学技术大学计算机学院,湖南,长沙,410073
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61472432)
- DOI：10.3969/j.issn.0372-2112.2016.02.001
  中图分类号： TP303
- 网络出版：2016-02-25，
  
  纸质出版：2016
- 稿件说明：
移动端阅览
陈海燕, 杨超, 刘胜, 等. 一种高效的面向基2 FFT算法的SIMD并行存储结构[J]. 电子学报, 2016,44(2):241-246.

CHEN Hai-yan, YANG Chao, LIU Sheng, et al. An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation[J]. Acta Electronica Sinica, 2016, 44(2): 241-246.
陈海燕, 杨超, 刘胜, 等. 一种高效的面向基2 FFT算法的SIMD并行存储结构[J]. 电子学报, 2016,44(2):241-246. DOI： 10.3969/j.issn.0372-2112.2016.02.001.

CHEN Hai-yan, YANG Chao, LIU Sheng, et al. An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation[J]. Acta Electronica Sinica, 2016, 44(2): 241-246. DOI： 10.3969/j.issn.0372-2112.2016.02.001.

摘要

随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元

并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题

采用简单的部分地址异或逻辑完成SIMD并行访存地址转换

实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令

可完全消除SIMD结构下基2 FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明

采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计

不同点数FFT运算可获得1.32~2.66的加速比.

Abstract

As more and more execution units are integrated in the digital signal processor(DSP) with single instruction multiple data stream(SIMD) extension

the flexibility and bandwidth efficiency of parallel memory access have significant effects on its whole practical performance.Based on detailed analysis of the memory access problems for radix-2 fast Fourier transform(FFT) algorithm in general SIMD DSP

this paper used parts of the address bit XOR logic to realize memory access address translation

and achieved conflict-free parallel SIMD memory accesses for FFT computation.Then several memory access instructions with special shuffle modes were brought forward

which could completely eliminate extra shuffling instruction operations of radix-2 FFT algorithm in the SIMD architecture.Finally

the vector memory(VM) in 16-way SIMD DSP YHFT-Matrix2 was optimized by above methods.The test results show that the optimized VM can realize fully pipelined conflict-free memory accesses and 100% parallel memory access bandwidth utilization with increase of 18% area overheads.Compared with the design before optimization

the performance of different points radix-2 FFT can achieve speedup ranging from 1.32 to 2.66.

关键词

Keywords

references

浏览量

1278

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于FFT的快速高精度载波参数联合估计算法

SAR并行成像处理的研究