CHEN Hai-yan, YANG Chao, LIU Sheng, et al. An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation[J]. Acta Electronica Sinica, 2016, 44(2): 241-246.
DOI:
CHEN Hai-yan, YANG Chao, LIU Sheng, et al. An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation[J]. Acta Electronica Sinica, 2016, 44(2): 241-246. DOI: 10.3969/j.issn.0372-2112.2016.02.001.
An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation
As more and more execution units are integrated in the digital signal processor(DSP) with single instruction multiple data stream(SIMD) extension
the flexibility and bandwidth efficiency of parallel memory access have significant effects on its whole practical performance.Based on detailed analysis of the memory access problems for radix-2 fast Fourier transform(FFT) algorithm in general SIMD DSP
this paper used parts of the address bit XOR logic to realize memory access address translation
and achieved conflict-free parallel SIMD memory accesses for FFT computation.Then several memory access instructions with special shuffle modes were brought forward
which could completely eliminate extra shuffling instruction operations of radix-2 FFT algorithm in the SIMD architecture.Finally
the vector memory(VM) in 16-way SIMD DSP YHFT-Matrix2 was optimized by above methods.The test results show that the optimized VM can realize fully pipelined conflict-free memory accesses and 100% parallel memory access bandwidth utilization with increase of 18% area overheads.Compared with the design before optimization
the performance of different points radix-2 FFT can achieve speedup ranging from 1.32 to 2.66.