WANG Xiang-qian1, HONG Yi1,2, WANG Hao2, ZHENG Qi-long3
1. School of Computer and Information, Hefei University of Technology, Hefei, Anhui 230009, China;
2. No.38 Research Institute, China Electronics Technology Group Corporation, Hefei, Anhui 230088, China;
3. School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China
BWDSP is a word addressed VLIW DSP supporting clustering and SIMD.Based on open source compiling infrastructure open64, key technologies of compiler are developed for BWDSP which consist of optimized processing of address register, instruction clustering combined multi-heuristic factors, register allocation and instruction scheduling on clustering architecture.The key optimization technologies of BWDSP compiler on its hardware architecture include vectorization based on dependence analysis, application of effective instruction and recognization of zero overhead loop.Some general attention points for compiler development based on open source compiler infrastructure are presented after the development experience on BWDSP compiler is summarized.
王向前, 洪一, 王昊, 郑启龙. 魂芯DSP的编译器设计与优化[J]. 电子学报, 2015, 43(8): 1656-1661.
WANG Xiang-qian, HONG Yi, WANG Hao, ZHENG Qi-long. Compiler Design and Optimization for BWDSP. Chinese Journal of Electronics, 2015, 43(8): 1656-1661.
[1] Lin M,et al.Retargeting the open64 compiler to powerpcproce ssor[A].IEEE International Conference on Embedded Software and Systems Symposia,2008[C].Washington,DC:IEEE Computer Society,2008.152-157.
[2] Malhotra V.Open64 compiler[A/OL].http://www-vlsi.stanford.edu/smart memories/protected/ meetings/summer2003/Open64Compiler.pdf,2003.
[3] De S K,Dasgupta A,Kushwaha S,et al.Development of an efficientDSP compiler based on open64[A].HSU Wei-Chung.Open64 Workshop at 2008 ACM International Symposium on Code Generation and Optimization[C].New York:Association for Computing Machinery,2008.1-11.
[4] 郑启龙,等.DSP分块内存和多AGU的编译指示优化[J].小型微型计算机系统,2012,33(003):582-586. Zheng Qilong,et al.Compiler optimizations via progma for DSP local memory and multi-AGUs[J].Journal of Chinese Computer Systems,2012,33(003):582-586.(in Chinese)
[5] 雷一鸣,等.一种基于寄存器压力的 VLIW DSP 分簇算法[J].计算机应用,2010,30(1):274-276. Lei Yiming,et al.Register based algorithm for VLIW DSP cluster assignment[J].Journal of Computer Applications,2010,30(1):274-276.(in Chinese)
[6] Ju R,Chan S,Chow F,et al.Open Research Compiler (ORC):Beyond Version 1.0[A/OL].http://ipf-orc.sourceforge.net/ORC-PACT02-tutorial.pdf.2002.
[7] Wu C,Lian R,Zhang J,et al.An overview of the open research compiler[A].17th International Languages and Compilers for High Performance Computing Workshop,LCPC 2004[C].Berlin Heidelberg:Springer,2005.17-31.
[8] Lin Y C,Tang C L,Wu C J,et al.Compiler supports and optimizations for PAC VLIW DSP processors[A].18th International Languages and Compilers for High Performance Computing Workshop,LCPC 2005[C].Berlin Heidelberg:Springer,2006.466-474.
[9] Nuzman D,Rosen I,Zaks A.Autovectorization of interleaved data for SIMD[A].Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation[C].New York:Association for Computing Machinery,2006.132-143.
[10] Allen R,Kennedy K.Optimizing Compilers for Modern Architectures[M].San Francisco:Morgan Kaufmann,2002.
[11] 李玉祥.面向非多媒体程序的 SIMD 向量化方法及优化技术研究[D].合肥:中国科学技术大学,2008. Li Yuxiang.Research and optimization of SIMD Vectorization Algorithms on Non-multime dia Applications[D].Hefei:University of Science and Technology of China,2008.(in Chinese)
[12] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multi media instruction sets[A].Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation[C].New York:Association for Computing Machinery,2000.145-156.
[13] 魏帅,赵荣彩,姚远.面向 SLP 的多重循环向量化[J].软件学报,2012,23(7):1717-1728. Wei Shuai,Zhao Rongcai,Yao Yuan.Loop-nest Auto-vectorization based on SLP[J].Journal of Software,2012,23(7):1717-1728.(in Chinese)
[14] Uh G R,Wang Y,Whalley D,et al.Effective exploitation of a zero overhead loop buffer[A].Proceedings of the ACM SIGPLAN 1999 workshop on Languages,compilers,and tools for embedded systems[C].New York:Association for Computing Machinery,1999.10-19.
[15] Uh G R,et al.Techniques for effectively exploiting a zero overhead loop buffer[A].1999's Compiler Construction[C].Berlin Heidelberg:Springer,2000.157-172.