信息工程大学,河南郑州450001
[ "别梦妮 女,1997年2月出生于湖北省荆州市.现为信息工程大学计算机科学与技术专业博士研究生.主要研究方向为后量子密码处理器设计.E-mail: raspberry0213@126.com" ]
[ "李 伟 男,1983年11月出生于天津市.现为信息工程大学教授.主要研究方向为体系结构、安全芯片设计、集成电路技术. E-mail: try_1118@163.com" ]
收稿:2023-10-09,
修回:2024-04-22,
纸质出版:2024-12-25
移动端阅览
别梦妮, 李伟, 陈韬, 等. 高能效混合基多项式乘法算法及可重构硬件结构研究与设计[J]. 电子学报, 2024, 52(12): 3957-3966.
BIE Meng-ni, LI Wei, CHEN Tao, et al. Research and Design of High Energy Efficient Hybrid Base Polynomial Multiplication Algorithm and Reconfigurable Hardware Structure[J]. Acta Electronica Sinica, 2024, 52(12): 3957-3966.
别梦妮, 李伟, 陈韬, 等. 高能效混合基多项式乘法算法及可重构硬件结构研究与设计[J]. 电子学报, 2024, 52(12): 3957-3966. DOI:10.12263/DZXB.20230945
BIE Meng-ni, LI Wei, CHEN Tao, et al. Research and Design of High Energy Efficient Hybrid Base Polynomial Multiplication Algorithm and Reconfigurable Hardware Structure[J]. Acta Electronica Sinica, 2024, 52(12): 3957-3966. DOI:10.12263/DZXB.20230945
本文针对快速多项式乘法算法与可重构单元的高能效设计问题展开研究,首先对现有的格基后量子密码算法展开研究,提出了一种基于数论变换(Number Theoretic Transform,NTT)的快速多项式乘法算法,并针对其中的核心运算过程,提出了高能效混合基的NTT和INTT(Inverse Number Theoretic Transform)算法,该算法可以利用NTT变换高效实现所有基于有限域的格基后量子密码算法中的多项式乘法.在此基础上,对快速多项式乘法算法运算结构进行研究,在不增加额外运算部件的前提下,通过优化网络连接关系,提出了一种高能效可重构的混合基多项式乘法加速网络,在可灵活实现基2、基3、基4的NTT/INTT算法的同时,将基3与基4的NTT运算效率提升了一倍.本文针对混合基NTT运算过程中的访存冲突问题展开研究,从理论上分析了冲突产生的原因,在此基础上分析提出了一种高能效混合基的内存管理方案,设计了相应的地址生成逻辑.本文提出的内存访问方案是原地内存访问的一种,硬件固化后仍可实现不同的多项式乘法算法的内存管理.实验结果表明,在55 nm CMOS工艺下,完成维度为256,模数小于2
16
的多项式乘法运算仅需0.785
<math id="M1"><mi mathvariant="normal">μ</mi><mi mathvariant="normal">s</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=73219266&type=
2.96333337
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=73219255&type=
2.96333337
,最高工作频率可达到476 MHz,功耗为83.6 mW,面积时间积(Area Time Product,ATP)为152.604 kGE·μs.与当前现有研究相比,本文提出的结构的ATP值降低了40%以上.
In this paper
we present a fast polynomial multiplication algorit
hm
the hybrid-basis number theoretic transform (NTT) and inverse NTT (INTT) algorithms. These algorithms can efficiently implement polynomial multiplication based on finite domain using NTT conversion. On this basis
the paper explores the computational structure of fast polynomial multiplication algorithms. Without adding extra computational components
it optimizes network connectivity and proposes an energy-efficient reconfigurable hybrid-basis polynomial multiplication acceleration network. This network can flexibly implement base-2
base-3
and base-4 NTT/INTT algorithms
while doubling the operational efficiency of base-3 and base-4 NTT. This paper studies the issue of memory access conflicts in the computation process of hybrid-basis NTT. It theoretically analyzes the causes of these conflicts and
based on this analysis
proposes an energy-efficient hybrid-basis memory management scheme
designing the corresponding address generation logic. The proposed memory access scheme is a form of in-place memory access
and once implemented in hardware
it can still manage memory for different polynomial multiplication algorithms. Experimental results show that
under the 55 nm CMOS process
completing polynomial multiplication with a dimension of 256 and modulus less than 2
16
requires only 0.785 μs. The maximum operating frequency can reach 476 MHz
with a power consumption of 83.6 mW and an area time product (ATP) of 152.604 kGE·μs. Compared to the existing research
the ATP value of the proposed structure in this paper is reduced by more than 40%.
LYUBASHEVSKY V , MICCIANCIO D , PEIKERT C , et al . SWIFFT: A modest proposal for FFT hashing [C ] // International Workshop on Fast Software Encryption . Berlin : Springer , 2008 : 54 - 72 .
GÖTTERT N , FELLER T , SCHNEIDER M , et al . On the design of hardware building blocks for modern lattice-based encryption schemes [C ] // International Workshop on Cryptographic Hardware and Embedded Systems . Berlin : Springer , 2012 : 512 - 529 .
FRITZMANN T , SEPULVEDA J . Efficient and flexible low-power NTT for lattice-based cryptography [C ] // IEEE International Symposium on Hardware Oriented Security and Trust . Piscataway : IEEE , 2019 : 141 - 150 .
DU C H , BAI G Q . Towards efficient polynomial multiplication for lattice-based cryptography [C ] // IEEE International Symposium on Circuits and Systems . Piscataway : IEEE , 2016 : 1178 - 1181 .
SHOR P W . Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer [J ] . SIAM Journal on Computing , 1997 , 26 ( 5 ): 1484 - 1509 .
GÜNEYSU T , LYUBASHEVSKY V , PÖPPELMANN T . Practical lattice-based cryptography: A signature scheme for embedded systems [C ] // International Workshop on Cryptographic Hardware and Embedded Systems . Berlin : Springer , 2012 : 530 - 547 .
LIU W Q , FAN S L , KHALID A , et al . Optimized schoolbook polynomial multiplication for compact lattice-based cryptography on FPGA [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2019 , 27 ( 10 ): 2459 - 2463 .
CHUNG C M M , HWANG V , KANNWISCHER M J , et al . NTT multiplication for NTT-unfriendly rings [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 , 2021 : 159 - 188 .
BERNSTEIN D J , SORENSON J P . Modular exponentiation via the explicit Chinese remainder theorem [J ] . Mathematics of Computation , 2007 , 76 ( 257 ): 443 - 455 .
CHEN X R , YANG B H , YIN S Y , et al . CFNTT: Scalable radix-2/4 NTT multiplication architecture with an efficient conflict-free memory mapping scheme [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 ( 1 ): 94 - 126 .
LI A B , LIU D S , LI X , et al . A flexible instruction-based post-quantum cryptographic processor with modulus reconfigurable arithmetic unit for module LWR&E [C ] // 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC) . Piscataway : IEEE , 2022 : 1 - 3 .
XING Y , LI S . A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 ( 2 ): 328 - 356 .
YAMAN F , MERT A C , ÖZTÜRK E , et al . A hardware accelerator for polynomial multiplication operation of CRYSTALS-KYBER PQC scheme [C ] // 2021 Design , Automation & Test in Europe Conference & Exhibition . Piscataway : IEEE , 2021 : 1020 - 1025 .
ZHAO Y , XIE R , XIN G , et al . A high-performance domain-specific processor with matrix extension of RISC-V for module-LWE applications [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2022 , 69 ( 7 ): 2871 - 2884 .
ZHOU Z , HE D , LIU Z , et al . A software/hardware co-design of crystals-dilithium signature scheme [J ] . ACM Transactions on Reconfigurable Technology and Systems , 2021 , 14 ( 2 ): 1 - 21 .
HU X , TIAN J , LI M H , et al . AC-PM: An area-efficient and configurable polynomial multiplier for lattice based cryptography [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2023 , 70 ( 2 ): 719 - 732 .
刘冬生 , 赵文定 , 刘子龙 , 等 . 应用于格密码的可重构多通道数论变换硬件设计 [J ] . 电子与信息学报 , 2022 , 44 ( 2 ): 566 - 572 .
LIU D S , ZHAO W D , LIU Z L , et al . Reconfigurable hardware design of multi-lanes number theoretic transform for lattice-based cryptography [J ] . Journal of Electronics & Information Technology , 2022 , 44 ( 2 ): 566 - 572 . (in Chinese)
SU Y , YANG B L , YANG C , et al . A highly unified reconfigurable multicore architecture to speed up NTT/INTT for homomorphic polynomial multiplication [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2022 , 30 ( 8 ): 993 - 1006 .
DANG V B , MOHAJERANI K , GAJ K . High-speed hardware architectures and FPGA benchmarking of CRYSTALS-kyber, NTRU, and saber [J ] . IEEE Transactions on Computers , 2023 , 72 ( 2 ): 306 - 320 .
BANERJEE U , UKYAB T S , CHANDRAKASAN A P . Sapphire: A configurable crypto-processor for post-quantum lattice-based protocols [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2019 2019 : 17 - 61 .
XIN G Z , HAN J , YIN T Y , et al . VPQC: A domain-specific vector processor for post-quantum cryptography based on RISC-V architecture [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2020 , 67 ( 8 ): 2672 - 2684 .
LI D , PAKALA A , YANG K Y . MeNTT: A compact and efficient processing-in-memory number theoretic transform (NTT) accelerator [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2022 , 30 ( 5 ): 579 - 588 .
0
浏览量
13
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621