

浏览全部资源
扫码关注微信
中国人民解放军战略支援部队信息工程大学,河南郑州 450000
Received:31 May 2022,
Revised:2022-09-13,
Published:25 February 2024
移动端阅览
陈韬,李慧琴,吴艾青,等.基于2KNTT的多项式乘法单元设计[J].电子学报,2024,52(02):455-467.
CHEN Tao, LI Hui-qin, WU Ai-qing, et al.A Polynomial Multiplier Design Based on 2KNTT[J].Acta Electronica Sinica, 2024, 52(02): 455-467.
陈韬,李慧琴,吴艾青,等.基于2KNTT的多项式乘法单元设计[J].电子学报,2024,52(02):455-467. DOI:10.12263/DZXB.20220629
CHEN Tao, LI Hui-qin, WU Ai-qing, et al.A Polynomial Multiplier Design Based on 2KNTT[J].Acta Electronica Sinica, 2024, 52(02): 455-467. DOI:10.12263/DZXB.20220629
在格基抗量子公钥密码算法的基础运算中,多项式乘法在硬件实现上消耗大量的时间.为提高实际运算性能,本文通过分析多项式乘法运算中数论变换的快速实现算法,提出一种面向CRYSTALS-Kyber算法、适应硬件实现的2
n
次单位根预处理型快速数论变换算法架构,利用小位宽数论变换的并行处理与复杂度低的计算形式来减少运算时间.整体运算架构在结合算法特殊性质后,确定了32路并行的设计模型.在此基础上,设计了一种与该架构匹配的统一化运算单元和数据读写不冲突、地址分配最优的存储单元.实验结果表明,在65 nm的互补金属氧化物半导体(CMOS)工艺下,97 ns完成一组项数为256、模数为3 329的多项式乘法运算,花费108个周期,最高工作频率可达到1.1 GHz,面积时间积为20.7
<math id="M1"><mo stretchy="false">(</mo><mi mathvariant="normal">k</mi><mi mathvariant="normal">G</mi><mi mathvariant="normal">E</mi><mo>⋅</mo><mi mathvariant="normal">μ</mi><mi mathvariant="normal">s</mi><mo stretchy="false">)</mo></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56616077&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56616089&type=
12.36133385
2.96333337
.
Polynomial multiplication consumes a lot of time in hardware implementation in the underlying operations of Lattice-based post-quantum public-key cryptography algorithms. The paper analyzes the fast implementation of number theoretic transform algorithm in polynomial multiplication operations for CRYSTALS-Kyber and proposes a 2n-th unit root preprocessing fast number theoretic transform algorithm architecture that adapts to the hardware implementation. In order to reduce computing time
the architecture uses parallel processing of small bit-width number theoretic transformation and low-complexity computations. Taking into account the characteristics of the algorithm
the overall computing architecture adopts a 32-way parallel design model. Based on this
we design a unified computing unit that matches the architecture and a storage unit with non-conflicting mechanism while reading or writing data and optimal address assignment. Under the CMOS 65 nm process
a set of polynomial multiplication operations with term number 256 and modulus 3 329 can be completed in 108 cycles within 97 ns. The maximum operating frequency can reach 1.1 GHz
and the area time product is 20.7
<math id="M2"><mo stretchy="false">(</mo><mi mathvariant="normal">k</mi><mi mathvariant="normal">G</mi><mi mathvariant="normal">E</mi><mo>⋅</mo><mi mathvariant="normal">μ</mi><mi mathvariant="normal">s</mi><mo stretchy="false">)</mo></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56616098&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=56616079&type=
12.36133385
2.96333337
.
UMANA V G , BERNSTEIN D J , BUCHMANN J , et al . Post-Quantum Cryptography [M]. Berlin : Springer , 2009 .
AJTAI M . Generating hard instances of lattice problems (extended abstract) [C]// Proceedings of the twenty-eighth annual ACM symposium on Theory of Computing . New York : ACM , 1996 : 99 - 108 .
NEJATOLLAHI H , DUTT N , RAY S , et al . Post-quantum lattice-based cryptography implementations: A survey [J]. ACM Computing Surveys , 2019 , 51 ( 6 ): 1 - 41 .
LYUBASHEVSKY V , MICCIANCIO D , PEIKERT C , et al . SWIFFT: A modest proposal for FFT hashing [C]// International Workshop on Fast Software Encryption . Berlin : Springer , 2008 : 54 - 72 .
CHEN D D , MENTENS N , VERCAUTEREN F , et al . High-speed polynomial multiplication architecture for ring-LWE and SHE cryptosystems [J]. IEEE Transactions on Circuits and Systems I: Regular Papers , 2015 , 62 ( 1 ): 157 - 166 .
ZHOU S , XUE H Y , ZHANG D D , et al . Preprocess-then-NTT technique and its applications to kyber and NewHope [C]// International Conference on Information Security and Cryptology . Cham : Springer , 2019 : 117 - 137 .
DU C H , BAI G Q . Efficient polynomial multiplier architecture for Ring-LWE based public key cryptosystems [C]// 2016 IEEE International Symposium on Circuits and Systems (ISCAS) . Piscataway : IEEE , 2016 : 1162 - 1165 .
BANERJEE U , UKYAB T S , CHANDRAKASAN A P . Sapphire: A configurable crypto-processor for post-quantum lattice-based protocols [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems , 2019 ( 4 ): 17 - 61 .
HUANG Y M , HUANG M Q , LEI Z K , et al . A pure hardware implementation of CRYSTALS-KYBER PQC algorithm through resource reuse [J]. IEICE Electronics Express , 2020 , 17 ( 17 ): 20200234 .
XING Y F , LI S G . A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 ( 2 ): 328 - 356 .
XIN G Z , HAN J , YIN T Y , et al . VPQC: A domain-specific vector processor for post-quantum cryptography based on RISC-V architecture [J]. IEEE Transactions on Circuits and Systems I: Regular Papers , 2020 , 67 ( 8 ): 2672 - 2684 .
ZHANG N , YANG B H , CHEN C , et al . Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems , 2020 ( 2 ): 49 - 72 .
SONG S M , TANG W , CHEN T , et al . LEIA : A 2 . 05mm 2 140mW lattice encryption instruction accelerator in 40nm CMOS[C]// 2018 IEEE Custom Integrated Circuits Conference (CICC) . Piscataway : IEEE , 2018: 1 - 4 .
ZHU Y M , LIU Z , PAN Y B . When NTT meets karatsuba: Preprocess-then-NTT technique revisited [C]// International Conference on Information and Communications Security . Cham : Springer , 2021 : 249 - 264 .
BARRETT P . Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor [M]// Advances in Cryptology - CRYPTO'86 . Berlin : Springer , 2007 : 311 - 323 .
YAMAN F , MERT A C , ÖZTÜRK E , et al . A hardware accelerator for polynomial multiplication operation of CRYSTALS-KYBER PQC scheme [C]// 2021 Design , Automation & Test in Europe Conference & Exhibition (DATE) . Piscataway : IEEE , 2021 : 1020 - 1025 .
李斌 , 陈晓杰 , 冯峰 , 等 . 后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现 [J]. 通信学报 , 2022 , 43 ( 2 ): 196 - 207 .
LI B , CHEN X J , FENG F , et al . FPGA multi-unit parallel optimization and implementation of post-quantum cryptography CRYSTALS-Kyber [J]. Journal on Communications , 2022 , 43 ( 2 ): 196 - 207 . (in Chinese)
0
Views
12
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621