信息工程大学,河南郑州 450001
[ "别梦妮 女,1997年2月出生于湖北省荆州市.现为信息工程大学计算机科学与技术专业博士研究生.主要研究方向为后量子密码处理器设计.E-mail: raspberry0213@126.com" ]
[ "李伟 男,1983年11月出生于天津市.现为信息工程大学教授.主要研究方向为体系结构、安全芯片设计、集成电路技术.E-mail: try_1118@163.com" ]
收稿:2024-11-15,
修回:2025-02-19,
纸质出版:2025-02-25
移动端阅览
别梦妮, 李伟, 付秋兴, 等. 高能效格基后量子密码并行采样算法与硬件结构研究[J]. 电子学报, 2025, 53(02): 420-430.
BIE Meng-ni, LI Wei, FU Qiu-xing, et al. Research on Energy-Efficient Parallel Sampling Algorithm and Hardware Architecture for Lattice-Based Post-Quantum Ciphers[J]. Acta Electronica Sinica, 2025, 53(02): 420-430.
别梦妮, 李伟, 付秋兴, 等. 高能效格基后量子密码并行采样算法与硬件结构研究[J]. 电子学报, 2025, 53(02): 420-430. DOI:10.12263/DZXB.20241036
BIE Meng-ni, LI Wei, FU Qiu-xing, et al. Research on Energy-Efficient Parallel Sampling Algorithm and Hardware Architecture for Lattice-Based Post-Quantum Ciphers[J]. Acta Electronica Sinica, 2025, 53(02): 420-430. DOI:10.12263/DZXB.20241036
在后量子密码高速演进的过程中,为兼顾灵活性与高效性的需求,本文面向多种格基后量子密码算法提出了一款并行可重构的采样加速器.本文结合数学推导分别提出了7种采样的高效并行实现模型,并从中提炼了4种共同运算逻辑.以这4种共同运算逻辑为核心,引入数据重排限制运算数据的有效位宽,提高了拒绝采样的接受率并简化了运算逻辑,提出了一种高能效的可重构并行采样算法.为提升采样算法的硬件实现效能,本文采用蝴蝶变换网络在单个时钟周期内完成任意有效位宽数据的并行切分、归并与查找,高效实现了算法前后处理的并行化,构建了参数化的并行可重构采样加速器架构模型,结合实验探索,提出了一款数据带宽为1 024 bit的并行可重构采样加速器.实验结果表明,使用40 nm CMOS工艺库,在ss、125 ℃工艺角条件下进行后仿,电路最高工作频率可达到667 MHz,平均功耗为0.54 W.完成256点均匀采样需6 ns,完成256点拒绝值小于2
16
的拒绝采样平均仅需22.5 ns,完成256点8 bit以内的二项采样需18 ns,完成509点简单三值采样需36 ns,完成701点非负相关三值采样需124.5 ns,完成509点固定权重三值采样需11.18
<math id="M1"><mi mathvariant="normal">μ</mi><mi mathvariant="normal">s</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=80639469&type=
2.96333337
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=80639481&type=
2.96333337
,完成一次Falcon算法中的离散高斯采样需3 ns.与现有研究相比,本文提出的采样器完成一次均匀-拒绝采样的能耗值降低了约30.23%,完成一次二项采样的能耗值降低了约31.6%.
During the rapid evolution of post-quantum cryptography
considering the needs
for flexibility and efficiency
we proposed a parallel reconfigurable sampling accelerator for various lattice-based post-quantum cryptographic algorithms. We analyzed seven sampling processes involved in lattice-based post-quantum cryptography and proposed seven efficient parallel implementation models for these samplings
respectively
based on mathematical derivations. Then we extracted four common operational logics from these models. Using these four common operational logics as the core
we introduced data rearrangement to limit the effective bit width of operation data
which improved the acceptance rate of rejection sampling and eliminates the complex modular reduction operations in finite field operations. Then we proposed a high energy-efficient reconfigurable parallel sampling algorithm. To enhance the hardware implementation efficiency of the sampling algorithm
we adopted the butterfly transform network to complete the parallel splitting
merging
and lookup of data with any effective bit width within a single clock cycle
efficiently realizing the parallelization of the algorithm’s pre- and post-processing
and constructed a parameterized parallel reconfigurable sampling accelerator architecture model. Aiming for high energy efficiency
combined with logic synthesis experimental results
we determined the optimal parallel degree parameters of the architecture model and proposed a parallel reconfigurable sampling accelerator with a data bandwidth of 1 024 bits. Experimental results showed that
using a 40 nm CMOS process library
and performing post-simulation under the ss
125 ℃ process corner conditions
the circuit's highest operating frequency can reach 667 MHz
with an average power consumption of 0.54W. Completing a 256-point uniform sampling requires 6 ns
completing a 256-point rejection sampling with a rejection value less than 2
16
on average only takes 22.5 ns
completing a 256-point binary sampling within 8 bits requires 18 ns
completing a 509-point simple ternary sampling req
uires 36 ns
completing a 701-point non-negative correlated ternary sampling requires 124.5 ns
completing a 509-point fixed-weight ternary sampling requires 11.18
<math id="M2"><mi mathvariant="normal">μ</mi><mi mathvariant="normal">s</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=80639457&type=
2.96333337
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=80639470&type=
2.96333337
and completing a discrete Gaussian sampling in the Falcon algorithm once requires 3 ns. Compared with existing research
the sampler proposed in we reduce the energy consumption value for a uniform-rejection sampling by about 30.23%
and the energy consumption value for a binary sampling by about 31.6%.
何诗洋 , 李晖 , 李凤华 . 面向格基密码体制的高效硬件实现研究综述 [J ] . 密码学报 , 2021 , 8 ( 6 ): 1019 - 1038 .
HE S Y , LI H , LI F H . A survey on high-efficiency hardware implementation for lattice-based cryptosystem [J ] . Journal of Cryptologic Research , 2021 , 8 ( 6 ): 1019 - 1038 . (in Chinese)
王良成 , 石元兵 , 张舒黎 , 等 . 后量子密码迁移研究 [J ] . 通信技术 , 2023 , 56 ( 8 ): 999 - 1006 .
WANG L C , SHI Y B , ZHANG S L , et al . Research on post-quantum cipher migration [J ] . Communications Technology , 2023 , 56 ( 8 ): 999 - 1006 . (in Chinese)
GÖTTERT N , FELLER T , SCHNEIDER M , et al . On the design of hardware building blocks for modern lattice-based encryption schemes [C ] // Cryptographic Hardware and Embedded Systems - CHES 2012 . Berlin : Springer , 2012 : 512 - 529 .
ODER T , GÜNEYSU T . Implementing the NewHope-simple key exchange on low-cost FPGAs [C ] // Progress in Cryptology - LATINCRYPT 2017 . Cham : Springer International Publishing , 2019 : 128 - 142 .
CHEN C , DANBA O , HOSTEIN J , et. al . NTRU algorithm specications and supporting documentation [EB/OL ] . ( 2020-09-30 )[ 2024-11-15 ] . https://www.ntru.org/resources.shtml https://www.ntru.org/resources.shtml .
ROY S , REPARAZ O , VERCAUTEREN F , et al . Compact and side channel secure discrete Gaussian sampling [EB/OL ] . ( 2014-07-31 )[ 2024-11-15 ] . https://eprint.iacr.org/2014/591 https://eprint.iacr.org/2014/591 .
DU C H , BAI G Q . Towards efficient discrete Gaussian sampling for lattice-based cryptography [C ] // 2015 25th International Conference on Field Programmable Logic and Applications (FPL) . Piscataway : IEEE , 2015 : 1 - 6 .
KARL P , SCHUPP J , FRITZMANN T , et al . Post-quantum signatures on RISC-V with hardware acceleration [J ] . ACM Transactions on Embedded Computing Systems , 2024 , 23 ( 2 ): 1 - 23 .
FRITZMANN T , SIGL G , SEPÚLVEDA J . RISQ-V: Tightly coupled RISC-V accelerators for post-quantum cryptography [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2020 ( 4 ): 239 - 280 .
ZHU Y H , ZHU W P , ZHU M , et al . A 28nm 48KOPS 3.4µJ/op agile crypto-processor for post-quantum cryptography on multi-mathematical problems [C ] // 2022 IEEE International Solid-State Circuits Conference (ISSCC) . Piscataway : IEEE , 2022 : 514 - 516 .
BANERJEE U , UKYAB T S , CHANDRAKASAN A P . Sapphire: A configurable crypto-processor for post-quantum lattice-based protocols [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2019 ( 4 ): 17 - 61 .
AIKATA A , MERT A C , IMRAN M , et al . KaLi: A crystal for post-quantum security using kyber and dilithium [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2023 , 70 ( 2 ): 747 - 758 .
XIN G Z , HAN J , YIN T Y , et al . VPQC: A domain-specific vector processor for post-quantum cryptography based on RISC-V architecture [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2020 , 67 ( 8 ): 2672 - 2684 .
HILEWITZ Y , LEE R B . Fast bit gather, bit scatter and bit permutation instructions for commodity microprocessors [J ] . Journal of Signal Processing Systems , 2008 , 53 ( 1 ): 145 - 169 .
KIM B , PARK J , MOON S , et al . Configurable energy-efficient lattice-based post-quantum cryptography processor for IoT devices [C ] // ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC) . Piscataway : IEEE , 2022 : 525 - 528 .
YE Z W , SONG R B , ZHANG H , et al . A highly-efficient lattice-based post-quantum cryptography processor for IoT applications [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2024 , 24 ( 2 ): 130 - 153 .
KIM B , MUN H G , KIM S , et al . A 1.03MOPS/W lattice-based post-quantum cryptography processor for IoT devices [J ] . Journal of Semiconductor Technology and Science , 2024 , 24 ( 1 ): 55 - 61 .
TIAN S Q , WANG W , SZEFER J . Merge-exchange sort based discrete Gaussian sampler with fixed memory access pattern [C ] // 2019 International Conference on Field-Programmable Technology (ICFPT) . Piscataway : IEEE , 2019 : 126 - 134 .
KARABULUT E , ALKIM E , AYSU A . Efficient, flexible, and constant-time Gaussian sampling hardware for lattice cryptography [J ] . IEEE Transactions on Computers , 2022 , 71 ( 8 ): 1810 - 1823 .
0
浏览量
30
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621