高能效混合基多项式乘法算法及可重构硬件结构研究与设计

别梦妮; 李伟; 陈韬; 李慧琴; 杜怡然; 南龙梅

doi:10.12263/DZXB.20230945

您当前的位置：

首页 >

文章列表页 >

高能效混合基多项式乘法算法及可重构硬件结构研究与设计

学术论文 | 更新时间：2025-12-11

- 高能效混合基多项式乘法算法及可重构硬件结构研究与设计
- Research and Design of High Energy Efficient Hybrid Base Polynomial Multiplication Algorithm and Reconfigurable Hardware Structure
- 电子学报 2024年52卷第12期页码：3957-3966
- 作者机构：
  
  信息工程大学，河南郑州450001
- 作者简介：
  
  [ "别梦妮女，1997年2月出生于湖北省荆州市.现为信息工程大学计算机科学与技术专业博士研究生.主要研究方向为后量子密码处理器设计.E-mail: raspberry0213@126.com" ]
  [ "李伟男，1983年11月出生于天津市.现为信息工程大学教授.主要研究方向为体系结构、安全芯片设计、集成电路技术. E-mail: try_1118@163.com" ]
- 基金信息：
  
  国防预研项目(2019-JCJQ-JJ-123)
- DOI：10.12263/DZXB.20230945
  中图分类号： TN402;TP309
- 收稿：2023-10-09，
  
  修回：2024-04-22，
  
  纸质出版：2024-12-25
- 稿件说明：
移动端阅览
别梦妮, 李伟, 陈韬, 等. 高能效混合基多项式乘法算法及可重构硬件结构研究与设计[J]. 电子学报, 2024, 52(12): 3957-3966.

BIE Meng-ni, LI Wei, CHEN Tao, et al. Research and Design of High Energy Efficient Hybrid Base Polynomial Multiplication Algorithm and Reconfigurable Hardware Structure[J]. Acta Electronica Sinica, 2024, 52(12): 3957-3966.
别梦妮, 李伟, 陈韬, 等. 高能效混合基多项式乘法算法及可重构硬件结构研究与设计[J]. 电子学报, 2024, 52(12): 3957-3966. DOI：10.12263/DZXB.20230945

BIE Meng-ni, LI Wei, CHEN Tao, et al. Research and Design of High Energy Efficient Hybrid Base Polynomial Multiplication Algorithm and Reconfigurable Hardware Structure[J]. Acta Electronica Sinica, 2024, 52(12): 3957-3966. DOI：10.12263/DZXB.20230945

摘要

本文针对快速多项式乘法算法与可重构单元的高能效设计问题展开研究，首先对现有的格基后量子密码算法展开研究，提出了一种基于数论变换（Number Theoretic Transform，NTT）的快速多项式乘法算法，并针对其中的核心运算过程，提出了高能效混合基的NTT和INTT（Inverse Number Theoretic Transform）算法，该算法可以利用NTT变换高效实现所有基于有限域的格基后量子密码算法中的多项式乘法.在此基础上，对快速多项式乘法算法运算结构进行研究，在不增加额外运算部件的前提下，通过优化网络连接关系，提出了一种高能效可重构的混合基多项式乘法加速网络，在可灵活实现基2、基3、基4的NTT/INTT算法的同时，将基3与基4的NTT运算效率提升了一倍.本文针对混合基NTT运算过程中的访存冲突问题展开研究，从理论上分析了冲突产生的原因，在此基础上分析提出了一种高能效混合基的内存管理方案，设计了相应的地址生成逻辑.本文提出的内存访问方案是原地内存访问的一种，硬件固化后仍可实现不同的多项式乘法算法的内存管理.实验结果表明，在55 nm CMOS工艺下，完成维度为256，模数小于2

的多项式乘法运算仅需0.785

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=73219266&type=

2.96333337

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=73219255&type=

2.96333337

，最高工作频率可达到476 MHz，功耗为83.6 mW，面积时间积（Area Time Product，ATP）为152.604 kGE·μs.与当前现有研究相比，本文提出的结构的ATP值降低了40%以上.

Abstract

In this paper

we present a fast polynomial multiplication algorit

the hybrid-basis number theoretic transform (NTT) and inverse NTT (INTT) algorithms. These algorithms can efficiently implement polynomial multiplication based on finite domain using NTT conversion. On this basis

the paper explores the computational structure of fast polynomial multiplication algorithms. Without adding extra computational components

it optimizes network connectivity and proposes an energy-efficient reconfigurable hybrid-basis polynomial multiplication acceleration network. This network can flexibly implement base-2

base-3

and base-4 NTT/INTT algorithms

while doubling the operational efficiency of base-3 and base-4 NTT. This paper studies the issue of memory access conflicts in the computation process of hybrid-basis NTT. It theoretically analyzes the causes of these conflicts and

based on this analysis

proposes an energy-efficient hybrid-basis memory management scheme

designing the corresponding address generation logic. The proposed memory access scheme is a form of in-place memory access

and once implemented in hardware

it can still manage memory for different polynomial multiplication algorithms. Experimental results show that

under the 55 nm CMOS process

completing polynomial multiplication with a dimension of 256 and modulus less than 2

requires only 0.785 μs. The maximum operating frequency can reach 476 MHz

with a power consumption of 83.6 mW and an area time product (ATP) of 152.604 kGE·μs. Compared to the existing research

the ATP value of the proposed structure in this paper is reduced by more than 40%.

关键词

Keywords

references

LYUBASHEVSKY V , MICCIANCIO D , PEIKERT C , et al . SWIFFT: A modest proposal for FFT hashing [C ] // International Workshop on Fast Software Encryption . Berlin : Springer , 2008 : 54 - 72 .

GÖTTERT N , FELLER T , SCHNEIDER M , et al . On the design of hardware building blocks for modern lattice-based encryption schemes [C ] // International Workshop on Cryptographic Hardware and Embedded Systems . Berlin : Springer , 2012 : 512 - 529 .

FRITZMANN T , SEPULVEDA J . Efficient and flexible low-power NTT for lattice-based cryptography [C ] // IEEE International Symposium on Hardware Oriented Security and Trust . Piscataway : IEEE , 2019 : 141 - 150 .

DU C H , BAI G Q . Towards efficient polynomial multiplication for lattice-based cryptography [C ] // IEEE International Symposium on Circuits and Systems . Piscataway : IEEE , 2016 : 1178 - 1181 .

SHOR P W . Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer [J ] . SIAM Journal on Computing , 1997 , 26 ( 5 ): 1484 - 1509 .

GÜNEYSU T , LYUBASHEVSKY V , PÖPPELMANN T . Practical lattice-based cryptography: A signature scheme for embedded systems [C ] // International Workshop on Cryptographic Hardware and Embedded Systems . Berlin : Springer , 2012 : 530 - 547 .

LIU W Q , FAN S L , KHALID A , et al . Optimized schoolbook polynomial multiplication for compact lattice-based cryptography on FPGA [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2019 , 27 ( 10 ): 2459 - 2463 .

CHUNG C M M , HWANG V , KANNWISCHER M J , et al . NTT multiplication for NTT-unfriendly rings [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 , 2021 : 159 - 188 .

BERNSTEIN D J , SORENSON J P . Modular exponentiation via the explicit Chinese remainder theorem [J ] . Mathematics of Computation , 2007 , 76 ( 257 ): 443 - 455 .

CHEN X R , YANG B H , YIN S Y , et al . CFNTT: Scalable radix-2/4 NTT multiplication architecture with an efficient conflict-free memory mapping scheme [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 ( 1 ): 94 - 126 .

LI A B , LIU D S , LI X , et al . A flexible instruction-based post-quantum cryptographic processor with modulus reconfigurable arithmetic unit for module LWR&E [C ] // 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC) . Piscataway : IEEE , 2022 : 1 - 3 .

XING Y , LI S . A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2021 ( 2 ): 328 - 356 .

YAMAN F , MERT A C , ÖZTÜRK E , et al . A hardware accelerator for polynomial multiplication operation of CRYSTALS-KYBER PQC scheme [C ] // 2021 Design , Automation & Test in Europe Conference & Exhibition . Piscataway : IEEE , 2021 : 1020 - 1025 .

ZHAO Y , XIE R , XIN G , et al . A high-performance domain-specific processor with matrix extension of RISC-V for module-LWE applications [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2022 , 69 ( 7 ): 2871 - 2884 .

ZHOU Z , HE D , LIU Z , et al . A software/hardware co-design of crystals-dilithium signature scheme [J ] . ACM Transactions on Reconfigurable Technology and Systems , 2021 , 14 ( 2 ): 1 - 21 .

HU X , TIAN J , LI M H , et al . AC-PM: An area-efficient and configurable polynomial multiplier for lattice based cryptography [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2023 , 70 ( 2 ): 719 - 732 .

刘冬生 , 赵文定 , 刘子龙 , 等 . 应用于格密码的可重构多通道数论变换硬件设计 [J ] . 电子与信息学报 , 2022 , 44 ( 2 ): 566 - 572 .

LIU D S , ZHAO W D , LIU Z L , et al . Reconfigurable hardware design of multi-lanes number theoretic transform for lattice-based cryptography [J ] . Journal of Electronics & Information Technology , 2022 , 44 ( 2 ): 566 - 572 . (in Chinese)

SU Y , YANG B L , YANG C , et al . A highly unified reconfigurable multicore architecture to speed up NTT/INTT for homomorphic polynomial multiplication [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2022 , 30 ( 8 ): 993 - 1006 .

DANG V B , MOHAJERANI K , GAJ K . High-speed hardware architectures and FPGA benchmarking of CRYSTALS-kyber, NTRU, and saber [J ] . IEEE Transactions on Computers , 2023 , 72 ( 2 ): 306 - 320 .

BANERJEE U , UKYAB T S , CHANDRAKASAN A P . Sapphire: A configurable crypto-processor for post-quantum lattice-based protocols [J ] . IACR Transactions on Cryptographic Hardware and Embedded Systems , 2019 2019 : 17 - 61 .

XIN G Z , HAN J , YIN T Y , et al . VPQC: A domain-specific vector processor for post-quantum cryptography based on RISC-V architecture [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2020 , 67 ( 8 ): 2672 - 2684 .

LI D , PAKALA A , YANG K Y . MeNTT: A compact and efficient processing-in-memory number theoretic transform (NTT) accelerator [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2022 , 30 ( 5 ): 579 - 588 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

Crystal-Kyber算法的FPGA高效并行优化

基于2KNTT的多项式乘法单元设计