CN116820397B - Rapid number theory conversion circuit based on CRYSTALS-Kyber - Google Patents

Rapid number theory conversion circuit based on CRYSTALS-Kyber Download PDF

Info

Publication number
CN116820397B
CN116820397B CN202310594853.1A CN202310594853A CN116820397B CN 116820397 B CN116820397 B CN 116820397B CN 202310594853 A CN202310594853 A CN 202310594853A CN 116820397 B CN116820397 B CN 116820397B
Authority
CN
China
Prior art keywords
butterfly
data
bram
unit
memories
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310594853.1A
Other languages
Chinese (zh)
Other versions
CN116820397A (en
Inventor
张卓尧
崔益军
刘伟强
王成华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310594853.1A priority Critical patent/CN116820397B/en
Publication of CN116820397A publication Critical patent/CN116820397A/en
Application granted granted Critical
Publication of CN116820397B publication Critical patent/CN116820397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • G06F7/575Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1408Protection against unauthorised use of memory or access to memory by using cryptography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1416Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提供一种基于CRYSTALS‑Kyber的快速数论变换电路,其中控制单元为两个蝶形单元和四个BRAM内存提供模式控制信号,并且按照不同的工作方式,为四个BRAM内存提供读写地址;数据通过四个BRAM内存输入到蝶形单元中,通过控制单元的模式控制信号选择不同的蝶形单元模式,并且在蝶形单元中引入巴雷特约简电路,将12bit×12bit=24bit的数据重新规范到12bit的范围内,得到蝶形单元运算结果后按照快速数论变换算法的顺序写回四个BRAM内存中。本发明蝶形单元节省资源又使得其能够在高频率下运行,内存访问方式能够最大程度地发挥蝶形单元的算力,使得占用周期少。

The invention provides a fast number theory conversion circuit based on CRYSTALS-Kyber, in which the control unit provides mode control signals for two butterfly units and four BRAM memories, and provides read and write addresses for the four BRAM memories according to different working modes. ; Data is input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through the mode control signal of the control unit, and a Barrett reduction circuit is introduced into the butterfly unit to convert 12bit×12bit=24bit data Restandardize it to the 12-bit range, obtain the butterfly unit operation results, and write them back to the four BRAM memories in the order of the fast number theory transformation algorithm. The butterfly unit of the present invention saves resources and enables it to operate at high frequency. The memory access method can maximize the computing power of the butterfly unit and reduce the occupation period.

Description

Rapid number theory conversion circuit based on CRYSTALS-Kyber
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a CRYSTALS-Kyber-based rapid number theory conversion circuit.
Background
Global economy is increasingly integrated, accompanied by rapid developments in internet networking technology and information technology. Information interaction is frequent nowadays, and the information has become a key link of national security due to the characteristics of timeliness, safety and the like of the information. Information security refers to the technical, administrative security that is established and employed for data processing systems in order to protect computer hardware, software data from being destroyed, altered, and compromised by accidental or malicious causes in an unsafe environment with an attacker.
Two large areas of research in information security include cryptography and cryptanalysis. Cryptography is the discipline of studying how information is transmitted in a covert manner; mathematical studies, which refer in particular to information and its transmission in modern times, are often regarded as branches of mathematical and computer science, and are also closely related to information theory. Just as it is the basis of almost all existing security mechanisms, cryptography becomes the basis for information security. The cryptanalysis is that after a cryptosystem is deeply researched, the characteristics of the cryptosystem are analyzed, vulnerabilities of the cryptosystem are mined to attack, and meanwhile, disciplines of corresponding defense facilities can be designed based on the cryptosystem; it has a synergistic relationship with cryptography.
The information theory which is creatively proposed by shannon lays a theoretical foundation of modern cryptography, and through decades of development and research, a modern cryptosystem can be divided into a symmetrical cryptosystem and an asymmetrical cryptosystem. In the early years, the commonly used data encryption standard (Data Encryption Standard, DES) and advanced encryption standard (Advanced Encryption Standard, AES) were symmetric cryptosystems, whose encryption and decryption shared the same key. Whereas the underlying mathematical problem can be converted into RSA, ECC, etc. algorithms of Non-deterministic polynomial (Non-deterministic Polynomial, NP) problem, which are cryptographic algorithms trusted by experts in recent years, are all asymmetric cryptosystems. Compared with a symmetric cryptosystem, the encryption and decryption of the asymmetric cryptosystem are carried out by using different keys (public key and private key), so that the speed of the whole algorithm operation process is slower, the power consumption is higher, and the security is better ensured. The root cause is that NP problems are difficult or require an exponential time to break down on a traditional computer.
Although the existing cryptosystem is temporally safe, the rapid development of the Shor algorithm and post quantum computer technology makes the current cryptosystem extremely threatened. The cipher chip is used as the implementation carrier of cipher algorithm, and its hardware architecture is the most reliable and efficient way of implementing the whole cipher scheme, so it plays an important role in evaluating the performance of the cipher scheme. Compared with the software implementation, the hardware implementation has the advantages of high parallelism, strong flexibility and low cost, and is a key for pushing the development and application of the cryptosystem. Hardware implementation of the traditional encryption scheme is mature, and research on hardware implementation of the quantum attack resistant post quantum cryptography scheme is just started. And therefore quantum cryptography schemes have become a significant research hotspot for current cryptography.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a CRYSTALS-Kyber-based rapid number theory conversion circuit.
The invention provides a CRYSTALS-Kyber-based rapid number theory conversion circuit, which comprises two butterfly units with double-group input/output ports, a control unit and four two-group double-port BRAM memories, wherein the butterfly units are connected with the control unit;
the control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm.
Further, the modulus q=3329 of the crystalskyber algorithm in the butterfly unit, the control unit and the four two-bank, two-port BRAM memory of two-bank input-output ports, and the polynomial coefficient n=256.
Further, four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation; inputting new data to the butterfly unit in each period, and storing the result every time the result of the butterfly unit is output; the four two-group double-port BRAM memory access units adopt a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the four two-group double-port BRAM memory access units use a ping-pong memory access mode so as to meet the data throughput rate of simultaneously reading data and writing data.
Further, the butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; the butterfly unit can disassemble the parts to separate the use of support point-wise multiplication functions.
Further, when the ping-pong memory access mode is used, the operation result of the butterfly unit is stored according to a preset position so as to be ready for data reading of the next round; and splitting an original set of data into two sets of data to be stored separately after each round of butterfly operation is completed.
Further, the multiplication operation in the butterfly operation expands the original 12bit data into 24bit data, and the 24bit data is re-normalized back to the 12bit range by introducing an approximate calculated barrett reduction module.
The invention provides a quick number theory conversion circuit based on CRYSTALS-Kyber, which uses a quick number theory conversion algorithm (NTT) as a loop polynomial multiplication algorithm, uses a CT mode butterfly unit to calculate a forward NTT process, and uses a GS mode NTT butterfly unit to calculate a reverse NTT process, so that the loop polynomial multiplication of a lattice password is realized efficiently, the selected NTT algorithm reduces the calculation complexity, and the frequency and the calculation speed of the overall design are improved;
two different circuit functions are integrated in one calculation unit by adopting an NTT butterfly unit with a switchable mode, different modes are controlled by a mode control signal con and an input address selection signal, and the two different NTT butterfly units of CT/GS are integrated in the same module, so that the consumption of hardware resources is reduced;
in the NTT conversion circuit, two dual-port BRAMs which are a group are used for storing 256 data, and the bit width of each BRAMs can store 2 data; the BRAM can take out 4 data in each period, and the data throughput is improved so as to meet the data input requirements of two butterfly units;
the barrett reduction and circuit closed loop butterfly unit with approximate calculation of the invention simplifies the calculation flow and the circuit complexity to a great extent, can save resources and provides convenience for retiming;
according to the memory access scheme, the maximum computing power of the butterfly unit is exerted as a benchmark, and the table tennis storage and the read-write sub-control are specifically adopted, so that the NTT conversion occupation period is extremely close to the theoretical limit value.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a block diagram of a CRYSTALS-Kyber based fast number theory conversion circuit provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of memory access data storage according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a barrett's reduction unit circuit incorporating approximate computation according to an embodiment of the present invention;
fig. 4 is a circuit diagram of a closed-loop type multifunctional butterfly operation unit according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, the embodiment of the invention provides a CRYSTALS-Kyber-based fast number theory conversion circuit, which comprises two butterfly units with double-group input/output ports, a control unit and four two-group double-port BRAM memories.
The control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm.
Illustratively, the two-set butterfly unit, control unit, and four two-set double-port BRAM memory have a modulus q=3329, and a polynomial coefficient n=256 of the crystalskyber algorithm.
The four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation (seven-stage butterfly operation); in order to fully exert the calculation force of the butterfly unit, new data needs to be input into the butterfly unit in each period, and the result needs to be stored in time every time the result of the butterfly unit is output. The designed memory access unit adopts a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the designed memory access unit uses a ping-pong memory access mode to meet the requirement of larger data throughput rate for simultaneously reading data and writing data. Completing one NTT transform, at least (7×128)/2=448 cycles are required in case two butterfly units are used; while according to the present design, only 459 cycles are required to complete one NTT transformation, with an additional 11 cycles for the necessary data selection and calculation of the wait butterfly unit, and as close as possible to the theoretical limit value.
The butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; and the butterfly unit can disassemble the parts for separate use supporting the point-wise multiplication function.
As shown in fig. 2, when the ping-pong memory access mode is used, in order to adapt to the characteristics of the NTT transform, the operation object of each round is different from that of the previous round, so that the read-write function can run continuously, and therefore, the operation result of the butterfly unit needs to be stored according to a preset position to read the data of the next round. In this embodiment, each time a round of butterfly operation is completed, an original set of data is split into two sets of data for separate storage.
Introducing a barrett reduction unit, the original 12-bit data can be expanded into 24-bit data by the multiplication operation in butterfly operation, so that the 24-bit data needs to be re-normalized back to the 12-bit range; of the various reduction algorithms that are widely used, the barrett reduction algorithm is most suitable, but the conventional barrett reduction algorithm requires a lot of resources in the design of hardware circuits; the barrett reduction algorithm used in the embodiment of the invention introduces the concept of approximate calculation through accurate data analysis, reduces the calculation accuracy of the former part to save resources, and obtains an accurate value through a simple complement difference before outputting. The specific circuit is shown in fig. 3, wherein the first two stages are approximate calculation processes, and the third stage is compensation process.
To adapt two groups of butterfly operations, the butterfly unit is designed as a closed loop; CT and GS butterfly operations are mostly used for the forward and reverse processes of NTT transformation, respectively, but the essential difference between CT and GS butterfly operations is the order of operations; according to the embodiment of the invention, the circuit is designed into a closed loop, and data are input and results are taken out from different circuit nodes, so that two calculation sequences can be realized in a single data loop, the circuit is simplified, the resource is saved, and the operating frequency of the circuit can be improved. As shown in fig. 4, the node before multiplication is the input node for the CT butterfly operation, and the node before modulo addition or modulo subtraction is the input node for the GS butterfly operation.
The GS butterfly operation is the inverse of the CT butterfly operation and does not fully recover the data, thus requiring an additional post-processing operation after the inverse NTT transform is completed. The embodiment of the invention introduces a DIV2 unit, which can enable the GS operation to completely recover the data so as to eliminate the additional post-processing operation required after the inverse NTT conversion.
The specific circuit operation logic will be described in the following sub-functions, taking the forward NTT phase as an example:
1) Forward NTT phase:
as shown in fig. 4, the control unit gives a control signal con=0, the two butterfly units operate in parallel in CT mode, and the address generation mode is as shown in fig. 2; a set of 256 data initially places the first 128 data in RAM0 and the last 128 data in RAM1, the first round by a 0 ~a 127 Respectively sum a 128 ~a 255 And performing CT butterfly operation. The 1 st period control unit generates a read address; the 2 nd cycle data is fetched from BRAM (a 0 ,a 1 ) And (a) 128 ,a 129 ) The method comprises the steps of carrying out a first treatment on the surface of the The 3 rd period performs data selection, which includes selecting a memory group (RAM 0 and RAM1 in this time) for providing data and selecting a data connection node (RAM 0 input low node and RAM1 input high node in this time) of the memory group; the 4 th period high node data is multiplied by a twiddle factor omega provided by the ROM; and re-normalize the multiplication result back to the 12bit range through barrett reduction in the next 3 cycles; the output of the barrett reduction and the input of the low node in the 8 th and 9 th periods do modulo addition and modulo subtraction operation; the two butterfly units output the calculation result of the first group of data in the 10 th period, and the control unit provides a write address in the tenth period; the 11 th cycle completes the access of the first set of butterfly results, each subsequent cycle has new data written, and the last set of data is stored in 459 th cycle, completing the NTT transformation operation, because of the total (7×128)/2=448 sets of data.
It should be noted that, taking the data storage of the first round as an example, since the second round is a 0 ~a 63 Respectively sum a 64 ~a 127 A 128 ~a 191 Respectively sum a 192 ~a 255 CT butterfly operation is performed, so that the original a is needed 0 ~a 127 And a 128 ~a 255 Split storage for the next data read, see fig. 2, where a will be a, through the first round 0 ~a 63 And a 64 ~a 127 Stored in RAM2 and RAM3, respectively. Each subsequent round reads and writes data according to the rule.
2) Reverse NTT phase:
the control module gives a control signal con of 1, and the two butterfly units operate in GS mode in parallel. Butterfly is a complete reverse process and data read-write is also a reverse process in a reverse state distribution from right to left in fig. 2.
The following appends introduce an error analysis of the approximation-calculated barrett reduction algorithm:
in the step of the algorithm 1, the estimated quotient t is obtained through preliminary calculation, then a rough standard value r is obtained, and finally an accurate standard value can be obtained rapidly through one-time judgment. However, in the circuit design, the DSP is relatively wasteful of resources and time-consuming, so the embodiment of the invention introduces a concept of approximate calculation, the original multiplication operation is realized by addition and subtraction, and after a re-estimated value is obtained, the final accurate value is obtained by compensating for the difference.
According to algorithm 1, quotient t and approximation quotientThe method can be obtained by the following formula:
due to the nature of any number A and BAll hold, so the following error analysis can be done:
wherein the method comprises the steps ofIs an error function of the argument k, and since the error e must be an integer, therefore,
it is to be noted in particular that the choice of mIf (3)According to the error analysis, the error e {0,1}. But for simplicity of circuit design, the design is selected by analysis And through verification, error e { -1,0}; as with the analysis described above, altering the selected value of m can, of course, result in the approximation quotient being an exact quotient by one addition or subtraction.
Besides, the first step and the second step in the algorithm 2 change the multiplication in the original algorithm 1 into the multiplication realized by addition and subtraction, and the idea of approximate calculation is introduced in the first step, and the number of bits after the decimal point is removed, so that the operation bit width is saved by 20 bits, and therefore, additional errors are also generated.
Easily available e 1 Will fall within a range of-1 to 1, and in combination with the above analysis, the final error e can be obtained f ∈[-2,1]. Thus, in the second step of the algorithm, only an additional 2bit value is needed to be calculated for determining the complement value q mux Finally, an accurate standard value res can be obtained through one addition.
The invention has been described in detail in connection with the specific embodiments and exemplary examples thereof, but such description is not to be construed as limiting the invention. It will be understood by those skilled in the art that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, and these fall within the scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (1)

1.一种基于CRYSTALS-Kyber的快速数论变换电路,其特征在于,包括两个双组输入输出端口的蝶形单元、控制单元和四个两组双口BRAM内存;1. A fast number theory conversion circuit based on CRYSTALS-Kyber, which is characterized by including two butterfly units with dual sets of input and output ports, a control unit and four two sets of dual-port BRAM memories; 控制单元为两个蝶形单元和四个BRAM内存提供模式控制信号,并且按照不同的工作方式,为四个BRAM内存提供读写地址;数据通过四个BRAM内存输入到蝶形单元中,通过控制单元的模式控制信号选择不同的蝶形单元模式,并且在蝶形单元中引入巴雷特约简电路,将12bit×12bit=24bit的数据重新规范到12bit的范围内,得到蝶形单元运算结果后按照快速数论变换算法的顺序写回四个BRAM内存中;The control unit provides mode control signals for the two butterfly units and four BRAM memories, and provides read and write addresses for the four BRAM memories according to different working modes; data is input to the butterfly unit through the four BRAM memories, and is controlled by The mode control signal of the unit selects different butterfly unit modes, and a Barrett reduction circuit is introduced in the butterfly unit to re-standardize the 12bit×12bit=24bit data into the 12bit range. After obtaining the butterfly unit operation result, follow The order of the fast number theory transformation algorithm is written back to four BRAM memories; 其中,两个双组输入输出端口的蝶形单元、控制单元和四个两组双口BRAM内存中的CRYSTALS-Kyber算法的模数q=3329,多项式系数n=256;Among them, the modulus q=3329 and polynomial coefficient n=256 of the CRYSTALS-Kyber algorithm in two dual-group input and output port butterfly units, control units and four dual-group dual-port BRAM memories; 四个两组双口BRAM内存用于暂存快速数论变换的中间过程数据;每一个周期往蝶形单元输入新的数据,同时每当蝶形单元的结果输出时,将结果进行存储;四个两组双口BRAM内存访问单元采用读写分控的运作方式,使得读数据和写数据之间不产生冲突,并且四个两组双口BRAM内存访问单元使用乒乓内存访问模式,以满足同时读数据和写数据的数据吞吐率;Four two sets of dual-port BRAM memories are used to temporarily store the intermediate process data of fast number theory transformation; new data is input to the butterfly unit in each cycle, and the results are stored whenever the results of the butterfly unit are output; four Two sets of dual-port BRAM memory access units adopt a reading and writing separate control operation mode, so that there is no conflict between reading data and writing data, and four sets of dual-port BRAM memory access units use a ping-pong memory access mode to meet simultaneous reading Data throughput of data and write data; 蝶形单元按照CT和GS蝶形运算的特性,将电路设计为一个闭环,设计两组输入输出口以支持CT和GS两种蝶形运算;蝶形单元能够将各部分拆开,以分开使用支持逐点相乘功能;The butterfly unit designs the circuit as a closed loop according to the characteristics of CT and GS butterfly operations, and designs two sets of input and output ports to support both CT and GS butterfly operations; the butterfly unit can disassemble each part for separate use. Support point-by-point multiplication function; 使用乒乓内存访问模式时,将蝶形单元的运算结果按照预设的位置进行存储,以备下一轮的数据读取;每完成一轮蝶形操作,将原本一组数据拆分成两组数据分开存储;When using the ping-pong memory access mode, the operation results of the butterfly unit are stored in the preset location to prepare for the next round of data reading; after each round of butterfly operation is completed, the original set of data is split into two sets Data is stored separately; 蝶形运算中的乘法操作将原本的12bit数据扩展成24bit数据,而通过引入近似计算的巴雷特约简模块将24bit数据重新规范回12bit的范围。The multiplication operation in butterfly operation expands the original 12-bit data into 24-bit data, and the 24-bit data is re-normalized back to the 12-bit range by introducing the Barrett reduction module of approximate calculation.
CN202310594853.1A 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber Active CN116820397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310594853.1A CN116820397B (en) 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310594853.1A CN116820397B (en) 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber

Publications (2)

Publication Number Publication Date
CN116820397A CN116820397A (en) 2023-09-29
CN116820397B true CN116820397B (en) 2024-02-02

Family

ID=88119387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310594853.1A Active CN116820397B (en) 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber

Country Status (1)

Country Link
CN (1) CN116820397B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115756386A (en) * 2022-10-26 2023-03-07 南京航空航天大学 Efficient Lightweight NTT Multiplier Circuit Based on Lattice Cipher
CN115756387A (en) * 2022-09-20 2023-03-07 杭州电子科技大学 NTT hardware realization method of R2-MDC architecture based on folding transformation
CN115801226A (en) * 2022-11-02 2023-03-14 武汉亦芯微电子有限公司 CRYSTALS-KYBER safety processor adopting post-quantum cryptography algorithm
WO2023060809A1 (en) * 2021-10-11 2023-04-20 苏州浪潮智能科技有限公司 Number theoretic transforms computation circuit and method, and computer device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11416638B2 (en) * 2019-02-19 2022-08-16 Massachusetts Institute Of Technology Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques
US11614945B2 (en) * 2019-11-27 2023-03-28 EpiSys Science, Inc. Apparatus and method of a scalable and reconfigurable fast fourier transform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023060809A1 (en) * 2021-10-11 2023-04-20 苏州浪潮智能科技有限公司 Number theoretic transforms computation circuit and method, and computer device
CN115756387A (en) * 2022-09-20 2023-03-07 杭州电子科技大学 NTT hardware realization method of R2-MDC architecture based on folding transformation
CN115756386A (en) * 2022-10-26 2023-03-07 南京航空航天大学 Efficient Lightweight NTT Multiplier Circuit Based on Lattice Cipher
CN115801226A (en) * 2022-11-02 2023-03-14 武汉亦芯微电子有限公司 CRYSTALS-KYBER safety processor adopting post-quantum cryptography algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OFDM系统中256点基-4 IFFT模块的设计与FPGA实现;刘真;鲁艳;马宇;万俊;;广东通信技术(第01期);全文 *
基于FPGA的数论变换算法及应用的研究;余汉成;王成华;邵杰;夏永君;;微计算机信息(第32期);全文 *

Also Published As

Publication number Publication date
CN116820397A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
Zhang et al. Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT
US11416638B2 (en) Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques
Fritzmann et al. Efficient and flexible low-power NTT for lattice-based cryptography
Land et al. A hard crystal-implementing dilithium on reconfigurable hardware
Alrimeih et al. Fast and flexible hardware support for ECC over multiple standard prime fields
CN103226461B (en) A kind of Montgomery modular multiplication method for circuit and circuit thereof
KR20230141045A (en) Crypto-processor Device and Data Processing Apparatus Employing the Same
CN106685663A (en) Encryption method and circuit for error learning problem on ring domain
CN108959168B (en) SHA512 full pipeline circuit based on on-chip memory and its realization method
CN113467750A (en) Large integer bit width division circuit and method for SRT algorithm with radix of 4
Xu et al. : A Memory-Efficient Tri-Stage Polynomial Multiplication Accelerator Using 2D Coupled-BFUs
Elkhatib et al. Accelerated RISC-V for post-quantum SIKE
CN113342310A (en) Serial parameter configurable fast number theory transformation hardware accelerator applied to lattice password
CN116432765A (en) RISC-V-based special processor for post quantum cryptography algorithm
JP2002229445A (en) Modulator exponent device
CN111079934B (en) Number Theoretical Transformation Unit and Method Applied to Error Learning Encryption Algorithm in Ring Domain
CN116820397B (en) Rapid number theory conversion circuit based on CRYSTALS-Kyber
CN114594925B (en) Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof
CN1696894B (en) Large number modular multiplication calculation multiplier
CN116886274B (en) High-efficiency application type polynomial operation circuit applied to CRYSTALS-Kyber
Gauri et al. Design and Implementation of a Fully Pipelined and Parameterizable Hardware Accelerator for BLAKE2 Cryptographic Hash Function in FPGA
Huynh et al. An efficient cryptographic accelerators for IoT system based on elliptic curve digital signature
CN114172629A (en) High-performance fully-homomorphic encryption processor circuit based on RLWE encryption scheme
WO2023000577A1 (en) Data compression method and apparatus, electronic device, and storage medium
Duan et al. SMBHA: A System-Level Multicore BGV Hardware Accelerator Based on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant