CN116820397B - Rapid number theory conversion circuit based on CRYSTALS-Kyber - Google Patents
Rapid number theory conversion circuit based on CRYSTALS-Kyber Download PDFInfo
- Publication number
- CN116820397B CN116820397B CN202310594853.1A CN202310594853A CN116820397B CN 116820397 B CN116820397 B CN 116820397B CN 202310594853 A CN202310594853 A CN 202310594853A CN 116820397 B CN116820397 B CN 116820397B
- Authority
- CN
- China
- Prior art keywords
- data
- butterfly
- bram
- unit
- memories
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 14
- 230000015654 memory Effects 0.000 claims abstract description 42
- 230000009467 reduction Effects 0.000 claims abstract description 16
- RNAMYOYQYRYFQY-UHFFFAOYSA-N 2-(4,4-difluoropiperidin-1-yl)-6-methoxy-n-(1-propan-2-ylpiperidin-4-yl)-7-(3-pyrrolidin-1-ylpropoxy)quinazolin-4-amine Chemical compound N1=C(N2CCC(F)(F)CC2)N=C2C=C(OCCCN3CCCC3)C(OC)=CC2=C1NC1CCN(C(C)C)CC1 RNAMYOYQYRYFQY-UHFFFAOYSA-N 0.000 claims abstract description 15
- 230000009466 transformation Effects 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000000034 method Methods 0.000 claims description 15
- 238000013461 design Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 6
- 238000010977 unit operation Methods 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 101100325756 Arabidopsis thaliana BAM5 gene Proteins 0.000 description 3
- 101150046378 RAM1 gene Proteins 0.000 description 3
- 101100476489 Rattus norvegicus Slc20a2 gene Proteins 0.000 description 3
- 238000013478 data encryption standard Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 102100031584 Cell division cycle-associated 7-like protein Human genes 0.000 description 1
- 101000777638 Homo sapiens Cell division cycle-associated 7-like protein Proteins 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
- G06F7/575—Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1408—Protection against unauthorised use of memory or access to memory by using cryptography
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1416—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a rapid number theory conversion circuit based on CRYSTALS-Kyber, wherein a control unit provides mode control signals for two butterfly units and four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm. The butterfly unit saves resources, can operate at high frequency, and the memory access mode can exert the computing power of the butterfly unit to the greatest extent, so that the occupied period is less.
Description
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a CRYSTALS-Kyber-based rapid number theory conversion circuit.
Background
Global economy is increasingly integrated, accompanied by rapid developments in internet networking technology and information technology. Information interaction is frequent nowadays, and the information has become a key link of national security due to the characteristics of timeliness, safety and the like of the information. Information security refers to the technical, administrative security that is established and employed for data processing systems in order to protect computer hardware, software data from being destroyed, altered, and compromised by accidental or malicious causes in an unsafe environment with an attacker.
Two large areas of research in information security include cryptography and cryptanalysis. Cryptography is the discipline of studying how information is transmitted in a covert manner; mathematical studies, which refer in particular to information and its transmission in modern times, are often regarded as branches of mathematical and computer science, and are also closely related to information theory. Just as it is the basis of almost all existing security mechanisms, cryptography becomes the basis for information security. The cryptanalysis is that after a cryptosystem is deeply researched, the characteristics of the cryptosystem are analyzed, vulnerabilities of the cryptosystem are mined to attack, and meanwhile, disciplines of corresponding defense facilities can be designed based on the cryptosystem; it has a synergistic relationship with cryptography.
The information theory which is creatively proposed by shannon lays a theoretical foundation of modern cryptography, and through decades of development and research, a modern cryptosystem can be divided into a symmetrical cryptosystem and an asymmetrical cryptosystem. In the early years, the commonly used data encryption standard (Data Encryption Standard, DES) and advanced encryption standard (Advanced Encryption Standard, AES) were symmetric cryptosystems, whose encryption and decryption shared the same key. Whereas the underlying mathematical problem can be converted into RSA, ECC, etc. algorithms of Non-deterministic polynomial (Non-deterministic Polynomial, NP) problem, which are cryptographic algorithms trusted by experts in recent years, are all asymmetric cryptosystems. Compared with a symmetric cryptosystem, the encryption and decryption of the asymmetric cryptosystem are carried out by using different keys (public key and private key), so that the speed of the whole algorithm operation process is slower, the power consumption is higher, and the security is better ensured. The root cause is that NP problems are difficult or require an exponential time to break down on a traditional computer.
Although the existing cryptosystem is temporally safe, the rapid development of the Shor algorithm and post quantum computer technology makes the current cryptosystem extremely threatened. The cipher chip is used as the implementation carrier of cipher algorithm, and its hardware architecture is the most reliable and efficient way of implementing the whole cipher scheme, so it plays an important role in evaluating the performance of the cipher scheme. Compared with the software implementation, the hardware implementation has the advantages of high parallelism, strong flexibility and low cost, and is a key for pushing the development and application of the cryptosystem. Hardware implementation of the traditional encryption scheme is mature, and research on hardware implementation of the quantum attack resistant post quantum cryptography scheme is just started. And therefore quantum cryptography schemes have become a significant research hotspot for current cryptography.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a CRYSTALS-Kyber-based rapid number theory conversion circuit.
The invention provides a CRYSTALS-Kyber-based rapid number theory conversion circuit, which comprises two butterfly units with double-group input/output ports, a control unit and four two-group double-port BRAM memories, wherein the butterfly units are connected with the control unit;
the control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm.
Further, the modulus q=3329 of the crystalskyber algorithm in the butterfly unit, the control unit and the four two-bank, two-port BRAM memory of two-bank input-output ports, and the polynomial coefficient n=256.
Further, four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation; inputting new data to the butterfly unit in each period, and storing the result every time the result of the butterfly unit is output; the four two-group double-port BRAM memory access units adopt a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the four two-group double-port BRAM memory access units use a ping-pong memory access mode so as to meet the data throughput rate of simultaneously reading data and writing data.
Further, the butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; the butterfly unit can disassemble the parts to separate the use of support point-wise multiplication functions.
Further, when the ping-pong memory access mode is used, the operation result of the butterfly unit is stored according to a preset position so as to be ready for data reading of the next round; and splitting an original set of data into two sets of data to be stored separately after each round of butterfly operation is completed.
Further, the multiplication operation in the butterfly operation expands the original 12bit data into 24bit data, and the 24bit data is re-normalized back to the 12bit range by introducing an approximate calculated barrett reduction module.
The invention provides a quick number theory conversion circuit based on CRYSTALS-Kyber, which uses a quick number theory conversion algorithm (NTT) as a loop polynomial multiplication algorithm, uses a CT mode butterfly unit to calculate a forward NTT process, and uses a GS mode NTT butterfly unit to calculate a reverse NTT process, so that the loop polynomial multiplication of a lattice password is realized efficiently, the selected NTT algorithm reduces the calculation complexity, and the frequency and the calculation speed of the overall design are improved;
two different circuit functions are integrated in one calculation unit by adopting an NTT butterfly unit with a switchable mode, different modes are controlled by a mode control signal con and an input address selection signal, and the two different NTT butterfly units of CT/GS are integrated in the same module, so that the consumption of hardware resources is reduced;
in the NTT conversion circuit, two dual-port BRAMs which are a group are used for storing 256 data, and the bit width of each BRAMs can store 2 data; the BRAM can take out 4 data in each period, and the data throughput is improved so as to meet the data input requirements of two butterfly units;
the barrett reduction and circuit closed loop butterfly unit with approximate calculation of the invention simplifies the calculation flow and the circuit complexity to a great extent, can save resources and provides convenience for retiming;
according to the memory access scheme, the maximum computing power of the butterfly unit is exerted as a benchmark, and the table tennis storage and the read-write sub-control are specifically adopted, so that the NTT conversion occupation period is extremely close to the theoretical limit value.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a block diagram of a CRYSTALS-Kyber based fast number theory conversion circuit provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of memory access data storage according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a barrett's reduction unit circuit incorporating approximate computation according to an embodiment of the present invention;
fig. 4 is a circuit diagram of a closed-loop type multifunctional butterfly operation unit according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, the embodiment of the invention provides a CRYSTALS-Kyber-based fast number theory conversion circuit, which comprises two butterfly units with double-group input/output ports, a control unit and four two-group double-port BRAM memories.
The control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm.
Illustratively, the two-set butterfly unit, control unit, and four two-set double-port BRAM memory have a modulus q=3329, and a polynomial coefficient n=256 of the crystalskyber algorithm.
The four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation (seven-stage butterfly operation); in order to fully exert the calculation force of the butterfly unit, new data needs to be input into the butterfly unit in each period, and the result needs to be stored in time every time the result of the butterfly unit is output. The designed memory access unit adopts a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the designed memory access unit uses a ping-pong memory access mode to meet the requirement of larger data throughput rate for simultaneously reading data and writing data. Completing one NTT transform, at least (7×128)/2=448 cycles are required in case two butterfly units are used; while according to the present design, only 459 cycles are required to complete one NTT transformation, with an additional 11 cycles for the necessary data selection and calculation of the wait butterfly unit, and as close as possible to the theoretical limit value.
The butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; and the butterfly unit can disassemble the parts for separate use supporting the point-wise multiplication function.
As shown in fig. 2, when the ping-pong memory access mode is used, in order to adapt to the characteristics of the NTT transform, the operation object of each round is different from that of the previous round, so that the read-write function can run continuously, and therefore, the operation result of the butterfly unit needs to be stored according to a preset position to read the data of the next round. In this embodiment, each time a round of butterfly operation is completed, an original set of data is split into two sets of data for separate storage.
Introducing a barrett reduction unit, the original 12-bit data can be expanded into 24-bit data by the multiplication operation in butterfly operation, so that the 24-bit data needs to be re-normalized back to the 12-bit range; of the various reduction algorithms that are widely used, the barrett reduction algorithm is most suitable, but the conventional barrett reduction algorithm requires a lot of resources in the design of hardware circuits; the barrett reduction algorithm used in the embodiment of the invention introduces the concept of approximate calculation through accurate data analysis, reduces the calculation accuracy of the former part to save resources, and obtains an accurate value through a simple complement difference before outputting. The specific circuit is shown in fig. 3, wherein the first two stages are approximate calculation processes, and the third stage is compensation process.
To adapt two groups of butterfly operations, the butterfly unit is designed as a closed loop; CT and GS butterfly operations are mostly used for the forward and reverse processes of NTT transformation, respectively, but the essential difference between CT and GS butterfly operations is the order of operations; according to the embodiment of the invention, the circuit is designed into a closed loop, and data are input and results are taken out from different circuit nodes, so that two calculation sequences can be realized in a single data loop, the circuit is simplified, the resource is saved, and the operating frequency of the circuit can be improved. As shown in fig. 4, the node before multiplication is the input node for the CT butterfly operation, and the node before modulo addition or modulo subtraction is the input node for the GS butterfly operation.
The GS butterfly operation is the inverse of the CT butterfly operation and does not fully recover the data, thus requiring an additional post-processing operation after the inverse NTT transform is completed. The embodiment of the invention introduces a DIV2 unit, which can enable the GS operation to completely recover the data so as to eliminate the additional post-processing operation required after the inverse NTT conversion.
The specific circuit operation logic will be described in the following sub-functions, taking the forward NTT phase as an example:
1) Forward NTT phase:
as shown in fig. 4, the control unit gives a control signal con=0, the two butterfly units operate in parallel in CT mode, and the address generation mode is as shown in fig. 2; a set of 256 data initially places the first 128 data in RAM0 and the last 128 data in RAM1, the first round by a 0 ~a 127 Respectively sum a 128 ~a 255 And performing CT butterfly operation. The 1 st period control unit generates a read address; the 2 nd cycle data is fetched from BRAM (a 0 ,a 1 ) And (a) 128 ,a 129 ) The method comprises the steps of carrying out a first treatment on the surface of the The 3 rd period performs data selection, which includes selecting a memory group (RAM 0 and RAM1 in this time) for providing data and selecting a data connection node (RAM 0 input low node and RAM1 input high node in this time) of the memory group; the 4 th period high node data is multiplied by a twiddle factor omega provided by the ROM; and re-normalize the multiplication result back to the 12bit range through barrett reduction in the next 3 cycles; the output of the barrett reduction and the input of the low node in the 8 th and 9 th periods do modulo addition and modulo subtraction operation; the two butterfly units output the calculation result of the first group of data in the 10 th period, and the control unit provides a write address in the tenth period; the 11 th cycle completes the access of the first set of butterfly results, each subsequent cycle has new data written, and the last set of data is stored in 459 th cycle, completing the NTT transformation operation, because of the total (7×128)/2=448 sets of data.
It should be noted that, taking the data storage of the first round as an example, since the second round is a 0 ~a 63 Respectively sum a 64 ~a 127 A 128 ~a 191 Respectively sum a 192 ~a 255 CT butterfly operation is performed, so that the original a is needed 0 ~a 127 And a 128 ~a 255 Split storage for the next data read, see fig. 2, where a will be a, through the first round 0 ~a 63 And a 64 ~a 127 Stored in RAM2 and RAM3, respectively. Each subsequent round reads and writes data according to the rule.
2) Reverse NTT phase:
the control module gives a control signal con of 1, and the two butterfly units operate in GS mode in parallel. Butterfly is a complete reverse process and data read-write is also a reverse process in a reverse state distribution from right to left in fig. 2.
The following appends introduce an error analysis of the approximation-calculated barrett reduction algorithm:
in the step of the algorithm 1, the estimated quotient t is obtained through preliminary calculation, then a rough standard value r is obtained, and finally an accurate standard value can be obtained rapidly through one-time judgment. However, in the circuit design, the DSP is relatively wasteful of resources and time-consuming, so the embodiment of the invention introduces a concept of approximate calculation, the original multiplication operation is realized by addition and subtraction, and after a re-estimated value is obtained, the final accurate value is obtained by compensating for the difference.
According to algorithm 1, quotient t and approximation quotientThe method can be obtained by the following formula:
due to the nature of any number A and BAll hold, so the following error analysis can be done:
wherein the method comprises the steps ofIs an error function of the argument k, and since the error e must be an integer, therefore,
it is to be noted in particular that the choice of mIf (3)According to the error analysis, the error e {0,1}. But for simplicity of circuit design, the design is selected by analysis And through verification, error e { -1,0}; as with the analysis described above, altering the selected value of m can, of course, result in the approximation quotient being an exact quotient by one addition or subtraction.
Besides, the first step and the second step in the algorithm 2 change the multiplication in the original algorithm 1 into the multiplication realized by addition and subtraction, and the idea of approximate calculation is introduced in the first step, and the number of bits after the decimal point is removed, so that the operation bit width is saved by 20 bits, and therefore, additional errors are also generated.
Easily available e 1 Will fall within a range of-1 to 1, and in combination with the above analysis, the final error e can be obtained f ∈[-2,1]. Thus, in the second step of the algorithm, only an additional 2bit value is needed to be calculated for determining the complement value q mux Finally, an accurate standard value res can be obtained through one addition.
The invention has been described in detail in connection with the specific embodiments and exemplary examples thereof, but such description is not to be construed as limiting the invention. It will be understood by those skilled in the art that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, and these fall within the scope of the present invention. The scope of the invention is defined by the appended claims.
Claims (1)
1. A rapid number theory conversion circuit based on CRYSTALS-Kyber is characterized by comprising two butterfly units with double input and output ports, a control unit and four two-group double-port BRAM memories;
the control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after calculation results of the butterfly unit are obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm;
the method comprises the steps of selecting a butterfly unit with two double-group input/output ports, a control unit and a module q=3329 of a CRYSTALS-Kyber algorithm in four double-group double-port BRAM memories, wherein a polynomial coefficient n=256;
the four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation; inputting new data to the butterfly unit in each period, and storing the result every time the result of the butterfly unit is output; the four two-group double-port BRAM memory access units adopt a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the four two-group double-port BRAM memory access units use a ping-pong memory access mode so as to meet the data throughput rate of simultaneously reading data and writing data;
the butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; the butterfly unit can disassemble the parts to separately use the function of supporting point-by-point multiplication;
when the ping-pong memory access mode is used, the operation result of the butterfly unit is stored according to a preset position so as to be ready for data reading of the next round; splitting an original group of data into two groups of data to be stored separately after each round of butterfly operation is completed;
the multiplication operation in butterfly expands the original 12bit data into 24bit data, and the 24bit data is re-normalized back to the 12bit range by introducing an approximate computed barrett reduction module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310594853.1A CN116820397B (en) | 2023-05-25 | 2023-05-25 | Rapid number theory conversion circuit based on CRYSTALS-Kyber |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310594853.1A CN116820397B (en) | 2023-05-25 | 2023-05-25 | Rapid number theory conversion circuit based on CRYSTALS-Kyber |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116820397A CN116820397A (en) | 2023-09-29 |
CN116820397B true CN116820397B (en) | 2024-02-02 |
Family
ID=88119387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310594853.1A Active CN116820397B (en) | 2023-05-25 | 2023-05-25 | Rapid number theory conversion circuit based on CRYSTALS-Kyber |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116820397B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115756387A (en) * | 2022-09-20 | 2023-03-07 | 杭州电子科技大学 | NTT hardware realization method of R2-MDC architecture based on folding transformation |
CN115756386A (en) * | 2022-10-26 | 2023-03-07 | 南京航空航天大学 | Efficient lightweight NTT multiplier circuit based on lattice code |
CN115801226A (en) * | 2022-11-02 | 2023-03-14 | 武汉亦芯微电子有限公司 | CRYSTALS-KYBER safety processor adopting post-quantum cryptography algorithm |
WO2023060809A1 (en) * | 2021-10-11 | 2023-04-20 | 苏州浪潮智能科技有限公司 | Number theoretic transforms computation circuit and method, and computer device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3903300A4 (en) * | 2019-02-19 | 2022-09-07 | Massachusetts Institute Of Technology | Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques |
US11614945B2 (en) * | 2019-11-27 | 2023-03-28 | EpiSys Science, Inc. | Apparatus and method of a scalable and reconfigurable fast fourier transform |
-
2023
- 2023-05-25 CN CN202310594853.1A patent/CN116820397B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023060809A1 (en) * | 2021-10-11 | 2023-04-20 | 苏州浪潮智能科技有限公司 | Number theoretic transforms computation circuit and method, and computer device |
CN115756387A (en) * | 2022-09-20 | 2023-03-07 | 杭州电子科技大学 | NTT hardware realization method of R2-MDC architecture based on folding transformation |
CN115756386A (en) * | 2022-10-26 | 2023-03-07 | 南京航空航天大学 | Efficient lightweight NTT multiplier circuit based on lattice code |
CN115801226A (en) * | 2022-11-02 | 2023-03-14 | 武汉亦芯微电子有限公司 | CRYSTALS-KYBER safety processor adopting post-quantum cryptography algorithm |
Non-Patent Citations (2)
Title |
---|
OFDM系统中256点基-4 IFFT模块的设计与FPGA实现;刘真;鲁艳;马宇;万俊;;广东通信技术(第01期);全文 * |
基于FPGA的数论变换算法及应用的研究;余汉成;王成华;邵杰;夏永君;;微计算机信息(第32期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116820397A (en) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11416638B2 (en) | Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques | |
Zhang et al. | Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT | |
Fritzmann et al. | Efficient and flexible low-power NTT for lattice-based cryptography | |
Land et al. | A hard crystal-implementing dilithium on reconfigurable hardware | |
Ma | An effective memory addressing scheme for FFT processors | |
Aikata et al. | KaLi: A crystal for post-quantum security using Kyber and Dilithium | |
KR100442218B1 (en) | Power-residue calculating unit using montgomery algorithm | |
US20230318829A1 (en) | Cryptographic processor device and data processing apparatus employing the same | |
CN113467750A (en) | Large integer bit width division circuit and method for SRT algorithm with radix of 4 | |
CN111079934B (en) | Number theory transformation unit and method applied to error learning encryption algorithm on ring domain | |
CN116820397B (en) | Rapid number theory conversion circuit based on CRYSTALS-Kyber | |
CN110224829B (en) | Matrix-based post-quantum encryption method and device | |
CN114826560B (en) | Lightweight block cipher CREF implementation method and system | |
CN114594925A (en) | Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof | |
JP2010107947A (en) | Sha-based message schedule operation method, message compression operation method and cryptographic device performing the same | |
CN116886274B (en) | High-efficiency application type polynomial operation circuit applied to CRYSTALS-Kyber | |
Peng et al. | A Hardware/Software Collaborative SM4 Implementation Resistant to Side-channel Attacks on ARM-FPGA Embedded SoC | |
Liu et al. | Multiprecision multiplication on armv8 | |
KR100974624B1 (en) | Method and Apparatus of elliptic curve cryptography processing in sensor mote and Recording medium using it | |
Praveena et al. | Bus encoded LUT multiplier for portable biomedical therapeutic devices | |
CN118233081B (en) | NEON instruction set-based national cipher SM2 bottom modular multiplication optimization method | |
CN112487448B (en) | Encryption information processing device, method and computer equipment | |
CN117785128A (en) | Computing system capable of being used for elliptic curve of arbitrary prime number domain | |
Xu et al. | Bandwidth Efficient Homomorphic Encrypted Discrete Fourier Transform Acceleration on FPGA | |
Han et al. | Algorithm-Based Countermeasures against Power Analysis Attacks for Public-Key Cryptography SM2 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |