CN116820397B - Rapid number theory conversion circuit based on CRYSTALS-Kyber - Google Patents

Rapid number theory conversion circuit based on CRYSTALS-Kyber Download PDF

Info

Publication number
CN116820397B
CN116820397B CN202310594853.1A CN202310594853A CN116820397B CN 116820397 B CN116820397 B CN 116820397B CN 202310594853 A CN202310594853 A CN 202310594853A CN 116820397 B CN116820397 B CN 116820397B
Authority
CN
China
Prior art keywords
data
butterfly
bram
unit
memories
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310594853.1A
Other languages
Chinese (zh)
Other versions
CN116820397A (en
Inventor
张卓尧
崔益军
刘伟强
王成华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310594853.1A priority Critical patent/CN116820397B/en
Publication of CN116820397A publication Critical patent/CN116820397A/en
Application granted granted Critical
Publication of CN116820397B publication Critical patent/CN116820397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • G06F7/575Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1408Protection against unauthorised use of memory or access to memory by using cryptography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1416Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a rapid number theory conversion circuit based on CRYSTALS-Kyber, wherein a control unit provides mode control signals for two butterfly units and four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm. The butterfly unit saves resources, can operate at high frequency, and the memory access mode can exert the computing power of the butterfly unit to the greatest extent, so that the occupied period is less.

Description

Rapid number theory conversion circuit based on CRYSTALS-Kyber
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a CRYSTALS-Kyber-based rapid number theory conversion circuit.
Background
Global economy is increasingly integrated, accompanied by rapid developments in internet networking technology and information technology. Information interaction is frequent nowadays, and the information has become a key link of national security due to the characteristics of timeliness, safety and the like of the information. Information security refers to the technical, administrative security that is established and employed for data processing systems in order to protect computer hardware, software data from being destroyed, altered, and compromised by accidental or malicious causes in an unsafe environment with an attacker.
Two large areas of research in information security include cryptography and cryptanalysis. Cryptography is the discipline of studying how information is transmitted in a covert manner; mathematical studies, which refer in particular to information and its transmission in modern times, are often regarded as branches of mathematical and computer science, and are also closely related to information theory. Just as it is the basis of almost all existing security mechanisms, cryptography becomes the basis for information security. The cryptanalysis is that after a cryptosystem is deeply researched, the characteristics of the cryptosystem are analyzed, vulnerabilities of the cryptosystem are mined to attack, and meanwhile, disciplines of corresponding defense facilities can be designed based on the cryptosystem; it has a synergistic relationship with cryptography.
The information theory which is creatively proposed by shannon lays a theoretical foundation of modern cryptography, and through decades of development and research, a modern cryptosystem can be divided into a symmetrical cryptosystem and an asymmetrical cryptosystem. In the early years, the commonly used data encryption standard (Data Encryption Standard, DES) and advanced encryption standard (Advanced Encryption Standard, AES) were symmetric cryptosystems, whose encryption and decryption shared the same key. Whereas the underlying mathematical problem can be converted into RSA, ECC, etc. algorithms of Non-deterministic polynomial (Non-deterministic Polynomial, NP) problem, which are cryptographic algorithms trusted by experts in recent years, are all asymmetric cryptosystems. Compared with a symmetric cryptosystem, the encryption and decryption of the asymmetric cryptosystem are carried out by using different keys (public key and private key), so that the speed of the whole algorithm operation process is slower, the power consumption is higher, and the security is better ensured. The root cause is that NP problems are difficult or require an exponential time to break down on a traditional computer.
Although the existing cryptosystem is temporally safe, the rapid development of the Shor algorithm and post quantum computer technology makes the current cryptosystem extremely threatened. The cipher chip is used as the implementation carrier of cipher algorithm, and its hardware architecture is the most reliable and efficient way of implementing the whole cipher scheme, so it plays an important role in evaluating the performance of the cipher scheme. Compared with the software implementation, the hardware implementation has the advantages of high parallelism, strong flexibility and low cost, and is a key for pushing the development and application of the cryptosystem. Hardware implementation of the traditional encryption scheme is mature, and research on hardware implementation of the quantum attack resistant post quantum cryptography scheme is just started. And therefore quantum cryptography schemes have become a significant research hotspot for current cryptography.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a CRYSTALS-Kyber-based rapid number theory conversion circuit.
The invention provides a CRYSTALS-Kyber-based rapid number theory conversion circuit, which comprises two butterfly units with double-group input/output ports, a control unit and four two-group double-port BRAM memories, wherein the butterfly units are connected with the control unit;
the control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm.
Further, the modulus q=3329 of the crystalskyber algorithm in the butterfly unit, the control unit and the four two-bank, two-port BRAM memory of two-bank input-output ports, and the polynomial coefficient n=256.
Further, four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation; inputting new data to the butterfly unit in each period, and storing the result every time the result of the butterfly unit is output; the four two-group double-port BRAM memory access units adopt a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the four two-group double-port BRAM memory access units use a ping-pong memory access mode so as to meet the data throughput rate of simultaneously reading data and writing data.
Further, the butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; the butterfly unit can disassemble the parts to separate the use of support point-wise multiplication functions.
Further, when the ping-pong memory access mode is used, the operation result of the butterfly unit is stored according to a preset position so as to be ready for data reading of the next round; and splitting an original set of data into two sets of data to be stored separately after each round of butterfly operation is completed.
Further, the multiplication operation in the butterfly operation expands the original 12bit data into 24bit data, and the 24bit data is re-normalized back to the 12bit range by introducing an approximate calculated barrett reduction module.
The invention provides a quick number theory conversion circuit based on CRYSTALS-Kyber, which uses a quick number theory conversion algorithm (NTT) as a loop polynomial multiplication algorithm, uses a CT mode butterfly unit to calculate a forward NTT process, and uses a GS mode NTT butterfly unit to calculate a reverse NTT process, so that the loop polynomial multiplication of a lattice password is realized efficiently, the selected NTT algorithm reduces the calculation complexity, and the frequency and the calculation speed of the overall design are improved;
two different circuit functions are integrated in one calculation unit by adopting an NTT butterfly unit with a switchable mode, different modes are controlled by a mode control signal con and an input address selection signal, and the two different NTT butterfly units of CT/GS are integrated in the same module, so that the consumption of hardware resources is reduced;
in the NTT conversion circuit, two dual-port BRAMs which are a group are used for storing 256 data, and the bit width of each BRAMs can store 2 data; the BRAM can take out 4 data in each period, and the data throughput is improved so as to meet the data input requirements of two butterfly units;
the barrett reduction and circuit closed loop butterfly unit with approximate calculation of the invention simplifies the calculation flow and the circuit complexity to a great extent, can save resources and provides convenience for retiming;
according to the memory access scheme, the maximum computing power of the butterfly unit is exerted as a benchmark, and the table tennis storage and the read-write sub-control are specifically adopted, so that the NTT conversion occupation period is extremely close to the theoretical limit value.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a block diagram of a CRYSTALS-Kyber based fast number theory conversion circuit provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of memory access data storage according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a barrett's reduction unit circuit incorporating approximate computation according to an embodiment of the present invention;
fig. 4 is a circuit diagram of a closed-loop type multifunctional butterfly operation unit according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, the embodiment of the invention provides a CRYSTALS-Kyber-based fast number theory conversion circuit, which comprises two butterfly units with double-group input/output ports, a control unit and four two-group double-port BRAM memories.
The control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after a butterfly unit operation result is obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm.
Illustratively, the two-set butterfly unit, control unit, and four two-set double-port BRAM memory have a modulus q=3329, and a polynomial coefficient n=256 of the crystalskyber algorithm.
The four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation (seven-stage butterfly operation); in order to fully exert the calculation force of the butterfly unit, new data needs to be input into the butterfly unit in each period, and the result needs to be stored in time every time the result of the butterfly unit is output. The designed memory access unit adopts a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the designed memory access unit uses a ping-pong memory access mode to meet the requirement of larger data throughput rate for simultaneously reading data and writing data. Completing one NTT transform, at least (7×128)/2=448 cycles are required in case two butterfly units are used; while according to the present design, only 459 cycles are required to complete one NTT transformation, with an additional 11 cycles for the necessary data selection and calculation of the wait butterfly unit, and as close as possible to the theoretical limit value.
The butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; and the butterfly unit can disassemble the parts for separate use supporting the point-wise multiplication function.
As shown in fig. 2, when the ping-pong memory access mode is used, in order to adapt to the characteristics of the NTT transform, the operation object of each round is different from that of the previous round, so that the read-write function can run continuously, and therefore, the operation result of the butterfly unit needs to be stored according to a preset position to read the data of the next round. In this embodiment, each time a round of butterfly operation is completed, an original set of data is split into two sets of data for separate storage.
Introducing a barrett reduction unit, the original 12-bit data can be expanded into 24-bit data by the multiplication operation in butterfly operation, so that the 24-bit data needs to be re-normalized back to the 12-bit range; of the various reduction algorithms that are widely used, the barrett reduction algorithm is most suitable, but the conventional barrett reduction algorithm requires a lot of resources in the design of hardware circuits; the barrett reduction algorithm used in the embodiment of the invention introduces the concept of approximate calculation through accurate data analysis, reduces the calculation accuracy of the former part to save resources, and obtains an accurate value through a simple complement difference before outputting. The specific circuit is shown in fig. 3, wherein the first two stages are approximate calculation processes, and the third stage is compensation process.
To adapt two groups of butterfly operations, the butterfly unit is designed as a closed loop; CT and GS butterfly operations are mostly used for the forward and reverse processes of NTT transformation, respectively, but the essential difference between CT and GS butterfly operations is the order of operations; according to the embodiment of the invention, the circuit is designed into a closed loop, and data are input and results are taken out from different circuit nodes, so that two calculation sequences can be realized in a single data loop, the circuit is simplified, the resource is saved, and the operating frequency of the circuit can be improved. As shown in fig. 4, the node before multiplication is the input node for the CT butterfly operation, and the node before modulo addition or modulo subtraction is the input node for the GS butterfly operation.
The GS butterfly operation is the inverse of the CT butterfly operation and does not fully recover the data, thus requiring an additional post-processing operation after the inverse NTT transform is completed. The embodiment of the invention introduces a DIV2 unit, which can enable the GS operation to completely recover the data so as to eliminate the additional post-processing operation required after the inverse NTT conversion.
The specific circuit operation logic will be described in the following sub-functions, taking the forward NTT phase as an example:
1) Forward NTT phase:
as shown in fig. 4, the control unit gives a control signal con=0, the two butterfly units operate in parallel in CT mode, and the address generation mode is as shown in fig. 2; a set of 256 data initially places the first 128 data in RAM0 and the last 128 data in RAM1, the first round by a 0 ~a 127 Respectively sum a 128 ~a 255 And performing CT butterfly operation. The 1 st period control unit generates a read address; the 2 nd cycle data is fetched from BRAM (a 0 ,a 1 ) And (a) 128 ,a 129 ) The method comprises the steps of carrying out a first treatment on the surface of the The 3 rd period performs data selection, which includes selecting a memory group (RAM 0 and RAM1 in this time) for providing data and selecting a data connection node (RAM 0 input low node and RAM1 input high node in this time) of the memory group; the 4 th period high node data is multiplied by a twiddle factor omega provided by the ROM; and re-normalize the multiplication result back to the 12bit range through barrett reduction in the next 3 cycles; the output of the barrett reduction and the input of the low node in the 8 th and 9 th periods do modulo addition and modulo subtraction operation; the two butterfly units output the calculation result of the first group of data in the 10 th period, and the control unit provides a write address in the tenth period; the 11 th cycle completes the access of the first set of butterfly results, each subsequent cycle has new data written, and the last set of data is stored in 459 th cycle, completing the NTT transformation operation, because of the total (7×128)/2=448 sets of data.
It should be noted that, taking the data storage of the first round as an example, since the second round is a 0 ~a 63 Respectively sum a 64 ~a 127 A 128 ~a 191 Respectively sum a 192 ~a 255 CT butterfly operation is performed, so that the original a is needed 0 ~a 127 And a 128 ~a 255 Split storage for the next data read, see fig. 2, where a will be a, through the first round 0 ~a 63 And a 64 ~a 127 Stored in RAM2 and RAM3, respectively. Each subsequent round reads and writes data according to the rule.
2) Reverse NTT phase:
the control module gives a control signal con of 1, and the two butterfly units operate in GS mode in parallel. Butterfly is a complete reverse process and data read-write is also a reverse process in a reverse state distribution from right to left in fig. 2.
The following appends introduce an error analysis of the approximation-calculated barrett reduction algorithm:
in the step of the algorithm 1, the estimated quotient t is obtained through preliminary calculation, then a rough standard value r is obtained, and finally an accurate standard value can be obtained rapidly through one-time judgment. However, in the circuit design, the DSP is relatively wasteful of resources and time-consuming, so the embodiment of the invention introduces a concept of approximate calculation, the original multiplication operation is realized by addition and subtraction, and after a re-estimated value is obtained, the final accurate value is obtained by compensating for the difference.
According to algorithm 1, quotient t and approximation quotientThe method can be obtained by the following formula:
due to the nature of any number A and BAll hold, so the following error analysis can be done:
wherein the method comprises the steps ofIs an error function of the argument k, and since the error e must be an integer, therefore,
it is to be noted in particular that the choice of mIf (3)According to the error analysis, the error e {0,1}. But for simplicity of circuit design, the design is selected by analysis And through verification, error e { -1,0}; as with the analysis described above, altering the selected value of m can, of course, result in the approximation quotient being an exact quotient by one addition or subtraction.
Besides, the first step and the second step in the algorithm 2 change the multiplication in the original algorithm 1 into the multiplication realized by addition and subtraction, and the idea of approximate calculation is introduced in the first step, and the number of bits after the decimal point is removed, so that the operation bit width is saved by 20 bits, and therefore, additional errors are also generated.
Easily available e 1 Will fall within a range of-1 to 1, and in combination with the above analysis, the final error e can be obtained f ∈[-2,1]. Thus, in the second step of the algorithm, only an additional 2bit value is needed to be calculated for determining the complement value q mux Finally, an accurate standard value res can be obtained through one addition.
The invention has been described in detail in connection with the specific embodiments and exemplary examples thereof, but such description is not to be construed as limiting the invention. It will be understood by those skilled in the art that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, and these fall within the scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (1)

1. A rapid number theory conversion circuit based on CRYSTALS-Kyber is characterized by comprising two butterfly units with double input and output ports, a control unit and four two-group double-port BRAM memories;
the control unit provides mode control signals for the two butterfly units and the four BRAM memories, and provides read-write addresses for the four BRAM memories according to different working modes; data are input into the butterfly unit through four BRAM memories, different butterfly unit modes are selected through mode control signals of the control unit, a barrett reduction circuit is introduced into the butterfly unit, 12bit×12bit=24bit data are re-normalized to a 12bit range, and after calculation results of the butterfly unit are obtained, the data are written back into the four BRAM memories according to the sequence of a rapid number theory transformation algorithm;
the method comprises the steps of selecting a butterfly unit with two double-group input/output ports, a control unit and a module q=3329 of a CRYSTALS-Kyber algorithm in four double-group double-port BRAM memories, wherein a polynomial coefficient n=256;
the four two groups of double-port BRAM memories are used for temporarily storing intermediate process data of the rapid number theory transformation; inputting new data to the butterfly unit in each period, and storing the result every time the result of the butterfly unit is output; the four two-group double-port BRAM memory access units adopt a read-write sub-control operation mode, so that no conflict is generated between read data and write data, and the four two-group double-port BRAM memory access units use a ping-pong memory access mode so as to meet the data throughput rate of simultaneously reading data and writing data;
the butterfly unit designs a circuit into a closed loop according to the characteristics of CT and GS butterfly operations, and designs two groups of input and output ports to support the CT and GS butterfly operations; the butterfly unit can disassemble the parts to separately use the function of supporting point-by-point multiplication;
when the ping-pong memory access mode is used, the operation result of the butterfly unit is stored according to a preset position so as to be ready for data reading of the next round; splitting an original group of data into two groups of data to be stored separately after each round of butterfly operation is completed;
the multiplication operation in butterfly expands the original 12bit data into 24bit data, and the 24bit data is re-normalized back to the 12bit range by introducing an approximate computed barrett reduction module.
CN202310594853.1A 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber Active CN116820397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310594853.1A CN116820397B (en) 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310594853.1A CN116820397B (en) 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber

Publications (2)

Publication Number Publication Date
CN116820397A CN116820397A (en) 2023-09-29
CN116820397B true CN116820397B (en) 2024-02-02

Family

ID=88119387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310594853.1A Active CN116820397B (en) 2023-05-25 2023-05-25 Rapid number theory conversion circuit based on CRYSTALS-Kyber

Country Status (1)

Country Link
CN (1) CN116820397B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115756387A (en) * 2022-09-20 2023-03-07 杭州电子科技大学 NTT hardware realization method of R2-MDC architecture based on folding transformation
CN115756386A (en) * 2022-10-26 2023-03-07 南京航空航天大学 Efficient lightweight NTT multiplier circuit based on lattice code
CN115801226A (en) * 2022-11-02 2023-03-14 武汉亦芯微电子有限公司 CRYSTALS-KYBER safety processor adopting post-quantum cryptography algorithm
WO2023060809A1 (en) * 2021-10-11 2023-04-20 苏州浪潮智能科技有限公司 Number theoretic transforms computation circuit and method, and computer device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11416638B2 (en) * 2019-02-19 2022-08-16 Massachusetts Institute Of Technology Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques
US11614945B2 (en) * 2019-11-27 2023-03-28 EpiSys Science, Inc. Apparatus and method of a scalable and reconfigurable fast fourier transform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023060809A1 (en) * 2021-10-11 2023-04-20 苏州浪潮智能科技有限公司 Number theoretic transforms computation circuit and method, and computer device
CN115756387A (en) * 2022-09-20 2023-03-07 杭州电子科技大学 NTT hardware realization method of R2-MDC architecture based on folding transformation
CN115756386A (en) * 2022-10-26 2023-03-07 南京航空航天大学 Efficient lightweight NTT multiplier circuit based on lattice code
CN115801226A (en) * 2022-11-02 2023-03-14 武汉亦芯微电子有限公司 CRYSTALS-KYBER safety processor adopting post-quantum cryptography algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OFDM系统中256点基-4 IFFT模块的设计与FPGA实现;刘真;鲁艳;马宇;万俊;;广东通信技术(第01期);全文 *
基于FPGA的数论变换算法及应用的研究;余汉成;王成华;邵杰;夏永君;;微计算机信息(第32期);全文 *

Also Published As

Publication number Publication date
CN116820397A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US11416638B2 (en) Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques
Zhang et al. Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT
Fritzmann et al. Efficient and flexible low-power NTT for lattice-based cryptography
Alrimeih et al. Fast and flexible hardware support for ECC over multiple standard prime fields
CN103226461B (en) A kind of Montgomery modular multiplication method for circuit and circuit thereof
CN103793199B (en) A kind of fast rsa password coprocessor supporting dual domain
Aikata et al. KaLi: A crystal for post-quantum security using Kyber and Dilithium
Seo et al. Efficient arithmetic on ARM‐NEON and its application for high‐speed RSA implementation
CN106685663A (en) Encryption method for error learning problem in ring domain and circuit
KR100442218B1 (en) Power-residue calculating unit using montgomery algorithm
CN113467750A (en) Large integer bit width division circuit and method for SRT algorithm with radix of 4
CN116820397B (en) Rapid number theory conversion circuit based on CRYSTALS-Kyber
US20230318829A1 (en) Cryptographic processor device and data processing apparatus employing the same
CN111079934B (en) Number theory transformation unit and method applied to error learning encryption algorithm on ring domain
CN114826560B (en) Lightweight block cipher CREF implementation method and system
CN110224829B (en) Matrix-based post-quantum encryption method and device
CN116886274B (en) High-efficiency application type polynomial operation circuit applied to CRYSTALS-Kyber
CN114371828A (en) Polynomial multiplier and processor with same
CN113342310A (en) Serial parameter configurable fast number theory transformation hardware accelerator applied to lattice password
KR100974624B1 (en) Method and Apparatus of elliptic curve cryptography processing in sensor mote and Recording medium using it
Liu et al. Multiprecision multiplication on ARMv8
Praveena et al. Bus encoded LUT multiplier for portable biomedical therapeutic devices
Peng et al. A Hardware/Software Collaborative SM4 Implementation Resistant to Side-channel Attacks on ARM-FPGA Embedded SoC
CN117785128A (en) Computing system capable of being used for elliptic curve of arbitrary prime number domain
TWI403952B (en) A large integer modulus index chip structure for signature cryptography

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant