CN114172629A - High-performance fully-homomorphic encryption processor circuit based on RLWE encryption scheme - Google Patents
High-performance fully-homomorphic encryption processor circuit based on RLWE encryption scheme Download PDFInfo
- Publication number
- CN114172629A CN114172629A CN202111499003.0A CN202111499003A CN114172629A CN 114172629 A CN114172629 A CN 114172629A CN 202111499003 A CN202111499003 A CN 202111499003A CN 114172629 A CN114172629 A CN 114172629A
- Authority
- CN
- China
- Prior art keywords
- module
- butterfly
- storage module
- ntt
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/06—Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
- G06F5/16—Multiplexed systems, i.e. using two or more similar devices which are alternately accessed for enqueue and dequeue operations, e.g. ping-pong buffers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0894—Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/30—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
- H04L9/3006—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy underlying computational problems or public-key parameters
- H04L9/3026—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy underlying computational problems or public-key parameters details relating to polynomials generation, e.g. generation of irreducible polynomials
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/12—Details relating to cryptographic hardware or logic circuitry
- H04L2209/122—Hardware reduction or efficient architectures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/12—Details relating to cryptographic hardware or logic circuitry
- H04L2209/125—Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/34—Encoding or coding, e.g. Huffman coding or error correction
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Discrete Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, which comprises: the device comprises a storage module, an NTT module and a control module; the storage module comprises 6 dual-port RAMs, 2 dual-port ROMs and a single-port ROM and is responsible for storing intermediate data and input coefficients in the operation process; the NTT module is responsible for NTT calculation, and an internal multiplier is also responsible for pre-calculation and post-calculation; the control module is used for controlling the whole system, controlling the address generation module to generate an address, controlling the NTT module to perform NTT operation and INTT operation, controlling the pre-calculation module and the post-calculation module to perform calculation, and controlling encryption and decryption operation. The invention can balance the hardware area and the throughput rate, and reduce the hardware resource consumption on the premise of ensuring the high throughput rate.
Description
Technical Field
The invention belongs to the field of encryption hardware circuit design, and particularly relates to a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme.
Background
After the quantum computer is released, the RSA cryptosystem and the Elliptic Curve Cryptosystem (ECC) are no longer safe. The development of quantum communication and quantum computers brings unprecedented challenges to the traditional cryptographic technology, and the quantum attack resisting cryptographic algorithm becomes a research hotspot of the current cryptology community. Lattice-based cryptosystems are good candidates for replacing traditional cryptosystems, and many cryptographic schemes such as identity schemes, digital signature schemes, etc. have been proposed. Many of these encryption schemes are based on RLWE security. In 2009, Regev et al presented the LWE problem and demonstrated that its safety could be reduced to the case difficult problem (SVP or CVP). In 2010, lyubaschevsky et al proposed RLWE, introducing ideal lattices in LWE, reducing complexity while providing the same level of security. RLWE and its variants have lower complexity than previous public key encryption schemes. Due to the safety and the easy implementation, the RLWE public key encryption system has wide application prospects in numerous applications such as cloud computing, 5G communication, data aggregation, personal health data management, neural network training on encrypted data and the like. In recent years, the RLWE fully homomorphic encryption scheme has been extensively studied in both software and hardware. Clercq et al propose a software implementation of RLWE encryption system, and Z Liu et al implement the RLWE encryption scheme on the armmeon and MSP430 architectures. Tan et al propose a high security level fingerprint authentication system based on the RLWE cryptographic scheme. Tuy Nguyen Tan et al propose a method for realizing video face encryption and decryption based on RLWE on a GPU. Experimental results show that the speed of face encryption and decryption operations based on RLWE is about 100 times that of face encryption and decryption operations on a GPU.
Polynomial multiplication is one of the most critical and time-consuming operations in RLWE public key cryptosystems, and the efficiency of the polynomial multiplier determines the performance of the RLWE crypto processor.Et al have demonstrated that LWE encryption is feasible in software, and proposed the hardware design of the first FPGA-based LWE encryption scheme. Due to the adoption of a fully parallel architecture, the throughput of encryption is 316 times higher than the performance of software implementation, but a large amount of logic resources are consumed.Et al propose an efficient compact RLWE encryption hardware architecture. The architecture comprises 2 Fast Fourier Transform (FFT) times and 3 Inverse Fast Fourier Transform (IFFT) times, and a butterfly unit is designed to calculate the FFT and the IFFT, so that the use of hardware area is reduced, but only one butterfly unit is used for calculation, and the parallelism of the FFT cannot be utilized. Roy et al propose an efficient and compact RLWE encryption processor, optimize the RLWE encryption scheme, reduce the number of NTT operations from 5 to 4, and simultaneously combine the NTT algorithm with the "negative folding" convolution, avoiding the pre-calculation of the "negative folding" convolution. Liu et al designed a universal modular unit and proposed an RLWE encryption processor that is resource efficient and can resist side channel attacks, but this architecture performs RLWE encryption and decryption in a sequential processing manner with low throughput rates of 0.056Mbps and 0.28Mbps, respectively. Velasco Medina et al propose a high-throughput RLWE encryption hardware architecture, which adopts a base 2 and a base 8 multi-path delay NTT algorithm, and the encryption and decryption throughput rates respectively reach megabits per second and gigabits per second, but a large amount of hardware resources are consumed, so that the RLWE encryption hardware architecture is not suitable for being implemented on an FPGA development board with limited resources.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a high-performance fully-homomorphic encryption processor circuit based on an RLWE encryption scheme, so that the balance between the area and the throughput rate can be realized, and the consumption of hardware resources is reduced on the premise of ensuring the high throughput rate.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:
the invention relates to a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, which is characterized by comprising the following steps: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;
the storage module stores polynomial coefficients through a data selector in a reverse storage mode, and comprises: three input noises e1、e2And e3Message m, intermediate coefficient in NTT operation process, and two public keys A, P and private key R after NTT operation2(ii) a The intermediate coefficients include: positive zoom factor psiiInverse scaling factor psi-iA rotation factor ω;
the first NTT module comprises a first butterfly module, a second butterfly module, a first module taking module and a first reverse module; the first NTT module comprises a first butterfly module and a second butterfly module;
the second NTT module comprises a third butterfly module, a fourth butterfly module, a second module taking module and a second reverse module; the second NTT module comprises a third butterfly module and a fourth butterfly module;
under the control of the control module, the encoder acquires the message m in the storage module and performs encoding processing to obtain the encoded message meAnd storing the data in a storage module;
four butterfly modules simultaneously read coded message m from the storage moduleeAnd a third noise e3Adding the obtained noise to obtain a processed third noise e3m;
The first butterfly module obtains the input noise e from the storage module1Low order and the scaling factor psiiAnd pre-calculating to obtain noiseAnd storing the low order of the data into the original address of the storage module;
the second butterfly module obtains the input noise e from the storage module1High and the scaling factor psiiAnd pre-calculating to obtain noiseAnd storing the high order of the data into the original address of the storage module;
at the same time, the third butterfly module obtains the input noise e from the storage module2Low order and the scaling factor psiiAnd pre-calculating to obtain noiseAnd storing the low order of the data into the original address of the storage module;
the fourth butterfly module obtains the input noise e from the storage module2High and the scaling factor psiiAnd pre-calculating to obtain noiseAnd storing the high order of the data into the original address of the storage module;
the first NTT module reads the noise from the storage moduleAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E1Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise e from the storage module2ψAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E2Then storing the address into the original address of the storage module;
the first butterfly module reads the third noise e from the storage module3mLow order and the scaling factor psiiAnd pre-calculating to obtain the calculation resultStoring the low order of the address into the original address of the storage module; while the second butterfly module derives the third noise e from the memory module3mHigh order and scaling factor psiiAnd pre-calculating to obtain the calculation resultThe high order of the address is stored into the original address of the storage module;
the first NTT module reads the calculation result from the storage moduleAnd the rotation factor omega is subjected to in-situ NTT operation to obtain an operation result E3M(ii) a Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module1And E2And calculates ciphertext C1=A⊙E1+E2Then sending the result to the second module taking module for module taking processing, and storing the obtained result into the original address of the storage module; meanwhile, the fourth butterfly module reads a public key P and an NTT operation result E from the storage module1And calculates a first parameter PE1=P⊙E1Then writing the address into the original address of the storage module;
the first butterfly module and the second butterfly module pair the calculation resultPerforming the butterfly operation of the last stage, and simultaneously reading the parameter P from the storage module by the third butterfly module and the fourth butterfly moduleE1And operation result E3MAnd then the ciphertext C is obtained by addition calculation2=PE1+E3MThen storing the message m into the original address of the storage module, thereby completing the encryption of the message m;
four butterfly modules respectively read ciphertext C from storage module1、C2And a private key R2Performing dot multiplication and addition to obtain a second parameter MD=C1⊙R2+C2Sending the data to a corresponding reverse module for reverse arrangement, and storing the data into an original address of the storage module;
the first butterfly module and the second butterfly module read the second parameter M from the storage moduleDPerforming the last stage of butterfly operation on the rotation factor omega to obtain two butterfly operation results; simultaneously, the third butterfly module and the fourth butterfly module read the scaling factor psi from the storage module-iPerforming dot product operation on the two butterfly operation results to obtain an INTT calculation result mdAnd storing the address into the original address of the storage module;
after the INTT operation is finished, the four butterfly modules read the INTT calculation result m from the storage moduledAnd a scaling factor psi-iParallel multiplication to obtain post-calculation resultThen writing the address into the original address of the storage module;
under the control of the control module, the decoder acquires the post-calculation result in the storage moduleAnd decoding to obtain the message m, thereby completing the recovery of the message m.
The high-performance fully homomorphic encryption processor circuit based on the RLWE encryption scheme is also characterized in that the two NTT modules perform eight-level butterfly operation according to the following process:
the second butterfly module reads intermediate data of L-1 level butterfly operation from the storage moduleAnd performing butterfly operation on the high order of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operationHigh position of (2);
the third butterfly module reads intermediate data of L-1 level butterfly operation from the storage moduleAnd performing butterfly operation on the low order bits of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operationLow order of (1);
the fourth butterfly module reads intermediate data of L-1 level butterfly operation from the storage moduleAnd performing butterfly operation on the high order of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operationHigh position of (2);
when the butterfly stage number L is equal to 1, the intermediate data of the L-1 stage butterfly operation is orderedAndare respectively noiseAnd noise
Step 3, the first module for taking module obtains the intermediate result of the L-level butterfly operation from the first butterfly module and the second butterfly moduleAnd performing modulus operation:
step 3.1, intermediate resultsThe high order and the low order are respectively input into two subtracters, and the output result of one of the subtracters is added with the modulus q to obtain an addition result;
step 3.2, the output result of the other subtracter is used as a gating signal of the data selector ifHigh position of>If the bit is low, the data selector outputs the addition result; if it isHigh position ofIf the bit is low, the data selector outputs the output result of the subtracter;
step 3.3, the output result of the data selector is compared withIs added to obtain an intermediate result of the L-th level butterfly operation
Similarly, the second modulo module obtains an intermediate result of the L-th level butterfly operation from the third butterfly module and the fourth butterfly modulePerforming modular operation to obtain intermediate data of L-level butterfly operation
Step 4, assigning L +1 to L, and then judging L>8, if yes, indicating that the butterfly operation is finished and finally obtaining intermediate dataAndis the operation result E1And E2Otherwise, returning to the step 2 for sequential execution.
Compared with the prior art, the beneficial technical effects of the invention are as follows:
1. the invention processes NTT operation and ciphertext calculation in the encryption process in parallel, and performs ping-pong operation on the data read-write process and the calculation process in the processing process of NTT and INTT operation, thereby hiding the read-write period of the data, reducing the delay of the RLWE encryption processor and improving the throughput rate of a hardware architecture.
2. The invention designs a hardware architecture for resource multiplexing, a multiplier and an adder in a butterfly module are multiplexed in the encryption and decryption process, an INTT multiplexing NTT circuit structure is adopted, and a storage module is multiplexed with the same control circuit, thereby reducing the hardware resource consumption of the RLWE encryption processor and improving the resource efficiency.
3. The invention designs the RLWE encryption processor with medium security level (n is 256, q is 65537), and completes circuit realization and hardware test on a Spartan-6FPGA development platform. The results show that the RLWE encryption processor has an encryption period of only 2.38k and a decryption period of only 1.69 k. The throughput rate reaches 21.01Mbps and 29.60Mbps, and the performance of the RLWE encryption processor is obviously improved.
Drawings
FIG. 1 is a hardware architecture of an RLWE crypto processor of the present invention;
FIG. 2 is a schematic diagram of an RLWE encryption scheme of the present invention;
FIG. 3 is a timing diagram of conventional data read/write and calculation serial processing;
FIG. 4 is a timing diagram illustrating the data read/write and calculation ping-pong operations of the present invention;
FIG. 5 is a block diagram of a dual port RAM address control circuit according to the present invention;
FIG. 6 is a frame diagram of a butterfly module of the present invention;
FIG. 7 is a frame diagram of a mold-taking module according to the present invention;
FIG. 8 is a timing diagram of the multiplier time division multiplexing of the present invention;
FIG. 9 is a block diagram of a multiplier TDM module according to the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a high-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme includes: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;
the encryption scheme of RLWE is shown in fig. 2, and the algorithm flow of the scheme comprises three steps of key generation, encryption and decryption.
The RLWE public key and secret key generation algorithm generates a public key (A, P) and a private key R2. The description is as follows:
(1) selecting a polynomial a from a uniform random distribution, and selecting two n-dimensional vectors r from a discrete Gaussian distribution1,r2。
(2) n-dimensional vector a, r1,r2Performing dot product operation on the sum of the scaling factor vector psi to obtain an n-dimensional vector aψ,
(3) For n-dimensional vector aψ,And carrying out NTT operation and converting to an NTT domain. The result of NTT operation is marked as A, R1,R2。
(4) Calculating P ═ R1-A⊙R2. Wherein the n-dimensional vector (A, P) is a public key of the RLWE encryption scheme, and the n-dimensional vector R2Is the private key of the RLWE encryption scheme.
The RLWE encryption algorithm uses the public keys (A, P) to generate the ciphertext (C)1,C2). The description is as follows:
(1) selecting three n-dimensional vectors e from discrete Gaussian distribution1,e2,e3。
(2) The message m is encoded using an encoding function, the result of which is added to an n-dimensional vector e3To obtain an n-dimensional vector e3m。
(3) n-dimensional vector e1,e2,e3mPerforming dot product operation on the sum and the scaling factor vector psi to obtain a vector
(4) For n-dimensional vectorAnd carrying out NTT operation and converting to an NTT domain. The result of NTT operation is recorded as E1,E2,E3M。
(5) Calculating C1=A⊙E1+E2,C2=P⊙E1+E3MN-dimensional vector (C)1,C2) Is a ciphertext.
The RLWE decryption algorithm uses ciphertext (C)1,C2) And a private key R2The message m is recovered. The description is as follows:
(1) calculating MD=C1⊙R2+C2。
(2) For n-dimensional vector MDAn INTT operation is performed to convert to the INTT domain. The result of the INTT operation is denoted as md。
(3) n-dimensional vector mdAnd the scaling factor vector psi-1Performing dot product operation to obtain n-dimensional vector mdψ。
(4) The message m is obtained by a decoding function.
The storage module stores polynomial coefficients through a data selector in a reverse storage mode, and comprises: three input noises e1、e2And e3Message m, intermediate coefficient in NTT operation process, and two public keys A, P and private key R after NTT operation2(ii) a The intermediate coefficients include: positive zoom factor psiiInverse scaling factor psi-iA rotation factor ω;
the memory module comprises 6 dual-port RAMs and 2 dual-port ROMs and a single-port ROM.
(1) RAM0 is 256 in depth and 17bit in width, and combines a polynomial coefficient vector e1Storing 128 numbers with index values of 0-127 after reverse order arrangement into a storage space with addresses of 0-127 of RAM0, and storing a polynomial coefficient vector R2The 128 numbers with index values of 0-127 after the reverse order arrangement are stored in the storage space with addresses of 128-255 of the RAM 0.
(2) RAM1 is 256 in depth and 17bit in width, and combines a polynomial coefficient vector e1Storing 128 numbers with index value of 128-255 after reverse order arrangement into a storage space with addresses of 0-127 of RAM1, and storing a polynomial coefficient vector R2The 128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space with the address of 128-255 of the RAM 1.
(3) RAM2 is 512 deep and 17 bits wide, and is formed by dividing the polynomial coefficient vector e2128 numbers with index values of 0-127 after the reverse order are stored in the memory space with addresses of 0-127 of the RAM 2. 256 numbers of the polynomial coefficient vector A after reverse order arrangement are stored in a storage space with the address of 128-383 in the RAM 2. The storage space with the address 384-511 of the RAM2 is used for storing M in the decryption processDThe data after reverse ordering.
(4) RAM3 is 512 deep and 17 bits wide, and is formed by dividing the polynomial coefficient vector e2128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space with the addresses of 0-127 of the RAM 3. 256 numbers of the polynomial coefficient vector P after reverse order arrangement are stored in a storage space with the address of 128-383 in the RAM 3. The storage space with the address 384-511 of the RAM3 is used for storing M in the decryption processDThe data after reverse ordering.
(5) RAM4 with depth of 128 and width of 17 bits3m128 numbers with index values of 0-127 after the reverse order are stored in the memory space with addresses of 0-127 of the RAM 4.
(6) RAM5 with depth of 128 and width of 17 bits3m128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space with the addresses of 0-127 of the RAM 5.
(7) ROM0 is dual-port ROM with depth of 256 and width of 16 bits, and stores 128 numbers with index values of 0-127 after reverse arrangement of scaling factor vector psi into ROM0 with addresses of 0-127, and stores the scaling factor vector psi-1The 128 numbers with the index values of 0-127 after the arrangement are stored in the storage space of the ROM0 with the addresses of 128-255.
(8) The ROM1 is dual-port ROM with depth of 256 and width of 16 bits, and stores 128 numbers with index value of 128-255 after reverse arrangement of vector psi into the memory space with addresses of 0-127 of ROM1, and stores the vector psi-1The 128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space of the ROM1 with the address of 128-255.
(9) ROM2 is a single port ROM with a depth of 60 and a width of 5 bits, and stores twiddle factors for NTT and INTT operations. The twiddle factors required for NTT are stored in the ROM2 at addresses 0-29, and the twiddle factors required for INTT are stored in the ROM2 at addresses 30-59.
(10) ROM3 is a single port ROM, 448 in depth and 16 bits in width, and stores twiddle factors for NTT and INTT operations. The twiddle factors required for NTT are stored in the memory space of ROM3 addresses 0-223, and the twiddle factors required for INTT are stored in the memory space of ROM3 addresses 224-447.
For the generation of the read-write address and the index value of the storage module in the NTT calculation process, a serial processing timing chart of 16-point NTT data read-write and calculation is shown in fig. 3, and data is read out from the RAM, subjected to butterfly operation and modular operation, and then written back to the RAM, which requires 3+1+2 to 6 clock cycles in total. 3n/2log clock cycles are needed for completing one n-point NTT operation, and 96 clock cycles are needed for one 16-point NTT.
For control modulesUnder the control of the whole system, the address generating module is controlled to generate an address, the NTT module is controlled to carry out NTT operation and INTT operation, the pre-calculation module and the post-calculation module are controlled to carry out calculation, and encryption and decryption operation is controlled. In order to accelerate the NTT processing speed, the invention carries out ping-pong operation on the data reading and writing process and the calculation process. Fig. 4 shows a timing chart of 16-point NTT data read-write and calculation ping-pong operations, in which data read-write is performed alternately, the data read-write process is hidden in the data calculation process, and the input data stream and the output data stream are continuous by controlling the RAM address, so as to implement seamless data processing. Since the butterfly operation and the modulo operation require 4 clock cycles, it is necessary to wait 4 clock cycles to obtain the first butterfly operation result. N/2logn +4 clock cycles are needed for completing one n-point NTT operation, and 36 clock cycles are needed for one 16-point NTT. Compared with a serial processing mode, the clock period is reduced by 62.5% by adopting a ping-pong operation mode. The invention adopts a base 2 algorithm, processes a polynomial with the number n of 256 terms, so the number n of the NTT module operation is 256. The relationship between the interval (gap) between two input data of the radix-2 butterfly unit and the NTT operation level number L, and the gap is 2L。
As shown in fig. 5, the RAM address control circuit mainly includes 2 counters, three data selectors, and 10 registers. The data selector selection end sel0 is used for reading and writing control, and when sel0 is 0, a reading address is selected; when sel0 is 1, the write address is selected. The selector selection end sel1 judges whether the NTT operation series L is equal to 7, if yes, the selector outputs 1, otherwise, the output is 2L. The butterfly module needs 4 clock cycles for performing one butterfly operation and one module taking operation, the original address is written back after the 5 th clock cycle, and the write address lags the read address by 5 clock cycles, so that the read address is delayed by 5 registers to obtain the write address.
The first NTT module comprises a first butterfly module, a second butterfly module, a first module taking module and a first reverse module; the first NTT module comprises a first butterfly module and a second butterfly module;
the second NTT module comprises a third butterfly module, a fourth butterfly module, a second module taking module and a second reverse module; the second NTT module comprises a third butterfly module and a fourth butterfly module;
under the control of the control module, the encoder acquires the message m in the storage module and performs encoding processing to obtain the encoded message meAnd storing the data in a storage module;
the four butterfly modules simultaneously read the coded message m from the storage moduleeAnd a third noise e3Adding the obtained noise to obtain a processed third noise e3m;
The first butterfly module obtains the input noise e from the storage module1Low order and the scaling factor psiiAnd pre-calculating to obtain noiseThe low order of the address is stored in the original address of the storage module;
the second butterfly module obtains the input noise e from the storage module1High and the scaling factor psiiAnd pre-calculating to obtain noiseThe high order of the address is stored in the original address of the storage module;
at the same time, the third butterfly module obtains the input noise e from the storage module2Low order and the scaling factor psiiAnd pre-calculating to obtain noiseThe low order of the address is stored in the original address of the storage module;
the fourth butterfly module obtains the input noise e from the storage module2High and the scaling factor psiiAnd pre-calculating to obtain noiseThe high order of the address is stored in the original address of the storage module;
first NTT module reads noise from storage moduleAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E1Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise from the memory moduleAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E2Then storing the address into the original address of the storage module; the butterfly module is a core module of the NTT module, as shown in fig. 6, the butterfly module is composed of two parts, i.e., a channel selector in fig. 6(a) and a radix-2 butterfly unit in fig. 6(b), the channel selector is responsible for selecting input data, and the radix-2 butterfly unit is responsible for calculating; the eight-stage butterfly operation flow is as follows:
the second butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage moduleAnd performing butterfly operation on the high order of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operationHigh position of (2);
the third butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage moduleAnd performing butterfly operation on the low order bits of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operationLow order of (1);
the fourth butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage moduleAnd performing butterfly operation on the high order of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operationHigh position of (2);
when the butterfly stage number L is equal to 1, the intermediate data of the L-1 stage butterfly operation is orderedAndare respectively noiseAnd noise
Step 3, the first modulus taking module obtains an intermediate result of the L-level butterfly operation from the first butterfly module and the second butterfly moduleAnd performing a modulus operation, wherein a modulus module circuit is as shown in fig. 7:
step 3.1, intermediate resultsOf (1) [31:16 ]]Bit sum [15:0 ]]The bits are respectively input into two subtractors, and one of the subtractorsAdding the output result of the subtracter with the modulus q to obtain an addition result;
step 3.2, the output result of the other subtracter is used as a gating signal of the data selector ifOf (1) [31:16 ]]BitIs (15: 0)]If the bit is positive, the data selector outputs the addition result; if it isOf (1) [31:16 ]]BitIs (15: 0)]If the bit is zero, the data selector outputs the output result of the subtracter;
step 3.3, output result of the data selector is compared withOf [32 ]]Bit addition to obtain intermediate result of L-th stage butterfly operation
Similarly, the second modulus-taking module obtains the intermediate result of the L-level butterfly operation from the third butterfly module and the fourth butterfly modulePerforming modular operation to obtain intermediate data of L-level butterfly operation
Step 4, assigning L +1 to L, and then judging L>8, if yes, indicating that the butterfly operation is finished and finally obtaining intermediate dataAndis the operation result E1And E2Otherwise, returning to the step 2 for sequential execution.
The first butterfly module reads the third noise e from the memory module3mLow order and the scaling factor psiiAnd pre-calculating to obtain the calculation resultThe low order of the address is stored into the original address of the storage module; while the second butterfly module slaves the third noise e in the memory module3mHigh order and scaling factor psiiAnd pre-calculating to obtain the calculation resultThe high order of the address is stored into the original address of the storage module;
the first NTT module reads the calculation result from the storage moduleAnd the rotation factor omega is subjected to in-situ NTT operation to obtain an operation result E3M(ii) a Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module1And E2And calculates ciphertext C1=A⊙E1+E2Then sending the data to a second module taking module for module taking processing, and storing the obtained result into the original address of the storage module; meanwhile, the fourth butterfly module reads the public key P and the NTT operation result E from the storage module1And calculates a first parameter PE1=P⊙E1Then writing the address into the original address of the storage module;
the first butterfly module and the second butterfly module pair the calculation resultPerforming butterfly operation of the last stage, and simultaneously reading the parameter P from the storage module by the third butterfly module and the fourth butterfly moduleE1And operation result E3MAnd performing (a) onAdding to obtain ciphertext C2=PE1+E3MThen storing the message m into the original address of the storage module, thereby completing the encryption of the message m;
four butterfly modules respectively read ciphertext C from storage module1、C2And a private key R2Performing dot multiplication and addition to obtain a second parameter MD=C1⊙R2+C2Sending the data to a corresponding reverse module for reverse arrangement, and storing the data into an original address of a storage module;
the first butterfly module and the second butterfly module read the second parameter M from the storage moduleDPerforming the last stage of butterfly operation on the rotation factor omega to obtain two butterfly operation results; meanwhile, the third butterfly module and the fourth butterfly module read the scaling factor psi from the storage module-iPerforming dot product operation on the two butterfly operation results to obtain an INTT calculation result mdAnd storing the address into the original address of the storage module;
after the INTT operation is finished, the four butterfly modules read the INTT calculation result m from the storage moduledAnd a scaling factor psi-iParallel multiplication to obtain post-calculation resultThen writing the address into the original address of the storage module;
under the control of the control module, the decoder obtains the post-calculation result in the storage moduleAnd decoding to obtain the message m, thereby completing the recovery of the message m.
The multiplexing timing of the multiplier is shown in FIG. 8Respectively representing vectors e of coefficients of a pair of polynomials1、e2、e3mA precalculation process of1、E2、E3MRespectively representNTT calculation procedure of (C)1Representing ciphertext C1A calculation process namely A &E1+E2,PE1Indicates P ^ E1Process, C1R2Is represented by C1⊙R2Process, mdRepresents MDThe process of the INTT operation of (a),represents mdAnd (5) post-calculation process. FIG. 9 shows a control block of the multiplier, which controls the timing of the multiplier through the selection terminal, and processes different data at different times. As can be seen from fig. 8, e1And e2The pre-calculation process is combined with the NTT operation main algorithm, namely, two multipliers are respectively adopted to perform parallel pre-calculation on two butterfly inputs at the 0 th level of the NTT algorithm; e.g. of the type3mFour multipliers are adopted for parallel processing in the pre-calculation process;andperforming NTT operation by respectively adopting two multipliers to perform parallel butterfly operation;the precomputation process is combined with the main algorithm of NTT operation, namely, two multipliers are respectively adopted at the 0 th level of the NTT algorithm to carry out parallel precomputation on two butterfly inputs, and in the process of NTT operation, the other two multipliers are utilized to calculate C1And PE1Hiding the part of the multiplication time into the NTT operation process; c1R2The process adopts four multipliers for parallel calculation; mDIn the last stage of INTT operation, two multipliers perform butterfly operation and the other two multipliers perform m pair operationdPerforming post-calculation; after NTT operation is finished, four multipliers m are adopteddPost-calculation is performed. The multiplier is fully utilized to carry out parallel computation, the clock period of encryption and decryption is reduced, the throughput rate of the RLWE encryption processor is improved,in addition, the multiplier in the NTT module is multiplexed in the processes of pre-calculation, post-calculation and encryption result, and the NTT module is multiplexed in the INTT operation process, so that the consumption of hardware resources is reduced.
Claims (2)
1. A high performance fully homomorphic crypto processor circuit based on an RLWE encryption scheme, comprising: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;
the storage module stores polynomial coefficients through a data selector in a reverse storage mode, and comprises: three input noises e1、e2And e3Message m, intermediate coefficient in NTT operation process, and two public keys A, P and private key R after NTT operation2(ii) a The intermediate coefficients include: positive zoom factor psiiInverse scaling factor psi-iA rotation factor ω;
the first NTT module comprises a first butterfly module, a second butterfly module, a first module taking module and a first reverse module; the first NTT module comprises a first butterfly module and a second butterfly module;
the second NTT module comprises a third butterfly module, a fourth butterfly module, a second module taking module and a second reverse module; the second NTT module comprises a third butterfly module and a fourth butterfly module;
under the control of the control module, the encoder acquires the message m in the storage module and performs encoding processing to obtain the encoded message meAnd storing the data in a storage module;
four butterfly modules simultaneously read coded message m from the storage moduleeAnd a third noise e3Adding the obtained noise to obtain a processed third noise e3m;
The first butterfly module obtains the input noise e from the storage module1Low order and the scaling factor psiiAnd pre-calculating to obtain noise e1ψAnd storing the low order of the data into the original address of the storage module;
the second butterfly module obtains the input noise e from the storage module1In the high position ofAnd a scaling factor psiiAnd pre-calculating to obtain noise e1ψAnd storing the high order of the data into the original address of the storage module;
at the same time, the third butterfly module obtains the input noise e from the storage module2Low order and the scaling factor psiiAnd pre-calculating to obtain noise e2ψAnd storing the low order of the data into the original address of the storage module;
the fourth butterfly module obtains the input noise e from the storage module2High and the scaling factor psiiAnd pre-calculating to obtain noise e2ψAnd storing the high order of the data into the original address of the storage module;
the first NTT module reads the noise e from the storage module1ψAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E1Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise e from the storage module2ψAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E2Then storing the address into the original address of the storage module;
the first butterfly module reads the third noise e from the storage module3mLow order and the scaling factor psiiAnd pre-calculating to obtain a calculation result e3mψStoring the low order of the address into the original address of the storage module; while the second butterfly module derives the third noise e from the memory module3mHigh order and scaling factor psiiAnd pre-calculating to obtain a calculation result e3mψThe high order of the address is stored into the original address of the storage module;
the first NTT module reads a calculation result e from the storage module3mψAnd the rotation factor omega is subjected to in-situ NTT operation to obtain an operation result E3M(ii) a Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module1And E2And calculates ciphertext C1=A⊙E1+E2Then sending the result to the second module taking module for module taking processing, and storing the obtained result into the storage moduleIn the original address; meanwhile, the fourth butterfly module reads a public key P and an NTT operation result E from the storage module1And calculates a first parameter PE1=P⊙E1Then writing the address into the original address of the storage module;
the first butterfly module and the second butterfly module pair the calculation result e3mψPerforming the butterfly operation of the last stage, and simultaneously reading the parameter P from the storage module by the third butterfly module and the fourth butterfly moduleE1And operation result E3MAnd then the ciphertext C is obtained by addition calculation2=PE1+E3MThen storing the message m into the original address of the storage module, thereby completing the encryption of the message m;
four butterfly modules respectively read ciphertext C from storage module1、C2And a private key R2Performing dot multiplication and addition to obtain a second parameter MD=C1⊙R2+C2Sending the data to a corresponding reverse module for reverse arrangement, and storing the data into an original address of the storage module;
the first butterfly module and the second butterfly module read the second parameter M from the storage moduleDPerforming the last stage of butterfly operation on the rotation factor omega to obtain two butterfly operation results; simultaneously, the third butterfly module and the fourth butterfly module read the scaling factor psi from the storage module-iPerforming dot product operation on the two butterfly operation results to obtain an INTT calculation result mdAnd storing the address into the original address of the storage module;
after the INTT operation is finished, the four butterfly modules read the INTT calculation result m from the storage moduledAnd a scaling factor psi-iParallel multiplication operation to obtain post-calculation result mdψThen writing the address into the original address of the storage module;
under the control of the control module, the decoder acquires a post-calculation result m in the storage moduledψAnd decoding to obtain the message m, thereby completing the recovery of the message m.
2. The RLWE encryption scheme-based high performance homomorphic encryption processor circuit of claim 1, wherein the two NTT modules perform eight-stage butterfly operations as follows:
step 1, defining the stage number of butterfly operation as L, and initializing L to be 1;
step 2, the first butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module1ψ L-1And the low bit of the twiddle factor omega and performing butterfly operation to obtain an intermediate result e 'of the L-level butterfly operation'1ψ LLow order of (1);
the second butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module1ψ L-1And performing butterfly operation on the high bits of the twiddle factor omega to obtain an intermediate result e 'of the L-level butterfly operation'1ψ LHigh position of (2);
the third butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module2ψ L-1And the low bit of the twiddle factor omega and performing butterfly operation to obtain an intermediate result e 'of the L-level butterfly operation'2ψ LLow order of (1);
the fourth butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module1ψ L-1And performing butterfly operation on the high bits of the twiddle factor omega to obtain an intermediate result e 'of the L-level butterfly operation'2ψ LHigh position of (2);
when the butterfly stage number L is equal to 1, the intermediate data e of the L-1 stage butterfly operation is ordered1ψ L-1And e2ψ L-1Respectively noise e1ψAnd noise e2ψ;
Step 3, the first modulus taking module obtains an intermediate result e 'of the L-level butterfly operation from the first butterfly module and the second butterfly module'1ψ LAnd performing modulus operation:
step 3.1, mixing the intermediate result e'1ψ LRespectively inputting the high order and low order of the input signals into two subtractors, and comparing the output result of one of the subtractors with that of the other subtracterAdding the modulus q to obtain an addition result;
step 3.2, taking the output result of the other subtracter as a gating signal of the data selector if e'1ψ LHigh position of>e’1ψ LIf the bit is low, the data selector outputs the addition result; if e'1ψ LHigh position of<e’1ψ LIf the bit is low, the data selector outputs the output result of the subtracter;
step 3.3, the output result of the data selector is compared with e'1ψ LIs added to obtain an intermediate result e of the L-th level butterfly operation1ψ L;
Similarly, the second modulo module obtains an intermediate result e 'of the L-th level butterfly operation from the third butterfly module and the fourth butterfly module'2ψ LPerforming modular operation to obtain intermediate data e of L-level butterfly operation2ψ L;
Step 4, assigning L +1 to L, and then judging L>8, if yes, indicating that the butterfly operation is finished and finally obtaining intermediate data e1ψ LAnd e2ψ LIs the operation result E1And E2Otherwise, returning to the step 2 for sequential execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111499003.0A CN114172629B (en) | 2021-12-09 | 2021-12-09 | High-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111499003.0A CN114172629B (en) | 2021-12-09 | 2021-12-09 | High-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114172629A true CN114172629A (en) | 2022-03-11 |
CN114172629B CN114172629B (en) | 2023-06-27 |
Family
ID=80484791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111499003.0A Active CN114172629B (en) | 2021-12-09 | 2021-12-09 | High-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114172629B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170155628A1 (en) * | 2015-12-01 | 2017-06-01 | Encrypted Dynamics LLC | Device, system and method for fast and secure proxy re-encryption |
CN110363030A (en) * | 2018-04-09 | 2019-10-22 | 英飞凌科技股份有限公司 | For executing the method and processing equipment of the Password Operations based on lattice |
-
2021
- 2021-12-09 CN CN202111499003.0A patent/CN114172629B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170155628A1 (en) * | 2015-12-01 | 2017-06-01 | Encrypted Dynamics LLC | Device, system and method for fast and secure proxy re-encryption |
CN110363030A (en) * | 2018-04-09 | 2019-10-22 | 英飞凌科技股份有限公司 | For executing the method and processing equipment of the Password Operations based on lattice |
Non-Patent Citations (1)
Title |
---|
陈克非;蒋林智;: "同态加密专栏序言", 密码学报, no. 06 * |
Also Published As
Publication number | Publication date |
---|---|
CN114172629B (en) | 2023-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111832050B (en) | Paillier encryption scheme based on FPGA chip implementation for federal learning | |
US8504602B2 (en) | Modular multiplication processing apparatus | |
Massolino et al. | A compact and scalable hardware/software co-design of SIKE | |
CN115622684B (en) | Privacy computation heterogeneous acceleration method and device based on fully homomorphic encryption | |
CN106100844B (en) | Optimized automatic bilinear pairing encryption method and device based on point blinding method | |
Chen et al. | Towards efficient Kyber on FPGAs: A processor for vector of polynomials | |
CN113628094B (en) | High-throughput SM2 digital signature computing system and method based on GPU | |
US20070206789A1 (en) | Elliptic curve cryptosystem optimization using two phase key generation | |
Hu | Improving the efficiency of homomorphic encryption schemes | |
CN114297571A (en) | Polynomial multiplication hardware implementation system suitable for lattice cipher algorithm | |
Du et al. | High-speed polynomial multiplier architecture for ring-LWE based public key cryptosystems | |
Schoinianakis | Residue arithmetic systems in cryptography: a survey on modern security applications | |
Lou et al. | Falcon: Fast spectral inference on encrypted data | |
CN115001693A (en) | Pure hardware implementation structure of grid-based key encapsulation algorithm OSKR based on FPGA | |
KR20230141045A (en) | Crypto-processor Device and Data Processing Apparatus Employing the Same | |
Yao et al. | Towards crystals-kyber: A m-lwe cryptoprocessor with area-time trade-off | |
Avanzi et al. | Faster scalar multiplication on Koblitz curves combining point halving with the Frobenius endomorphism | |
CN114172629B (en) | High-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme | |
CN111079934B (en) | Number theory transformation unit and method applied to error learning encryption algorithm on ring domain | |
CN117155572A (en) | Method for realizing large integer multiplication in cryptographic technology based on GPU (graphics processing Unit) parallel | |
CN111897578A (en) | Parallel processing method and device for scalar multiplication on elliptic curve with characteristic of 2 | |
JP4423900B2 (en) | Scalar multiplication calculation method, apparatus and program for elliptic curve cryptography | |
CN110224829B (en) | Matrix-based post-quantum encryption method and device | |
CN113190211A (en) | Four-input FIOS modular multiplication algorithm and architecture design for bilinear pairings | |
KR100974624B1 (en) | Method and Apparatus of elliptic curve cryptography processing in sensor mote and Recording medium using it |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |