CN114172629B

CN114172629B - High-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme

Info

Publication number: CN114172629B
Application number: CN202111499003.0A
Authority: CN
Inventors: 杜高明; 郭文杰; 廖秋竹; 宋宇鲲; 尹勇生
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2023-06-27
Anticipated expiration: 2041-12-09
Also published as: CN114172629A

Abstract

The invention discloses a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, which comprises: the device comprises a storage module, an NTT module and a control module; the storage module comprises 6 dual-port RAMs, 2 dual-port ROMs and one single-port ROM, and is responsible for storing intermediate data and input coefficients in the operation process; the NTT module is responsible for NTT calculation, and meanwhile, the internal multiplier is also responsible for pre-calculation and post-calculation; the control module is used for controlling the whole system, controlling the address generation module to generate an address, controlling the NTT module to carry out NTT operation and INTT operation, controlling the pre-calculation module and the post-calculation module to carry out calculation and controlling encryption and decryption operation. The invention can balance the hardware area and the throughput rate, and reduce the hardware resource consumption on the premise of ensuring high throughput rate.

Description

High-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme

Technical Field

The invention belongs to the field of encryption hardware circuit design, and particularly relates to a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme.

Background

After the advent of quantum computers, RSA cryptosystems and Elliptic Curve Cryptosystems (ECC) would no longer be secure. The development of quantum communication and quantum computers has presented unprecedented challenges to traditional cryptography, and quantum attack resistant cryptographic algorithms have become a research hotspot in the cryptographic world today. Lattice-based cryptographic systems are good candidates for replacing traditional cryptographic systems, and many cryptographic schemes have been proposed, such as identity schemes, digital signature schemes, and the like. Many of these encryption schemes are based on RLWE security. In 2009 Regev et al raised the LWE problem and demonstrated that its security could be reduced to a grid-difficult problem (SVP or CVP). In 2010, lyubashevsky et al proposed RLWE, introducing ideal lattices in LWE, reducing complexity while providing the same level of security. RLWE and its variants have lower complexity compared to previous public key encryption schemes. Due to the safety and easy realization, the RLWE public key encryption system has wide application prospect in numerous applications such as cloud computing, 5G communication, data aggregation, personal health data management, training of neural networks on encrypted data and the like. In recent years, RLWE isomorphic encryption schemes have been widely studied in both software and hardware. Clercq et al propose a software implementation of the RLWE encryption system, Z Liu et al implements the RLWE encryption scheme on the armeon and MSP430 architecture. Tan et al propose a high security level fingerprint authentication system based on RLWE cryptographic scheme. Tuy Nguyen Tan et al propose a video face encryption and decryption method based on RLWE on a GPU. Experimental results show that the implementation speed of face encryption and decryption operation based on RLWE is about 100 times that of the implementation speed of the GPU.

Polynomial multiplication is one of the most critical, time-consuming operations in an RLWE public key encryption system, and the efficiency of the polynomial multiplier determines the performance of the RLWE encryption processor.

LWE encryption has proven viable in software and a first hardware design for FPGA-based LWE encryption schemes was proposed. Since a fully parallel architecture is adopted, the encrypted throughput is 316 times higher than the performance of the software implementation, but a large amount of logic resources are consumed. />

The et al propose an efficient compact RLWE encryption hardware architecture. The architecture comprises 2 fast fourier transforms (Fast Fourier Transform, FFT) and 3 Inverse fast fourier transforms (Inverse)Fast Fourier Transform, IFFT), one butterfly unit is designed to calculate FFT and IFFT, thereby reducing the use of hardware area, but only one butterfly unit is used for calculation, and parallelism of FFT cannot be utilized. Roy et al propose an efficient compact RLWE encryption processor that optimizes the RLWE encryption scheme, reduces the number of NTT operations from 5 to 4, combines the NTT algorithm with the "negative-folding" convolution, and avoids the pre-computation of the "negative-folding" convolution. Liu et al designed a universal modular unit and proposed a resource efficient RLWE encryption processor that was resistant to side channel attacks, but the architecture performed RLWE encryption and decryption in a sequential processing manner with lower throughput rates of 0.056Mbps and 0.28Mbps, respectively. Velasco Medina et al propose a high throughput RLWE encryption hardware architecture that employs base 2, 8 multipath delay NTT algorithms whose encryption and decryption throughput rates reach megabits per second and gigabits per second, respectively, but consume significant hardware resources and are not suitable for implementation on resource-constrained FPGA development boards.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, so that the balance between the area and the throughput rate can be realized, and the consumption of hardware resources is reduced on the premise of ensuring the high throughput rate.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the invention discloses a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, which is characterized by comprising the following components: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;

the storage module stores polynomial coefficients in a reverse order storage mode through a data selector, and comprises: three input noise e ₁ 、e ₂ And e ₃ Intermediate coefficient in message m, NTT operation process, two public keys A, P and private key R after NTT operation ₂ The method comprises the steps of carrying out a first treatment on the surface of the The intermediate coefficients include: positive scaling factor ψ ⁱ Inverse scaling factor ψ ^-i A twiddle factor omega;

the first NTT module comprises a first butterfly module, a second butterfly module, a first module taking module and a first reverse order module; the first NTT module comprises a first butterfly module and a second butterfly module;

the second NTT module comprises a third butterfly module, a fourth butterfly module, a second module taking module and a second reverse order module; the second NTT module comprises a third butterfly module and a fourth butterfly module;

under the control of the control module, the encoder acquires the message m in the storage module and performs encoding processing to obtain an encoded message m _e And storing in a storage module;

four butterfly modules simultaneously read the coded message m from the memory module _e And third noise e ₃ And adding to obtain processed third noise e _3m ；

The first butterfly module obtains input noise e from the memory module ₁ Low order bits of (a) and a scaling factor ψ ⁱ And pre-calculating to obtain noise

The lower order of (2) is stored in the original address of the storage module;

the second butterfly module obtains input noise e from the memory module ₁ High order bits of (and scaling factor psi) ⁱ And pre-calculating to obtain noise

The high order bits of the data are stored in the original address of the storage module;

at the same time, the third butterfly module obtains the input noise e from the storage module ₂ Low order bits of (a) and a scaling factor ψ ⁱ And pre-calculating to obtain noise

The lower order of (2) is stored in the original address of the storage module;

the fourth butterfly module is configured to store the data from the memoryObtaining input noise e in a module ₂ High order bits of (and scaling factor psi) ⁱ And pre-calculating to obtain noise

the first NTT module reads the noise from the storage module

And rotating the factor omega and performing butterfly operation to obtain an operation result E ₁ Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise e from the storage module _2ψ And rotating the factor omega and performing butterfly operation to obtain an operation result E ₂ Then storing the address into the original address of the storage module;

the first butterfly module reads the third noise e from the memory module _3m Low order and scaling factor ψ of (2) ⁱ Pre-calculating to obtain calculation result

The lower order bits of the address are stored in the original address of the storage module; while the second butterfly module receives the third noise e from the memory module _3m High order and scaling factor ψ of (2) ⁱ Pre-calculating to obtain calculation result

The high order bits of the address are stored in the original address of the storage module;

the first NTT module reads the calculation result from the storage module

And rotating factor omega and performing in-situ NTT operation to obtain operation result E _3M The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module ₁ And E is ₂ And calculates ciphertext C ₁ ＝A⊙E ₁ +E ₂ Then sending the result to the second module to perform module picking processing, and storing the obtained result into the original address of the storage module; meanwhile, the fourth butterfly module reads the public key P and the NTT operation result E from the memory module ₁ And calculate the first parameter P _E1 ＝P⊙E ₁ Writing into the original address of the storage module;

the first butterfly module and the second butterfly module pair the calculation result

Performing the butterfly operation of the last stage, and simultaneously, the third butterfly module and the fourth butterfly module read the parameter P from the storage module _E1 Sum operation result E _3M And adds to obtain ciphertext C ₂ ＝P _E1 +E _3M Then storing the encrypted message m into the original address of the storage module, thereby completing the encryption of the message m;

the four butterfly modules respectively read ciphertext C from the storage module ₁ 、C ₂ And private key R ₂ And performing dot multiplication and addition operation to obtain a second parameter M _D ＝C ₁ ⊙R ₂ +C ₂ And the data are sent to the corresponding reverse order modules for reverse order arrangement and then stored in the original addresses of the storage modules;

the first butterfly module and the second butterfly module read the second parameter M from the memory module _D And rotating the factor omega and performing the butterfly operation of the last stage to obtain two butterfly operation results; at the same time, the third butterfly module and the fourth butterfly module read the scaling factor ψ from the memory module ^-i And performing dot product operation on the two butterfly operation results to obtain an INTT calculation result m _d And storing the original address of the storage module;

after the INTT operation is finished, four butterfly modules read the INTT calculation result m from the storage module _d And a scaling factor psi ^-i Parallel multiplication operation to obtain post-calculation result

Writing into the original address of the storage module;

under the control of the control module, the decoder obtains the post-calculation result in the storage module

And decoding to obtain the message m, thereby completing the recovery of the message m.

The high-performance fully homomorphic encryption processor circuit based on the RLWE encryption scheme is also characterized in that the two NTT modules perform eight-level butterfly operation according to the following process:

step 1, defining the number of stages of butterfly operation as L, and initializing L=1;

step 2, the first butterfly module reads intermediate data of the L-1 level butterfly operation from the storage module

Lower bits of twiddle factor omega and lower bits of twiddle factor omega are subjected to butterfly operation to obtain intermediate result of L-level butterfly operation +.>

Is lower in the (2) range;

the second butterfly module reads intermediate data of the L-1 level butterfly operation from the memory module

The high order of the twiddle factor omega and the high order of the twiddle factor omega are subjected to butterfly operation to obtain an intermediate result of the L-th level butterfly operation +.>

Is higher than the upper position of the lower part;

the third butterfly module reads intermediate data of the L-1 level butterfly operation from the memory module

Is lower in the (2) range;

the fourth butterfly module reads intermediate data of the L-1 level butterfly operation from the memory module

Is higher than the upper position of the lower part;

when the butterfly level L=1, let the intermediate data of the L-1 level butterfly operation

And->

Noise->

And noise->

Step 3, the first modulus module obtains intermediate results of the L-th level butterfly operation from the first butterfly module and the second butterfly module

And performing modular operation:

step 3.1, intermediate results

Respectively inputting the high order and the low order of the (a) into two subtractors, and adding the output result of one subtracter with the modulus q to obtain an addition result;

step 3.2, using the output result of another subtracter as dataGating signal of selector, if

High order of (2)>/>

The data selector outputs the addition result; if->

High order +.>

The data selector outputs the output result of the subtracter;

step 3.3, combining the output result of the data selector with

The highest bit of (2) is added to obtain the intermediate result of the L-level butterfly operation +.>

Similarly, the second modulus module obtains intermediate results of the L-level butterfly operation from the third butterfly module and the fourth butterfly module

And performing modulo operation to obtain intermediate data of L-level butterfly operation>

Step 4, after L+1 is assigned to L, judging L>8 is established, if so, the butterfly operation is ended, and finally the obtained intermediate data is obtained

And->

Namely the operation result E ₁ And E is ₂ Otherwise, returning to the step 2 for sequential execution.

Compared with the prior art, the invention has the beneficial technical effects that:

1. according to the invention, the NTT operation and the ciphertext calculation in the encryption process are processed in parallel, and meanwhile, in the processing process of the NTT operation and the INTT operation, the read-write process and the calculation process of the data are subjected to ping-pong operation, so that the read-write period of the data is hidden, the delay of the RLWE encryption processor is reduced, and the throughput rate of a hardware architecture is improved.

2. The invention designs a hardware architecture for multiplexing resources, wherein the encryption and decryption processes multiplex multipliers and adders in the butterfly module, the INTT multiplexes the NTT circuit structure, and the memory module multiplexes the same control circuit, thereby reducing the hardware resource consumption of the RLWE encryption processor and improving the resource efficiency.

3. The invention designs an RLWE encryption processor with medium security level (n=256, q=65537), and circuit implementation and hardware test are completed on a Spartan-6FPGA development platform. The result shows that the encryption period of the RLWE encryption processor is only 2.38k, and the decryption period is only 1.69k. The throughput rate reaches 21.01Mbps and 29.60Mbps, and the performance of the RLWE encryption processor is obviously improved.

Drawings

FIG. 1 is a hardware architecture of an RLWE encryption processor of the present invention;

FIG. 2 is a schematic diagram of an RLWE encryption scheme of the present invention;

FIG. 3 is a timing diagram of conventional data read-write and calculation serial processing;

FIG. 4 is a timing diagram of the ping-pong operation of data reading and writing and calculation according to the present invention;

FIG. 5 is a block diagram of a dual port RAM address control circuit according to the present invention;

FIG. 6 is a diagram of a butterfly module frame of the present invention;

FIG. 7 is a diagram of a modular frame of the present invention;

FIG. 8 is a time division multiplexing timing diagram of the multiplier according to the present invention;

fig. 9 is a block diagram of a multiplier time division multiplexing module according to the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, a high-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme includes: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;

encryption scheme of RLWE as shown in fig. 2, the algorithm flow of the scheme includes three steps of key generation, encryption and decryption.

RLWE public and private key generation algorithm generates public (A, P) and private (R) keys ₂ . The description is as follows:

(1) Polynomial a is selected from uniform random distribution, and two n-dimensional vectors r are selected from discrete Gaussian distribution ₁ ,r ₂ 。

(2) n-dimensional vectors a, r ₁ ,r ₂ Performing point multiplication operation on the scaling factor vector psi to obtain an n-dimensional vector a _ψ ,

(3) For n-dimensional vector a _ψ ,

And performing NTT operation and converting to an NTT domain. NTT operation result is marked as A, R ₁ ,R ₂ 。

(4) Calculate p=r ₁ -A⊙R ₂ . Wherein the n-dimensional vector (A, P) is the public key of the RLWE encryption scheme, and the n-dimensional vector R ₂ Is the private key of the RLWE encryption scheme.

RLWE encryption algorithm generates ciphertext (C) using public key (a, P ₁ ,C ₂ ). The description is as follows:

(1) Three n-dimensional vectors e from discrete gaussian distribution ₁ ,e ₂ ,e ₃ 。

(2) Encoding a message m using an encoding function, the result of the encoding adding an n-dimensional vector e ₃ Obtaining an n-dimensional vector e _3m 。

(3) n-dimensional vector e ₁ ,e ₂ ,e _3m Performing point multiplication operation on the scaling factor vector psi to obtain a vector

(4) For n-dimensional vectors

And performing NTT operation and converting to an NTT domain. The NTT operation result is marked as E ₁ ,E ₂ ,E _3M 。

(5) Calculation C ₁ ＝A⊙E ₁ +E ₂ ,C ₂ ＝P⊙E ₁ +E _3M N-dimensional vector (C ₁ ,C ₂ ) Is ciphertext.

RLWE decryption algorithm uses ciphertext (C ₁ ,C ₂ ) And private key R ₂ Recovering message m. The description is as follows:

(1) Calculate M _D ＝C ₁ ⊙R ₂ +C ₂ 。

(2) For n-dimensional vector M _D And performing INTT operation and converting to an INTT domain. The INTT operation result is recorded as m _d 。

(3) n-dimensional vector m _d And a scaling factor vector ψ ^-1 Performing point multiplication operation to obtain n-dimensional vector m _dψ 。

(4) The message m is obtained by a decoding function.

the memory module includes 6 dual port RAMs and 2 dual port ROMs and one single port ROM.

(1) The depth of RAM0 is 256, the width is 17bit, and the polynomial coefficient vector e is calculated ₁ 128 numbers with index value of 0-127 are stored in the memory space with RAM0 address of 0-127 after being arranged in reverse order, and polynomial coefficient vector R ₂ 128 numbers with index value of 0-127 are stored in the memory space with RAM0 address of 128-255 after the reverse order arrangement.

(2) The depth of the RAM1 is 256, the width is 17bit, and the polynomial coefficient vector e is calculated ₁ 128 numbers with index values of 128-255 are stored in the memory space with RAM1 addresses of 0-127 after being arranged in reverse order, and polynomial coefficient vector R ₂ 128 numbers with index values of 128-255 are stored in the memory space with addresses of 128-255 of the RAM1 after being arranged in the reverse order.

(3) RAM2 depth 512 and width 17bit, polynomial coefficient vector e ₂ 128 numbers with index value of 0-127 are stored in the memory space with RAM2 addresses of 0-127 after the reverse order arrangement. And storing 256 numbers which are arranged in the reverse order of the polynomial coefficient vector A into a storage space with the RAM2 addresses of 128-383. RAM2 memory space with 384-511 addresses for storing M during decryption _D The data after the reverse order arrangement.

(4) RAM3 depth 512 and width 17bit, polynomial coefficient vector e ₂ 128 numbers with index values of 128-255 are stored in the memory space with addresses of 0-127 of the RAM3 after the reverse order arrangement. 256 numbers after the polynomial coefficient vector P is arranged in the reverse order are stored in the storage space with the addresses 128-383 of the RAM 3. RAM3 memory space with 384-511 addresses for storing M during decryption _D The data after the reverse order arrangement.

(5) RAM4 depth 128, width 17bit, polynomial coefficient vector e _3m 128 numbers with index value of 0-127 are stored in the memory space with the addresses of 0-127 of RAM4 after the reverse order arrangement.

(6) RAM5 depth 128, width 17bit, polynomial coefficient vector e _3m 128 numbers with index values of 128-255 are stored in the memory space with addresses of 0-127 of the RAM5 after the reverse arrangement.

(7) ROM0 is dual-port ROM with depth of 256 and width of 16bit, 128 numbers with index value of 0-127 are stored in the storage space with ROM0 address of 0-127 after the scaling factor vector psi is arranged in reverse order, and the scaling factor vector psi is stored in the storage space with the address of 0-127 ^-1 128 numbers with index values of 0-127 are stored in the memory space with ROM0 addresses of 128-255 after arrangement.

(8) ROM1 is dual-port ROM with depth of 256 and width of 16bit, 128 numbers with index value of 128-255 after vector psi is arranged in reverse order are stored in ROM1 address0-127, vector ψ is stored in memory space ^-1 128 numbers with index values of 128-255 are stored in the storage space with ROM1 addresses of 128-255 after the reverse order arrangement.

(9) ROM2 is a single port ROM, 60 depth, 5bit width, and stored twiddle factors for NTT and INTT operations. The twiddle factors required for NTT are stored in the memory space with ROM2 addresses 0-29, and the twiddle factors required for INTT are stored in the memory space with ROM2 addresses 30-59.

(10) ROM3 is a single port ROM, has a depth 448, a width 16 bits, and stored NTT and INTT twiddle factors. The twiddle factors required for NTT are stored in the memory space with ROM3 addresses 0-223, and the twiddle factors required for INTT are stored in the memory space with ROM3 addresses 224-447.

For the generation of the read-write address and index value of the memory module in the NTT calculation process, a 16-point NTT data read-write and calculation serial processing time sequence diagram is shown in fig. 3, and the data is read out from the RAM, subjected to butterfly operation and modulo operation, and rewritten back into the RAM, and requires 3+1+2=6 clock cycles in total. A total of 3n/2logn clock cycles are required to complete an n-point NTT operation, and 96 clock cycles are required for a 16-point NTT.

The control module is used for controlling the whole system, controlling the address generation module to generate an address, controlling the NTT module to carry out NTT operation and INTT operation, controlling the pre-calculation module and the post-calculation module to carry out calculation and controlling encryption and decryption operation. In order to accelerate the NTT processing speed, the invention carries out ping-pong operation on the data reading and writing process and the calculation process. The time sequence diagram of the 16-point NTT data reading and writing and the calculating ping-pong operation is shown in fig. 4, and it can be seen from the diagram that the data reading and writing are alternately performed, the data reading and writing process is hidden in the data calculating process, and the input data stream and the output data stream are enabled to be continuous by controlling the RAM address, so that the seamless processing of the data is realized. Since the butterfly operation and the modulo operation require 4 clock cycles in total, it is necessary to wait 4 clock cycles to obtain the first butterfly operation result. A total of n/2 log+4 clock cycles are required to complete an n-point NTT operation, and a total of 36 clock cycles are required for a 16-point NTT. Compared with a serial processing mode, the clock cycle of the ping-pong operation mode is adoptedThe reduction is 62.5 percent. The invention adopts a base 2 algorithm to process a polynomial with the term n=256, so that the number of points n=256 calculated by the NTT module. Relationship between interval (gap) between two input data of radix-2 butterfly unit and NTT operation level L, gap=2 ^L 。

The RAM address control circuit is mainly composed of 2 counters, three data selectors, and 10 registers, as shown in fig. 5. The data selector selects the terminal sel0 as read-write control, and selects the read address when sel0=0; sel0=1, the write address is selected. The selector select terminal sel1 judges whether the NTT operation level L is equal to 7, if so, the selector outputs 1, otherwise, the output is 2 ^L . Because the butterfly module takes 4 clock cycles for one butterfly operation and one modulo operation, the original address is written back after the 5 th clock cycle, and the read address is delayed by 5 clock cycles after the write address, the write address is obtained after the read address is delayed by 5 registers.

four butterfly modules simultaneously read coded message m from the memory module _e And third noise e ₃ And adding to obtain processed third noise e _3m ；

The first butterfly module acquires input noise e from the memory module ₁ Low order bits of (a) and a scaling factor ψ ⁱ And pre-calculating to obtain noise

Is stored in the original address of the memory module;

High order bits of the memory module;

meanwhile, the third butterfly module acquires the input noise e from the memory module ₂ Low order bits of (a) and a scaling factor ψ ⁱ And pre-calculating to obtain noise

Is stored in the original address of the memory module;

the fourth butterfly module obtains input noise e from the memory module ₂ High order bits of (and scaling factor psi) ⁱ And pre-calculating to obtain noise

High order bits of the memory module;

the first NTT module reads noise from the storage module

And rotating the factor omega and performing butterfly operation to obtain an operation result E ₁ Then storing the address into the original address of the memory module; at the same time, the second NTT module reads the noise +.>

And rotating the factor omega and performing butterfly operation to obtain an operation result E ₂ Then storing the address into the original address of the memory module; the butterfly module is a core module of the NTT module, as shown in fig. 6, and is composed of a channel selector of fig. 6 (a) and a radix-2 butterfly unit of fig. 6 (b), wherein the channel selector is responsible for selecting input data, and the radix-2 butterfly unit is responsible for calculating; the eight-stage butterfly operation flow is as follows:

step 2, the first butterfly module reads intermediate data of the L-1 level butterfly operation from the memory module

Is lower in the (2) range;

Is higher than the upper position of the lower part;

Is lower in the (2) range;

Is higher than the upper position of the lower part;

when the butterfly stage number L=1, let the firstIntermediate data of L-1 level butterfly operation

And->

Noise->

And noise->

And performing a modulus operation, the modulus module circuit is as shown in fig. 7:

step 3.1, intermediate results

[31:16 ]]Bit sum [15:0 ]]The bits are respectively input into two subtractors, and the output result of one subtracter is added with the modulus q to obtain an addition result;

step 3.2, using the output of the other subtracter as the strobe signal of the data selector if

[31:16 ]]Bit->

[15:0 ]]Bit, the data selector outputs the addition result; if->

[31:16 ]]Bit->

[15:0 ]]Bit, then numberOutputting the output result of the subtracter according to the selector;

step 3.3, combining the output result of the data selector with

[32 of]Bit addition to obtain intermediate result +.>

Similarly, the second modulus module obtains intermediate results of the L-th level butterfly operation from the third butterfly module and the fourth butterfly module

And->

Storing the lower bits of the address in the original address of the memory module; while the second butterfly module receives the third noise e from the memory module _3m High order and scaling factor ψ of (2) ⁱ And pre-calculating to obtain a calculation result +.>

The high order bits of the address are stored in the original address of the memory module;

the first NTT module reads the calculation result from the storage module

And rotating factor omega and performing in-situ NTT operation to obtain operation result E _3M The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module ₁ And E is ₂ And calculates ciphertext C ₁ ＝A⊙E ₁ +E ₂ Then, the result is sent to a second module for module picking processing, and the obtained result is stored in the original address of the storage module; meanwhile, the fourth butterfly module reads the public key P and the NTT operation result E from the storage module ₁ And calculate the first parameter P _E1 ＝P⊙E ₁ Writing into the original address of the memory module;

the first butterfly module and the second butterfly module pair calculate results

Performing the butterfly operation of the last stage, and simultaneously, reading the parameter P from the storage module by the third butterfly module and the fourth butterfly module _E1 Sum operation result E _3M And adds to obtain ciphertext C ₂ ＝P _E1 +E _3M Then storing the encrypted message m into the original address of the storage module, thereby completing the encryption of the message m;

the first butterfly module and the second butterfly module read the second parameter M from the memory module _D And rotating the factor omega and performing the butterfly operation of the last stage to obtain two butterfly operation results; meanwhile, the third butterfly module and the fourth butterfly module read the scaling factor psi from the memory module ^-i And twoPerforming point multiplication operation on the butterfly operation result to obtain an INTT calculation result m _d Storing the original address of the storage module;

after the INTT operation is finished, four butterfly modules read INTT calculation results m from the storage module _d And a scaling factor psi ^-i Parallel multiplication operation to obtain post-calculation result

Writing into the original address of the memory module;

The multiplexing time sequence of the multiplier is shown in FIG. 8, and is as follows

Respectively represent the pair polynomial coefficient vector e ₁ 、e ₂ 、e _3m Pre-calculation process of E ₁ 、E ₂ 、E _3M Respectively indicate->

NTT operation procedure of C ₁ Representing ciphertext C ₁ The calculation process is A.sub.E ₁ +E ₂ ，P _E1 Represents P.sup.E ₁ Procedure C ₁ R ₂ Represent C ₁ ⊙R ₂ Procedure, m _d Represents M _D INTT operation procedure of->

Represents m _d And (5) a post-calculation process. Fig. 9 shows a multiplier control module, which controls the time sequence of the multiplier through the selection end to process different data at different times. As can be seen from fig. 8, e ₁ And e ₂ The pre-calculation process is combined with the NTT operation main algorithm, namely, two multipliers are respectively adopted for two inputs of the butterfly shape in the 0 th stage of the NTT algorithmPerforming parallel pre-calculation; e, e _3m Four multipliers are adopted for parallel processing in the pre-calculation process; />

And->

Respectively adopting two multipliers to perform parallel butterfly operation in the NTT operation process; />

The pre-calculation process is combined with the NTT operation main algorithm, namely two multipliers are respectively adopted to perform parallel pre-calculation on two butterfly inputs in the 0 th stage of the NTT algorithm, and the other two multipliers are utilized to calculate C in the NTT operation process ₁ And P _E1 Hiding the part of multiplication operation time into the NTT operation process; c (C) ₁ R ₂ The process adopts four multipliers to calculate in parallel; m is M _D The last stage of INTT operation is carried out, two multipliers are used for butterfly operation, and the other two multipliers are used for m _d Performing post-calculation; after NTT operation is finished, four multipliers m are adopted _d Post-calculation was performed. The multiplier is fully utilized for parallel computation, the clock period of encryption and decryption is reduced, the throughput rate of the RLWE encryption processor is improved, in addition, the multiplier in the NTT module is multiplexed in the process of pre-computation, post-computation and encryption result, the NTT module is multiplexed in the process of INTT operation, and the consumption of hardware resources is reduced.

Claims

1. A high performance fully homomorphic encryption processor circuit based on RLWE encryption scheme, comprising: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;

The first butterfly module obtains input noise e from the memory module ₁ Low order bits of (a) and a scaling factor ψ ⁱ And pre-calculating to obtain noise e _1ψ The lower order of (2) is stored in the original address of the storage module;

the second butterfly module obtains input noise e from the memory module ₁ High order bits of (and scaling factor psi) ⁱ And pre-calculating to obtain noise e _1ψ The high order bits of the data are stored in the original address of the storage module;

at the same time, the third butterfly module obtains the input noise e from the storage module ₂ Low order bits of (a) and a scaling factor ψ ⁱ And pre-calculating to obtain noise e _2ψ The lower order of (2) is stored in the original address of the storage module;

the fourth butterfly module obtains input noise e from the memory module ₂ High order bits of (and scaling factor psi) ⁱ And pre-calculating to obtain noise e _2ψ The high order bits of the data are stored in the original address of the storage module;

the first NTT module reads the noise e from the storage module _1ψ And twiddle factor omega incorporatePerforming butterfly operation to obtain an operation result E ₁ Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise e from the storage module _2ψ And rotating the factor omega and performing butterfly operation to obtain an operation result E ₂ Then storing the address into the original address of the storage module;

the first butterfly module reads the third noise e from the memory module _3m Low order and scaling factor ψ of (2) ⁱ And pre-calculating to obtain a calculation result e _3mψ The lower order bits of the address are stored in the original address of the storage module; while the second butterfly module receives the third noise e from the memory module _3m High order and scaling factor ψ of (2) ⁱ And pre-calculating to obtain a calculation result e _3mψ The high order bits of the address are stored in the original address of the storage module;

the first NTT module reads the calculated result e from the storage module _3mψ And rotating factor omega and performing in-situ NTT operation to obtain operation result E _3M The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module ₁ And E is ₂ And calculates ciphertext C ₁ ＝A⊙E ₁ +E ₂ Then sending the result to the second module to perform module picking processing, and storing the obtained result into the original address of the storage module; meanwhile, the fourth butterfly module reads the public key P and the NTT operation result E from the memory module ₁ And calculate the first parameter P _E1 ＝P⊙E ₁ Writing into the original address of the storage module;

the first butterfly module and the second butterfly module pair the calculation result e _3mψ Performing the butterfly operation of the last stage, and simultaneously, the third butterfly module and the fourth butterfly module read the parameter P from the storage module _E1 Sum operation result E _3M And adds to obtain ciphertext C ₂ ＝P _E1 +E _3M Then storing the encrypted message m into the original address of the storage module, thereby completing the encryption of the message m;

after the INTT operation is finished, four butterfly modules read the INTT calculation result m from the storage module _d And a scaling factor psi ^-i Parallel multiplication operation is carried out to obtain a post-calculation result m _dψ Writing into the original address of the storage module;

under the control of the control module, the decoder obtains the post-calculation result m in the storage module _dψ And decoding to obtain the message m, thereby completing the recovery of the message m.

2. The RLWE encryption scheme based high performance isomorphic encryption processor circuit of claim 1, wherein the two NTT modules perform eight stages of butterfly operations as follows:

step 2, the first butterfly module reads intermediate data e of the L-1 level butterfly operation from the storage module _1ψ ^L-1 And the lower bits of the twiddle factor omega are subjected to butterfly operation to obtain an intermediate result e 'of the L-level butterfly operation' _1ψ ^L Is lower in the (2) range;

the second butterfly module reads intermediate data e of the L-1 level butterfly operation from the storage module _1ψ ^L-1 The higher order of the twiddle factor omega and the higher order of the twiddle factor omega are subjected to butterfly operation to obtain an L-th stageIntermediate result e 'of butterfly operation' _1ψ ^L Is higher than the upper position of the lower part;

the third butterfly module reads intermediate data e of the L-1 level butterfly operation from the memory module _2ψ ^L-1 And the lower bits of the twiddle factor omega are subjected to butterfly operation to obtain an intermediate result e 'of the L-level butterfly operation' _2ψ ^L Is lower in the (2) range;

the fourth butterfly module reads intermediate data e of the L-1 level butterfly operation from the memory module _1ψ ^L-1 And the high order of the twiddle factor omega are subjected to butterfly operation to obtain an intermediate result e 'of the L-th level butterfly operation' _2ψ ^L Is higher than the upper position of the lower part;

when the butterfly level L=1, let the intermediate data e of the L-1 level butterfly operation _1ψ ^L-1 And e _2ψ ^L-1 Respectively noise e _1ψ And noise e _2ψ ；

Step 3, the first modulus module obtains an intermediate result e 'of the L-th level butterfly operation from the first butterfly module and the second butterfly module' _1ψ ^L And performing modular operation:

step 3.1, intermediate result e' _1ψ ^L Respectively inputting the high order and the low order of the (a) into two subtractors, and adding the output result of one subtracter with the modulus q to obtain an addition result;

step 3.2, using the output of the other subtracter as the strobe signal of the data selector, if e' _1ψ ^L High order of (2)>e’ _1ψ ^L The data selector outputs the addition result; if e' _1ψ ^L High order of (2)<e’ _1ψ ^L The data selector outputs the output result of the subtracter;

step 3.3, combining the output result of the data selector with e' _1ψ ^L Adding the highest bit of the L-level butterfly operation to obtain an intermediate result e _1ψ ^L ；

Similarly, the second modulus module obtains the middle of the L-level butterfly operation from the third butterfly module and the fourth butterfly moduleResult e' _2ψ ^L And performing modulo operation to obtain intermediate data e of the L-th butterfly operation _2ψ ^L ；

Step 4, after L+1 is assigned to L, judging L>8, if so, indicating that the butterfly operation is finished, and finally obtaining intermediate data e _1ψ ^L And e _2ψ ^L Namely the operation result E ₁ And E is ₂ Otherwise, returning to the step 2 for sequential execution.