CN114172629A

CN114172629A - High-performance fully-homomorphic encryption processor circuit based on RLWE encryption scheme

Info

Publication number: CN114172629A
Application number: CN202111499003.0A
Authority: CN
Inventors: 杜高明; 郭文杰; 廖秋竹; 宋宇鲲; 尹勇生
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-11
Anticipated expiration: 2041-12-09
Also published as: CN114172629B

Abstract

The invention discloses a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, which comprises: the device comprises a storage module, an NTT module and a control module; the storage module comprises 6 dual-port RAMs, 2 dual-port ROMs and a single-port ROM and is responsible for storing intermediate data and input coefficients in the operation process; the NTT module is responsible for NTT calculation, and an internal multiplier is also responsible for pre-calculation and post-calculation; the control module is used for controlling the whole system, controlling the address generation module to generate an address, controlling the NTT module to perform NTT operation and INTT operation, controlling the pre-calculation module and the post-calculation module to perform calculation, and controlling encryption and decryption operation. The invention can balance the hardware area and the throughput rate, and reduce the hardware resource consumption on the premise of ensuring the high throughput rate.

Description

High-performance fully-homomorphic encryption processor circuit based on RLWE encryption scheme

Technical Field

The invention belongs to the field of encryption hardware circuit design, and particularly relates to a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme.

Background

After the quantum computer is released, the RSA cryptosystem and the Elliptic Curve Cryptosystem (ECC) are no longer safe. The development of quantum communication and quantum computers brings unprecedented challenges to the traditional cryptographic technology, and the quantum attack resisting cryptographic algorithm becomes a research hotspot of the current cryptology community. Lattice-based cryptosystems are good candidates for replacing traditional cryptosystems, and many cryptographic schemes such as identity schemes, digital signature schemes, etc. have been proposed. Many of these encryption schemes are based on RLWE security. In 2009, Regev et al presented the LWE problem and demonstrated that its safety could be reduced to the case difficult problem (SVP or CVP). In 2010, lyubaschevsky et al proposed RLWE, introducing ideal lattices in LWE, reducing complexity while providing the same level of security. RLWE and its variants have lower complexity than previous public key encryption schemes. Due to the safety and the easy implementation, the RLWE public key encryption system has wide application prospects in numerous applications such as cloud computing, 5G communication, data aggregation, personal health data management, neural network training on encrypted data and the like. In recent years, the RLWE fully homomorphic encryption scheme has been extensively studied in both software and hardware. Clercq et al propose a software implementation of RLWE encryption system, and Z Liu et al implement the RLWE encryption scheme on the armmeon and MSP430 architectures. Tan et al propose a high security level fingerprint authentication system based on the RLWE cryptographic scheme. Tuy Nguyen Tan et al propose a method for realizing video face encryption and decryption based on RLWE on a GPU. Experimental results show that the speed of face encryption and decryption operations based on RLWE is about 100 times that of face encryption and decryption operations on a GPU.

Polynomial multiplication is one of the most critical and time-consuming operations in RLWE public key cryptosystems, and the efficiency of the polynomial multiplier determines the performance of the RLWE crypto processor.

Et al have demonstrated that LWE encryption is feasible in software, and proposed the hardware design of the first FPGA-based LWE encryption scheme. Due to the adoption of a fully parallel architecture, the throughput of encryption is 316 times higher than the performance of software implementation, but a large amount of logic resources are consumed.

Et al propose an efficient compact RLWE encryption hardware architecture. The architecture comprises 2 Fast Fourier Transform (FFT) times and 3 Inverse Fast Fourier Transform (IFFT) times, and a butterfly unit is designed to calculate the FFT and the IFFT, so that the use of hardware area is reduced, but only one butterfly unit is used for calculation, and the parallelism of the FFT cannot be utilized. Roy et al propose an efficient and compact RLWE encryption processor, optimize the RLWE encryption scheme, reduce the number of NTT operations from 5 to 4, and simultaneously combine the NTT algorithm with the "negative folding" convolution, avoiding the pre-calculation of the "negative folding" convolution. Liu et al designed a universal modular unit and proposed an RLWE encryption processor that is resource efficient and can resist side channel attacks, but this architecture performs RLWE encryption and decryption in a sequential processing manner with low throughput rates of 0.056Mbps and 0.28Mbps, respectively. Velasco Medina et al propose a high-throughput RLWE encryption hardware architecture, which adopts a base 2 and a base 8 multi-path delay NTT algorithm, and the encryption and decryption throughput rates respectively reach megabits per second and gigabits per second, but a large amount of hardware resources are consumed, so that the RLWE encryption hardware architecture is not suitable for being implemented on an FPGA development board with limited resources.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a high-performance fully-homomorphic encryption processor circuit based on an RLWE encryption scheme, so that the balance between the area and the throughput rate can be realized, and the consumption of hardware resources is reduced on the premise of ensuring the high throughput rate.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:

the invention relates to a high-performance fully homomorphic encryption processor circuit based on an RLWE encryption scheme, which is characterized by comprising the following steps: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;

the storage module stores polynomial coefficients through a data selector in a reverse storage mode, and comprises: three input noises e₁、e₂And e₃Message m, intermediate coefficient in NTT operation process, and two public keys A, P and private key R after NTT operation₂(ii) a The intermediate coefficients include: positive zoom factor psiⁱInverse scaling factor psi^-iA rotation factor ω;

the first NTT module comprises a first butterfly module, a second butterfly module, a first module taking module and a first reverse module; the first NTT module comprises a first butterfly module and a second butterfly module;

the second NTT module comprises a third butterfly module, a fourth butterfly module, a second module taking module and a second reverse module; the second NTT module comprises a third butterfly module and a fourth butterfly module;

under the control of the control module, the encoder acquires the message m in the storage module and performs encoding processing to obtain the encoded message m_eAnd storing the data in a storage module;

four butterfly modules simultaneously read coded message m from the storage module_eAnd a third noise e₃Adding the obtained noise to obtain a processed third noise e_3m；

The first butterfly module obtains the input noise e from the storage module₁Low order and the scaling factor psiⁱAnd pre-calculating to obtain noise

And storing the low order of the data into the original address of the storage module;

the second butterfly module obtains the input noise e from the storage module₁High and the scaling factor psiⁱAnd pre-calculating to obtain noise

And storing the high order of the data into the original address of the storage module;

at the same time, the third butterfly module obtains the input noise e from the storage module₂Low order and the scaling factor psiⁱAnd pre-calculating to obtain noise

the fourth butterfly module obtains the input noise e from the storage module₂High and the scaling factor psiⁱAnd pre-calculating to obtain noise

the first NTT module reads the noise from the storage module

And the rotation factor omega is subjected to butterfly operation to obtain an operation result E₁Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise e from the storage module_2ψAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E₂Then storing the address into the original address of the storage module;

the first butterfly module reads the third noise e from the storage module_3mLow order and the scaling factor psiⁱAnd pre-calculating to obtain the calculation result

Storing the low order of the address into the original address of the storage module; while the second butterfly module derives the third noise e from the memory module_3mHigh order and scaling factor psiⁱAnd pre-calculating to obtain the calculation result

The high order of the address is stored into the original address of the storage module;

the first NTT module reads the calculation result from the storage module

And the rotation factor omega is subjected to in-situ NTT operation to obtain an operation result E_3M(ii) a Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module₁And E₂And calculates ciphertext C₁＝A⊙E₁+E₂Then sending the result to the second module taking module for module taking processing, and storing the obtained result into the original address of the storage module; meanwhile, the fourth butterfly module reads a public key P and an NTT operation result E from the storage module₁And calculates a first parameter P_E1＝P⊙E₁Then writing the address into the original address of the storage module;

the first butterfly module and the second butterfly module pair the calculation result

Performing the butterfly operation of the last stage, and simultaneously reading the parameter P from the storage module by the third butterfly module and the fourth butterfly module_E1And operation result E_3MAnd then the ciphertext C is obtained by addition calculation₂＝P_E1+E_3MThen storing the message m into the original address of the storage module, thereby completing the encryption of the message m;

four butterfly modules respectively read ciphertext C from storage module₁、C₂And a private key R₂Performing dot multiplication and addition to obtain a second parameter M_D＝C₁⊙R₂+C₂Sending the data to a corresponding reverse module for reverse arrangement, and storing the data into an original address of the storage module;

the first butterfly module and the second butterfly module read the second parameter M from the storage module_DPerforming the last stage of butterfly operation on the rotation factor omega to obtain two butterfly operation results; simultaneously, the third butterfly module and the fourth butterfly module read the scaling factor psi from the storage module^-iPerforming dot product operation on the two butterfly operation results to obtain an INTT calculation result m_dAnd storing the address into the original address of the storage module;

after the INTT operation is finished, the four butterfly modules read the INTT calculation result m from the storage module_dAnd a scaling factor psi^-iParallel multiplication to obtain post-calculation result

Then writing the address into the original address of the storage module;

under the control of the control module, the decoder acquires the post-calculation result in the storage module

And decoding to obtain the message m, thereby completing the recovery of the message m.

The high-performance fully homomorphic encryption processor circuit based on the RLWE encryption scheme is also characterized in that the two NTT modules perform eight-level butterfly operation according to the following process:

step 1, defining the stage number of butterfly operation as L, and initializing L to be 1;

step 2, the first butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage module

And performing butterfly operation on the low order bits of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operation

Low order of (1);

the second butterfly module reads intermediate data of L-1 level butterfly operation from the storage module

And performing butterfly operation on the high order of the rotation factor omega to obtain an intermediate result of the L-th level butterfly operation

High position of (2);

the third butterfly module reads intermediate data of L-1 level butterfly operation from the storage module

Low order of (1);

the fourth butterfly module reads intermediate data of L-1 level butterfly operation from the storage module

High position of (2);

when the butterfly stage number L is equal to 1, the intermediate data of the L-1 stage butterfly operation is ordered

And

are respectively noise

And noise

Step 3, the first module for taking module obtains the intermediate result of the L-level butterfly operation from the first butterfly module and the second butterfly module

And performing modulus operation:

step 3.1, intermediate results

The high order and the low order are respectively input into two subtracters, and the output result of one of the subtracters is added with the modulus q to obtain an addition result;

step 3.2, the output result of the other subtracter is used as a gating signal of the data selector if

High position of>

If the bit is low, the data selector outputs the addition result; if it is

High position of

If the bit is low, the data selector outputs the output result of the subtracter;

step 3.3, the output result of the data selector is compared with

Is added to obtain an intermediate result of the L-th level butterfly operation

Similarly, the second modulo module obtains an intermediate result of the L-th level butterfly operation from the third butterfly module and the fourth butterfly module

Performing modular operation to obtain intermediate data of L-level butterfly operation

Step 4, assigning L +1 to L, and then judging L>8, if yes, indicating that the butterfly operation is finished and finally obtaining intermediate data

And

is the operation result E₁And E₂Otherwise, returning to the step 2 for sequential execution.

Compared with the prior art, the beneficial technical effects of the invention are as follows:

1. the invention processes NTT operation and ciphertext calculation in the encryption process in parallel, and performs ping-pong operation on the data read-write process and the calculation process in the processing process of NTT and INTT operation, thereby hiding the read-write period of the data, reducing the delay of the RLWE encryption processor and improving the throughput rate of a hardware architecture.

2. The invention designs a hardware architecture for resource multiplexing, a multiplier and an adder in a butterfly module are multiplexed in the encryption and decryption process, an INTT multiplexing NTT circuit structure is adopted, and a storage module is multiplexed with the same control circuit, thereby reducing the hardware resource consumption of the RLWE encryption processor and improving the resource efficiency.

3. The invention designs the RLWE encryption processor with medium security level (n is 256, q is 65537), and completes circuit realization and hardware test on a Spartan-6FPGA development platform. The results show that the RLWE encryption processor has an encryption period of only 2.38k and a decryption period of only 1.69 k. The throughput rate reaches 21.01Mbps and 29.60Mbps, and the performance of the RLWE encryption processor is obviously improved.

Drawings

FIG. 1 is a hardware architecture of an RLWE crypto processor of the present invention;

FIG. 2 is a schematic diagram of an RLWE encryption scheme of the present invention;

FIG. 3 is a timing diagram of conventional data read/write and calculation serial processing;

FIG. 4 is a timing diagram illustrating the data read/write and calculation ping-pong operations of the present invention;

FIG. 5 is a block diagram of a dual port RAM address control circuit according to the present invention;

FIG. 6 is a frame diagram of a butterfly module of the present invention;

FIG. 7 is a frame diagram of a mold-taking module according to the present invention;

FIG. 8 is a timing diagram of the multiplier time division multiplexing of the present invention;

FIG. 9 is a block diagram of a multiplier TDM module according to the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, a high-performance fully homomorphic encryption processor circuit based on RLWE encryption scheme includes: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;

the encryption scheme of RLWE is shown in fig. 2, and the algorithm flow of the scheme comprises three steps of key generation, encryption and decryption.

The RLWE public key and secret key generation algorithm generates a public key (A, P) and a private key R₂. The description is as follows:

(1) selecting a polynomial a from a uniform random distribution, and selecting two n-dimensional vectors r from a discrete Gaussian distribution₁,r₂。

(2) n-dimensional vector a, r₁,r₂Performing dot product operation on the sum of the scaling factor vector psi to obtain an n-dimensional vector a_ψ,

(3) For n-dimensional vector a_ψ,

And carrying out NTT operation and converting to an NTT domain. The result of NTT operation is marked as A, R₁,R₂。

(4) Calculating P ═ R₁-A⊙R₂. Wherein the n-dimensional vector (A, P) is a public key of the RLWE encryption scheme, and the n-dimensional vector R₂Is the private key of the RLWE encryption scheme.

The RLWE encryption algorithm uses the public keys (A, P) to generate the ciphertext (C)₁,C₂). The description is as follows:

(1) selecting three n-dimensional vectors e from discrete Gaussian distribution₁,e₂,e₃。

(2) The message m is encoded using an encoding function, the result of which is added to an n-dimensional vector e₃To obtain an n-dimensional vector e_3m。

(3) n-dimensional vector e₁,e₂,e_3mPerforming dot product operation on the sum and the scaling factor vector psi to obtain a vector

(4) For n-dimensional vector

And carrying out NTT operation and converting to an NTT domain. The result of NTT operation is recorded as E₁,E₂,E_3M。

(5) Calculating C₁＝A⊙E₁+E₂,C₂＝P⊙E₁+E_3MN-dimensional vector (C)₁,C₂) Is a ciphertext.

The RLWE decryption algorithm uses ciphertext (C)₁,C₂) And a private key R₂The message m is recovered. The description is as follows:

(1) calculating M_D＝C₁⊙R₂+C₂。

(2) For n-dimensional vector M_DAn INTT operation is performed to convert to the INTT domain. The result of the INTT operation is denoted as m_d。

(3) n-dimensional vector m_dAnd the scaling factor vector psi^-1Performing dot product operation to obtain n-dimensional vector m_dψ。

(4) The message m is obtained by a decoding function.

the memory module comprises 6 dual-port RAMs and 2 dual-port ROMs and a single-port ROM.

(1) RAM0 is 256 in depth and 17bit in width, and combines a polynomial coefficient vector e₁Storing 128 numbers with index values of 0-127 after reverse order arrangement into a storage space with addresses of 0-127 of RAM0, and storing a polynomial coefficient vector R₂The 128 numbers with index values of 0-127 after the reverse order arrangement are stored in the storage space with addresses of 128-255 of the RAM 0.

(2) RAM1 is 256 in depth and 17bit in width, and combines a polynomial coefficient vector e₁Storing 128 numbers with index value of 128-255 after reverse order arrangement into a storage space with addresses of 0-127 of RAM1, and storing a polynomial coefficient vector R₂The 128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space with the address of 128-255 of the RAM 1.

(3) RAM2 is 512 deep and 17 bits wide, and is formed by dividing the polynomial coefficient vector e₂128 numbers with index values of 0-127 after the reverse order are stored in the memory space with addresses of 0-127 of the RAM 2. 256 numbers of the polynomial coefficient vector A after reverse order arrangement are stored in a storage space with the address of 128-383 in the RAM 2. The storage space with the address 384-511 of the RAM2 is used for storing M in the decryption process_DThe data after reverse ordering.

(4) RAM3 is 512 deep and 17 bits wide, and is formed by dividing the polynomial coefficient vector e₂128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space with the addresses of 0-127 of the RAM 3. 256 numbers of the polynomial coefficient vector P after reverse order arrangement are stored in a storage space with the address of 128-383 in the RAM 3. The storage space with the address 384-511 of the RAM3 is used for storing M in the decryption process_DThe data after reverse ordering.

(5) RAM4 with depth of 128 and width of 17 bits_3m128 numbers with index values of 0-127 after the reverse order are stored in the memory space with addresses of 0-127 of the RAM 4.

(6) RAM5 with depth of 128 and width of 17 bits_3m128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space with the addresses of 0-127 of the RAM 5.

(7) ROM0 is dual-port ROM with depth of 256 and width of 16 bits, and stores 128 numbers with index values of 0-127 after reverse arrangement of scaling factor vector psi into ROM0 with addresses of 0-127, and stores the scaling factor vector psi^-1The 128 numbers with the index values of 0-127 after the arrangement are stored in the storage space of the ROM0 with the addresses of 128-255.

(8) The ROM1 is dual-port ROM with depth of 256 and width of 16 bits, and stores 128 numbers with index value of 128-255 after reverse arrangement of vector psi into the memory space with addresses of 0-127 of ROM1, and stores the vector psi^-1The 128 numbers with the index values of 128-255 after the reverse order arrangement are stored in the storage space of the ROM1 with the address of 128-255.

(9) ROM2 is a single port ROM with a depth of 60 and a width of 5 bits, and stores twiddle factors for NTT and INTT operations. The twiddle factors required for NTT are stored in the ROM2 at addresses 0-29, and the twiddle factors required for INTT are stored in the ROM2 at addresses 30-59.

(10) ROM3 is a single port ROM, 448 in depth and 16 bits in width, and stores twiddle factors for NTT and INTT operations. The twiddle factors required for NTT are stored in the memory space of ROM3 addresses 0-223, and the twiddle factors required for INTT are stored in the memory space of ROM3 addresses 224-447.

For the generation of the read-write address and the index value of the storage module in the NTT calculation process, a serial processing timing chart of 16-point NTT data read-write and calculation is shown in fig. 3, and data is read out from the RAM, subjected to butterfly operation and modular operation, and then written back to the RAM, which requires 3+1+2 to 6 clock cycles in total. 3n/2log clock cycles are needed for completing one n-point NTT operation, and 96 clock cycles are needed for one 16-point NTT.

For control modulesUnder the control of the whole system, the address generating module is controlled to generate an address, the NTT module is controlled to carry out NTT operation and INTT operation, the pre-calculation module and the post-calculation module are controlled to carry out calculation, and encryption and decryption operation is controlled. In order to accelerate the NTT processing speed, the invention carries out ping-pong operation on the data reading and writing process and the calculation process. Fig. 4 shows a timing chart of 16-point NTT data read-write and calculation ping-pong operations, in which data read-write is performed alternately, the data read-write process is hidden in the data calculation process, and the input data stream and the output data stream are continuous by controlling the RAM address, so as to implement seamless data processing. Since the butterfly operation and the modulo operation require 4 clock cycles, it is necessary to wait 4 clock cycles to obtain the first butterfly operation result. N/2logn +4 clock cycles are needed for completing one n-point NTT operation, and 36 clock cycles are needed for one 16-point NTT. Compared with a serial processing mode, the clock period is reduced by 62.5% by adopting a ping-pong operation mode. The invention adopts a base 2 algorithm, processes a polynomial with the number n of 256 terms, so the number n of the NTT module operation is 256. The relationship between the interval (gap) between two input data of the radix-2 butterfly unit and the NTT operation level number L, and the gap is 2^L。

As shown in fig. 5, the RAM address control circuit mainly includes 2 counters, three data selectors, and 10 registers. The data selector selection end sel0 is used for reading and writing control, and when sel0 is 0, a reading address is selected; when sel0 is 1, the write address is selected. The selector selection end sel1 judges whether the NTT operation series L is equal to 7, if yes, the selector outputs 1, otherwise, the output is 2^L. The butterfly module needs 4 clock cycles for performing one butterfly operation and one module taking operation, the original address is written back after the 5 th clock cycle, and the write address lags the read address by 5 clock cycles, so that the read address is delayed by 5 registers to obtain the write address.

the four butterfly modules simultaneously read the coded message m from the storage module_eAnd a third noise e₃Adding the obtained noise to obtain a processed third noise e_3m；

The low order of the address is stored in the original address of the storage module;

The high order of the address is stored in the original address of the storage module;

first NTT module reads noise from storage module

And the rotation factor omega is subjected to butterfly operation to obtain an operation result E₁Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise from the memory module

And the rotation factor omega is subjected to butterfly operation to obtain an operation result E₂Then storing the address into the original address of the storage module; the butterfly module is a core module of the NTT module, as shown in fig. 6, the butterfly module is composed of two parts, i.e., a channel selector in fig. 6(a) and a radix-2 butterfly unit in fig. 6(b), the channel selector is responsible for selecting input data, and the radix-2 butterfly unit is responsible for calculating; the eight-stage butterfly operation flow is as follows:

Low order of (1);

the second butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage module

High position of (2);

the third butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage module

Low order of (1);

the fourth butterfly module reads the intermediate data of the L-1 level butterfly operation from the storage module

High position of (2);

And

are respectively noise

And noise

Step 3, the first modulus taking module obtains an intermediate result of the L-level butterfly operation from the first butterfly module and the second butterfly module

And performing a modulus operation, wherein a modulus module circuit is as shown in fig. 7:

step 3.1, intermediate results

Of (1) [31:16 ]]Bit sum [15:0 ]]The bits are respectively input into two subtractors, and one of the subtractorsAdding the output result of the subtracter with the modulus q to obtain an addition result;

Of (1) [31:16 ]]Bit

Is (15: 0)]If the bit is positive, the data selector outputs the addition result; if it is

Of (1) [31:16 ]]Bit

Is (15: 0)]If the bit is zero, the data selector outputs the output result of the subtracter;

step 3.3, output result of the data selector is compared with

Of [32 ]]Bit addition to obtain intermediate result of L-th stage butterfly operation

Similarly, the second modulus-taking module obtains the intermediate result of the L-level butterfly operation from the third butterfly module and the fourth butterfly module

And

The first butterfly module reads the third noise e from the memory module_3mLow order and the scaling factor psiⁱAnd pre-calculating to obtain the calculation result

The low order of the address is stored into the original address of the storage module; while the second butterfly module slaves the third noise e in the memory module_3mHigh order and scaling factor psiⁱAnd pre-calculating to obtain the calculation result

the first NTT module reads the calculation result from the storage module

And the rotation factor omega is subjected to in-situ NTT operation to obtain an operation result E_3M(ii) a Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module₁And E₂And calculates ciphertext C₁＝A⊙E₁+E₂Then sending the data to a second module taking module for module taking processing, and storing the obtained result into the original address of the storage module; meanwhile, the fourth butterfly module reads the public key P and the NTT operation result E from the storage module₁And calculates a first parameter P_E1＝P⊙E₁Then writing the address into the original address of the storage module;

Performing butterfly operation of the last stage, and simultaneously reading the parameter P from the storage module by the third butterfly module and the fourth butterfly module_E1And operation result E_3MAnd performing (a) onAdding to obtain ciphertext C₂＝P_E1+E_3MThen storing the message m into the original address of the storage module, thereby completing the encryption of the message m;

four butterfly modules respectively read ciphertext C from storage module₁、C₂And a private key R₂Performing dot multiplication and addition to obtain a second parameter M_D＝C₁⊙R₂+C₂Sending the data to a corresponding reverse module for reverse arrangement, and storing the data into an original address of a storage module;

the first butterfly module and the second butterfly module read the second parameter M from the storage module_DPerforming the last stage of butterfly operation on the rotation factor omega to obtain two butterfly operation results; meanwhile, the third butterfly module and the fourth butterfly module read the scaling factor psi from the storage module^-iPerforming dot product operation on the two butterfly operation results to obtain an INTT calculation result m_dAnd storing the address into the original address of the storage module;

Then writing the address into the original address of the storage module;

under the control of the control module, the decoder obtains the post-calculation result in the storage module

The multiplexing timing of the multiplier is shown in FIG. 8

Respectively representing vectors e of coefficients of a pair of polynomials₁、e₂、e_3mA precalculation process of₁、E₂、E_3MRespectively represent

NTT calculation procedure of (C)₁Representing ciphertext C₁A calculation process namely A &E₁+E₂，P_E1Indicates P ^ E₁Process, C₁R₂Is represented by C₁⊙R₂Process, m_dRepresents M_DThe process of the INTT operation of (a),

represents m_dAnd (5) post-calculation process. FIG. 9 shows a control block of the multiplier, which controls the timing of the multiplier through the selection terminal, and processes different data at different times. As can be seen from fig. 8, e₁And e₂The pre-calculation process is combined with the NTT operation main algorithm, namely, two multipliers are respectively adopted to perform parallel pre-calculation on two butterfly inputs at the 0 th level of the NTT algorithm; e.g. of the type_3mFour multipliers are adopted for parallel processing in the pre-calculation process;

and

performing NTT operation by respectively adopting two multipliers to perform parallel butterfly operation;

the precomputation process is combined with the main algorithm of NTT operation, namely, two multipliers are respectively adopted at the 0 th level of the NTT algorithm to carry out parallel precomputation on two butterfly inputs, and in the process of NTT operation, the other two multipliers are utilized to calculate C₁And P_E1Hiding the part of the multiplication time into the NTT operation process; c₁R₂The process adopts four multipliers for parallel calculation; m_DIn the last stage of INTT operation, two multipliers perform butterfly operation and the other two multipliers perform m pair operation_dPerforming post-calculation; after NTT operation is finished, four multipliers m are adopted_dPost-calculation is performed. The multiplier is fully utilized to carry out parallel computation, the clock period of encryption and decryption is reduced, the throughput rate of the RLWE encryption processor is improved,in addition, the multiplier in the NTT module is multiplexed in the processes of pre-calculation, post-calculation and encryption result, and the NTT module is multiplexed in the INTT operation process, so that the consumption of hardware resources is reduced.

Claims

1. A high performance fully homomorphic crypto processor circuit based on an RLWE encryption scheme, comprising: the device comprises a storage module, two NTT modules, a control module, an encoder and a decoder;

The first butterfly module obtains the input noise e from the storage module₁Low order and the scaling factor psiⁱAnd pre-calculating to obtain noise e_1ψAnd storing the low order of the data into the original address of the storage module;

the second butterfly module obtains the input noise e from the storage module₁In the high position ofAnd a scaling factor psiⁱAnd pre-calculating to obtain noise e_1ψAnd storing the high order of the data into the original address of the storage module;

at the same time, the third butterfly module obtains the input noise e from the storage module₂Low order and the scaling factor psiⁱAnd pre-calculating to obtain noise e_2ψAnd storing the low order of the data into the original address of the storage module;

the fourth butterfly module obtains the input noise e from the storage module₂High and the scaling factor psiⁱAnd pre-calculating to obtain noise e_2ψAnd storing the high order of the data into the original address of the storage module;

the first NTT module reads the noise e from the storage module_1ψAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E₁Then storing the address into the original address of the storage module; at the same time, the second NTT module reads the noise e from the storage module_2ψAnd the rotation factor omega is subjected to butterfly operation to obtain an operation result E₂Then storing the address into the original address of the storage module;

the first butterfly module reads the third noise e from the storage module_3mLow order and the scaling factor psiⁱAnd pre-calculating to obtain a calculation result e_3mψStoring the low order of the address into the original address of the storage module; while the second butterfly module derives the third noise e from the memory module_3mHigh order and scaling factor psiⁱAnd pre-calculating to obtain a calculation result e_3mψThe high order of the address is stored into the original address of the storage module;

the first NTT module reads a calculation result e from the storage module_3mψAnd the rotation factor omega is subjected to in-situ NTT operation to obtain an operation result E_3M(ii) a Meanwhile, the third butterfly module reads the public key A, NTT operation result E from the storage module₁And E₂And calculates ciphertext C₁＝A⊙E₁+E₂Then sending the result to the second module taking module for module taking processing, and storing the obtained result into the storage moduleIn the original address; meanwhile, the fourth butterfly module reads a public key P and an NTT operation result E from the storage module₁And calculates a first parameter P_E1＝P⊙E₁Then writing the address into the original address of the storage module;

the first butterfly module and the second butterfly module pair the calculation result e_3mψPerforming the butterfly operation of the last stage, and simultaneously reading the parameter P from the storage module by the third butterfly module and the fourth butterfly module_E1And operation result E_3MAnd then the ciphertext C is obtained by addition calculation₂＝P_E1+E_3MThen storing the message m into the original address of the storage module, thereby completing the encryption of the message m;

after the INTT operation is finished, the four butterfly modules read the INTT calculation result m from the storage module_dAnd a scaling factor psi^-iParallel multiplication operation to obtain post-calculation result m_dψThen writing the address into the original address of the storage module;

under the control of the control module, the decoder acquires a post-calculation result m in the storage module_dψAnd decoding to obtain the message m, thereby completing the recovery of the message m.

2. The RLWE encryption scheme-based high performance homomorphic encryption processor circuit of claim 1, wherein the two NTT modules perform eight-stage butterfly operations as follows:

step 2, the first butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module_1ψ ^L-1And the low bit of the twiddle factor omega and performing butterfly operation to obtain an intermediate result e 'of the L-level butterfly operation'_1ψ ^LLow order of (1);

the second butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module_1ψ ^L-1And performing butterfly operation on the high bits of the twiddle factor omega to obtain an intermediate result e 'of the L-level butterfly operation'_1ψ ^LHigh position of (2);

the third butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module_2ψ ^L-1And the low bit of the twiddle factor omega and performing butterfly operation to obtain an intermediate result e 'of the L-level butterfly operation'_2ψ ^LLow order of (1);

the fourth butterfly module reads the intermediate data e of the L-1 level butterfly operation from the storage module_1ψ ^L-1And performing butterfly operation on the high bits of the twiddle factor omega to obtain an intermediate result e 'of the L-level butterfly operation'_2ψ ^LHigh position of (2);

when the butterfly stage number L is equal to 1, the intermediate data e of the L-1 stage butterfly operation is ordered_1ψ ^L-1And e_2ψ ^L-1Respectively noise e_1ψAnd noise e_2ψ；

Step 3, the first modulus taking module obtains an intermediate result e 'of the L-level butterfly operation from the first butterfly module and the second butterfly module'_1ψ ^LAnd performing modulus operation:

step 3.1, mixing the intermediate result e'_1ψ ^LRespectively inputting the high order and low order of the input signals into two subtractors, and comparing the output result of one of the subtractors with that of the other subtracterAdding the modulus q to obtain an addition result;

step 3.2, taking the output result of the other subtracter as a gating signal of the data selector if e'_1ψ ^LHigh position of>e’_1ψ ^LIf the bit is low, the data selector outputs the addition result; if e'_1ψ ^LHigh position of<e’_1ψ ^LIf the bit is low, the data selector outputs the output result of the subtracter;

step 3.3, the output result of the data selector is compared with e'_1ψ ^LIs added to obtain an intermediate result e of the L-th level butterfly operation_1ψ ^L；

Similarly, the second modulo module obtains an intermediate result e 'of the L-th level butterfly operation from the third butterfly module and the fourth butterfly module'_2ψ ^LPerforming modular operation to obtain intermediate data e of L-level butterfly operation_2ψ ^L；

Step 4, assigning L +1 to L, and then judging L>8, if yes, indicating that the butterfly operation is finished and finally obtaining intermediate data e_1ψ ^LAnd e_2ψ ^LIs the operation result E₁And E₂Otherwise, returning to the step 2 for sequential execution.