CN115758412A

CN115758412A - Data homomorphic encryption reasoning system and method

Info

Publication number: CN115758412A
Application number: CN202211464251.6A
Authority: CN
Inventors: 李�昊; 束柬; 王金钖; 陈剑波; 戈力; 徐生; 姜佳华
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-03-07

Abstract

The application discloses a data homomorphic encryption reasoning system and a data homomorphic encryption reasoning method. Because the data is sent to the evaluation party after being encrypted, other parties can not obtain the private data of the evaluation party, and the data privacy safety is ensured. By means of the characteristics of the homomorphic encryption technology, the evaluating party can utilize the encryption model parameters to carry out encryption reasoning on the encrypted data, the obtained encryption reasoning result is the same as the plaintext reasoning result, and the encryption reasoning result is fed back to the data party to be decrypted to obtain the plaintext reasoning result. Furthermore, the encryption reasoning calculation is carried out by utilizing the independent evaluators, so that the high-computing-power service of the evaluators can be borrowed, and the reliability of the encryption reasoning calculation is ensured.

Description

Data homomorphic encryption reasoning system and method

Technical Field

The application relates to the technical field of federal learning, in particular to a data homomorphic encryption reasoning system and method.

Background

With the increasing maturity of artificial intelligence-related technologies, researchers and related practitioners often implement some reasoning tasks through the use of neural networks. However, this method is usually built on large-scale data feeding, and as public and law laws and law rules attach more importance to the protection of private data, it becomes an important issue how to break data islands while considering privacy security.

The existing solution for privacy security comprises: differential Privacy (DP), multi-party Security Calculation (MPC). The differential privacy generally adopts random sampling or noise addition to disturb input and output data of a user, so that privacy analysis can be resisted to a certain extent, and although the method is easy to implement, the performance of a model is obviously reduced; while MPC can ensure data accuracy and security because MPC does not disturb the computation process, MPC requires higher computation and network bandwidth throughput.

Disclosure of Invention

In view of the above problems, the present application is provided to provide a data homomorphic encryption reasoning system and method, so as to solve the problems that the traditional neural network reasoning process does not consider data privacy security and is easily attacked by dishonest users, and the existing privacy security solution is easily subjected to the reduction of model performance or the requirement of large computation amount and network bandwidth throughput. The specific scheme is as follows:

in a first aspect, a data homomorphic encryption reasoning system is provided, including: a data party, a model party and an evaluation party;

the data side is used for generating homomorphic encrypted public key and private key pairs, homomorphic encryption is carried out on the data to be inferred by adopting the public key to obtain encrypted data, the public key is sent to the model side, and the encrypted data is sent to the evaluation side;

the model party is used for homomorphic encryption of model parameters of a pre-trained data inference model by adopting the public key to obtain encryption model parameters and sending the encryption model parameters to the evaluation party, wherein the data inference model is a neural network model used for inferring the data to be inferred;

the evaluator is used for carrying out encryption reasoning on the encrypted data by adopting the encryption model parameters to obtain an encryption reasoning result and feeding the result back to the data side;

and the data side is also used for decrypting the encrypted reasoning result by adopting the private key to obtain a plaintext reasoning result.

In a second aspect, a data homomorphic encryption reasoning method is provided, which is applied to a model side, and the method includes:

receiving a public key sent by a data party;

homomorphic encryption is carried out on model parameters of a pre-trained data inference model by adopting the public key to obtain encryption model parameters, wherein the data inference model is a neural network model used for inferring data to be inferred of a data party;

and sending the encryption model parameters to an evaluator, so that the evaluator adopts the encryption model parameters to perform encryption reasoning on the encrypted data provided by the data party and feeds back an encryption reasoning result to the data party, wherein the encrypted data is generated after the data party adopts the public key to perform homomorphic encryption on the data to be reasoned.

By means of the technical scheme, the homomorphic encryption technology is applied to the neural network reasoning task, homomorphic encryption supports mathematical operation on the ciphertext, and the calculation result is identical to the plaintext calculation result after being decrypted. The system provided by the application relates to three participants, namely a data party for providing data to be reasoned, a model party for providing a data inference model and an evaluation party for realizing an encrypted inference task. After a homomorphic encrypted public and private key pair is generated by a data party, homomorphic encryption is carried out on data to be reasoned by adopting a public key and then the data to be reasoned is sent to an evaluating party, the public key is sent to a model party, and the model party carries out homomorphic encryption on model parameters of a pre-trained data inference model by adopting the public key and sends the encrypted model parameters to the evaluating party. Because the data party and the model party encrypt respective private data through homomorphic encryption and then send the data to the evaluation party, other parties except the data party and the model party cannot acquire the private data of the data party, and the data privacy safety is ensured. And by means of the characteristic of homomorphic encryption technology, the evaluating party can utilize the encryption model parameters to carry out encryption reasoning on the encrypted data, the obtained encryption reasoning result is the same as the plaintext reasoning result, the accurate implementation of the reasoning task is ensured, on the basis, the encryption reasoning result is fed back to the data party, and the data party carries out decryption by utilizing a private key to obtain the plaintext reasoning result. By adopting the scheme, the problems that data privacy safety is not considered in the traditional neural network reasoning process and the traditional neural network reasoning process is easily attacked by dishonest users are solved, and meanwhile, the performance of a data reasoning model is ensured.

Further, in consideration of the fact that certain calculation overhead is consumed in the process of carrying out encryption reasoning on the encrypted data by using the encryption model parameters, the process of realizing the encryption reasoning calculation is independently carried out by an evaluator, and the evaluator can be a cloud or other servers with high calculation power, so that the reliability of the encryption reasoning calculation is ensured.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a diagram of data homomorphic encryption inference signaling interaction provided in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a process of converting genotype vector data into a feature map;

fig. 3 is another data homomorphic encryption inference signaling interaction diagram provided in an embodiment of the present application;

FIG. 4 illustrates a data side encryption data process diagram;

FIG. 5 illustrates a model side cryptographic model parameter process diagram;

FIG. 6 illustrates a block diagram of a data side decryption process;

fig. 7 is a schematic flow chart of a data homomorphic encryption method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application provides a data homomorphic encryption reasoning system which can be suitable for various reasoning tasks related to data privacy safety, and can guarantee the safety of privacy data and the accuracy of reasoning results by homomorphic encryption technology to homomorphic encryption of the privacy data of each participant. Meanwhile, the reliability of the encryption reasoning calculation result can be ensured by using the powerful calculation of the evaluator through the encryption reasoning calculation of the independently arranged evaluator.

The scheme of the application can be applied to scenes of carrying out encryption reasoning on privacy data of biology, medical treatment, finance and the like. Taking the genotype phenotype data prediction task as an example, it refers to predicting phenotype data from high-throughput genotype data, such as predicting phenotype information and disease risk. Wherein, the genotype data and the phenotype data are sensitive and private data.

Next, with reference to the data homomorphic encryption inference signaling interaction diagram shown in fig. 1, the functions of each party in the data homomorphic encryption inference system of the present application are described:

the data homomorphic encryption reasoning system comprises a data party A, a model party B and an evaluator C.

The data side A is used for generating homomorphic encrypted public key and private key pairs, homomorphic encryption is carried out on the data to be inferred by adopting the public key to obtain encrypted data, the public key is sent to the model side, and the encrypted data is sent to the evaluation side.

The public key is used for encrypting plaintext data to be reasoned by a data party and encrypting model parameters of a pre-trained data inference model by a model party. The private key is only stored locally in the data side and is used for decrypting the encrypted reasoning result.

The data party A can locally generate a public key and a private key by using a secret key generation algorithm, and only sends the public key to the model party B, so that the model party B and the evaluator C cannot steal the private data of the data party A.

The time sequence of sending the public key to the model party B and sending the encrypted data to the evaluator by the data party A is not strictly required, and the public key and the encrypted data can be executed synchronously or in any sequence.

And the model party B is used for homomorphically encrypting the model parameters of the pre-trained data inference model by adopting the public key to obtain encryption model parameters and sending the encryption model parameters to the evaluator C.

Specifically, the model party B may train the data inference model in plaintext form in advance. The data reasoning model is a neural network model used for reasoning data to be reasoned of a data side.

The data inference model trained by the model party B may be different according to different inference task requirements of the data party a, for example, when the data party a needs to perform a genotype phenotype data inference task, the data inference model trained by the model party B may be a neural network model for processing genotype phenotype data inference, such as a classification neural network or a prediction neural network.

After the data inference model is obtained through training, the model party B can adopt the public key provided by the data party A to homomorphically encrypt the model parameters of the data inference model to obtain encryption model parameters, and sends the encryption model parameters to the evaluator C.

It should be noted here that, the process of sending the encrypted data from the data party a to the evaluator C and the process of sending the encrypted model parameters from the model party B to the evaluator C are not strictly required in terms of time sequence, and may be executed synchronously or in any order.

And the evaluator C is used for carrying out encryption reasoning on the encrypted data by adopting the encryption model parameters to obtain an encryption reasoning result and feeding the result back to the data side A.

Specifically, after receiving the encrypted data provided by the data party a and the encryption model parameters provided by the model party B, the evaluator C may perform homomorphic encryption inference on the encrypted data by using the encryption model parameters to obtain an encryption inference result, and feed back the encryption inference result to the data party a.

The evaluator C may be a terminal dedicated to performing homomorphic encryption inference calculation, and has a strong calculation capability and a capability of performing complex encryption inference calculation, for example, the evaluator C may be a cloud, a server, or the like.

And the data party A is also used for decrypting the encrypted reasoning result fed back by the assessment party C by adopting a private key to obtain a plaintext reasoning result.

In view of the characteristics of homomorphic encryption technology, the plaintext reasoning result obtained after the encrypted reasoning result is decrypted is the same as the reasoning result obtained by directly adopting a plaintext reasoning mode, and the accuracy of the finally obtained plaintext reasoning result is ensured.

The data homomorphic encryption reasoning system provided by this embodiment applies a homomorphic encryption technology to a neural network reasoning task, homomorphic encryption supports mathematical operation on a ciphertext, and a calculation result after decryption is the same as a plaintext calculation result. The system provided by the application relates to three participants, namely a data party for providing data to be reasoned, a model party for providing a data inference model and an evaluation party for realizing an encrypted inference task. After a homomorphic encrypted public and private key pair is generated by a data party, homomorphic encryption is carried out on data to be reasoned by adopting a public key and then the data to be reasoned is sent to an evaluating party, the public key is sent to a model party, and the model party carries out homomorphic encryption on model parameters of a pre-trained data inference model by adopting the public key and sends the encrypted model parameters to the evaluating party. Because the data party and the model party encrypt respective private data through homomorphic encryption and then send the data to the evaluation party, other parties except the data party and the model party cannot acquire the private data of the data party, and the data privacy safety is ensured. And by means of the characteristics of homomorphic encryption technology, the evaluation party can utilize the parameters of the encryption model to carry out encryption reasoning on the encrypted data, the obtained encryption reasoning result is the same as the plaintext reasoning result, the accurate implementation of the reasoning task is ensured, on the basis, the encryption reasoning result is fed back to the data party, and the data party carries out decryption by utilizing a private key to obtain the plaintext reasoning result. By adopting the scheme, the problems that data privacy safety is not considered in the traditional neural network reasoning process and the traditional neural network reasoning process is easily attacked by dishonest users are solved, and meanwhile, the performance of a data reasoning model is ensured.

In some embodiments of the present application, the data inference system of the present application is further described, taking genotypic phenotype data inference tasks as an example.

The data side can be a user needing to carry out phenotypic data inference prediction, and the data to be inferred provided by the data side can be genotype vector data of corresponding phenotypic data to be inferred.

The model party may be a gene company whose pre-trained data inference model may be a neural network model for processing genotype phenotype data inferences. The model party can train the data inference model based on the self-owned gene library or the gene library called from the third party. Wherein, the gene library comprises genotype vector data and corresponding phenotype data.

Considering that the genotype vector data is over-discrete and sparse, if the genotype vector data is directly used as a training sample training data inference model by a model, the problem of over-fitting of the model is easy to occur, so that the prediction precision of the model is reduced. For this reason, the present embodiment provides a solution, that is, the model side first converts the genotype vector data as the training sample into the feature map, and then may perform plaintext training on the data inference model by using the feature map. Therefore, local and global features of the feature map can be obtained by using the convolutional network CNN, and the prediction accuracy of the trained data inference model is improved.

Referring to fig. 2, the process of converting the genotype vector data as the training sample into the feature map by the model side may include:

each genotype vector data serving as a training sample is segmented according to rows, and the segmented rows are arranged into a characteristic rectangular chart from top to bottom.

In FIG. 2, a piece of genotype vector data is defined as: { x ₀ ,x ₁ ,...,x _d+1 ,...,x _2d ,x _2d+1 ,...,x _dim-1 }

If the characteristic histogram is square, calculating the width d of the characteristic histogram:

wherein, dim _i Represents the dimension size of the ith genotype vector data.

Further, in connection with the signaling interaction diagram illustrated in fig. 3:

because the gene types contained in the gene library of the model party are various, and the genotype vector data to be inferred by the data party can be one or more types of genes, the gene label of the genotype vector data to be inferred by the phenotype data can be sent to the model party by the data party, so that the model party can search the genotype vector data of the corresponding label in the gene library according to the gene label to be used as training data.

Further, considering that the genotype vector data has too high dimensionality, generally up to ten thousand dimensionalities, and part of the dimensionalities are sparse and useless, the method can increase the process of performing dimensionality screening on the genotype vector data to achieve the purpose of performing dimensionality reduction on the high-dimensional genotype vector data.

In an optional implementation manner, a base model with a penalty term may be used to screen each dimension feature in genotype vector data, so as to obtain the genotype vector data after dimension reduction, and the genotype vector data is used as a model training sample.

The penalty term can be an L1 penalty term or an L2 penalty term, the base model processes genotype vector data to obtain a weight matrix of the base model, and further after L1 regularization, a characteristic with small contribution in the weight matrix of the base model is set to be 0, and the obtained L1 weight matrix is a sparse matrix and contains a large amount of 0. For example, the weight matrix of the base model is [0,0,0,0,0,0.42,0,0, -0.18], and the L1 weight matrix obtained after L1 regularization is: [0,0,0,0,0,0.42,0,0, -0.18,0,0,0,0,0,0.42,0,0, -0.18].

For L2 regularization, it is a distribution between [0,1], and for sparse data, the number of features after screening using L1 regularization is less than the number of features after screening using L2 regularization. Still taking the weight matrix of the base model as an example, after L2 regularization, the obtained L2 weight matrix is: [0.12,0.24,0.11,0.28,0.12,0.23,0.11,0.22,0.12,0.21,0.11,0.18].

In this embodiment, the feature may be screened by using a rule that L1 is combined with L2, specifically:

s1, firstly, determining an index idx of which the weight coefficient is not 0 in the L1 weight matrix, and then amplifying the index idx according to a set amplification value m to obtain an amplified index range [ idx-m, idx + m ].

S2, searching a weight coefficient q1 with an index of idx in the L2 weight matrix, searching the difference value between each weight coefficient and q1 in an index range [ idx-m, idx + m ] in the L2 weight matrix, finding the index of the weight coefficient with the difference value smaller than a set threshold value, selecting a target index with a corresponding weight coefficient of 0 in the L1 weight matrix from the indexes, and counting the number of the target indexes.

And S3, calculating the ratio of the weight coefficient corresponding to the index idx in the L1 weight matrix to the number of the target indexes, and taking the ratio as an average value.

And S4, modifying all the weight coefficients with the indexes as the target indexes into the average value in the L1 weight matrix.

In the embodiment, the combination of the L1 regularization and the L2 regularization is used as a penalty term, so that the characteristic dimension after the L1 regularization can be enlarged, and the generalization of the model is enhanced. In the experiment, the dimension of the genotype vector data is 20390, and when the threshold value is set to be 0.01, the dimension after L1 regularization is 1357, the dimension after L2 regularization is 20254, and the dimension after L1 and L2 are combined is 7632.

Examples are as follows:

the weight matrix for the basis model is [0,0,0,0,0,0.42,0,0, -0.18].

The L1 weight matrix obtained after L1 regularization is: [0,0,0,0,0,0.42,0,0, -0.18,0,0,0,0,0,0.42,0,0, -0.18].

The L2 weight matrix obtained after L2 regularization is: [0.12,0.24,0.11,0.28,0.12,0.23,0.11,0.22,0.12,0.21,0.11,0.18].

The amplification value m takes the value 4, and the threshold value is set to be 0.02.

Take the first weighting factor 0.42 in the L1 weighting matrix, which is not 0, as an example, and its index is 5.

The range of the indexes after enlargement is 1-9.

The weight coefficient corresponding to index 5 in L2 is 0.23, the difference between each weight coefficient in the index range of 1 to 9 in L2 and 0.23 is calculated, and the target index which is less than 0.02 and corresponds to the weight coefficient of 0 in L1 includes [1,5,7].

Calculated mean =0.42/3=0.14

Modifying each weight coefficient with index [1,5,7] in the L1 weight matrix to 0.14 to obtain:

[0,0.14,0,0,0,0.14,0,0.14,-0.18，0,0,0,0,0,0.42,0,0,-0.18]。

it should be noted that, in the process of training the data inference model, the model side may adopt a square activation function as the nonlinear activation function in the model when the data inference model can be trained according to the present application, considering that the homomorphic encryption technology does not support nonlinear activation functions such as sigmoid and relu.

Further as shown in connection with fig. 3:

in order to reduce the parameter size of the data inference model and reduce the calculation times of the ciphertext, in this embodiment, before homomorphic encryption is performed on the model parameters of the data inference model, the convolution layer and the BN layer in the model may be fused, and the last sigmoid layer or softmax layer of the model is removed.

The general convolution operation is as follows:

y ₁ ＝w·x+b

BN was calculated as follows:

wherein, mu and sigma ² Is the mean and variance of the current batch, and the calculated results are affine transformed using γ and β.

After fusing the convolutional layer and the BN layer, the calculation is as follows:

for classification prediction tasks, such as genotype phenotype classification tasks, the model side removes the last sigmoid layer or softmax layer of the data inference model, namely, does not need the model to perform probability calculation. After the data side receives the reasoning result, the data side can directly compare the output result of the full connection layer, and the reasoning result is the larger numerical value.

In addition, considering that the parameters stored in the data inference model trained by the model party are usually full-precision, and the encrypted ciphertext is increased in an exponential order, which is likely to cause the problems of low efficiency and time consumption in the encryption inference calculation of the evaluator party, in the present application, before the model party performs homomorphic encryption on the model parameters of the data inference model by using the public key, a processing process of performing static quantization on the model parameters may be added, as shown in fig. 3.

Specifically, the model parameters of the data inference model are subjected to static quantization to obtain model parameters subjected to static quantization, and quantization factors used for the static quantization are sent to the evaluator.

Wherein the quantization factors include a scaling factor scale and a zero point.

The quantization factor may be sent to the evaluator together with the cryptographic model parameters, the sending of the quantization factor not being shown in fig. 3.

Taking the model parameter before static quantization as full-precision FP32 type as an example, the static quantization process can quantize the model parameter into INT8 type, so that the overhead of ciphertext calculation can be reduced, and the reasoning performance of the data reasoning model is further improved.

Wherein v is _fp32 Representing FP32 model parameters before quantization, v _int8 Representing the quantized model parameters of type INT 8.

The scaling factor scale and the zero point zero _ point used in quantization may be calculated according to the most significant value of the original data space and the quantized data space before quantization, and assuming that the original data space is N and the quantized data space is N', the scale may be calculated as follows:

wherein N is _max And N _min Respectively represent the maximum value and the minimum value in N space, N' _max And N' _min Respectively representing the maximum and minimum values in the N' space.

By performing static quantization processing on the model parameters of the data inference model, the model can adopt a public key to homomorphically encrypt the model parameters after the static quantization processing, thereby reducing the overhead of ciphertext calculation.

Further, the process of carrying out encryption reasoning on the encrypted data by adopting the encryption model parameters to the evaluator and obtaining the encryption reasoning result is introduced.

If the encryption model parameters sent by the model party are the encryption model parameters subjected to static quantization processing, the evaluator party can perform quantitative encryption reasoning, specifically:

the assessment party obtains the quantized scaling factor scale and the zero _ point, and scales the encrypted data sent by the data party to obtain the scaled encrypted data:

where q _ cx denotes the encrypted data after scaling, and cx denotes the encrypted data.

And further, calculating the scaled encrypted data and the encryption model parameters to obtain an encryption reasoning result.

Specifically, the scaled encrypted data and the weights of the models of the layers are linearly calculated:

y _i ＝cw _i ·q_cx+pb _i

wherein cw _i And b _i Representing the encryption weights and packed bias terms, y, of the layer i network, respectively _i Indicating the output of the i-th network without the activation function. When the model party performs homomorphic encryption on the model parameters, only homomorphic encryption can be performed on the network weights, and the bias items are directly packaged and sent to the evaluator.

Of course, the weights and bias terms may also be homomorphically encrypted at the same time.

In this embodiment, if a square activation function is used as a nonlinear activation function between two adjacent network layers, the output of the i-th network after passing through the activation function may be represented as y _i ′：

y _i ′＝y _i ·y _i

Since the multiplication operation of the ciphertext can make the noise of the ciphertext larger and the size of the ciphertext larger, resulting in the increase of the subsequent operation overhead, the embodiment may also output y to each layer network _i ' carry on the re-linearization treatment, then operate with the next layer network parameter.

Next, a process of performing homomorphic encryption on each of the data side and the model side will be described.

The process of encrypting the data to be reasoned by the data side and the process of encrypting the model parameters of the data inference model by the model side can adopt the same encryption algorithm, such as a CKKS homomorphic encryption scheme adopting approximate operation. The parameters of the encryption scheme mainly refer to polynomial modulus, ciphertext modulus and plaintext modulus.

In this embodiment, the data to be inferred is taken as the genotype vector data to be inferred, and the encryption process of the data side is introduced.

In order to improve the throughput, in this embodiment, N pieces of genotype vector data to be inferred may be encrypted and calculated at the same time.

In connection with the encryption flow shown in fig. 4:

firstly, N pieces of genotype vector data to be inferred are converted into matrix data.

In the example of fig. 4, the dimension of each genotype vector data to be inferred is dim, and the dimension of a matrix formed by N genotype vector data to be inferred sorted from top to bottom is nxdim.

And (3) after the matrix formed by sequencing the N pieces of genotype vector data to be inferred from top to bottom is transposed, transposed matrix data is obtained.

Further, the matrix data are packaged, and the packaged data are homomorphic encrypted by adopting a public key to obtain encrypted data.

Correspondingly, the model party adopts the public key to homomorphically encrypt the model parameters of the data inference model to obtain the process of encrypting the model parameters, which can be shown in fig. 5.

In order to adapt to the encrypted data obtained by packing and encrypting the N pieces of genotype vector data to be reasoned by the data side, the model side can copy N parts of weight parameters of each network layer of the data inference model to form a weight matrix. Further, the weight matrix is packaged, and the packaged data is homomorphic encrypted by adopting a public key to obtain an encryption model parameter.

And if the model side adopts the fusion of the convolution layer and the BN layer and removes the last sigmoid layer or softmax layer, only the fused convolution layer and the full-link layer are left.

Taking the process of encrypting the convolutional layer as an example:

assuming that the size of the convolution kernel is k x k and the bias item is b, assuming that the encryption process only encrypts the weight parameters of the convolution layer, the bias item can be directly packed.

The weight parameter of a convolutional layer can be expressed as a dimension k ² The vector of (a), namely:

copying the weight parameters by N parts to form a weight matrix:

the dimension of the weight matrix is k ² * N, each column vector in the matrix is the same, and the dimension formed by the weight parameters is k ² The vector of (2).

And further packaging the weight matrix, and performing homomorphic encryption on the packaged data by adopting a public key to obtain an encryption model parameter.

The process of decrypting the data side to obtain the plaintext inference result is described with reference to fig. 6.

According to the encryption mode of the data party and the model party, the evaluator can deduce and obtain the reasoning results of N pieces of genotype vector data to be deduced at one time, so that the encryption reasoning results fed back by the evaluator from the data party can be N encrypted packed reasoning results, which can be expressed as cout:

the data side decrypts the encrypted N packed inference results to obtain the decrypted N packed inference results pout:

further, decoding the N packed inference results to obtain a decoded inference result set out:

wherein, the inference results of the data to be inferred from the 0 th to the N-1 th are respectively expressed as out ₀ -out _N-1 For the genotype phenotype vector classification prediction to-be-inferred task, the to-be-inferred task belongs to a two-classification task, each inference result comprises the probability that the classification probability of the genotype data to be inferred to the phenotype data is 0 or 1, and the inference result out of the 0 th genotype data to be inferred is used ₀ For example, out of _0,0 Represents the probability that the 0 th genotype data to be inferred does not belong to the phenotype data, out _0,1 Represents the probability that the 0 th genotype data to be inferred belongs to the phenotype data.

If the sigmoid layer or the softmax layer which is finally used for probability calculation of the model is removed before the parameter of the data inference model is encrypted by the model party, the data party needs to further judge each inference result, and if the sigmoid layer is removed, the final inference result y is determined according to the following formula:

if the softmax layer is removed, determining a final inference result y according to the following formula:

by adopting the data homomorphic encryption reasoning system provided by the application, the data to be reasoned is subjected to encryption reasoning, and the accuracy rate is basically equivalent to the plaintext reasoning result through experimental verification, thereby demonstrating the reliability of the data homomorphic encryption reasoning scheme of the application. Meanwhile, the reasoning speed has a great relationship with the operational capability of an evaluator performing the encrypted reasoning, and the reasoning speed can be remarkably improved by arranging the evaluator with strong calculation power.

In some embodiments of the present application, there is further provided a data homomorphic encryption inference method, which can be specifically applied to the model party in the foregoing data homomorphic encryption inference system. As shown in fig. 7, the data homomorphic encryption inference method may include:

and step S100, receiving the public key sent by the data party.

And step S110, homomorphic encryption is carried out on the model parameters of the pre-trained data inference model by adopting the public key to obtain encryption model parameters.

The data reasoning model is a neural network model used for reasoning the data to be inferred of the data side.

And step S120, sending the encryption model parameters to an evaluator, so that the evaluator performs encryption reasoning on the encrypted data provided by the data party by using the encryption model parameters, and feeding back an encryption reasoning result to the data party.

The encrypted data is generated by the data party after homomorphic encryption is carried out on the data to be inferred by adopting the public key.

Optionally, in the data inference model, a square activation function may be used as the nonlinear activation function, so that a homomorphic encryption technology may be used to perform homomorphic encryption processing on the model parameters.

The method of the present application may also include a process of training a data inference model.

When the data to be inferred of the data side is genotype vector data of the corresponding phenotype data to be inferred, the data inference model trained by the model side can be a neural network model for processing genotype phenotype data inference. The training process of the data inference model can comprise the following steps:

s1, converting genotype vector data serving as a training sample into a characteristic diagram;

and S2, performing plaintext training on the data inference model by adopting the characteristic diagram to obtain the trained data inference model.

In this embodiment, considering that the genotype vector data is too discrete and sparse, if the model side directly uses the genotype vector data as a training sample to train the data inference model, the problem of model overfitting is likely to occur, which leads to reduction of the prediction accuracy of the model. Therefore, the embodiment provides a solution, that is, genotype vector data as a training sample is first converted into a feature map, and then a data inference model can be plaintext trained by using the feature map. Therefore, local and global features of the feature map can be obtained by using the convolutional network CNN, and the prediction accuracy of the trained data inference model is improved.

Optionally, in the step S1, the process of converting the genotype vector data as the training sample into the feature map may include:

s11, dividing each genotype vector data serving as a training sample into rows, and arranging the divided rows into a characteristic rectangular chart from top to bottom in sequence.

Optionally, before the step S1, the training process of the data inference model provided by the present application may further include the following steps:

and S3, screening each dimension characteristic in the genotype vector data serving as the training sample by adopting the basic model with the punishment item, and obtaining the genotype vector data subjected to dimension reduction as the training sample.

In this embodiment, further considering that the genotype vector data has too high dimensionality, which generally can reach tens of thousands of dimensions, and some of the dimensionalities are sparse and useless, a process of performing dimensionality screening on the genotype vector data can be added in this embodiment to achieve the purpose of performing dimensionality reduction on the high-dimensional genotype vector data.

Optionally, in step S110, before performing homomorphic encryption on the model parameters of the pre-trained data inference model by using the public key, the following processing steps may be further added:

step S130, performing static quantization on the model parameters of the data inference model to obtain model parameters after the static quantization, and sending quantization factors used for the static quantization to an evaluator, wherein the quantization factors comprise scaling factors and zero points.

On this basis, the process of step S110 may specifically include:

and carrying out homomorphic encryption on the model parameters after the static quantization by adopting the public key.

In this embodiment, considering that the parameters stored in the data inference model are usually full-precision, and the encrypted ciphertext is increased in exponential order, which is likely to cause the problems of low efficiency and time consumption in the cryptographic inference calculation of the evaluator, the present application adds the processing procedure of performing static quantization on the model parameters before homomorphic encryption is performed on the model parameters of the data inference model by using the public key, so as to reduce the cost of the subsequent ciphertext calculation.

Optionally, before performing static quantization on the model parameters of the data inference model in step S130, the following processing steps may be added:

and S140, fusing the convolution layer and the BN layer in the data inference model, and removing the last sigmoid layer or softmax layer of the data inference model.

In this embodiment, in order to reduce the parameter size of the data inference model and reduce the calculation times of the ciphertext, before homomorphic encryption is performed on the model parameters of the data inference model, the convolution layer and the BN layer in the model are further fused, and the last sigmoid layer or softmax layer of the model is removed.

Optionally, in step S110 of the present application, the process of performing homomorphic encryption on the model parameters of the pre-trained data inference model by using the public key to obtain the encrypted model parameters may include:

s1, copying N parts of weight parameters of each network layer of the pre-trained data inference model to form a weight matrix.

And S2, packaging the weight matrix, and performing homomorphic encryption on the packaged data by adopting the public key to obtain an encryption model parameter.

In this embodiment, in order to improve throughput, the data side may perform packed encryption on N pieces of genotype vector data to be inferred at a time. In order to adapt to the encrypted data obtained by the data side and packing and encrypting the N pieces of genotype vector data to be inferred, the model side in this embodiment can copy N parts of the weight parameters of each network layer of the data inference model to form a weight matrix, and then pack and encrypt the weight matrix and send the weight matrix to the evaluator.

Of course, in some embodiments of the present application, two other data homomorphic encryption inference methods may also be provided from the perspective of the data party or the evaluator, where the method steps performed by the data party and the evaluator may refer to the related descriptions in the data homomorphic encryption inference system, and are not further described herein.

The data homomorphic encryption system and method provided by the application can be suitable for various reasoning tasks related to data privacy security, and can ensure the security of the privacy data and the accuracy of a reasoning result by homomorphic encryption technology to homomorphically encrypt the privacy data of each participant. Meanwhile, the reliability of the result of the encrypted reasoning calculation can be ensured by utilizing the powerful calculation of the evaluator through the encrypted reasoning calculation of the evaluator which is independently arranged.

In the above description of the embodiments, the genotype phenotype classification task is taken as an example, and the homomorphic encryption process is described. Corresponding to the task scenario, the data side may be a user client, which wishes to provide sensitive genotype data of itself for classification prediction of phenotype data. The model party can be a terminal such as a gene company and the like which can train a data inference model (such as a phenotype classification model) by utilizing a gene library of the model party, and the trained data inference model comprises sensitive model parameters and cannot be shared in a common form, so that the sensitive model parameters can be issued to the evaluation party in a homomorphic encryption mode. The evaluator can be a server for executing the encryption reasoning calculation by acquiring the encrypted genotype data and the encrypted model parameters, and can realize the rapid and accurate encryption reasoning calculation through the strong calculation capability of the evaluator, and the acquired data are all encrypted data, so that the privacy data of the data side and the model side can not be stolen, and the data security is ensured. The evaluator feeds back the encrypted reasoning result to the data side, and the data side decrypts the result by using the private key, so that a plaintext reasoning result can be obtained.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, the embodiments may be combined as needed, and the same and similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data homomorphic cryptographic inference system, comprising: a data party, a model party and an evaluation party;

2. The system according to claim 1, wherein the data to be inferred is genotype vector data corresponding to the phenotype data to be inferred, and the data inference model is a neural network model for processing genotype phenotype data inference;

the process of the model side for training the data inference model in advance comprises the following steps:

converting genotype vector data serving as a training sample into a characteristic diagram;

and performing plaintext training on the data inference model by using the characteristic diagram to obtain the trained data inference model.

3. The system of claim 2, wherein the process of the model side converting the genotype vector data as the training sample into the feature map comprises:

and (3) dividing each genotype vector data serving as a training sample into rows, and arranging the divided rows into a characteristic rectangular chart from top to bottom in sequence.

4. The system of claim 2, wherein the modeling party is further configured to, prior to converting the genotype vector data as the training sample into the feature map:

and screening each dimension characteristic in the genotype vector data serving as a training sample by adopting a basic model with a penalty term to obtain the genotype vector data subjected to dimension reduction serving as the training sample.

5. The system according to claim 1, wherein a square activation function is adopted as a nonlinear activation function in the model-side pre-trained data inference model.

6. The system of claim 1, wherein the model side, prior to homomorphically encrypting the model parameters of the data inference model using the public key, is further configured to:

performing static quantization on the model parameters of the data inference model to obtain model parameters after the static quantization, and sending quantization factors used for the static quantization to the evaluator, wherein the quantization factors comprise scaling factors and zero points;

the process of homomorphic encryption of the model parameters of the data inference model by the model party by using the public key includes:

7. The system of claim 6, wherein the modeling side, prior to statically quantizing the model parameters of the data inference model, is further configured to:

and fusing a convolution layer and a BN layer in the data inference model, and removing the last sigmoid layer or softmax layer of the data inference model.

8. The system according to claim 6, wherein the process of the evaluator performing the cryptographic inference on the encrypted data using the cryptographic model parameters to obtain the cryptographic inference result comprises:

scaling the encrypted data by using the quantization factor to obtain scaled encrypted data;

and calculating the scaled encrypted data and the encryption model parameters to obtain an encryption reasoning result.

9. The system according to claim 2, wherein the process of homomorphically encrypting the data to be inferred by the data party by using the public key to obtain encrypted data comprises:

converting N pieces of genotype vector data to be inferred into matrix data;

and packing the matrix data, and performing homomorphic encryption on the packed data by adopting the public key to obtain encrypted data.

10. The system according to claim 9, wherein the process of the model party using the public key to homomorphically encrypt the model parameters of the pre-trained data inference model to obtain encrypted model parameters comprises:

copying N parts of weight parameters of each network layer of the pre-trained data inference model to form a weight matrix;

and packaging the weight matrix, and performing homomorphic encryption on the packaged data by adopting the public key to obtain an encryption model parameter.

11. The system of claim 9, wherein the encrypted inference results are encrypted N packed inference results;

the process that the data side decrypts the encrypted reasoning result by adopting the private key to obtain a plaintext reasoning result comprises the following steps:

decrypting the encrypted N packed inference results by using the private key to obtain decrypted N packed inference results;

and decoding each decrypted packed reasoning result respectively to obtain a reasoning result set of each genotype vector data.

12. A data homomorphic encryption reasoning method is applied to a model side, and comprises the following steps:

receiving a public key sent by a data party;

carrying out homomorphic encryption on model parameters of a pre-trained data inference model by adopting the public key to obtain encryption model parameters, wherein the data inference model is a neural network model used for reasoning data to be inferred of a data party;