CN115242444B

CN115242444B - Verifiable privacy protection linear regression method and system

Info

Publication number: CN115242444B
Application number: CN202210710116.9A
Authority: CN
Inventors: 赖俊祚; 吴鹏辉; 李燕玲; 张蓉; 宋贝贝
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2023-08-01
Anticipated expiration: 2042-06-22
Also published as: CN115242444A

Abstract

The invention relates to the field of machine learning, in particular to a verifiable privacy protection linear regression method and a verifiable privacy protection linear regression system, wherein the method comprises the following steps: the model owner discloses verifiable parameters related to the linear regression model, and deploys the verifiable parameters related to the linear regression model on a cloud server; the user encrypts the data to obtain a ciphertext, and the ciphertext is uploaded to a cloud server where the linear regression model is located; the user requests the cloud server to calculate the ciphertext, the cloud server calculates and predicts the ciphertext uploaded by the user through a linear regression model, and a calculation prediction result is returned to the user; and verifying the plaintext correctness of the calculated prediction result by using verifiable parameters related to the linear regression model disclosed by the model owner in advance. The cloud server can prevent the cloud server from maliciously returning wrong results, can ensure the safety of user data and models, and can protect the privacy of model information of model owners, data information of users and prediction results.

Description

Verifiable privacy protection linear regression method and system

Technical Field

The invention relates to the field of machine learning, in particular to a verifiable privacy protection linear regression method and system.

Background

In the artificial intelligence era, personal privacy protection is increasingly paid attention to both home and abroad. Machine learning is one of the ways to implement artificial intelligence, and is focused on how to mine potential, effective and understandable knowledge from mass data, build a data-driven reasoning and decision model, and achieve the goal of "getting the data and using the data". With the rapid development of cloud computing modes, more and more cloud servers provide machine learning platforms for users. However, as the modern society places more and more importance on privacy protection, how to ensure the privacy and accuracy of models and the privacy of user data in the process of computing becomes a great difficulty in the field of machine learning. To solve this problem, various privacy-preserving machine learning methods have been proposed successively, including a differential privacy-based machine learning protection mechanism, a homomorphic encryption-based privacy protection mechanism, and a secure multiparty computing-based privacy protection mechanism.

Nikolaenko et al, in private-Preserving Ridge Regression on Hundreds of Millions of Records, mention the use of both homomorphic encryption and secure two-party computing to protect the Privacy of data while training a regression model. The scheme divides the training process of the regression model into two stages of data aggregation and model parameter calculation, wherein the first stage uses homomorphic encryption algorithm; the second phase uses a secure two-party computing protocol. The scheme in the document only considers model training, but does not consider the use of the model, namely, the trained model is utilized to predict the result of the data, and the verifiability of the predicted result is not considered, which are all the conditions which need to be considered in the practical application scene. There are also many schemes currently considering making predictions on trained models, but they have both problems, either the privacy protection of user data during prediction is not considered, or the correctness of the predicted results cannot be verified.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a verifiable privacy protection linear regression method and a verifiable privacy protection linear regression system, which enable a result obtained by a user to be verifiable by disclosing a model verifiable parameter on the premise of protecting the privacy of the model, prevent a cloud server from maliciously returning an error result, ensure the safety of user data and the model, protect the privacy of model information of a model owner, data information of the user and a prediction result, ensure the correctness of the prediction result, and be more in line with the actual application scene.

It is a first object of the present invention to provide a verifiable privacy preserving linear regression method.

It is a second object of the present invention to provide a verifiable privacy preserving linear regression system.

The first object of the present invention can be achieved by adopting the following technical scheme:

a verifiable privacy preserving linear regression method, the method comprising:

disclosing verifiable parameters of the model, and disclosing verifiable parameters related to the linear regression model by a model owner;

data processing, wherein a user encrypts data by using a Paillier encryption algorithm to obtain a ciphertext, and the ciphertext is uploaded to a cloud server where a linear regression model is located;

data calculation, wherein a user requests a cloud server to calculate ciphertext, the cloud server calculates ciphertext uploaded by the user through a linear regression model, and a prediction result is returned to the user;

and (3) verifying the result, namely decrypting the predicted result to obtain a plaintext of the predicted result, and verifying the plaintext correctness of the predicted result by using verifiable parameters related to a linear regression model which is disclosed by a model owner in advance.

In a preferred technical solution, the model owner discloses verifiable parameters related to a linear regression model, including:

randomly selecting a group G with a large prime number t and an order of t _t And a generator h of the group, the generator h ε G _t Model parameters assumed to have been trained by the model owner are vector w and real number b, where w= (w ₁ ，w ₂ ，…，w _n ) Is a trained model parameter, b is the offset of the model; calculating a parameter vector w 'and a real number b' related to w and b:

b′＝h ^b ；

large prime number t, group G _t The generator h, the parameter vector w 'and the real number b' are disclosed.

In a preferred technical scheme, the calculation formula for calculating the ciphertext by the user request cloud server is as follows:

wherein w= (w ₁ ，w ₂ ，…，w _n ) Vector x= (x) ₁ ,x ₂ ,…,x _n ) Initial data for the user.

The second object of the invention can be achieved by adopting the following technical scheme:

a verifiable privacy preserving linear regression system, the system comprising:

the verifiable parameter disclosure module is used for disclosing verifiable parameters of the model, and a model owner discloses verifiable parameters related to the linear regression model;

the data processing module is used for requesting the cloud server to calculate the ciphertext by the user, calculating the ciphertext uploaded by the user by the cloud server through the linear regression model, and returning a prediction result to the user;

and the result verification module is used for decrypting the prediction result to obtain the plaintext of the prediction result, and verifying the plaintext correctness of the prediction result by using verifiable parameters related to the linear regression model disclosed by the model owner in advance.

Compared with the prior art, the invention has the following advantages and beneficial effects:

according to the invention, the verifiable parameters related to the model are disclosed on the premise of protecting the privacy of the model, so that the result obtained by the user is verifiable, the cloud server is prevented from maliciously returning an error result, the safety of user data and the model can be ensured, the privacy of model information of a model owner, data information of the user and a predicted result can be protected, the correctness of the predicted result can be ensured, and the method and the device are more suitable for practical application scenes.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a method implementation scenario in an embodiment of the present invention;

FIG. 2 is a flow chart of a verifiable privacy preserving linear regression method in an embodiment of the present invention.

Detailed Description

The technical solution of the present invention will be described in further detail below with reference to the accompanying drawings and examples, it being apparent that the described examples are some, but not all, examples of the present invention, and embodiments of the present invention are not limited thereto. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1:

with the rapid development of the Internet and the Internet of things, the explosive growth of data is realized, the characteristics of huge quantity, various types, high authenticity and the like are presented, and the maximization of the output value of the data is realized by using a machine learning technology, so that the practical demands of people are realized. Many scientific institutions have trained more practical models, but these models are proprietary assets of model owners, are not disclosed to users, and are typically deployed on cloud servers (servers). When a user needs to use a model to predict data, the data can be uploaded to a cloud server for calculation, but the data of the user usually contains sensitive information and the cloud server is not 'absolute trusted'. Therefore, the machine learning technology based on cloud computing inevitably faces some security problems, firstly, if data is directly uploaded in plaintext, the privacy of users can be leaked; secondly, the cloud server may be malicious, that is, the cloud server predicts the data of the user by using an incorrect model, and the correctness of the prediction result cannot be guaranteed. Therefore, when the machine learning model is utilized to predict the result of the data, the correctness of the predicted result is necessary while protecting the privacy of the owner and the user of the model, and the method has important theoretical significance and application value.

According to the invention, the linear regression model of which the model owner is deployed on the cloud server is utilized to perform operation on ciphertext on data uploaded by a user, the calculation result is returned to the user, the result is taken by the user and decrypted to obtain a prediction result in a plaintext form, and the prediction result is verified according to model information disclosed by the model owner in advance.

Linear regression models generally represent the form:

y(x)＝w ^T x+b

wherein x= (x) ₁ ，x ₂ ，…，x _n ) Is a feature vector representing data, w= (w ₁ ，w ₂ ，…，w _n ) Is a trained model parameter, and b is the offset of the model.

As shown in fig. 1, an implementation scenario of an embodiment disclosed in the present specification is schematically shown in which 2 participants, a participant server and a participant Client, where the participant server is a model owner, and the participant Client is a data owner, also called a user. The data owner Client has sample feature data to be processed, wherein the sample may be room price data, patient data, audio data, text data, etc. to be analyzed, respectively, and the sample features may include house features (e.g., area, location, floor, etc.), patient health features (e.g., age, gender, heart rate, blood pressure, etc.), audio frequency spectrum features, text coding features, etc.

The model owner server owns a linear regression model for performing business processing according to the sample characteristic data, for example, when the sample is the house price data, the business processing may be house price prediction; when the sample is patient data, business processes may be prediction of patient disease, and so on.

As shown in fig. 2, the verifiable privacy preserving linear regression method of the present invention includes the following steps:

s1, disclosing verifiable parameters of a model, wherein a model owner discloses verifiable parameters related to a linear regression model.

The model owner (server) discloses the relevant verifiable parameters of the linear regression model by using objective reality with difficult discrete logarithm solution, and the relevant verifiable parameters of the real-published linear regression model comprise: big prime t, group G _t Generating a meta h, a parameter vector w 'and a real number b', wherein the meta h, the parameter vector w 'and the real number b' are specifically as follows:

s11, randomly selecting a group G with a large prime number t and an order of t _t And a generator h of the group, the generator h ε G _t Model parameters assumed to have been trained by the model owner are vector w and real number b, where w= (w ₁ ，w ₂ ，…，w _n ) Is a trained model parameter, b is the offset of the model; calculating a public parameter vector w 'and a real number b' related to w and b:

b′＝h ^b ；

s12, combining big prime numbers t and group G _t The generator h, the parameter vector w 'and the real number b' are disclosed.

Let p be prime and a be the primitive root of p, i.e. a ¹ ,a ² ,…,a ^p-1 All values of 1-p-1 are generated at mod p, so forWith unique i.epsilon. {1, …, p-1}, such that b.ident.a ⁱ mod p. I is called the discrete logarithm of a as the base b under the modulus p, i.e. i≡log _a (b)(mod p)。

B can be found relatively easily with a fast exponential algorithm when a, p, i are known, but it is very difficult to find i if a, b and p are known. When p is large, the fastest discrete logarithm algorithm known to date is not feasible.

For the purpose of privacy protection of the model, the model owner server cannot directly disclose the original parameters of the model to all users, but discloses the parameters related to the model according to the operation. Solving for sleepiness based on discrete logarithmsDifficulty, malicious adversaries get the public parameters G related to the model _t After t, h, w 'and b', the parameters w= (w) of the model are to be solved ₁ ；w ₂ ；…；w _n ) And b is not feasible. When the user needs to verify the correctness of the final result, the user can verify the final result only through the disclosed model related parameters without solving the parameters of the model. Therefore, the safety of the public model is ensured, and a verification method for the correctness of the prediction result is provided. The invention uses the discrete logarithm difficulty problem to ensure the privacy of the model information disclosed by the model owner.

S2, data processing, wherein a user encrypts the data by using a Paillier encryption algorithm to obtain a ciphertext, and the ciphertext is uploaded to a cloud server where the linear regression model is located.

Firstly, a user encrypts data by using a Paillier encryption algorithm to obtain a ciphertext c, and the process is as follows:

presetting user initial data as a vector x= (x) ₁ ,x ₂ ,…,x _n ) Running a Setup (k) algorithm, specifically comprising:

inputting a safety parameter k, selecting two large prime numbers p, q, calculating a natural number N=p.q, and randomly selecting a group element(wherein->) So that gcd (L (g) ^λ mod N ² ) N) =1, wherein gcd (,) represents the greatest common divisor of the two numbers, defining +.>The public key is pk= (g, N) and the private key is sk=λ=lcm ((p-1), (q-1)); where lcm (,) function represents the least common multiple of two numbers.

The Enc (pk, m) algorithm is run, and the plaintext m epsilon Z is input _N (wherein Z _N = {0,1, …, n-1 }) selecting a random number

(wherein->) Calculating to obtain ciphertext c:

c＝Enc(pk,m)＝g ^m r ^N mod N ²

where mod is a modulo operation, a group element

In the subsequent verification process, a Dec (sk, c) algorithm is operated, a private key sk and a ciphertext c are input, and the following formula is calculated to obtain a plaintext m:

specifically, x is _i For example, the Enc (pk, x) algorithm is run, and the plaintext x is input _i ∈Z _n Selecting a random numberCalculation of ciphertext c _i ：

For convenience of symbol, let the vector obtained after the user encrypts the data as follows:

c＝(c ₁ ,c ₂ ,…,c _n )

＝(Enc _pk (x ₁ ),Enc _pk (x ₂ ),…,Enc _pk (x _n ))

and then, uploading the encrypted ciphertext c to a cloud server where the linear regression model is located by the user.

The Paillier encryption algorithm has the property of being additive homomorphic, and is specifically as follows:

multiplication between ciphertexts corresponds to the corresponding plaintext m ₁ 、m ₂ The addition is performed in the following form:

the constant exponent operation of ciphertext corresponds to the multiplication of plaintext m with a constant, in the form:

Enc(m) ^a ＝Enc(a·m)

wherein a is a constant.

For the purpose of privacy data protection, the data owner Client cannot directly send its sample data to the model owner server, so as not to reveal the sample feature value. After the data is encrypted by the Paillier encryption algorithm, the ciphertext is sent to the model owner Sever, and the model owner Sever does not have the private key sk and cannot calculate the plaintext corresponding to the ciphertext. Therefore, the aim of protecting the privacy of the Client data of the user is fulfilled.

S3, data calculation, wherein a user requests a cloud server to calculate ciphertext, the cloud server calculates and predicts ciphertext data uploaded by the user through a linear regression model, and a calculation prediction result is returned to the user.

And carrying out operation on ciphertext on the data uploaded by the user, returning a calculation result to the user, and carrying out result prediction by the user by using a linear regression model deployed on the cloud server.

After receiving the user ciphertext data c, the cloud server calculates on the ciphertext data c and returns the calculated result to the user. Assuming that model parameters already trained by the model owner are vector w and real number b, the calculation formula of ciphertext data c is as follows:

w＝(w ₁ ；w ₂ ；…；w _n ) Vector x= (x) ₁ ,x ₂ ,…,x _n ) For user initial data, b is the bias of the model.

The homomorphic encryption algorithm is an encryption function, and is used for encrypting the plaintext after operation, and is equivalent to the ciphertext after the encryption. Since the user data received by the model owner server is in the form of ciphertext, addition and multiplication cannot be directly performed. The homomorphism of the Paillier encryption function can enable the Sever to operate on the ciphertext, and the operation result is equivalent to that obtained by the established calculation on the plaintext and then encrypted. The model owner Sever also can not unwrap the plaintext corresponding to the final result because of no private key sk, so that the privacy of the calculation result of the user is ensured.

S4, result verification, namely decrypting the calculation prediction result to obtain a plaintext of the calculation prediction result, and verifying the plaintext correctness of the calculation prediction result by utilizing related verifiable parameters of a linear regression model which are disclosed by a model owner in advance.

The user verifying the correctness of the prediction result comprises the following steps:

s41, after the user obtains the prediction result in the ciphertext form returned by the cloud server, decrypting to obtain a prediction result plaintext y, and the specific method is as follows: operating a Dec (sk, c) algorithm, inputting a private key sk and a ciphertext c, and calculating the following formula to obtain a predicted result plaintext y:

s42, according to the prediction result plaintext y and the verifiable parameter G related to the linear regression model disclosed by the model owner _t Calculation of the first result y from t, h, w' and b ₁ ：

y ₁ ＝h ^y

S43, calculating a second result y based on original model parameters of the linear regression model and initial data of the user ₂ ：

Wherein w= (w ₁ ；w ₂ ；…；w _n ) Vector x= (x) ₁ ,x ₂ ,…,x _n ) For user initial data, b is the bias of the model.

S44, verifying the first result y ₁ Whether or not to equal the second result y ₂ If the first result y ₁ Equal to the second result y ₂ And (5) indicating that the prediction result returned by the computing cloud server is correct, otherwise, indicating that the prediction result returned by the cloud server is incorrect.

Through verification of the correctness of the results, the user Client can detect result errors caused by malicious deployment of wrong model parameters in the cloud server by the model owner or other faults. The user Client may discard the calculation results, either to allow the cloud server to recalculate or to choose to terminate collaboration with the model owner.

Example 2:

the embodiment provides a verifiable privacy protection linear regression system, which comprises a verifiable parameter disclosure module, a data processing module, a data calculation module and a result verification module, wherein the specific functions of the modules are as follows:

the data processing module is used for requesting the cloud server to calculate the ciphertext by the user, calculating and predicting the ciphertext uploaded by the user by the cloud server through the linear regression model, and returning a calculation and prediction result to the user;

and the result verification module is used for decrypting the calculation prediction result to obtain a plaintext of the calculation prediction result, and verifying the plaintext correctness of the calculation prediction result by using verifiable parameters related to the linear regression model disclosed by the model owner in advance.

The model owner discloses verifiable parameters related to a linear regression model, including:

randomly selecting a group G with a large prime number t and an order of t _t Raw materials of groupGenerating element h, generating element h E G _t Model parameters assumed to have been trained by the model owner are vector w and real number b, where w= (w ₁ ；w ₂ ；…；w _n ) Is a trained model parameter, b is the offset of the model; calculating a parameter vector w 'and a real number b' related to w and b:

b′＝h ^b ；

The user performs Paillier encryption on the data to obtain ciphertext, which comprises the following steps:

presetting user initial data as a vector x= (x) ₁ ,x ₂ ,…,x _n ) Running a Setup (k) algorithm, inputting a safety parameter k, selecting two large prime numbers p, q, and randomly selecting a group element, wherein the natural number n=p·qLet gcd (L (g) ^λ mod N ² ) N) =1, wherein gcd () represents the greatest common divisor of two numbers; definitions->The public key is pk= (g, N), the private key is sk = λ = lcm ((p-1), (q-1)), where lcm () function represents the least common multiple of two numbers;

the Enc (pk, m) algorithm is run, and the plaintext m epsilon Z is input _N ，Z _N = {0,1, …, n-1}, select random number Calculating to obtain ciphertext c:

c＝Enc(pk,m)＝g ^m r ^N mod N ²

where mod is a modulo operation, a group element

The calculation formula for calculating the ciphertext by the user request cloud server is as follows:

wherein w= (w ₁ ；w ₂ ；…；w _n ) Vector x= (x) ₁ ,x ₂ ,…,x _n ) Initial data for the user.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. A verifiable privacy preserving linear regression method, the method comprising the steps of:

data calculation, wherein a user requests a cloud server to calculate ciphertext, the cloud server calculates and predicts the ciphertext uploaded by the user through a linear regression model, and a calculation prediction result is returned to the user;

result verification, namely decrypting the calculation prediction result to obtain a plaintext of the calculation prediction result, and verifying the plaintext of the calculation prediction result by using verifiable parameters related to a linear regression model which is disclosed by a model owner in advance;

b'＝h ^b

2. The verifiable privacy preserving linear regression method of claim 1, wherein the user performs Paillier encryption on the data to obtain ciphertext, comprising:

presetting user initial data as a vector x= (x) ₁ ，x ₂ ，…，x _n ) Running a Setup (k) algorithm, inputting a safety parameter k, selecting two large prime numbers p, q, and randomly selecting a group element, wherein the natural number n=p·q Let gcd (L (g) ^λ mod N ² ) N) =1, wherein gcd () represents the greatest common divisor of two numbers; definitions->Public key pk= (g, N), private key sk = λ = lcm ((p-1), q-1)), where lcm () function represents the least common multiple of two numbers;

c＝Enc _pk (m)＝g ^m r ^N mod N ²

where mod is a modulo operation, a group element

3. The verifiable privacy preserving linear regression method of claim 2, wherein the calculation formula for the user to request the cloud server to calculate the ciphertext is:

wherein w= (w ₁ ，w ₂ ，…，w _n ) Vector x= (x) ₁ ，x ₂ ，…，x _n ) Initial data for the user.

4. The verifiable privacy preserving linear regression method of claim 2, wherein decrypting the computed prediction results to obtain plaintext of the computed prediction results, verifying the plaintext correctness of the computed prediction results using verifiable parameters associated with a linear regression model previously disclosed by the model owner, comprises:

after the user takes the prediction result in the ciphertext form returned by the cloud server, decrypting to obtain a prediction result plaintext y;

calculating a first result y according to the prediction result plaintext y and verifiable parameters related to the linear regression model according to the prediction result plaintext ₁ ；

Original model parameters and user initial data based on linear regression model calculate second result y ₂ ；

Validating the first result y ₁ Whether or not to equal the second result y ₂ When the first result y ₁ Equal to the second result y ₂ And if not, the prediction result returned by the cloud server is wrong.

5. The verifiable privacy preserving linear regression method of claim 4 wherein the calculation formula of the prediction result plaintext y is:

the first result y ₁ The calculation formula of (2) is as follows:

y ₁ ＝h ^y ；

the second result y ₂ The calculation formula of (2) is as follows:

6. A verifiable privacy preserving linear regression system, the system comprising:

the result verification module is used for decrypting the calculation prediction result to obtain a plaintext of the calculation prediction result, and verifying the plaintext correctness of the calculation prediction result by using verifiable parameters related to a linear regression model which is disclosed by a model owner in advance;

b'＝h ^b

7. The verifiable privacy preserving linear regression system of claim 6 wherein the calculation formula for the user requesting the cloud server to calculate the ciphertext is: