CN111275202A

CN111275202A - Machine learning prediction method and system for data privacy protection

Info

Publication number: CN111275202A
Application number: CN202010105981.1A
Authority: CN
Inventors: 赵川; 赵埼; 荆山; 张波; 陈贞翔; 王吉伟
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2020-06-12
Anticipated expiration: 2040-02-20
Also published as: CN111275202B

Abstract

The disclosure provides a machine learning prediction method and a system facing data privacy protection, wherein the method comprises the following steps: acquiring encrypted data; the main server creates a credible area, and decrypts the acquired data to be predicted and the prediction model in the credible area; the main server carries out secret sharing on the decrypted data to be predicted and the prediction model, obtains secret shares of the data and model shares respectively, and distributes the secret shares of the data and the model shares to the auxiliary server and the main server which are not colluded; the auxiliary server and the main server respectively carry out prediction calculation to obtain prediction result shares; and the main server carries out secret reconstruction on all the prediction result shares, forwards the reconstructed prediction result shares to the trusted area for integration and encryption, and sends the integrated prediction result shares to the data providing terminal to be predicted, and the data providing terminal decrypts the data to obtain the prediction result predicted according to the model. The security and the privacy security of the two parties are protected by combining the secure multiparty computing and the SGX technology, and the security problem in the process of providing the prediction service is solved.

Description

Machine learning prediction method and system for data privacy protection

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a machine learning prediction method and system for data privacy protection.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, artificial intelligence techniques such as machine learning have been widely used in various fields such as image recognition and text processing. Training a model requires a large amount of data, high computational resources and associated expertise, which is clearly difficult for average individuals and businesses. In order to solve the problem, all large companies begin to provide machine learning, namely service, and users can obtain a prediction result by directly uploading data and selecting a proper model without learning a complex machine learning algorithm. Such as amazon machine learning and service platforms, can help generate billions of real-time predictions each day. The inventors have found that while predictive services provide convenience to users, they also pose a threat to personal privacy. On one hand, the data of the user providing the prediction data has the risk of information leakage: when personal sensitive information such as related medical and pathological data is predicted, the service platform can directly acquire the user privacy information, the information is uploaded and stored in the server, and if the information is maliciously collected or is attacked from the outside, personal privacy data can be leaked. On the other hand, the data used by the model provider to predict the model presents a risk of leakage: in recent years, more and more attacks aiming at machine learning are proposed, such as model inversion attack (model inversion attack), membership inference attack (membership attack), and the like, and an attacker can also presume the attribute of original sensitive data only through an attack model without directly contacting the original data. If the model is trained on the basis of the private data, an adversary can pretend to be an honest user and attack through malicious query, which undoubtedly brings hidden danger to machine learning and service use. In conclusion, in the process of providing the machine learning prediction service based on the private data, the two-way privacy disclosure problem exists, including the problem that the user uploaded data is possibly stolen by a service provider, and the mechanism provided prediction model is possibly attacked by a malicious user, so how to realize the safe and reliable prediction service has important value in practical application.

Disclosure of Invention

In order to solve the problems, the disclosure provides a machine learning prediction method and a machine learning prediction system facing data privacy protection, which combine secure multiparty computing and an SGX technology to protect privacy security of two parties, and solve the security problem in the process of providing prediction service.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

one or more embodiments provide a machine learning prediction method facing data privacy protection, which includes the following steps:

acquiring data: the main server obtains encrypted data to be predicted and an encrypted prediction model;

the main server creates a credible area, and decrypts the acquired data to be predicted and the prediction model in the credible area; the main server carries out secret sharing on the decrypted data to be predicted and the prediction model, obtains secret shares of the data and model shares respectively, and distributes the secret shares of the data and the model shares to the auxiliary server and the main server which are not colluded;

the auxiliary server and the main server respectively carry out prediction calculation according to the obtained secret share and the model share of the data to obtain a prediction result share, and the auxiliary server encrypts and sends the obtained prediction result share to the main server;

the main server obtains the encrypted prediction result shares sent by the auxiliary server, secretly reconstructs all the prediction result shares, forwards the reconstructed prediction result shares to the trusted area for integration and encryption, and sends the reconstructed prediction result shares to the data providing terminal to be predicted, and the data providing terminal decrypts to obtain the prediction result predicted according to the model.

the main server obtains the encrypted prediction result shares sent by the auxiliary server, secretly reconstructs all the prediction result shares, forwards the reconstructed prediction result shares to the trusted area for integration and encryption, and sends the reconstructed prediction result shares to the data providing terminal to be predicted.

the auxiliary server respectively obtains the secret share and the model share of the data;

the auxiliary server predicts shares according to respective models and according to the local private key sk_sDecrypting to obtain the key k of the main server_sBy means of a secret key k_sDecrypting to respectively obtain the original parameters of the prediction model and the data to be predicted;

the auxiliary server predicts according to the secret share and the model share of the data and adopts a Chebyshev polynomial to approximate an activation function to carry out nonlinear activation function calculation so as to obtain a prediction result share;

and encrypting the prediction result share by adopting a homomorphic encryption algorithm: each auxiliary server uses the homomorphic encrypted public key pk distributed by Enclave_epThe predicted share results are encrypted and sent to the primary server.

One or more embodiments provide a machine learning prediction system facing data privacy protection, which comprises a model providing terminal, a data providing terminal to be predicted, and an auxiliary server and a main server which are not colluded;

the model providing terminal: for providing a machine learning training model;

the data to be predicted providing terminal: the data to be predicted is used for providing a training model;

a main server: executing the machine learning prediction method facing to the data privacy protection;

the auxiliary server: the machine learning prediction method facing data privacy protection is implemented.

Compared with the prior art, the beneficial effect of this disclosure is:

(1) the disclosed machine learning prediction method provides reliable bi-directional security: the user privacy data and the prediction result cannot be stolen by a model provider and a server; the details of the model uploaded by the prediction service organization are not leaked to the main server and the user. On one hand, in the whole calculation process, private data of a user (a terminal for providing data to be predicted) and a prediction model of a model provider are uploaded in an encryption mode, only trusted Enclave can operate the data in a plaintext state, and processed data is stored in an unfamiliar server in a shared value mode, so that the data is prevented from being stolen by a main server.

The security of the prediction result is realized by homomorphic encryption, so that the privacy disclosure when the result is reconstructed is prevented, the security of the application program is difficult to ensure as the key in the conventional general cloud environment is usually stored in the form of plaintext on an untrusted node, and the key in the disclosure is stored in the trusted Enclave to prevent the access disclosure from an internal manager or privileged software.

(2) The technical scheme of the present disclosure can reduce user overhead: according to the method, secret sharing is moved to a server side, reliable Enclave is established through an SGX technology to operate data to be protected, and a large amount of computing overhead is transferred to a cloud server.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.

FIG. 1 is a diagram of the overall architecture of a system according to embodiment 4 of the present disclosure;

FIG. 2 is a flow chart of a method of embodiment 1 of the disclosure;

fig. 3 is a schematic diagram of shared value addition calculation according to embodiment 1 of the present disclosure;

fig. 4 is a schematic diagram of shared value multiplication calculation of embodiment 1 of the present disclosure;

fig. 5 is a schematic diagram of embodiment 1 of the present disclosure approximating a first activation function of a neural network using a chebyshev polynomial;

fig. 6 is a schematic diagram of approximating a second activation function of a neural network using a chebyshev polynomial in embodiment 1 of the present disclosure;

fig. 7 is a remote authentication flowchart in embodiment 1 of the present disclosure;

fig. 8 is a flowchart of a primary server and a secondary server homomorphic encryption in embodiment 1 of the present disclosure;

fig. 9 is a flowchart of bidirectional encryption between the host server and the user or model providing terminal in embodiment 1 of the present disclosure.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.

SGX (Intel Software Guard extensions) is a new processor technology developed by Intel, and can provide a trusted space on a computing platform to ensure confidentiality and integrity of key codes and data of a user. The data needing protection can be safely encapsulated in an environment called Enclave, and the environment can prevent attacks from external malicious software or privileged software (such as an operating system).

The collusion-free server is: the cloud servers are independent of each other, which means that two cloud servers cannot collude with each other.

The encryption algorithm appearing in the formula in this embodiment is denoted En (∙) and the decryption algorithm is denoted Dec (∙).

Homomorphic Encryption (Homomorphic Encryption): homomorphic encryption is a special encryption method that allows ciphertext to be processed to still be the result of encryption. Homomorphic encryption is divided into full homomorphic encryption and semi-homomorphic encryption, and the full homomorphic encryption is an encryption function which simultaneously satisfies the properties of addition homomorphism and multiplication homomorphy and can carry out addition and multiplication operations for any number of times. Semi-homomorphism is satisfied only with additive or multiplicative properties.

In the technical solutions disclosed in one or more embodiments, as shown in fig. 1 and 2, a machine learning prediction method for data privacy protection is used to input data to be predicted of a user to a model of a terminal provided according to the model, directly obtain a prediction result, and send the prediction result to the user, in the process, the user cannot obtain the model of the terminal provided by the model, and the terminal provided by the model cannot obtain data of the user, so that protection of the data to be predicted and training data of the model is achieved, and the method includes the following steps:

step 1, acquiring data: the main server obtains encrypted data to be predicted and an encrypted prediction model;

step 2, the main server creates a credible area, and decrypts the acquired data to be predicted and the prediction model in the credible area; the main server carries out secret sharing on the decrypted data to be predicted and the prediction model, obtains secret shares of the data and model shares respectively, and distributes the secret shares of the data and the model shares to the auxiliary server and the main server which are not colluded;

step 3, the auxiliary server and the main server respectively carry out prediction calculation according to the obtained secret share of the data and the obtained model share of the data to obtain a prediction result share, and the auxiliary server encrypts the obtained prediction result share and sends the encrypted prediction result share to the main server;

and 4, the main server acquires the encrypted prediction result shares sent by the auxiliary server, secretly reconstructs all the prediction result shares, forwards the reconstructed prediction result shares to a trusted area for integration and encryption, sends the integrated prediction result shares to the data providing terminal to be predicted, and decrypts the data providing terminal to obtain the prediction result predicted according to the model.

In the above steps, the main server serves as a transfer station to distribute the received data and the model share to the auxiliary server, meanwhile, the main server also reserves a share, the main server and the auxiliary server are two servers which are not colluded, the data share is input to the model share according to the obtained shares to perform prediction calculation to obtain a prediction result share, the prediction result shares are integrated through the main server and sent to a provider of the data to be predicted, and a prediction result according to the data to be predicted is obtained.

The embodiment realizes bidirectional protection of the prediction data and the prediction model through two servers, namely a main server and an auxiliary server, and solves the problems on the basis of not increasing the calculation overhead. The data to be predicted is not provided for a model provider, forwarding and processing are carried out through the server, the data of the auxiliary server or the main server is only one part of the data to be predicted and is not complete data, the complete data cannot be obtained even if the data of the auxiliary server is leaked, the confidentiality of the data to be predicted is improved, meanwhile, encryption is carried out in the data transmission process, and the safety of data transmission is improved.

The number of servers is only an example, and the number of the participants of the multi-party calculation can be set according to specific situations, and the number of the participants of the multi-party calculation is larger than two, and meanwhile, the cost and the calculation amount of the system are increased.

The following is a detailed description:

in the step 1, a model providing terminal provides a machine learning model, the providing terminal of the data to be predicted can provide the data to be predicted for a user, and the user inputs the corresponding model according to the data to be predicted to obtain a prediction model so as to provide the model for the user to use. And selecting a proper model according to the data submitted by the user and the prediction requirement, and providing the model for the user. The models may be uploaded to a server and stored in advance. The providing time of the model and the time of the data to be predicted are not necessarily provided at the same time, the sequence of time is not limited, the model can be prepared and transmitted to the main server, and the data to be predicted can be received when the user side needs to predict.

As a further improvement, in step 2, the main server may dynamically apply for building a trusted zone Enclave in an Intel SGX trusted mode, and the main server creates a trusted execution environment Enclave, which is a protected content container for storing sensitive data and codes in the computing process and protecting the sensitive data and codes from being accessed and attacked by external malicious software. The user, the model provider, and the auxiliary server need to perform remote authentication with the Enclave to ensure that the main server really runs a protected Enclave.

SGX: the Intel SGX is a new extension of the Intel architecture, and adds a new instruction set and a memory access mechanism on the original architecture. These extensions allow applications to implement a container called Enclave that partitions a protected area within the application's address space, providing protection of the code and data within the container from malicious software that has special permissions. The SGX does not identify and isolate all malware on the platform, but encapsulates the security operations of legitimate software in a trusted zone Enclave to protect the legitimate software from being attacked by malware.

The key for encrypting the main server, the data providing terminal to be predicted, the model providing terminal and the auxiliary server in the execution process of the steps is shared.

Further, in order to ensure that the server really runs in a necessary component containing Enclave, before step 1, the data providing terminal to be predicted and the model providing terminal perform remote authentication with the server, and key sharing is performed between the Enclave establishing the trusted zone of the main server and the data providing terminal to be predicted, the model providing terminal and the auxiliary server. The user, the model provider, the auxiliary server need to perform remote authentication with the Enclave to ensure that the main server really runs a protected Enclave.

The principle of the remote authentication process is shown in fig. 7, and includes the following steps:

1) firstly, a communication channel is established between an challenger and a platform application program, and the challenger initiates a challenge to the application program.

2) And the application program sends the encapsulation identity information of the requesting encapsulation to the platform application encapsulation together with the challenge.

3) A manifest is generated by Enclave, including responses to the challenge, after which the challenger will use the temporary public key, then generate a hash digest of the manifest, generate a REPORT associated with the manifest, and send the REPORT to the application.

4) The application sends a REPORT to the querying envelope for signing.

5) And (3) the checking Enable firstly carries out in-platform authentication on the REPORT, if the checking is successful, the signature carried out by the private key of the checking Enable is replaced with the MAC value in the REPORT to generate the QUOTE, and the QUOTE is returned to the application program.

6) The application sends the QUOTE and associated supported data manifest to the challenger.

7) The challenger uses the EPID public key certificate and the authentication verification service to perform signature verification on QUOTE and verify the integrity of the list.

The data providing terminal and the model providing terminal to be predicted carry out remote authentication with the main server, and key sharing is carried out between an Enclave zone of the main server and the data providing terminal to be predicted, between the model providing terminal and the auxiliary server: the main service runs the Enclave, and each terminal (a data providing terminal to be predicted and a model providing terminal) is used as an challenger to carry out remote authentication with the main server, so that the main server is ensured to really run a credible Enclave.

Optionally, the trusted zone Enclave of the main server, the data providing terminal to be predicted, and the model providing terminal may transmit data in a hybrid encryption manner combining RSA encryption and AES encryption. And a Paillier homomorphic encryption algorithm is adopted between the trusted zone Enclave of the main server and the auxiliary server to encrypt and decrypt the transmission data.

RSA is the first relatively perfect public key cryptographic algorithm, the security of which is based on the difficulty of large integer molecular decomposition, and the RSA cryptosystem is as follows:

1) selecting two large prime numbers p and q;

2) the calculation of n is p × q,

3) randomly selecting e:

and are connected with

A mutualin;

4) computing

The public key is pk ═ (n, e), and the private key is sk ═ p, q, d;

5) encryption of c ═ m^emodn；

6) Decrypting m ═ c^emodn。

The advanced Encryption standard AES (advanced Encryption standard) is the most common symmetric Encryption algorithm, i.e. the same key is used for Encryption and decryption, and the AES Encryption process involves 4 operations, namely byte substitution, row shifting, column obfuscation, and round key addition. The decryption processes are respectively corresponding inverse operations. Since each step of operation is reversible, the plaintext can be recovered by decrypting in the reverse order.

The method comprises the following steps that a trusted zone Enclave of a main server, a data providing terminal to be predicted, namely a user, a model provider and an auxiliary server respectively generate a public and private key pair: (sk)_e，pk_e)_RSA， (sk_ep，pk_ep)_Pailler，(sk_u，pk_u)_RSA，(sk_s，pk_s)_RSAAnd key K of AES_e，K_u， K_MP. The user, the auxiliary server and the Enclave share the public key of the respective RSA, and simultaneously the Enclave sends the Paillair public key of the Enclave to the two servers.

Wherein (sk)_e，pk_e)_RSA: generated by Enclave, and the public key pk_eRespectively sent to the model providing terminal and the data providing terminal to be predicted, and the private key sk_eLocally reserved AES key K for encryption and decryption from a model provider terminal and a data provider terminal to be predicted_u，K_mp。

(sk_ep，pk_ep)_Pailler: generated by Enclave, and respectively sent to the auxiliary server and the main server for encrypting and decrypting the predicted result share predicted by the auxiliary server.

(sk_u，pk_u)_RSA: generated by the user, and the public key pk_uEnclave, private key sk sent to host server_uAnd the prediction result is kept locally and used for encrypting and decrypting the prediction result reconstructed by the envelope, and the prediction result is sent to a user after being encrypted and decrypted after being received by the user.

(sk_s，pk_s)_RSA: generated by the auxiliary server, and the public key pk_sEnclave, private key sk sent to host server_sLocally reserved for encrypting and decrypting the AES key K generated by Enclave_e。

K_u: the data to be predicted is provided by a data providing terminal to be predicted, and the data to be predicted uploaded by the data providing terminal to be predicted, namely a user, is encrypted.

K_MP: model for encrypted upload by locally generating AES key by model-providing terminalMolding;

K_e: and generating by Enclave, and encrypting and decrypting data sent by the main server to the auxiliary server and the user.

The prediction model in step 1 may be a prediction model that is trained locally in advance by the model providing terminal based on local data, and the prediction model may be established by using any machine learning method.

As shown in fig. 9, the method for transmitting data between the trusted zone Enclave of the main server and the model providing terminal by using a hybrid encryption method combining RSA encryption and AES encryption includes a method for implementing the encryption step in step 1 and the decryption step in step 2, and specifically includes:

encryption step of the model providing terminal: the method of the prediction model encrypted in step 1 may be: model providing terminal locally generates AES key k of model providing terminal_MPEncrypting the prediction model parameter ω_IObtaining an encrypted model parameter ciphertext

I.e. by

1, 2, wherein I is the participant's number.

RSA public key pk shared according to host server Enclave_eAES key k of encryption model providing terminal_MPI.e. by

Sending the encrypted prediction model parameters and the encrypted ciphertext of the AES key of the model providing terminal as a mixed ciphertext to a main server, and forwarding the ciphertext to an Enclave by the main server;

the method for decrypting the prediction model by the trusted zone Enclave of the main server comprises the following steps: after the Enclave receives the mixed ciphertext, the Enclave adopts a local RSA private key sk_eCipher text obtaining model extraction of AES key of decryption model providing terminalAES key k for terminal_MPI.e. by

Providing AES key k for terminal according to model_MPDecrypting encrypted model parameter ciphertext

Obtaining a model parameter omega_II.e. by

A method for a user to upload private data of the user as data to be predicted (i.e., data to be predicted), and a trusted area Enclave of a main server and a data providing terminal (e.g., a user) to be predicted transmit data in a hybrid encryption mode combining RSA encryption and AES encryption, as shown in fig. 9, includes a method for implementing the encryption step of the data to be predicted in step 1 and the decryption step in step 2, and specifically includes:

(1) prediction data providing terminal (such as user) locally provides AES key k of terminal by prediction data_uEncrypting the prediction data x to obtain an encrypted prediction data ciphertext C_xI.e. by

RSA public key pk using trusted zone Enclave of host server_eAES key k for encrypted predictive data providing terminal_uObtaining ciphertext c of AES key of prediction data providing terminal_uIs that is

To predict data ciphertext C_xAnd cipher text c of AES key of prediction data providing terminal_uAnd sending to the main server.

(2) The main server forwards the ciphertext to the trusted zone Enclave, and after receiving the mixed ciphertext, the trusted zone Enclave uses the RSA private key of the trusted zone Enclavesk_eAES key k for obtaining prediction data providing terminal by decryption_uIs that is

Providing AES key k of terminal by predicting data_uThe decryption obtains the prediction data x, i.e.

In step 2, the method in which the main server performs secret sharing on the decrypted prediction data and the model, obtains a plurality of secret shares of data and model shares, and distributes the secret shares of data and model shares to the collusive auxiliary server includes the following steps:

step 21, decrypting the obtained prediction model in the trusted zone Enclave of the main server, performing encryption secret sharing on model parameters, sending one model share to the main server, encrypting other model shares and sending the other model shares to the auxiliary server, deleting the original model by the trusted zone Enclave, and storing the model in the auxiliary server in a share form; deleting the original model, namely original model parameter data; the method comprises the following specific steps:

protection model parameter omega is secretly shared by main server trusted zone Enclave through addition_IWill be ω_IShare divided into two_i(ω_I) I is 0, 1, i.e. ω_I＝(share₀(ω_I)+ share₁(ω_I) Mod Q, each model share being a shared value, where shared value and Q both belong to a finite field;

encrypting the key k for one of the two model shares via the master server_eCarry out encryption, k_eEncrypting a share of the model

Encrypting the primary server encryption key k by the RSA public key pks of the trusted zone Enclave_eI.e. by

After encryptionAnd forwarded to the primary server. The cryptographic model shares are forwarded by the primary service to the secondary server, and another share remains at the primary server. After secret distribution is completed, original model parameter data omega_IDeleted by the trusted zone Enclave of the main server.

Enclave divides private data into two secret shares, share, by encrypting secret sharing₀(ω_I) And share₁(ω_I) And after being encrypted, the secret share is forwarded to the auxiliary server through the main server, and is decrypted and stored in the auxiliary server. The other secret share is stored directly in the master server in clear text. E.g. the final master server owns share₀(ω_I) The auxiliary server has share₁(ω_I) The original privacy data are stored in the form of plain text secret shares in two separate servers.

The encryption of one secret share realizes that the secret cannot be stolen by the main server, and if the secret share is not encrypted, the main server obtains the secret shares of two plaintexts to recover user data or a model, so that the privacy is revealed. And directly storing the unencrypted plaintext share in the main server to participate in the subsequent prediction calculation. The encrypted secret shares are forwarded to the secondary server for decryption storage, and the two servers each have one secret share, and the secret cannot be recovered because the two servers are not in collusion.

And step 22, after receiving the data to be predicted, decrypting the obtained data to be predicted in an Enclave of the main server, sharing the decrypted data to be predicted by adopting encryption secrets to obtain secret data shares, sending one secret data share to the main server, and sending the other secret data shares to the auxiliary server after encrypting.

Enclave shares data x to be predicted which is input for protecting privacy through addition secrets, the execution steps are the same as the above model, and x is divided into 2 data secret shares: share_i(x) I is 0, 1, one share of the data is encrypted,

forwarded to the main server and is sent by the main serverThe encrypted data shares are sent to the secondary server and the clear data shares are retained. After the secret sharing operation is completed, the original data is destroyed by Enclave.

The number of the auxiliary servers can be set according to needs, and the embodiment is described by taking one auxiliary server as an example.

Step 3 is a step of calculating according to the selected specific model prediction, because the model prediction may involve multiplication, it is difficult to perform direct multiplication calculation on the shared value, as a further improvement, in order to reduce the calculation overhead when the server performs prediction, this embodiment may be completed by means of a multiplication triple, and step 2 further includes the following steps: beaver triples (u, v, z) are generated in the trusted zone Enclave of the main server, distributed and stored in the auxiliary server and the main server. And (3) in the step of predicting in the step 3, directly using the Beaver triple (u, v, z) to complete related multiplication, thereby reducing the calculation cost of the auxiliary server and improving the efficiency of data processing.

In step 3, the auxiliary server and the main server respectively perform prediction calculation according to the secret share and the model share of the data acquired by the auxiliary server and the main server to obtain a prediction result share, and the method for encrypting the obtained prediction result share by the auxiliary server and sending the encrypted prediction result share to the main server is as follows:

and (3-1) decrypting the prediction model and the data to be predicted: the auxiliary server reserves the respective model prediction shares, and the specific auxiliary server keeps the respective model prediction shares according to the local private key sk_sDecrypting to obtain the key k of the main server_eBy means of a secret key k_eDecrypting to respectively obtain the original parameters of the prediction model and the data to be predicted; is that

(3-2) prediction calculation: and the main server and the auxiliary server respectively carry out prediction calculation according to the secret share and the model share of the data, and adopt a Chebyshev polynomial approximation activation function to carry out nonlinear activation function calculation so as to obtain a prediction result share.

Two auxiliary servers in respectively owned data secret share and modulePrediction on type fraction

The prediction calculation mainly involves addition and multiplication of shared values. For the nonlinear activation function, polynomial approximation is adopted, the activation function is fitted through a high-order Chebyshev polynomial, compared with a common polynomial, the Chebyshev polynomial has better fitting performance and accuracy, the calculation efficiency is also ensured to be within an acceptable range, and as shown in fig. 5 and 6, the activation function of the neural network is fitted through the polynomial, and the nonlinear activation function is converted into a linear function so as to calculate the shared value.

The principle of shared value addition calculation, as shown in fig. 3, is as follows: given two secrets a, b, two servers S_iHaving the respective shared values a of two numbers_i，b_i，i＝0，1a_i，b_iE F, F is a finite field where a ═ a (a)₀+a₁)mod Q，b＝(b₀+b₁) mod Q, Q ∈ F. The two-party server calculates the sum of two secrets (a + b, S) by the secret sharing value_iDirectly calculating the sum of two shared values owned by itself_i＝(a_i+b_i) mod Q, then sent to S_1-iTwo servers run a reconstruction algorithm to reconstruct the secret, i.e. c ═ Rec (c)₀，c₁)＝c₀+c₁。

Shared value multiplication: the shared value is more complex in multiplication calculation, and needs to be assisted by a multiplication triple, namely u, v, z satisfies z-uv mod Q, and the multiplication is generated by credible Enclave and distributed to two servers, namely S_iHaving respective u_i，v_i，z_i，i＝0，1。

Given two secrets a, b, two servers S_iHaving the respective shared values a of two numbers_i，b_i，i＝0，1，a_i，b_iE F where a ═ a₀+a₁)mod Q，b＝(b₀+ b₁) mod Q. The two-party servers calculate two secret products c ═ a × b from the secret shared value, as shown in fig. 4, and each server S_iFirst calculate e_i＝a_i-u_i，f_i＝b_i-v_iHiding the locally shared value, then exchanging the hidden value e_i，f_i. After obtaining the hidden value, S_iLocal reconstruction of e ═ Rec (e)₀，e₁)，f＝Rec(f₀，f₁) And calculate c_i＝-i·e·f+f·a_i+ e·b_i+z_iSending the calculation result to S_1-iThe two servers locally rebuild c ═ Rec (c)₀，c₁)＝c₀+c₁。

And (3-3) encrypting the prediction result share by adopting a homomorphic encryption algorithm: the auxiliary server uses the homomorphic encrypted public key pk distributed by Enclave_epEncrypt the predicted share result, i.e.

The homomorphic encryption flow of the specific primary server and the secondary server is shown in fig. 8.

In the embodiment, Paillier homomorphic encryption is adopted, the algorithm meets the addition homomorphic condition, and the safety of the algorithm is based on the problem of judging the remainder of the composite number. The Paillier algorithm process is as follows:

1) selecting two large prime numbers p and q;

2) the calculation of N is p × q,

so that gcd (L (g)^λmod N²) N) ═ 1, where l (x) is (x-1)/N;

3) calculating a public key pk ═ (N, g), sk ═ λ ═ lcm (p-1, q-1), λ is the least common multiple of p-1, q-1;

4) randomly selecting a random number r, r < N, and encrypting c ═ E_pk(m)＝g^mr^Nmod N²；

5) Decryption

In step 4, the main server obtains the encrypted prediction result share sent by the auxiliary server for all the auxiliary serversThe secret reconstruction method for the prediction result share specifically reconstructs the prediction result under the ciphertext according to the addition homomorphism, namely the prediction result is

In addition homomorphism, namely, an encryption algorithm f satisfies f (a) + f (B) ═ f (a + B), in the embodiment, the private data (user data and model) is divided into two secret shares and stored in two servers for calculation, and two prediction shares are obtained, and the two prediction shares are restored into a complete ciphertext prediction result by the main service.

Because the main server does not have an Enclave private key which can not decrypt the prediction result, but can reconstruct the encrypted prediction share, in order to avoid the leakage of the prediction result when the server reconstructs the prediction result, and meanwhile, considering that the actual memory of the Enclave is small and can not support a large amount of calculation, the server uses homomorphic encryption protection to reconstruct the prediction result and then forwards the prediction result to the Enclave of the main server for operation instead of directly reconstructing the prediction result in the Enclave, the Enclave calculation cost is reduced, and the overall efficiency is improved.

In step 4, the reconstructed prediction result share is forwarded to the trusted zone, and the reconstructed prediction result share is integrated and encrypted in the trusted zone, which may include the following steps:

4-1, decryption step: the main server sends the reconstructed encrypted prediction result to the envelope of the main server for decryption to obtain a prediction result in a plaintext: the decryption key of Enclave of the main server is sk_epThe decryption formula is

4-2, integrated prediction results: and selecting the prediction result with the maximum vote number as a final prediction result by adopting a voting method.

Enclave integrates the prediction results by using a voting method to obtain a final prediction result y_vote(x) And selecting the classification category with the highest ticket number in the prediction result as the final prediction category for the classification problem. Firstly, the number of the same prediction results, namely the number of votes,

1, 2, 3, selecting a prediction result of the maximum vote

As a final prediction result.

By voting the prediction results of the multiple models and outputting the voting results as final results, on one hand, overfitting can be avoided, and on the other hand, the prediction results cannot be issued independently because some privacy information contained in the data to be predicted can be leaked by the category predicted by a single model. By combining the prediction results of a plurality of models, the condition that the final result is too dependent on a single model and is easy to be attacked such as member reasoning attack is avoided.

4-3, encrypt the final prediction result y_vote(x) The method comprises the following steps AES private key k adopting trusted zone Enclave_eEncrypting the final prediction result to obtain a final prediction result ciphertext C_vote，

Using RSA public key pk_uEncryption of AES private key k_e，

The final prediction result ciphertext and the encrypted AES private key k_eThe ciphertext is transmitted to the main server, and the final prediction result y is deleted_vote(x)。

And a decryption step of the data to be predicted providing terminal, namely the user: the main server encrypts a final prediction result ciphertext and an encrypted AES private key k_eAnd the subsequent ciphertext is sent to the user. User local private key sk_uDecryption encryption AES private key k_eThe subsequent ciphertext obtains an AES private key k of the trusted zone Enclave_eIs that is

AES private key k through trusted zone Enclave_eDecrypting the final prediction result ciphertext to obtain the Dec_e(C_vote)→y_vote(x)。

The prediction method has the following advantages:

(1) the disclosed machine learning prediction method provides reliable bi-directional security: the prediction result of the user privacy data cannot be stolen by a model provider and a main server; the details of the model uploaded by the prediction service organization are not leaked to the main server and the user. On one hand, in the whole calculation process, private data of a user (a terminal for providing data to be predicted) and a prediction model of a model diagram provider are uploaded in an encryption mode, only trusted Enclave can operate the data in a plaintext state, and processed data is stored in an auxiliary server which is not colluded in a shared value mode, so that the data is prevented from being stolen.

(3) Secret sharing is adopted, secret sharing divides a secret in a proper mode, each divided share is managed by different participants, a single participant cannot recover secret information, and only a plurality of participants cooperate together, the secret information can be recovered. More importantly, the secrets can still be fully recovered when any participant within the respective range goes wrong. Since in this scheme addition and multiplication of the shared value is involved, the shared value calculation is different from the straight forward calculation in the clear.

Example 2

The embodiment provides a machine learning prediction method facing data privacy protection, which is implemented in a main server and comprises the following steps:

Example 3

The embodiment provides a machine learning prediction method facing data privacy protection, which is implemented in an auxiliary server and comprises the following steps:

and encrypting the prediction result share by adopting a homomorphic encryption algorithm: each auxiliary server uses the homomorphic encrypted public key pk distributed by Enclave_epEncrypting and concurrently forwarding predicted share resultsTo the main server.

Example 4

The embodiment provides a machine learning prediction system facing data privacy protection, which is characterized in that: the prediction method comprises a model providing terminal, a data providing terminal to be predicted, and an auxiliary server and a main server which are not colluded;

the model providing terminal: for providing a machine learning predictive model;

the data to be predicted providing terminal: data to be predicted for providing a prediction model;

a main server: the machine learning prediction method for data privacy protection is described in embodiment 2;

the auxiliary server: the machine learning prediction method for data privacy protection in embodiment 3.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A machine learning prediction method facing data privacy protection is characterized by comprising the following steps:

2. The method for predicting machine learning oriented to data privacy protection as claimed in claim 1, wherein: the method specifically includes that a main server creates a trusted zone, namely dynamically applies for building a trusted zone Enclave in an Intel SGX trusted mode.

3. The method for predicting machine learning oriented to data privacy protection as claimed in claim 1, wherein: before the data obtaining step, the data providing terminal to be predicted and the model providing terminal are remotely authenticated with the server, and key sharing is carried out between an Enclave of the main server and the data providing terminal to be predicted as well as between the model providing terminal and the auxiliary server;

or, the trusted zone Enclave of the main server transmits data with the data providing terminal to be predicted and the model providing terminal by respectively using a mixed encryption mode combining RSA encryption and AES encryption;

or, a Paillier homomorphic encryption algorithm is adopted between the trusted zone Enclave of the main server and the auxiliary server to encrypt and decrypt the transmission data.

4. The method of claim 3, wherein the machine learning prediction method for data privacy protection comprises:

the method for transmitting data between the trusted zone Enclave of the main server and the model providing terminal by using a mixed encryption mode combining RSA encryption and AES encryption specifically comprises the following steps:

encryption step of the model providing terminal: the model providing terminal encrypts training model parameters by adopting an AES (advanced encryption standard) key of the local model providing terminal to obtain an encrypted model parameter ciphertext;

providing an AES key of a terminal according to an RSA public key encryption model shared by the Enclave of the main server, sending the encrypted training model parameters and the encrypted AES key of the model providing terminal to the main server as a mixed ciphertext, and forwarding the ciphertext to the Enclave by the main server;

the method for decrypting the training model by the trusted zone Enclave of the main server comprises the following steps: after the Enclave receives the mixed ciphertext, the Enclave decrypts the AES key by adopting a local RSA private key to obtain the AES key of the model providing terminal, and decrypts the encrypted training model parameter ciphertext according to the AES key of the model providing terminal to obtain the model parameter; or

The method for transmitting data between the trusted zone Enclave of the main server and the data providing terminal to be predicted respectively by using a mixed encryption mode combining RSA encryption and AES encryption comprises the following specific steps:

the data to be predicted providing terminal encrypts the data to be predicted through an AES key of the data to be predicted providing terminal to obtain an encrypted data ciphertext to be predicted; encrypting an AES key of the data providing terminal to be predicted by using an RSA public key of Enclave in a trusted area of the main server to obtain a ciphertext of the AES key of the data providing terminal to be predicted; sending a data cipher text to be predicted and a cipher text of an AES key of a data providing terminal to be predicted to a main server;

the main server forwards the ciphertext to the trusted zone Enclave, and the local RSA private key sk in the trusted zone Enclave_eAnd decrypting to obtain the AES key of the data providing terminal to be predicted, and decrypting to obtain the data to be predicted through the AES key of the data providing terminal to be predicted.

5. The method for predicting machine learning oriented to data privacy protection as claimed in claim 1, wherein:

the method for the main server to carry out secret sharing on the decrypted data to be predicted and the prediction model, respectively obtaining the secret share and the model share of the data, and distributing the secret share and the model share of the data to the auxiliary server and the main server which are not colluded specifically comprises the following steps:

decrypting the obtained prediction model in an Enclave of a trusted area of a main server, carrying out encryption secret sharing on model parameters, sending one model share to the main server, and sending other model shares to an auxiliary server after encrypting;

after receiving the data to be predicted, decrypting the obtained data to be predicted in an Enclave of a trusted area of a main server, carrying out encryption secret sharing on the decrypted data to be predicted to obtain data secret shares, sending one of the data secret shares to the main server, and sending the other data secret shares to an auxiliary server after being encrypted.

6. The method for predicting machine learning oriented to data privacy protection as claimed in claim 1, wherein: the method also comprises the following steps before the auxiliary server performs the prediction calculation: generating a Beaver triple in an Enclave of a main server, and distributing the Beaver triple to an auxiliary server and the main server;

or

The method comprises the following steps that the auxiliary server and the main server respectively carry out prediction calculation according to the obtained secret share and the model share of the data to obtain a prediction result share, the auxiliary server encrypts the obtained prediction result share and sends the encrypted prediction result share to the main server, and the method comprises the following steps:

decrypting the training model and the data to be predicted: the auxiliary server reserves the respective model prediction share and performs prediction according to the local private key sk_sDecrypting to obtain the key k of the main server_sBy means of a secret key k_sDecrypting to respectively obtain the original parameters of the prediction model and the data to be predicted;

the auxiliary server carries out training prediction according to the secret share and the model share of the data, and adopts a Chebyshev polynomial approximation activation function to carry out nonlinear activation function calculation to obtain a prediction result share;

and encrypting the prediction result share by adopting a homomorphic encryption algorithm: the secondary server encrypts the predicted share result using the homomorphic encrypted public key distributed by Enclave.

7. The method for predicting machine learning oriented to data privacy protection as claimed in claim 1, wherein: the method comprises the steps that the main server obtains the encrypted prediction result shares sent by the auxiliary server, and secret reconstruction is carried out on all the prediction result shares, specifically, the prediction results under the ciphertext are reconstructed according to the addition homomorphism;

or

The reconstructed prediction result share is forwarded to a trusted area, and the reconstructed prediction result share is integrated and encrypted in the trusted area, which comprises the following steps:

and (3) decryption: the main server sends the reconstructed encrypted prediction result to the envelope of the main server for decryption to obtain a prediction result in a plaintext;

integrating the predicted results: selecting the prediction result with the maximum vote number as a final prediction result by adopting a voting method;

AES private key k adopting trusted zone Enclave_eEncrypting the final prediction result to obtain a final prediction result ciphertext;

or

The decryption step of the data to be predicted providing terminal comprises the following steps:

the main server encrypts a final prediction result ciphertext and an encrypted AES private key k_eThe ciphertext is sent to a data providing terminal to be predicted;

local private key sk of data providing terminal to be predicted_uDecryption encryption AES private key k_eThe subsequent ciphertext obtains an AES private key k of the trusted zone Enclave_eAES private key k through trusted zone Enclave_eAnd decrypting the final prediction result ciphertext to obtain a prediction result.

8. A machine learning prediction method facing data privacy protection is characterized by comprising the following steps:

acquiring data: the main server obtains encrypted data to be predicted and an encrypted training model;

the main server establishes a credible area, and decrypts the acquired data to be predicted and the training model in the credible area; the main server carries out secret sharing on the decrypted data to be predicted and the training model, a plurality of data secret shares and model shares are respectively obtained and distributed to a plurality of auxiliary servers which are not colluded;

the main server obtains the encrypted prediction result shares sent by the multiple auxiliary servers to carry out secret reconstruction respectively, the reconstructed prediction result shares are forwarded to the trusted area to be integrated and encrypted, the integrated and encrypted prediction result shares are sent to the data providing terminal to be predicted, and the data providing terminal decrypts the prediction result predicted according to the model.

9. A machine learning prediction method facing data privacy protection is characterized by comprising the following steps:

10. A machine learning prediction system facing data privacy protection is characterized in that: the prediction method comprises a model providing terminal, a data providing terminal to be predicted, and an auxiliary server and a main server which are not colluded;

the model providing terminal: for providing a machine learning training model;

a main server: for performing a data privacy protection oriented machine learning prediction method of claim 8;

the auxiliary server: for performing a data privacy protection oriented machine learning prediction method of claim 9.