CN108521326B

CN108521326B - Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption

Info

Publication number: CN108521326B
Application number: CN201810317657.9A
Authority: CN
Inventors: 杨浩淼; 从鑫; 张可; 黄云帆; 何伟超; 张有; 李洪伟
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-04-10
Filing date: 2018-04-10
Publication date: 2021-02-19
Anticipated expiration: 2038-04-10
Also published as: CN108521326A

Abstract

The invention discloses a privacy protection linear SVM model training method based on vector homomorphic encryption, which belongs to the field of information technology safety and comprises the following steps: step 1, a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server; step 2, the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user; step 3, the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server; and 4, training the plaintext linear kernel function matrix by the server by adopting a ciphertext SMO algorithm, and returning a training result to the user.

Description

Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption

Technical Field

The invention belongs to the field of information technology security, and particularly relates to a privacy protection linear SVM model training algorithm based on vector homomorphic encryption.

Background

A Support Vector Machine (SVM) is an important model for classification and regression analysis in Machine learning theory. The method comprises the steps of establishing a mathematical quadratic programming model, solving the model by using the existing training data, solving an optimal decision boundary, and then predicting the data by using the decision boundary to judge the category of the data. The core idea of the SVM is that under the condition that a group of training data and corresponding labels are given, the training data are regarded as points in a space, an interface is searched in the space, the interface can divide the training set data into two parts, the data of the same category are on the same side of the interface, and the different categories are separated through the interface. And then, under the condition of ensuring that all training data are correctly classified, the interface is far away from the training data point as far as possible, so that the reliability in prediction is ensured. When the SVM model carries out prediction, the trained interface is used for checking which side of the interface the prediction data is positioned on so as to judge the category of the data.

Compared with other classification algorithms, the SVM has the following advantages:

(1) the SVM can find an optimal interface by establishing a mathematical programming model, so that the reliability of a classification result is as high as possible.

(2) The SVM is not only suitable for linear classification, but also can be used for nonlinear classification by applying a kernel skill method. Then, the SVM has good robustness, and the determination of the interface after training is only related to the support vector, so that the influence of the addition and deletion training set on the training result is small.

(3) The SVM supports small sample learning, massive data are not needed for training an SVM model, and a classification model with a good effect can be trained by only needing a small number of data sets.

Because of its many advantages, SVM is commonly used in image recognition, text analysis, medical and financial fields. SVMs play an important role, especially in artificial intelligence emerging in recent years.

Training algorithms of SVM models are many, wherein the most common training algorithm is an SMO algorithm, which is a training algorithm specially proposed for the characteristics of SVM models, and compared with a general SVM training algorithm, the training speed is higher and the space requirement is less. Under the condition that the training data set is very huge, a large amount of time can be spent on training of the SVM, and a user generally selects to train the model on the cloud platform. However, the cloud platform itself is not necessarily trusted, and currently, many service providers mostly use cloud servers provided by the cloud platform, such as the arrhizus, the Tencent cloud, the Amazon cloud, and the like, and these cloud platform providers can monitor the cloud servers provided by the cloud platform, so as to obtain private data of an enterprise leasing the cloud services. On the other hand, from the perspective of the user, in order to use services on the internet, such as image recognition, text analysis, and the like, local personal data needs to be uploaded to the cloud, but the user data is visible to the service provider after being uploaded to the cloud, and the server can easily acquire the user data for other uses, such as buying and selling user information, but the user cannot prevent the user data from being used. Therefore, the cloud is not very trusted, and the user cannot know what the user data is specifically done by the cloud. Specifically, the problem of privacy safety is not considered in the SVM training algorithm, in the training process, the training data set is stored in the computer in a plaintext form, and if the cloud is not credible, the cloud can easily acquire the training data set during training, so that the privacy of a user is revealed.

Disclosure of Invention

The invention aims to: in order to solve the problem that privacy of a user is revealed because training data information does not have privacy in the process of training an SVM model on a cloud platform, a privacy protection linear SVM model training algorithm based on vector homomorphic encryption is provided.

The technical scheme adopted by the invention is as follows:

a privacy-protecting linear SVM model training algorithm based on vector homomorphic encryption comprises the following steps:

step 1: a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server;

step 2: the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user;

and step 3: the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server;

and 4, step 4: and the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user.

Further, the step 1 comprises the following steps:

s1.1. initializing a key matrix S [ S ]₁₁,…,S_wv]；

S1.2, training data set X [ X ] through key matrix S₁,…,x_t]Encrypting to obtain a ciphertext data set matrix X_e；

S1.3, calculating a ciphertext conversion matrix M according to the transposition matrix G of the training data set X;

s1.4, a data tag vector y formed by tags of each piece of data in a training data set X and a ciphertext data set matrix X_eAnd the ciphertext transformation matrix M is sent to the server.

Further, the specific content encrypted in step S1.2 is as follows:

using the key matrix S to train each piece of data X in the data set X_i(i is more than or equal to 1 and less than or equal to t) is encrypted, and the encrypted ciphertext vector

And satisfy Sc_i＝wx_i+ e, where i denotes the index, t denotes the number of vectors in the training data set X, n denotes the length of the ciphertext vector c,

p represents a prime number, m represents data x_iThe length of (a) of (b),

is shown in a finite field Z_pA set of vectors of internal length m, w represents an integer parameter, e represents an error vector and | e | < w/2, S represents a key matrix and

q is a prime number and

is shown in a finite field Z_qA set of vectors of internal length n,

is shown in a finite field Z_qA set of matrices with an inner dimension of m x n.

Further, the calculation of the ciphertext transformation matrix M in step S1.3 comprises the following steps:

s1.3.1. according to | c | < 2^lSelecting a matching adaptation integer l, and dividing each of the ciphertext vectors cAn item c_i(i is more than or equal to 1 and less than or equal to n) is converted into a binary expression to obtain a transcoding ciphertext vector c ^ c₁,…,c^_n]Wherein, c ^ c_i＝[c^_i(l-1)，...，c^_i1，c^_i0]，c^_ij＝{- 1,0,1}；

S1.3.2. making the intermediate ciphertext vector be c^*And satisfy c^*∈Z^nlI.e. by

Wherein Z is^nlRepresenting a vector set with length nl in a finite field Z;

s1.3.3. Each item S of the Key matrix S_ij(i is not less than 1 and not more than w, i is not less than 1 and not more than v) is converted into a vector S^* _ij＝ [2^l-1S_ij,2^l-2S_ij,…,2S_ij,S_ij]Obtaining an intermediate key matrix S^*And an intermediate key matrix S^*∈Z^w×vlWhere l denotes an adaptation integer, w denotes the number of rows of the key matrix S, v denotes the number of columns of the key matrix S, Z^m×nlRepresenting a vector set with dimension m multiplied by nl in a finite field Z;

s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S^*+ Emodq, to obtain the key translation matrix

And M is as large as Z^n’×nlNew cipher text vector c ═ Mc^*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S^*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Z^m×nlQ represents a prime number and

A∈Z^(n’-m)×nln 'denotes the length of the new ciphertext vector c', Z^n’×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z^(n’-m)×nlThe vector set representing the dimension (n' -m) x nl in the finite field ZAnd (6) mixing.

Further, the specific content of step 2 is: the server according to the ciphertext data set matrix X_eCalculating a ciphertext linear kernel function matrix K by the ciphertext conversion matrix M_eAnd the linear kernel function matrix K of the cipher text_eAnd sent to the user.

Further, the specific content of step 3 is: user linear kernel function matrix K for cipher text by using key matrix S_eDecrypting to obtain plaintext linear kernel function matrix K₁₁,…,K_mn]And sending the plaintext linear kernel function matrix K to the server.

Further, the step 4 comprises the following steps:

s4.1, initializing Lagrange coefficient vector alpha (alpha)₁,…,α_t) An offset constant b and a penalty coefficient C;

s4.2. selecting a ciphertext data set matrix X_eTwo sample points (x) in_ei,y_i) And (x)_ej,y_j) As the adjustment points, corresponding error values are calculated

And

wherein alpha is_kRepresenting the value of the kth element, y, in the Lagrangian coefficient vector α_kDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, K_ikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, K_jkRepresenting the value of the jth row and kth column of the plaintext linear kernel function matrix K;

s4.3, order the elements to be collected

Computing

And

if it is

Then

If it is

Then

Wherein,

and

respectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, y_iAnd y_jRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, E_iAnd E_jRespectively represent sample points (x)_ei,y_i) And (x)_ej,y_j) The error value of (1);

s4.4. element to be collected

And

respectively replacing the ith element and the jth element in the vector alpha;

s4.5, making the element to be collected

Let the receive offset constant be b^newCalculating a first value to be determined

And a second pending value

If it is

Then b is^new＝b_i(ii) a If it is

Then b is^new＝b_j(ii) a Otherwise

Then shift the receive off by a constant b^newReplaces the value of the offset constant b; wherein, K_iiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, K_ijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, K_jjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;

s4.6. judging ciphertext data set matrix X_eWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.

Further, the ciphertext data set matrix X_eEach sample point (x) in (b)_ei,y_i) KKT conditions that should be met include: 1)

2)

3)

wherein,

wherein alpha is_iRepresenting the value of the ith element in the lagrange coefficient vector alpha.

Further, if there are decimal in the training data set X, before step S1.2 is executed, the training data set X needs to be normalized first, so that the data in the training data set X are all integers, and in step S3, the decryption result needs to be normalized to obtain the plaintext linear kernel function matrix K.

Further, in the case of possession of the key matrix S, the decryption operation on the ciphertext vector c is calculation

Wherein "a")_qRepresenting the nearest integer of a modulo q.

Further, the vector-based homomorphic encryption scheme satisfies homomorphic addition and linear transformation.

The homomorphic addition comprises the following specific contents: when two plaintext cipher text pairs { X } consisting of elements in the training data set X and elements in the cipher text vector c_i，c_iAnd { x }_j，c_jS (c) is satisfied when the same key matrix S is present_i+c_j)＝w(x_i+x_j)+(e_i+ e_j) New ciphertext vector c ═ c_i+c_jNew error vector e ═ e_i+e_jWherein e is_iAnd e_jRespectively representing the error vectors of the ith and jth elements in the training data set X.

The specific contents of the linear transformation are as follows: using an arbitrary matrix B ∈ Z^m’×nFor data x_iCalculating Bx_iWhen (BS) c is wBx_i+Be_iWhere m represents the length of the ith data in the training data set X, and m' represents the data X after linear conversion_iS denotes a key matrix, c denotes a ciphertext vector, e_iRepresenting the error vector for the ith element in the training data set X. At this time, the ciphertext vector c can be regarded as a result of encrypting the data BX by the key matrix BS, and the key transformation matrix M e can be calculatedZ^(m ^{’+1)×m’l}. Converting the key matrix BS into S' ∈ Z^{m’×(m’+1)}Then, the new ciphertext vector c ═ Mc. Wherein Z is^{(m’+1)×m’l}Represents a set of vectors with dimensions (m '+ 1) x m' l in a finite field Z, Z^{m’×(m’+1)}Represents a set of vectors with dimensions m '× (m' +1) within the finite field Z.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. in the invention, under a VHE homomorphic encryption scheme, a training data set is converted into a ciphertext data set, linear kernel function calculation in an original SMO algorithm is converted into ciphertext linear transformation operation in the VHE, so that the linear kernel function calculation process is carried out under a ciphertext, all operations related to the training data set in the SMO algorithm are expressed by the kernel function, namely, only the confidentiality of the linear kernel function is required to be ensured, and other operations in the SMO cannot leak the information of the training data set. Therefore, all information in the training data set is always encrypted during the whole training process, and no information of the original data can be obtained by anyone except the encryption party who owns the key, namely the data owner. In the process of SMO calculation, only a ciphertext data set can be obtained for a server side for calculation, so that the cloud side can not deduce any valuable information from the original data related set from the ciphertext owned by the cloud side, and the privacy of a data owner is fully protected.

2. In the invention, the VHE homomorphic encryption algorithm only encrypts integer vectors, in order to enable the training data set to meet the encryption requirement of the VHE homomorphic encryption algorithm, under the condition that decimal data are stored in the training data set, the data in the training data set are integrally amplified before encryption, so that the data with decimal parts are changed into integers, and then encryption and ciphertext linear transformation operations are carried out. And after the linear transformation is carried out, the plaintext is decrypted according to the previous magnification factor, and then the plaintext is reduced, so that the data is restored to obtain a correct value, namely a plaintext linear kernel function matrix. Under the condition of ensuring the calculation precision, the calculation result of the ciphertext kernel function is consistent with the calculation result of the plaintext kernel function, namely the training result of the ciphertext SMO is the same as the training result of the plaintext SMO.

3. In the invention, the operations related to the training data set can be converted into the calculation of the kernel function, all the calculations related to the kernel function are classified into the product of a matrix in the ciphertext SMO algorithm, the calculation of all the kernel functions is completed at one time, then a kernel function matrix is obtained, and then the result in the kernel function matrix is only needed to be directly used, so that the vector calculation efficiency of the VHE homomorphic encryption algorithm is higher than that of other homomorphic encryption algorithms. And as the kernel functions are calculated uniformly, the encryption, the ciphertext calculation and the decryption are all calculated once, so that the time overhead of the ciphertext SMO algorithm is ensured to be small.

4. In the invention, the user and the server carry out three times of interaction, the first time is that the user sends the data tag vector, the ciphertext data set matrix and the ciphertext conversion matrix to the server, and the complexity of the communication space is approximately O (mn). The second time, the server returns the ciphertext linear kernel function matrix to the user, the user decrypts the ciphertext linear kernel function matrix and uploads the decrypted plaintext linear kernel function matrix to the server, and the communication space complexity is approximately O (m)²). And thirdly, the server sends the trained Lagrange coefficient vector and the closing offset constant to the user, and the communication space complexity is approximately O (m). In the case of more data dimensions, i.e., n > m, the total communication space complexity is O (n). In the case of a large amount of data, i.e., m > n, the total communication space complexity is O (m)²). If the data dimension and the data amount are equivalent, the total communication space complexity is O (m)²+ mn). The communication space complexity is in polynomial level, and the efficiency of finishing training data is improved.

5. In the invention, by setting reasonable system parameters of the VHE, the operation of converting kernel function calculation into ciphertext linear transformation in the VHE by the ciphertext SMO algorithm can ensure the correctness of the kernel function calculation, and the calculation of the kernel function matrix under the ciphertext ensures the confidentiality of training data, so that the correct classification of the training data can be completed. Compared with other homomorphic encryption schemes, the algorithm has the advantages that under the optimized parameters, the time growth rate is smaller, and the speed is higher.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

A privacy protection linear SVM model training algorithm based on vector homomorphic encryption is characterized in that a training data set X is a matrix formed by t z-dimensional vectors, each piece of data of the training data set X has a label value representing a category, and the label values are arranged according to the arrangement sequence of each piece of data in the training data set X to form a data label vector y [ [ y ] ]₁,…,y_t]. The linear SVM model training algorithm comprises the following steps:

step 1: and the user encrypts the training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to the server. The method comprises the following steps:

s1.1. initializing a key matrix S [ S ]₁₁,…,S_wv]。

The user first determines the number of rows and columns of the key matrix S and then randomly generates the values in the matrix.

S1.2, a cipher text data set matrix X is obtained by encrypting a training data set X through a key matrix S_e。

And if the data in the training data set X has the decimal, amplifying the training data set X by a certain factor l to convert the decimal in the training data set into an integer. Then, each column of data of the training data set X is regarded as an integer vector, and the integer vector is encrypted according to columns

Obtaining a ciphertext vector set

Encryption of training data set X using vector-based homomorphic encryption scheme VHEUsing the key matrix S to train each piece of data X in the data set X_i(i is more than or equal to 1 and less than or equal to t) is encrypted, and the encrypted ciphertext vector

p represents a prime number, m represents data x_iThe length of (a) of (b),

is shown in a finite field Z_pA set of vectors of internal length m, w represents a large integer parameter, e represents an error vector and | e | < w/2, S represents a key matrix and

q is a prime number and

is shown in a finite field Z_qA set of vectors of internal length n,

is shown in a finite field Z_qA set of matrices with an inner dimension of m x n. In possession of the key matrix S, the decryption operation on the ciphertext vector c is a calculation

Wherein "a")_qRepresenting the nearest integer of a modulo q.

And S1.3, calculating a ciphertext conversion matrix M according to the transpose matrix G of the training data set X.

The transpose matrix of the training data set X is counted as G, which is regarded as a linear transformation matrix in VHE, and a key transformation matrix M of the transpose matrix G of the training data set X is calculated.

The calculation of the ciphertext transformation matrix M comprises the following steps:

s1.3.1. according to | c | < 2^lSelecting a matched adaptive integer l, and combining each item c in the ciphertext vector c_i(i is more than or equal to 1 and less than or equal to n) is converted into a binary expression to obtain a transcoding ciphertext vector c ^ c₁,…,c^_n]Wherein, c ^ c_i＝[c^_i(l-1),…,c^_i1,c^_i0]，c^_ij＝{- 1,0,1}；

Wherein Z is^nlRepresenting a vector set with length nl in a finite field Z;

A∈Z^(n’-m)×nln 'denotes the length of the new ciphertext vector c', Z^n’×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z^(n’-m)×nlRepresents a set of vectors with dimension (n' -m) x nl in the finite field Z.

If definition D ═ Mc^*If S' D is S^*c^*+Ec^*. Due to | c^*The random noise matrix E may be small, the new error vector E' Ec^*The value of (c) may be small.

The homomorphic addition and linear conversion are satisfied by adopting a vector-based homomorphic encryption scheme VHE.

The specific contents of the linear transformation are as follows: using an arbitrary matrix B ∈ Z^m’×nFor data x_iCalculating Bx_iWhen (BS) c is wBx_i+Be_iWhere m represents the length of the ith data in the training data set X, and m' represents the data X after linear conversion_iS denotes a key matrix, c denotes a ciphertext vector, e_iRepresenting the error vector for the ith element in the training data set X. At this time, the ciphertext vector c can be regarded as a result of encrypting the data BX by the key matrix BS, and the key conversion matrix M e Z can be calculated^(m ^{’+1)×m’l}. Converting the key matrix BS into S' ∈ Z^{m’×(m’+1)}Then, the new ciphertext vector c ═ Mc. Wherein Z is^{(m’+1)×m’l}Represents a set of vectors with dimensions (m '+ 1) x m' l in a finite field Z, Z^{m’×(m’+1)}Represents a set of vectors with dimensions m '× (m' +1) within the finite field Z.

S1.4. mark each piece of data in training data set XData label vector y formed by labels and ciphertext data set matrix X_eAnd the ciphertext transformation matrix M is sent to the server.

Step 2: and the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user.

The server receives the ciphertext data set matrix X_eAfter the ciphertext conversion matrix M and the data label vector y are converted, a linear kernel function is calculated, namely M.X is calculated_eCalculating the calculation result as a ciphertext linear kernel function matrix K_eThen, the linear kernel function matrix K of the ciphertext is processed_eAnd sent back to the user.

And step 3: and the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server.

The user receives the ciphertext linear kernel function matrix K_eThen, the key matrix S is used for decrypting the plaintext K ═ Dec (K)_eS). If the training data set X is amplified by l times before being encrypted, the plaintext k is reduced according to the amplification factor l, and the reduction result is counted as a plaintext linear kernel function matrix

And sending the plaintext linear kernel function matrix K to the server. And if the training data set X is not amplified, directly sending the plaintext K obtained by decryption to the server as the plaintext linear kernel function matrix K.

The server trains a ciphertext data set matrix X according to the data label vector y and the plaintext linear kernel function matrix K_eIf each piece of data in the training data set X meets the KKT condition, stopping training; otherwise, repeating the step 4.

Training ciphertext data set matrix X_eThe method comprises the following steps:

s4.1. the server receives the plaintext linear kernel functionAfter the matrix K, a Lagrangian coefficient vector alpha (alpha) is initialized₁,…,α_t) An offset constant b and a penalty coefficient C.

And

wherein alpha is_kRepresenting the value of the kth element, y, in the Lagrangian coefficient vector α_kDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, K_ikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, K_jkRepresents the value of the jth row and kth column of the plaintext linear kernel matrix K.

S4.3, order the elements to be collected

Computing

And

if it is

Then

If it is

Then

Wherein,

and

s4.4. element to be collected

And

the ith and jth elements of the vector alpha are replaced, respectively.

S4.5, making the element to be collected

And a second pending value

If it is

Then b is^new＝b_i(ii) a If it is

Then b is^new＝b_j(ii) a Otherwise

Wherein, the ciphertext data set matrix X_eEach sample point (x) in (b)_ei,y_i) KKT conditions that should be met include: 1)

2)

3)

wherein,

wherein alpha is_iRepresenting the value of the ith element in the lagrange coefficient vector alpha. If f (x)_i)>0, the data is classified as positive, otherwise as negative.

After training is finished, the Lagrange coefficient vector alpha and the receive-close offset constant b are sent to a user, and after the user receives the Lagrange coefficient vector alpha and the receive-close offset constant b, the ciphertext SMO algorithm is finished.

The mathematical form of the SVM model can be written as:

the SVM can be trained only by solving the mathematical programming model. Each component of the Lagrange coefficient vector alpha is a coefficient corresponding to the training data, and the penalty coefficient C limits the value range of alpha.

Example 2

On the basis of the first embodiment, when a polynomial kernel function containing a linear kernel function is processed, the polynomial kernel function is firstly split into two parts, namely the linear kernel function and a nonlinear kernel function. Firstly, the linear kernel function is calculated under the ciphertext to obtain a kernel function table of the plaintext, then 1 is added to the numerical value in the kernel function table under the plaintext, and then the calculation of the polynomial kernel function can be completed. Because the linear kernel function part is operated under the ciphertext, the training data set is in a secret state, and after the operation of adding 1 and the power is carried out, the server still cannot acquire the information of the training data set, the SVM model of the polynomial kernel function can also be trained under the ciphertext.

Example 3

On the basis of the first embodiment, when a gaussian kernel function containing a linear kernel function is processed, euclidean distances between any two training data are needed, and the VHE can calculate the distance between two vectors in a ciphertext, so that the calculation of the gaussian kernel function can be split into two parts, namely the linear kernel function and the gaussian function. The distance between any two vectors is first calculated in ciphertext and then the gaussian function is calculated in plaintext. Because the distance between the vectors is calculated in a ciphertext state, the training data is in a secret state, the server cannot reversely deduce specific values of the two vectors from the distance between the vectors, and even if the Gaussian function is calculated in a plaintext state, the server still cannot obtain information about the training data set.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A privacy protection linear SVM model training method based on vector homomorphic encryption is characterized by comprising the following steps:

and 4, step 4: the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user;

the step 1 comprises the following steps:

s1.1. initializing a key matrix S [ S ]₁₁，...，S_wv]W represents the row number of the key matrix S, and v represents the column number of the key matrix S;

s1.2, training data set X [ X ] through key matrix S₁，...，x_t]Encrypting to obtain a ciphertext data set matrix X_eT represents the number of vectors in the training data set X;

s1.4, a data tag vector y formed by tags of each piece of data in a training data set X and a ciphertext data set matrix X_eAnd the ciphertext conversion matrix M is sent to the server;

the specific content of the step 2 is as follows: the server according to the ciphertext data set matrix X_eCalculating a ciphertext linear kernel function matrix K by the ciphertext conversion matrix M_eAnd a linear kernel function matrix of the ciphertextK_eSending to the user;

the specific content of the step 3 is as follows: user linear kernel function matrix K for cipher text by using key matrix S_eDecrypting to obtain plaintext linear kernel function matrix K₁₁，...，K_mn]Sending the plaintext linear kernel function matrix K to the server;

the step 4 comprises the following steps:

s4.1, initializing Lagrange coefficient vector alpha (alpha)₁，...，α_t) An offset constant b and a penalty coefficient C;

s4.2. selecting a ciphertext data set matrix X_eTwo sample points (x) in_ei，y_i) And (x)_ej，y_j) As the adjustment points, corresponding error values are calculated

And

wherein alpha is_kRepresenting the value of the kth element, y, in the Lagrangian coefficient vector α_kDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, K_ikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, K_jkRepresenting the value of the jth row and kth column of the plaintext linear kernel function matrix K; s4.3, order the elements to be collected

Computing

And

if it is

Then

If it is

Then

Wherein,

and

respectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, y_iAnd y_jRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, E_iAnd E_jRespectively represent sample points (x)_ei，y_i) And (x)_ej，y_j) The error value of (1);

s4.4. element to be collected

And

respectively replacing the ith element and the jth element in the vector alpha;

s4.5, making the element to be collected

And a second pending value

If it is

Then b is^new＝b_i(ii) a If it is

Then b is^new＝b_j(ii) a Otherwise

2. The privacy-preserving linear SVM model training method based on vector homomorphic encryption according to claim 1, wherein the encryption in step S1.2 comprises the following specific contents:

using the key matrix S to train each piece of data X in the data set X_iPerforming encryption processing, and encrypting the encrypted ciphertext vector

And satisfy Sc_i＝wx_i+ e, where i represents the subscript, 1. ltoreq. i.ltoreq.t, n represents the length of the ciphertext vector c,

p represents a prime number, m represents data x_iThe length of (a) of (b),

is shown in a finite field Z_pSet of vectors of internal length m, w representsOne integer parameter, e denotes an error vector and | e | < w/2, S denotes a key matrix and

q is a prime number and

is shown in a finite field Z_qA set of vectors of internal length n,

3. The privacy-preserving linear SVM model training method based on vector homomorphic encryption according to claim 2, wherein the calculation of the ciphertext transformation matrix M in step S1.3 comprises the following steps:

s1.3.1. according to | c | < 2^lSelecting a matched adaptive integer l, and combining each item c in the ciphertext vector c_iConverting into binary expression to obtain the transcoding ciphertext vector c ^ c₁，...，c^_n]Wherein i is more than or equal to 1 and less than or equal to n, c ^ c_i＝[c^_i(l-1)，...，c^_i1，c^_i0]，c^_ij＝{-1，0，1}；

Wherein Z is^nlRepresenting a vector set with length nl in a finite field Z;

s1.3.3. Each item S of the Key matrix S_ijConversion into vector S^* _ij＝[2^l-1S_ij，2^l-2S_ij，...，2S_ij，S_ij]Obtaining an intermediate key matrix S^*And an intermediate key matrix S^*∈Z^w×vlWherein i is not less than 1 and not more than w, j is not less than 1 and not more than v, l represents an adaptive integer, Z^w×vlRepresenting a vector set with dimension w multiplied by vl in a finite field Z;

And M is as large as Z^n′×nlNew cipher text vector c ═ Mc^*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S^*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Z^m×nlQ represents a prime number and

p represents a prime number, A ∈ Z^(n′-m)×nlN 'denotes the length of the new ciphertext vector c', Z^n′×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z^(n′-m)×nlRepresents a set of vectors with dimension (n' -m) x nl in the finite field Z.

4. The privacy-preserving linear SVM model training method based on vector homomorphic encryption of claim 1, wherein in step S4.6, the ciphertext data set matrix X_eEach sample point (x) in (b)_ei，y_i) KKT conditions that should be met include: 1)

wherein,

5. The method of claim 2, wherein if there are decimal parts in the training data set X, the training data set X needs to be normalized before step S1.2 is executed, so that all data in the training data set X are integers, and in step S3, the decryption result needs to be normalized to obtain the plaintext linear kernel matrix K.

6. The privacy-preserving linear SVM model training method based on vector homomorphic encryption as claimed in claim 5, wherein the decryption operation on the ciphertext vector c is calculation under the condition of possessing the key matrix S

Wherein,

representing the nearest integer of a modulo q.