CN108521326B - Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption - Google Patents
Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption Download PDFInfo
- Publication number
- CN108521326B CN108521326B CN201810317657.9A CN201810317657A CN108521326B CN 108521326 B CN108521326 B CN 108521326B CN 201810317657 A CN201810317657 A CN 201810317657A CN 108521326 B CN108521326 B CN 108521326B
- Authority
- CN
- China
- Prior art keywords
- matrix
- vector
- ciphertext
- kernel function
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 239000013598 vector Substances 0.000 title claims abstract description 138
- 238000012549 training Methods 0.000 title claims abstract description 111
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000012706 support-vector machine Methods 0.000 title description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 178
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 26
- 238000004364 calculation method Methods 0.000 claims description 24
- 230000009466 transformation Effects 0.000 claims description 18
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 66
- 238000004891 communication Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 230000006978 adaptation Effects 0.000 description 3
- 101100379079 Emericella variicolor andA gene Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/96—Management of image or video recognition tasks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a privacy protection linear SVM model training method based on vector homomorphic encryption, which belongs to the field of information technology safety and comprises the following steps: step 1, a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server; step 2, the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user; step 3, the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server; and 4, training the plaintext linear kernel function matrix by the server by adopting a ciphertext SMO algorithm, and returning a training result to the user.
Description
Technical Field
The invention belongs to the field of information technology security, and particularly relates to a privacy protection linear SVM model training algorithm based on vector homomorphic encryption.
Background
A Support Vector Machine (SVM) is an important model for classification and regression analysis in Machine learning theory. The method comprises the steps of establishing a mathematical quadratic programming model, solving the model by using the existing training data, solving an optimal decision boundary, and then predicting the data by using the decision boundary to judge the category of the data. The core idea of the SVM is that under the condition that a group of training data and corresponding labels are given, the training data are regarded as points in a space, an interface is searched in the space, the interface can divide the training set data into two parts, the data of the same category are on the same side of the interface, and the different categories are separated through the interface. And then, under the condition of ensuring that all training data are correctly classified, the interface is far away from the training data point as far as possible, so that the reliability in prediction is ensured. When the SVM model carries out prediction, the trained interface is used for checking which side of the interface the prediction data is positioned on so as to judge the category of the data.
Compared with other classification algorithms, the SVM has the following advantages:
(1) the SVM can find an optimal interface by establishing a mathematical programming model, so that the reliability of a classification result is as high as possible.
(2) The SVM is not only suitable for linear classification, but also can be used for nonlinear classification by applying a kernel skill method. Then, the SVM has good robustness, and the determination of the interface after training is only related to the support vector, so that the influence of the addition and deletion training set on the training result is small.
(3) The SVM supports small sample learning, massive data are not needed for training an SVM model, and a classification model with a good effect can be trained by only needing a small number of data sets.
Because of its many advantages, SVM is commonly used in image recognition, text analysis, medical and financial fields. SVMs play an important role, especially in artificial intelligence emerging in recent years.
Training algorithms of SVM models are many, wherein the most common training algorithm is an SMO algorithm, which is a training algorithm specially proposed for the characteristics of SVM models, and compared with a general SVM training algorithm, the training speed is higher and the space requirement is less. Under the condition that the training data set is very huge, a large amount of time can be spent on training of the SVM, and a user generally selects to train the model on the cloud platform. However, the cloud platform itself is not necessarily trusted, and currently, many service providers mostly use cloud servers provided by the cloud platform, such as the arrhizus, the Tencent cloud, the Amazon cloud, and the like, and these cloud platform providers can monitor the cloud servers provided by the cloud platform, so as to obtain private data of an enterprise leasing the cloud services. On the other hand, from the perspective of the user, in order to use services on the internet, such as image recognition, text analysis, and the like, local personal data needs to be uploaded to the cloud, but the user data is visible to the service provider after being uploaded to the cloud, and the server can easily acquire the user data for other uses, such as buying and selling user information, but the user cannot prevent the user data from being used. Therefore, the cloud is not very trusted, and the user cannot know what the user data is specifically done by the cloud. Specifically, the problem of privacy safety is not considered in the SVM training algorithm, in the training process, the training data set is stored in the computer in a plaintext form, and if the cloud is not credible, the cloud can easily acquire the training data set during training, so that the privacy of a user is revealed.
Disclosure of Invention
The invention aims to: in order to solve the problem that privacy of a user is revealed because training data information does not have privacy in the process of training an SVM model on a cloud platform, a privacy protection linear SVM model training algorithm based on vector homomorphic encryption is provided.
The technical scheme adopted by the invention is as follows:
a privacy-protecting linear SVM model training algorithm based on vector homomorphic encryption comprises the following steps:
step 1: a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server;
step 2: the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user;
and step 3: the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server;
and 4, step 4: and the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user.
Further, the step 1 comprises the following steps:
s1.1. initializing a key matrix S [ S ]11,…,Swv];
S1.2, training data set X [ X ] through key matrix S1,…,xt]Encrypting to obtain a ciphertext data set matrix Xe;
S1.3, calculating a ciphertext conversion matrix M according to the transposition matrix G of the training data set X;
s1.4, a data tag vector y formed by tags of each piece of data in a training data set X and a ciphertext data set matrix XeAnd the ciphertext transformation matrix M is sent to the server.
Further, the specific content encrypted in step S1.2 is as follows:
using the key matrix S to train each piece of data X in the data set Xi(i is more than or equal to 1 and less than or equal to t) is encrypted, and the encrypted ciphertext vectorAnd satisfy Sci=wxi+ e, where i denotes the index, t denotes the number of vectors in the training data set X, n denotes the length of the ciphertext vector c,p represents a prime number, m represents data xiThe length of (a) of (b),is shown in a finite field ZpA set of vectors of internal length m, w represents an integer parameter, e represents an error vector and | e | < w/2, S represents a key matrix andq is a prime number andis shown in a finite field ZqA set of vectors of internal length n,is shown in a finite field ZqA set of matrices with an inner dimension of m x n.
Further, the calculation of the ciphertext transformation matrix M in step S1.3 comprises the following steps:
s1.3.1. according to | c | < 2lSelecting a matching adaptation integer l, and dividing each of the ciphertext vectors cAn item ci(i is more than or equal to 1 and less than or equal to n) is converted into a binary expression to obtain a transcoding ciphertext vector c ^ c1,…,c^n]Wherein, c ^ ci=[c^i(l-1),...,c^i1,c^i0],c^ij={- 1,0,1};
S1.3.2. making the intermediate ciphertext vector be c*And satisfy c*∈ZnlI.e. byWherein Z isnlRepresenting a vector set with length nl in a finite field Z;
s1.3.3. Each item S of the Key matrix Sij(i is not less than 1 and not more than w, i is not less than 1 and not more than v) is converted into a vector S* ij= [2l-1Sij,2l-2Sij,…,2Sij,Sij]Obtaining an intermediate key matrix S*And an intermediate key matrix S*∈Zw×vlWhere l denotes an adaptation integer, w denotes the number of rows of the key matrix S, v denotes the number of columns of the key matrix S, Zm×nlRepresenting a vector set with dimension m multiplied by nl in a finite field Z;
s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S*+ Emodq, to obtain the key translation matrixAnd M is as large as Zn’×nlNew cipher text vector c ═ Mc*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Zm×nlQ represents a prime number andA∈Z(n’-m)×nln 'denotes the length of the new ciphertext vector c', Zn’×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z(n’-m)×nlThe vector set representing the dimension (n' -m) x nl in the finite field ZAnd (6) mixing.
Further, the specific content of step 2 is: the server according to the ciphertext data set matrix XeCalculating a ciphertext linear kernel function matrix K by the ciphertext conversion matrix MeAnd the linear kernel function matrix K of the cipher texteAnd sent to the user.
Further, the specific content of step 3 is: user linear kernel function matrix K for cipher text by using key matrix SeDecrypting to obtain plaintext linear kernel function matrix K11,…,Kmn]And sending the plaintext linear kernel function matrix K to the server.
Further, the step 4 comprises the following steps:
s4.1, initializing Lagrange coefficient vector alpha (alpha)1,…,αt) An offset constant b and a penalty coefficient C;
s4.2. selecting a ciphertext data set matrix XeTwo sample points (x) inei,yi) And (x)ej,yj) As the adjustment points, corresponding error values are calculatedAndwherein alpha iskRepresenting the value of the kth element, y, in the Lagrangian coefficient vector αkDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, KikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, KjkRepresenting the value of the jth row and kth column of the plaintext linear kernel function matrix K;
s4.3, order the elements to be collectedComputing Andif it isThenIf it isThenWherein,andrespectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, yiAnd yjRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, EiAnd EjRespectively represent sample points (x)ei,yi) And (x)ej,yj) The error value of (1);
s4.4. element to be collectedAndrespectively replacing the ith element and the jth element in the vector alpha;
s4.5, making the element to be collectedLet the receive offset constant be bnewCalculating a first value to be determinedAnd a second pending value If it isThen b isnew=bi(ii) a If it isThen b isnew=bj(ii) a OtherwiseThen shift the receive off by a constant bnewReplaces the value of the offset constant b; wherein, KiiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, KijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, KjjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;
s4.6. judging ciphertext data set matrix XeWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.
Further, the ciphertext data set matrix XeEach sample point (x) in (b)ei,yi) KKT conditions that should be met include: 1)2)3)wherein, wherein alpha isiRepresenting the value of the ith element in the lagrange coefficient vector alpha.
Further, if there are decimal in the training data set X, before step S1.2 is executed, the training data set X needs to be normalized first, so that the data in the training data set X are all integers, and in step S3, the decryption result needs to be normalized to obtain the plaintext linear kernel function matrix K.
Further, in the case of possession of the key matrix S, the decryption operation on the ciphertext vector c is calculationWherein "a")qRepresenting the nearest integer of a modulo q.
Further, the vector-based homomorphic encryption scheme satisfies homomorphic addition and linear transformation.
The homomorphic addition comprises the following specific contents: when two plaintext cipher text pairs { X } consisting of elements in the training data set X and elements in the cipher text vector ci,ciAnd { x }j,cjS (c) is satisfied when the same key matrix S is presenti+cj)=w(xi+xj)+(ei+ ej) New ciphertext vector c ═ ci+cjNew error vector e ═ ei+ejWherein e isiAnd ejRespectively representing the error vectors of the ith and jth elements in the training data set X.
The specific contents of the linear transformation are as follows: using an arbitrary matrix B ∈ Zm’×nFor data xiCalculating BxiWhen (BS) c is wBxi+BeiWhere m represents the length of the ith data in the training data set X, and m' represents the data X after linear conversioniS denotes a key matrix, c denotes a ciphertext vector, eiRepresenting the error vector for the ith element in the training data set X. At this time, the ciphertext vector c can be regarded as a result of encrypting the data BX by the key matrix BS, and the key transformation matrix M e can be calculatedZ(m ’+1)×m’l. Converting the key matrix BS into S' ∈ Zm’×(m’+1)Then, the new ciphertext vector c ═ Mc. Wherein Z is(m’+1)×m’lRepresents a set of vectors with dimensions (m '+ 1) x m' l in a finite field Z, Zm’×(m’+1)Represents a set of vectors with dimensions m '× (m' +1) within the finite field Z.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. in the invention, under a VHE homomorphic encryption scheme, a training data set is converted into a ciphertext data set, linear kernel function calculation in an original SMO algorithm is converted into ciphertext linear transformation operation in the VHE, so that the linear kernel function calculation process is carried out under a ciphertext, all operations related to the training data set in the SMO algorithm are expressed by the kernel function, namely, only the confidentiality of the linear kernel function is required to be ensured, and other operations in the SMO cannot leak the information of the training data set. Therefore, all information in the training data set is always encrypted during the whole training process, and no information of the original data can be obtained by anyone except the encryption party who owns the key, namely the data owner. In the process of SMO calculation, only a ciphertext data set can be obtained for a server side for calculation, so that the cloud side can not deduce any valuable information from the original data related set from the ciphertext owned by the cloud side, and the privacy of a data owner is fully protected.
2. In the invention, the VHE homomorphic encryption algorithm only encrypts integer vectors, in order to enable the training data set to meet the encryption requirement of the VHE homomorphic encryption algorithm, under the condition that decimal data are stored in the training data set, the data in the training data set are integrally amplified before encryption, so that the data with decimal parts are changed into integers, and then encryption and ciphertext linear transformation operations are carried out. And after the linear transformation is carried out, the plaintext is decrypted according to the previous magnification factor, and then the plaintext is reduced, so that the data is restored to obtain a correct value, namely a plaintext linear kernel function matrix. Under the condition of ensuring the calculation precision, the calculation result of the ciphertext kernel function is consistent with the calculation result of the plaintext kernel function, namely the training result of the ciphertext SMO is the same as the training result of the plaintext SMO.
3. In the invention, the operations related to the training data set can be converted into the calculation of the kernel function, all the calculations related to the kernel function are classified into the product of a matrix in the ciphertext SMO algorithm, the calculation of all the kernel functions is completed at one time, then a kernel function matrix is obtained, and then the result in the kernel function matrix is only needed to be directly used, so that the vector calculation efficiency of the VHE homomorphic encryption algorithm is higher than that of other homomorphic encryption algorithms. And as the kernel functions are calculated uniformly, the encryption, the ciphertext calculation and the decryption are all calculated once, so that the time overhead of the ciphertext SMO algorithm is ensured to be small.
4. In the invention, the user and the server carry out three times of interaction, the first time is that the user sends the data tag vector, the ciphertext data set matrix and the ciphertext conversion matrix to the server, and the complexity of the communication space is approximately O (mn). The second time, the server returns the ciphertext linear kernel function matrix to the user, the user decrypts the ciphertext linear kernel function matrix and uploads the decrypted plaintext linear kernel function matrix to the server, and the communication space complexity is approximately O (m)2). And thirdly, the server sends the trained Lagrange coefficient vector and the closing offset constant to the user, and the communication space complexity is approximately O (m). In the case of more data dimensions, i.e., n > m, the total communication space complexity is O (n). In the case of a large amount of data, i.e., m > n, the total communication space complexity is O (m)2). If the data dimension and the data amount are equivalent, the total communication space complexity is O (m)2+ mn). The communication space complexity is in polynomial level, and the efficiency of finishing training data is improved.
5. In the invention, by setting reasonable system parameters of the VHE, the operation of converting kernel function calculation into ciphertext linear transformation in the VHE by the ciphertext SMO algorithm can ensure the correctness of the kernel function calculation, and the calculation of the kernel function matrix under the ciphertext ensures the confidentiality of training data, so that the correct classification of the training data can be completed. Compared with other homomorphic encryption schemes, the algorithm has the advantages that under the optimized parameters, the time growth rate is smaller, and the speed is higher.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A privacy protection linear SVM model training algorithm based on vector homomorphic encryption is characterized in that a training data set X is a matrix formed by t z-dimensional vectors, each piece of data of the training data set X has a label value representing a category, and the label values are arranged according to the arrangement sequence of each piece of data in the training data set X to form a data label vector y [ [ y ] ]1,…,yt]. The linear SVM model training algorithm comprises the following steps:
step 1: and the user encrypts the training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to the server. The method comprises the following steps:
s1.1. initializing a key matrix S [ S ]11,…,Swv]。
The user first determines the number of rows and columns of the key matrix S and then randomly generates the values in the matrix.
S1.2, a cipher text data set matrix X is obtained by encrypting a training data set X through a key matrix Se。
And if the data in the training data set X has the decimal, amplifying the training data set X by a certain factor l to convert the decimal in the training data set into an integer. Then, each column of data of the training data set X is regarded as an integer vector, and the integer vector is encrypted according to columnsObtaining a ciphertext vector set
Encryption of training data set X using vector-based homomorphic encryption scheme VHEUsing the key matrix S to train each piece of data X in the data set Xi(i is more than or equal to 1 and less than or equal to t) is encrypted, and the encrypted ciphertext vectorAnd satisfy Sci=wxi+ e, where i denotes the index, t denotes the number of vectors in the training data set X, n denotes the length of the ciphertext vector c,p represents a prime number, m represents data xiThe length of (a) of (b),is shown in a finite field ZpA set of vectors of internal length m, w represents a large integer parameter, e represents an error vector and | e | < w/2, S represents a key matrix andq is a prime number andis shown in a finite field ZqA set of vectors of internal length n,is shown in a finite field ZqA set of matrices with an inner dimension of m x n. In possession of the key matrix S, the decryption operation on the ciphertext vector c is a calculationWherein "a")qRepresenting the nearest integer of a modulo q.
And S1.3, calculating a ciphertext conversion matrix M according to the transpose matrix G of the training data set X.
The transpose matrix of the training data set X is counted as G, which is regarded as a linear transformation matrix in VHE, and a key transformation matrix M of the transpose matrix G of the training data set X is calculated.
The calculation of the ciphertext transformation matrix M comprises the following steps:
s1.3.1. according to | c | < 2lSelecting a matched adaptive integer l, and combining each item c in the ciphertext vector ci(i is more than or equal to 1 and less than or equal to n) is converted into a binary expression to obtain a transcoding ciphertext vector c ^ c1,…,c^n]Wherein, c ^ ci=[c^i(l-1),…,c^i1,c^i0],c^ij={- 1,0,1};
S1.3.2. making the intermediate ciphertext vector be c*And satisfy c*∈ZnlI.e. byWherein Z isnlRepresenting a vector set with length nl in a finite field Z;
s1.3.3. Each item S of the Key matrix Sij(i is not less than 1 and not more than w, i is not less than 1 and not more than v) is converted into a vector S* ij= [2l-1Sij,2l-2Sij,…,2Sij,Sij]Obtaining an intermediate key matrix S*And an intermediate key matrix S*∈Zw×vlWhere l denotes an adaptation integer, w denotes the number of rows of the key matrix S, v denotes the number of columns of the key matrix S, Zm×nlRepresenting a vector set with dimension m multiplied by nl in a finite field Z;
s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S*+ Emodq, to obtain the key translation matrixAnd M is as large as Zn’×nlNew cipher text vector c ═ Mc*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Zm×nlQ represents a prime number andA∈Z(n’-m)×nln 'denotes the length of the new ciphertext vector c', Zn’×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z(n’-m)×nlRepresents a set of vectors with dimension (n' -m) x nl in the finite field Z.
If definition D ═ Mc*If S' D is S*c*+Ec*. Due to | c*The random noise matrix E may be small, the new error vector E' Ec*The value of (c) may be small.
The homomorphic addition and linear conversion are satisfied by adopting a vector-based homomorphic encryption scheme VHE.
The homomorphic addition comprises the following specific contents: when two plaintext cipher text pairs { X } consisting of elements in the training data set X and elements in the cipher text vector ci,ciAnd { x }j,cjS (c) is satisfied when the same key matrix S is presenti+cj)=w(xi+xj)+(ei+ ej) New ciphertext vector c ═ ci+cjNew error vector e ═ ei+ejWherein e isiAnd ejRespectively representing the error vectors of the ith and jth elements in the training data set X.
The specific contents of the linear transformation are as follows: using an arbitrary matrix B ∈ Zm’×nFor data xiCalculating BxiWhen (BS) c is wBxi+BeiWhere m represents the length of the ith data in the training data set X, and m' represents the data X after linear conversioniS denotes a key matrix, c denotes a ciphertext vector, eiRepresenting the error vector for the ith element in the training data set X. At this time, the ciphertext vector c can be regarded as a result of encrypting the data BX by the key matrix BS, and the key conversion matrix M e Z can be calculated(m ’+1)×m’l. Converting the key matrix BS into S' ∈ Zm’×(m’+1)Then, the new ciphertext vector c ═ Mc. Wherein Z is(m’+1)×m’lRepresents a set of vectors with dimensions (m '+ 1) x m' l in a finite field Z, Zm’×(m’+1)Represents a set of vectors with dimensions m '× (m' +1) within the finite field Z.
S1.4. mark each piece of data in training data set XData label vector y formed by labels and ciphertext data set matrix XeAnd the ciphertext transformation matrix M is sent to the server.
Step 2: and the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user.
The server receives the ciphertext data set matrix XeAfter the ciphertext conversion matrix M and the data label vector y are converted, a linear kernel function is calculated, namely M.X is calculatedeCalculating the calculation result as a ciphertext linear kernel function matrix KeThen, the linear kernel function matrix K of the ciphertext is processedeAnd sent back to the user.
And step 3: and the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server.
The user receives the ciphertext linear kernel function matrix KeThen, the key matrix S is used for decrypting the plaintext K ═ Dec (K)eS). If the training data set X is amplified by l times before being encrypted, the plaintext k is reduced according to the amplification factor l, and the reduction result is counted as a plaintext linear kernel function matrixAnd sending the plaintext linear kernel function matrix K to the server. And if the training data set X is not amplified, directly sending the plaintext K obtained by decryption to the server as the plaintext linear kernel function matrix K.
And 4, step 4: and the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user.
The server trains a ciphertext data set matrix X according to the data label vector y and the plaintext linear kernel function matrix KeIf each piece of data in the training data set X meets the KKT condition, stopping training; otherwise, repeating the step 4.
Training ciphertext data set matrix XeThe method comprises the following steps:
s4.1. the server receives the plaintext linear kernel functionAfter the matrix K, a Lagrangian coefficient vector alpha (alpha) is initialized1,…,αt) An offset constant b and a penalty coefficient C.
S4.2. selecting a ciphertext data set matrix XeTwo sample points (x) inei,yi) And (x)ej,yj) As the adjustment points, corresponding error values are calculatedAndwherein alpha iskRepresenting the value of the kth element, y, in the Lagrangian coefficient vector αkDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, KikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, KjkRepresents the value of the jth row and kth column of the plaintext linear kernel matrix K.
S4.3, order the elements to be collectedComputing Andif it isThenIf it isThenWherein,andrespectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, yiAnd yjRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, EiAnd EjRespectively represent sample points (x)ei,yi) And (x)ej,yj) The error value of (1);
s4.4. element to be collectedAndthe ith and jth elements of the vector alpha are replaced, respectively.
S4.5, making the element to be collectedLet the receive offset constant be bnewCalculating a first value to be determinedAnd a second pending value If it isThen b isnew=bi(ii) a If it isThen b isnew=bj(ii) a OtherwiseThen shift the receive off by a constant bnewReplaces the value of the offset constant b; wherein, KiiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, KijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, KjjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;
s4.6. judging ciphertext data set matrix XeWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.
Wherein, the ciphertext data set matrix XeEach sample point (x) in (b)ei,yi) KKT conditions that should be met include: 1) 2)3)wherein, wherein alpha isiRepresenting the value of the ith element in the lagrange coefficient vector alpha. If f (x)i)>0, the data is classified as positive, otherwise as negative.
After training is finished, the Lagrange coefficient vector alpha and the receive-close offset constant b are sent to a user, and after the user receives the Lagrange coefficient vector alpha and the receive-close offset constant b, the ciphertext SMO algorithm is finished.
The mathematical form of the SVM model can be written as:
the SVM can be trained only by solving the mathematical programming model. Each component of the Lagrange coefficient vector alpha is a coefficient corresponding to the training data, and the penalty coefficient C limits the value range of alpha.
Example 2
On the basis of the first embodiment, when a polynomial kernel function containing a linear kernel function is processed, the polynomial kernel function is firstly split into two parts, namely the linear kernel function and a nonlinear kernel function. Firstly, the linear kernel function is calculated under the ciphertext to obtain a kernel function table of the plaintext, then 1 is added to the numerical value in the kernel function table under the plaintext, and then the calculation of the polynomial kernel function can be completed. Because the linear kernel function part is operated under the ciphertext, the training data set is in a secret state, and after the operation of adding 1 and the power is carried out, the server still cannot acquire the information of the training data set, the SVM model of the polynomial kernel function can also be trained under the ciphertext.
Example 3
On the basis of the first embodiment, when a gaussian kernel function containing a linear kernel function is processed, euclidean distances between any two training data are needed, and the VHE can calculate the distance between two vectors in a ciphertext, so that the calculation of the gaussian kernel function can be split into two parts, namely the linear kernel function and the gaussian function. The distance between any two vectors is first calculated in ciphertext and then the gaussian function is calculated in plaintext. Because the distance between the vectors is calculated in a ciphertext state, the training data is in a secret state, the server cannot reversely deduce specific values of the two vectors from the distance between the vectors, and even if the Gaussian function is calculated in a plaintext state, the server still cannot obtain information about the training data set.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (6)
1. A privacy protection linear SVM model training method based on vector homomorphic encryption is characterized by comprising the following steps:
step 1: a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server;
step 2: the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user;
and step 3: the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server;
and 4, step 4: the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user;
the step 1 comprises the following steps:
s1.1. initializing a key matrix S [ S ]11,...,Swv]W represents the row number of the key matrix S, and v represents the column number of the key matrix S;
s1.2, training data set X [ X ] through key matrix S1,...,xt]Encrypting to obtain a ciphertext data set matrix XeT represents the number of vectors in the training data set X;
s1.3, calculating a ciphertext conversion matrix M according to the transposition matrix G of the training data set X;
s1.4, a data tag vector y formed by tags of each piece of data in a training data set X and a ciphertext data set matrix XeAnd the ciphertext conversion matrix M is sent to the server;
the specific content of the step 2 is as follows: the server according to the ciphertext data set matrix XeCalculating a ciphertext linear kernel function matrix K by the ciphertext conversion matrix MeAnd a linear kernel function matrix of the ciphertextKeSending to the user;
the specific content of the step 3 is as follows: user linear kernel function matrix K for cipher text by using key matrix SeDecrypting to obtain plaintext linear kernel function matrix K11,...,Kmn]Sending the plaintext linear kernel function matrix K to the server;
the step 4 comprises the following steps:
s4.1, initializing Lagrange coefficient vector alpha (alpha)1,...,αt) An offset constant b and a penalty coefficient C;
s4.2. selecting a ciphertext data set matrix XeTwo sample points (x) inei,yi) And (x)ej,yj) As the adjustment points, corresponding error values are calculatedAndwherein alpha iskRepresenting the value of the kth element, y, in the Lagrangian coefficient vector αkDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, KikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, KjkRepresenting the value of the jth row and kth column of the plaintext linear kernel function matrix K; s4.3, order the elements to be collectedComputingAndif it isThenIf it isThenWherein,andrespectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, yiAnd yjRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, EiAnd EjRespectively represent sample points (x)ei,yi) And (x)ej,yj) The error value of (1);
s4.4. element to be collectedAndrespectively replacing the ith element and the jth element in the vector alpha;
s4.5, making the element to be collectedLet the receive offset constant be bnewCalculating a first value to be determinedAnd a second pending valueIf it isThen b isnew=bi(ii) a If it isThen b isnew=bj(ii) a OtherwiseThen shift the receive off by a constant bnewReplaces the value of the offset constant b; wherein, KiiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, KijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, KjjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;
s4.6. judging ciphertext data set matrix XeWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.
2. The privacy-preserving linear SVM model training method based on vector homomorphic encryption according to claim 1, wherein the encryption in step S1.2 comprises the following specific contents:
using the key matrix S to train each piece of data X in the data set XiPerforming encryption processing, and encrypting the encrypted ciphertext vectorAnd satisfy Sci=wxi+ e, where i represents the subscript, 1. ltoreq. i.ltoreq.t, n represents the length of the ciphertext vector c,p represents a prime number, m represents data xiThe length of (a) of (b),is shown in a finite field ZpSet of vectors of internal length m, w representsOne integer parameter, e denotes an error vector and | e | < w/2, S denotes a key matrix andq is a prime number and is shown in a finite field ZqA set of vectors of internal length n,is shown in a finite field ZqA set of matrices with an inner dimension of m x n.
3. The privacy-preserving linear SVM model training method based on vector homomorphic encryption according to claim 2, wherein the calculation of the ciphertext transformation matrix M in step S1.3 comprises the following steps:
s1.3.1. according to | c | < 2lSelecting a matched adaptive integer l, and combining each item c in the ciphertext vector ciConverting into binary expression to obtain the transcoding ciphertext vector c ^ c1,...,c^n]Wherein i is more than or equal to 1 and less than or equal to n, c ^ ci=[c^i(l-1),...,c^i1,c^i0],c^ij={-1,0,1};
S1.3.2. making the intermediate ciphertext vector be c*And satisfy c*∈ZnlI.e. byWherein Z isnlRepresenting a vector set with length nl in a finite field Z;
s1.3.3. Each item S of the Key matrix SijConversion into vector S* ij=[2l-1Sij,2l-2Sij,...,2Sij,Sij]Obtaining an intermediate key matrix S*And an intermediate key matrix S*∈Zw×vlWherein i is not less than 1 and not more than w, j is not less than 1 and not more than v, l represents an adaptive integer, Zw×vlRepresenting a vector set with dimension w multiplied by vl in a finite field Z;
s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S*+ Emodq, to obtain the key translation matrixAnd M is as large as Zn′×nlNew cipher text vector c ═ Mc*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Zm×nlQ represents a prime number andp represents a prime number, A ∈ Z(n′-m)×nlN 'denotes the length of the new ciphertext vector c', Zn′×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z(n′-m)×nlRepresents a set of vectors with dimension (n' -m) x nl in the finite field Z.
4. The privacy-preserving linear SVM model training method based on vector homomorphic encryption of claim 1, wherein in step S4.6, the ciphertext data set matrix XeEach sample point (x) in (b)ei,yi) KKT conditions that should be met include: 1)wherein,wherein alpha isiRepresenting the value of the ith element in the lagrange coefficient vector alpha.
5. The method of claim 2, wherein if there are decimal parts in the training data set X, the training data set X needs to be normalized before step S1.2 is executed, so that all data in the training data set X are integers, and in step S3, the decryption result needs to be normalized to obtain the plaintext linear kernel matrix K.
6. The privacy-preserving linear SVM model training method based on vector homomorphic encryption as claimed in claim 5, wherein the decryption operation on the ciphertext vector c is calculation under the condition of possessing the key matrix SWherein,representing the nearest integer of a modulo q.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810317657.9A CN108521326B (en) | 2018-04-10 | 2018-04-10 | Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810317657.9A CN108521326B (en) | 2018-04-10 | 2018-04-10 | Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108521326A CN108521326A (en) | 2018-09-11 |
CN108521326B true CN108521326B (en) | 2021-02-19 |
Family
ID=63431981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810317657.9A Expired - Fee Related CN108521326B (en) | 2018-04-10 | 2018-04-10 | Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108521326B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108650269A (en) * | 2018-05-16 | 2018-10-12 | 中国科学技术大学 | A kind of graded encryption method and system based on intensified learning |
CN109359588B (en) * | 2018-10-15 | 2021-02-09 | 电子科技大学 | Novel privacy protection non-interactive K nearest neighbor classification method |
CN110059501B (en) * | 2019-04-16 | 2021-02-02 | 广州大学 | Safe outsourcing machine learning method based on differential privacy |
CN110190946B (en) * | 2019-07-12 | 2021-09-03 | 之江实验室 | Privacy protection multi-organization data classification method based on homomorphic encryption |
CN111104968B (en) * | 2019-12-02 | 2023-04-18 | 北京理工大学 | Safety SVM training method based on block chain |
CN111291781B (en) * | 2020-01-09 | 2022-05-27 | 浙江理工大学 | Encrypted image classification method based on support vector machine |
WO2021184346A1 (en) * | 2020-03-20 | 2021-09-23 | 云图技术有限公司 | Private machine learning model generation and training methods, apparatus, and electronic device |
CN111797907B (en) * | 2020-06-16 | 2023-02-03 | 武汉大学 | Safe and efficient SVM privacy protection training and classification method for medical Internet of things |
US11599806B2 (en) | 2020-06-22 | 2023-03-07 | International Business Machines Corporation | Depth-constrained knowledge distillation for inference on encrypted data |
CN112152806B (en) * | 2020-09-25 | 2023-07-18 | 青岛大学 | Cloud-assisted image identification method, device and equipment supporting privacy protection |
CN112669068B (en) * | 2020-12-28 | 2024-05-14 | 深圳前海用友力合科技服务有限公司 | Market research data transmission method and system based on big data |
CN113706323A (en) * | 2021-09-02 | 2021-11-26 | 杭州电子科技大学 | Automatic insurance policy claim settlement method based on zero knowledge proof |
CN115174035A (en) * | 2022-06-30 | 2022-10-11 | 蚂蚁区块链科技(上海)有限公司 | Data processing method and device |
CN116910818B (en) * | 2023-09-13 | 2023-11-21 | 北京数牍科技有限公司 | Data processing method, device, equipment and storage medium based on privacy protection |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105577357A (en) * | 2015-12-21 | 2016-05-11 | 东南大学 | Intelligent household data privacy protection method based on full homomorphic encryption |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105099693B (en) * | 2014-05-23 | 2018-10-19 | 华为技术有限公司 | A kind of transmission method and transmitting device |
WO2017065959A2 (en) * | 2015-09-25 | 2017-04-20 | Veracyte, Inc. | Methods and compositions that utilize transcriptome sequencing data in machine learning-based classification |
-
2018
- 2018-04-10 CN CN201810317657.9A patent/CN108521326B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105577357A (en) * | 2015-12-21 | 2016-05-11 | 东南大学 | Intelligent household data privacy protection method based on full homomorphic encryption |
Also Published As
Publication number | Publication date |
---|---|
CN108521326A (en) | 2018-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108521326B (en) | Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption | |
US11902413B2 (en) | Secure machine learning analytics using homomorphic encryption | |
CN112989368B (en) | Method and device for processing private data by combining multiple parties | |
CN105122721B (en) | For managing the method and system for being directed to the trustship of encryption data and calculating safely | |
US20200366459A1 (en) | Searching Over Encrypted Model and Encrypted Data Using Secure Single-and Multi-Party Learning Based on Encrypted Data | |
WO2016169346A1 (en) | Polynomial fully homomorphic encryption method and system based on coefficient mapping transform | |
US20150381349A1 (en) | Privacy-preserving ridge regression using masks | |
Liu et al. | Secure multi-label data classification in cloud by additionally homomorphic encryption | |
CN110635909B (en) | Attribute-based collusion attack resistant proxy re-encryption method | |
JP2014126865A (en) | Device and method for encryption processing | |
CN114696990B (en) | Multi-party computing method, system and related equipment based on fully homomorphic encryption | |
CN110611662B (en) | Attribute-based encryption-based fog collaborative cloud data sharing method | |
CN112906052B (en) | Aggregation method of multi-user gradient permutation in federated learning | |
CN112052466B (en) | Support vector machine user data prediction method based on multi-party secure computing protocol | |
CN112766514B (en) | Method, system and device for joint training of machine learning model | |
WO2014007296A1 (en) | Order-preserving encryption system, encryption device, decryption device, encryption method, decryption method, and programs thereof | |
CN113221153A (en) | Graph neural network training method and device, computing equipment and storage medium | |
Sharma et al. | Confidential boosting with random linear classifiers for outsourced user-generated data | |
CN116192358A (en) | Logistic regression method, device and system based on isomorphic encryption | |
WO2014030706A1 (en) | Encrypted database system, client device and server, method and program for adding encrypted data | |
Meng et al. | Privacy-preserving xgboost inference | |
Perusheska et al. | Deep learning-based cryptanalysis of different AES modes of operation | |
CN111737756B (en) | XGB model prediction method, device and system performed through two data owners | |
CN112507372B (en) | Method and device for realizing privacy protection of multi-party collaborative update model | |
CN111859440B (en) | Sample classification method of distributed privacy protection logistic regression model based on mixed protocol |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210219 |