CN108521326B - Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption - Google Patents

Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption Download PDF

Info

Publication number
CN108521326B
CN108521326B CN201810317657.9A CN201810317657A CN108521326B CN 108521326 B CN108521326 B CN 108521326B CN 201810317657 A CN201810317657 A CN 201810317657A CN 108521326 B CN108521326 B CN 108521326B
Authority
CN
China
Prior art keywords
matrix
vector
ciphertext
kernel function
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810317657.9A
Other languages
Chinese (zh)
Other versions
CN108521326A (en
Inventor
杨浩淼
从鑫
张可
黄云帆
何伟超
张有
李洪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810317657.9A priority Critical patent/CN108521326B/en
Publication of CN108521326A publication Critical patent/CN108521326A/en
Application granted granted Critical
Publication of CN108521326B publication Critical patent/CN108521326B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a privacy protection linear SVM model training method based on vector homomorphic encryption, which belongs to the field of information technology safety and comprises the following steps: step 1, a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server; step 2, the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user; step 3, the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server; and 4, training the plaintext linear kernel function matrix by the server by adopting a ciphertext SMO algorithm, and returning a training result to the user.

Description

Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption
Technical Field
The invention belongs to the field of information technology security, and particularly relates to a privacy protection linear SVM model training algorithm based on vector homomorphic encryption.
Background
A Support Vector Machine (SVM) is an important model for classification and regression analysis in Machine learning theory. The method comprises the steps of establishing a mathematical quadratic programming model, solving the model by using the existing training data, solving an optimal decision boundary, and then predicting the data by using the decision boundary to judge the category of the data. The core idea of the SVM is that under the condition that a group of training data and corresponding labels are given, the training data are regarded as points in a space, an interface is searched in the space, the interface can divide the training set data into two parts, the data of the same category are on the same side of the interface, and the different categories are separated through the interface. And then, under the condition of ensuring that all training data are correctly classified, the interface is far away from the training data point as far as possible, so that the reliability in prediction is ensured. When the SVM model carries out prediction, the trained interface is used for checking which side of the interface the prediction data is positioned on so as to judge the category of the data.
Compared with other classification algorithms, the SVM has the following advantages:
(1) the SVM can find an optimal interface by establishing a mathematical programming model, so that the reliability of a classification result is as high as possible.
(2) The SVM is not only suitable for linear classification, but also can be used for nonlinear classification by applying a kernel skill method. Then, the SVM has good robustness, and the determination of the interface after training is only related to the support vector, so that the influence of the addition and deletion training set on the training result is small.
(3) The SVM supports small sample learning, massive data are not needed for training an SVM model, and a classification model with a good effect can be trained by only needing a small number of data sets.
Because of its many advantages, SVM is commonly used in image recognition, text analysis, medical and financial fields. SVMs play an important role, especially in artificial intelligence emerging in recent years.
Training algorithms of SVM models are many, wherein the most common training algorithm is an SMO algorithm, which is a training algorithm specially proposed for the characteristics of SVM models, and compared with a general SVM training algorithm, the training speed is higher and the space requirement is less. Under the condition that the training data set is very huge, a large amount of time can be spent on training of the SVM, and a user generally selects to train the model on the cloud platform. However, the cloud platform itself is not necessarily trusted, and currently, many service providers mostly use cloud servers provided by the cloud platform, such as the arrhizus, the Tencent cloud, the Amazon cloud, and the like, and these cloud platform providers can monitor the cloud servers provided by the cloud platform, so as to obtain private data of an enterprise leasing the cloud services. On the other hand, from the perspective of the user, in order to use services on the internet, such as image recognition, text analysis, and the like, local personal data needs to be uploaded to the cloud, but the user data is visible to the service provider after being uploaded to the cloud, and the server can easily acquire the user data for other uses, such as buying and selling user information, but the user cannot prevent the user data from being used. Therefore, the cloud is not very trusted, and the user cannot know what the user data is specifically done by the cloud. Specifically, the problem of privacy safety is not considered in the SVM training algorithm, in the training process, the training data set is stored in the computer in a plaintext form, and if the cloud is not credible, the cloud can easily acquire the training data set during training, so that the privacy of a user is revealed.
Disclosure of Invention
The invention aims to: in order to solve the problem that privacy of a user is revealed because training data information does not have privacy in the process of training an SVM model on a cloud platform, a privacy protection linear SVM model training algorithm based on vector homomorphic encryption is provided.
The technical scheme adopted by the invention is as follows:
a privacy-protecting linear SVM model training algorithm based on vector homomorphic encryption comprises the following steps:
step 1: a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server;
step 2: the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user;
and step 3: the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server;
and 4, step 4: and the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user.
Further, the step 1 comprises the following steps:
s1.1. initializing a key matrix S [ S ]11,…,Swv];
S1.2, training data set X [ X ] through key matrix S1,…,xt]Encrypting to obtain a ciphertext data set matrix Xe
S1.3, calculating a ciphertext conversion matrix M according to the transposition matrix G of the training data set X;
s1.4, a data tag vector y formed by tags of each piece of data in a training data set X and a ciphertext data set matrix XeAnd the ciphertext transformation matrix M is sent to the server.
Further, the specific content encrypted in step S1.2 is as follows:
using the key matrix S to train each piece of data X in the data set Xi(i is more than or equal to 1 and less than or equal to t) is encrypted, and the encrypted ciphertext vector
Figure GDA0002622557500000031
And satisfy Sci=wxi+ e, where i denotes the index, t denotes the number of vectors in the training data set X, n denotes the length of the ciphertext vector c,
Figure GDA0002622557500000032
p represents a prime number, m represents data xiThe length of (a) of (b),
Figure GDA0002622557500000033
is shown in a finite field ZpA set of vectors of internal length m, w represents an integer parameter, e represents an error vector and | e | < w/2, S represents a key matrix and
Figure GDA0002622557500000034
q is a prime number and
Figure GDA0002622557500000035
is shown in a finite field ZqA set of vectors of internal length n,
Figure GDA0002622557500000036
is shown in a finite field ZqA set of matrices with an inner dimension of m x n.
Further, the calculation of the ciphertext transformation matrix M in step S1.3 comprises the following steps:
s1.3.1. according to | c | < 2lSelecting a matching adaptation integer l, and dividing each of the ciphertext vectors cAn item ci(i is more than or equal to 1 and less than or equal to n) is converted into a binary expression to obtain a transcoding ciphertext vector c ^ c1,…,c^n]Wherein, c ^ ci=[c^i(l-1),...,c^i1,c^i0],c^ij={- 1,0,1};
S1.3.2. making the intermediate ciphertext vector be c*And satisfy c*∈ZnlI.e. by
Figure GDA0002622557500000037
Wherein Z isnlRepresenting a vector set with length nl in a finite field Z;
s1.3.3. Each item S of the Key matrix Sij(i is not less than 1 and not more than w, i is not less than 1 and not more than v) is converted into a vector S* ij= [2l-1Sij,2l-2Sij,…,2Sij,Sij]Obtaining an intermediate key matrix S*And an intermediate key matrix S*∈Zw×vlWhere l denotes an adaptation integer, w denotes the number of rows of the key matrix S, v denotes the number of columns of the key matrix S, Zm×nlRepresenting a vector set with dimension m multiplied by nl in a finite field Z;
s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S*+ Emodq, to obtain the key translation matrix
Figure GDA0002622557500000038
And M is as large as Zn’×nlNew cipher text vector c ═ Mc*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Zm×nlQ represents a prime number and
Figure GDA00026225575000000422
A∈Z(n’-m)×nln 'denotes the length of the new ciphertext vector c', Zn’×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z(n’-m)×nlThe vector set representing the dimension (n' -m) x nl in the finite field ZAnd (6) mixing.
Further, the specific content of step 2 is: the server according to the ciphertext data set matrix XeCalculating a ciphertext linear kernel function matrix K by the ciphertext conversion matrix MeAnd the linear kernel function matrix K of the cipher texteAnd sent to the user.
Further, the specific content of step 3 is: user linear kernel function matrix K for cipher text by using key matrix SeDecrypting to obtain plaintext linear kernel function matrix K11,…,Kmn]And sending the plaintext linear kernel function matrix K to the server.
Further, the step 4 comprises the following steps:
s4.1, initializing Lagrange coefficient vector alpha (alpha)1,…,αt) An offset constant b and a penalty coefficient C;
s4.2. selecting a ciphertext data set matrix XeTwo sample points (x) inei,yi) And (x)ej,yj) As the adjustment points, corresponding error values are calculated
Figure GDA0002622557500000042
And
Figure GDA0002622557500000043
wherein alpha iskRepresenting the value of the kth element, y, in the Lagrangian coefficient vector αkDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, KikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, KjkRepresenting the value of the jth row and kth column of the plaintext linear kernel function matrix K;
s4.3, order the elements to be collected
Figure GDA0002622557500000044
Computing
Figure GDA0002622557500000045
Figure GDA0002622557500000046
And
Figure GDA0002622557500000047
if it is
Figure GDA0002622557500000048
Then
Figure GDA0002622557500000049
If it is
Figure GDA00026225575000000410
Then
Figure GDA00026225575000000411
Wherein the content of the first and second substances,
Figure GDA00026225575000000412
and
Figure GDA00026225575000000413
respectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, yiAnd yjRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, EiAnd EjRespectively represent sample points (x)ei,yi) And (x)ej,yj) The error value of (1);
s4.4. element to be collected
Figure GDA00026225575000000414
And
Figure GDA00026225575000000415
respectively replacing the ith element and the jth element in the vector alpha;
s4.5, making the element to be collected
Figure GDA00026225575000000416
Let the receive offset constant be bnewCalculating a first value to be determined
Figure GDA00026225575000000417
And a second pending value
Figure GDA00026225575000000418
Figure GDA00026225575000000419
If it is
Figure GDA00026225575000000420
Then b isnew=bi(ii) a If it is
Figure GDA00026225575000000421
Then b isnew=bj(ii) a Otherwise
Figure GDA0002622557500000051
Then shift the receive off by a constant bnewReplaces the value of the offset constant b; wherein, KiiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, KijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, KjjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;
s4.6. judging ciphertext data set matrix XeWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.
Further, the ciphertext data set matrix XeEach sample point (x) in (b)ei,yi) KKT conditions that should be met include: 1)
Figure GDA0002622557500000052
2)
Figure GDA0002622557500000053
3)
Figure GDA0002622557500000054
wherein the content of the first and second substances,
Figure GDA0002622557500000055
Figure GDA0002622557500000056
wherein alpha isiRepresenting the value of the ith element in the lagrange coefficient vector alpha.
Further, if there are decimal in the training data set X, before step S1.2 is executed, the training data set X needs to be normalized first, so that the data in the training data set X are all integers, and in step S3, the decryption result needs to be normalized to obtain the plaintext linear kernel function matrix K.
Further, in the case of possession of the key matrix S, the decryption operation on the ciphertext vector c is calculation
Figure GDA0002622557500000057
Wherein "a")qRepresenting the nearest integer of a modulo q.
Further, the vector-based homomorphic encryption scheme satisfies homomorphic addition and linear transformation.
The homomorphic addition comprises the following specific contents: when two plaintext cipher text pairs { X } consisting of elements in the training data set X and elements in the cipher text vector ci,ciAnd { x }j,cjS (c) is satisfied when the same key matrix S is presenti+cj)=w(xi+xj)+(ei+ ej) New ciphertext vector c ═ ci+cjNew error vector e ═ ei+ejWherein e isiAnd ejRespectively representing the error vectors of the ith and jth elements in the training data set X.
The specific contents of the linear transformation are as follows: using an arbitrary matrix B ∈ Zm’×nFor data xiCalculating BxiWhen (BS) c is wBxi+BeiWhere m represents the length of the ith data in the training data set X, and m' represents the data X after linear conversioniS denotes a key matrix, c denotes a ciphertext vector, eiRepresenting the error vector for the ith element in the training data set X. At this time, the ciphertext vector c can be regarded as a result of encrypting the data BX by the key matrix BS, and the key transformation matrix M e can be calculatedZ(m ’+1)×m’l. Converting the key matrix BS into S' ∈ Zm’×(m’+1)Then, the new ciphertext vector c ═ Mc. Wherein Z is(m’+1)×m’lRepresents a set of vectors with dimensions (m '+ 1) x m' l in a finite field Z, Zm’×(m’+1)Represents a set of vectors with dimensions m '× (m' +1) within the finite field Z.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. in the invention, under a VHE homomorphic encryption scheme, a training data set is converted into a ciphertext data set, linear kernel function calculation in an original SMO algorithm is converted into ciphertext linear transformation operation in the VHE, so that the linear kernel function calculation process is carried out under a ciphertext, all operations related to the training data set in the SMO algorithm are expressed by the kernel function, namely, only the confidentiality of the linear kernel function is required to be ensured, and other operations in the SMO cannot leak the information of the training data set. Therefore, all information in the training data set is always encrypted during the whole training process, and no information of the original data can be obtained by anyone except the encryption party who owns the key, namely the data owner. In the process of SMO calculation, only a ciphertext data set can be obtained for a server side for calculation, so that the cloud side can not deduce any valuable information from the original data related set from the ciphertext owned by the cloud side, and the privacy of a data owner is fully protected.
2. In the invention, the VHE homomorphic encryption algorithm only encrypts integer vectors, in order to enable the training data set to meet the encryption requirement of the VHE homomorphic encryption algorithm, under the condition that decimal data are stored in the training data set, the data in the training data set are integrally amplified before encryption, so that the data with decimal parts are changed into integers, and then encryption and ciphertext linear transformation operations are carried out. And after the linear transformation is carried out, the plaintext is decrypted according to the previous magnification factor, and then the plaintext is reduced, so that the data is restored to obtain a correct value, namely a plaintext linear kernel function matrix. Under the condition of ensuring the calculation precision, the calculation result of the ciphertext kernel function is consistent with the calculation result of the plaintext kernel function, namely the training result of the ciphertext SMO is the same as the training result of the plaintext SMO.
3. In the invention, the operations related to the training data set can be converted into the calculation of the kernel function, all the calculations related to the kernel function are classified into the product of a matrix in the ciphertext SMO algorithm, the calculation of all the kernel functions is completed at one time, then a kernel function matrix is obtained, and then the result in the kernel function matrix is only needed to be directly used, so that the vector calculation efficiency of the VHE homomorphic encryption algorithm is higher than that of other homomorphic encryption algorithms. And as the kernel functions are calculated uniformly, the encryption, the ciphertext calculation and the decryption are all calculated once, so that the time overhead of the ciphertext SMO algorithm is ensured to be small.
4. In the invention, the user and the server carry out three times of interaction, the first time is that the user sends the data tag vector, the ciphertext data set matrix and the ciphertext conversion matrix to the server, and the complexity of the communication space is approximately O (mn). The second time, the server returns the ciphertext linear kernel function matrix to the user, the user decrypts the ciphertext linear kernel function matrix and uploads the decrypted plaintext linear kernel function matrix to the server, and the communication space complexity is approximately O (m)2). And thirdly, the server sends the trained Lagrange coefficient vector and the closing offset constant to the user, and the communication space complexity is approximately O (m). In the case of more data dimensions, i.e., n > m, the total communication space complexity is O (n). In the case of a large amount of data, i.e., m > n, the total communication space complexity is O (m)2). If the data dimension and the data amount are equivalent, the total communication space complexity is O (m)2+ mn). The communication space complexity is in polynomial level, and the efficiency of finishing training data is improved.
5. In the invention, by setting reasonable system parameters of the VHE, the operation of converting kernel function calculation into ciphertext linear transformation in the VHE by the ciphertext SMO algorithm can ensure the correctness of the kernel function calculation, and the calculation of the kernel function matrix under the ciphertext ensures the confidentiality of training data, so that the correct classification of the training data can be completed. Compared with other homomorphic encryption schemes, the algorithm has the advantages that under the optimized parameters, the time growth rate is smaller, and the speed is higher.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A privacy protection linear SVM model training algorithm based on vector homomorphic encryption is characterized in that a training data set X is a matrix formed by t z-dimensional vectors, each piece of data of the training data set X has a label value representing a category, and the label values are arranged according to the arrangement sequence of each piece of data in the training data set X to form a data label vector y [ [ y ] ]1,…,yt]. The linear SVM model training algorithm comprises the following steps:
step 1: and the user encrypts the training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to the server. The method comprises the following steps:
s1.1. initializing a key matrix S [ S ]11,…,Swv]。
The user first determines the number of rows and columns of the key matrix S and then randomly generates the values in the matrix.
S1.2, a cipher text data set matrix X is obtained by encrypting a training data set X through a key matrix Se
And if the data in the training data set X has the decimal, amplifying the training data set X by a certain factor l to convert the decimal in the training data set into an integer. Then, each column of data of the training data set X is regarded as an integer vector, and the integer vector is encrypted according to columns
Figure GDA0002622557500000071
Obtaining a ciphertext vector set
Figure GDA0002622557500000072
Encryption of training data set X using vector-based homomorphic encryption scheme VHEUsing the key matrix S to train each piece of data X in the data set Xi(i is more than or equal to 1 and less than or equal to t) is encrypted, and the encrypted ciphertext vector
Figure GDA0002622557500000073
And satisfy Sci=wxi+ e, where i denotes the index, t denotes the number of vectors in the training data set X, n denotes the length of the ciphertext vector c,
Figure GDA0002622557500000074
p represents a prime number, m represents data xiThe length of (a) of (b),
Figure GDA0002622557500000075
is shown in a finite field ZpA set of vectors of internal length m, w represents a large integer parameter, e represents an error vector and | e | < w/2, S represents a key matrix and
Figure GDA0002622557500000081
q is a prime number and
Figure GDA0002622557500000082
is shown in a finite field ZqA set of vectors of internal length n,
Figure GDA0002622557500000083
is shown in a finite field ZqA set of matrices with an inner dimension of m x n. In possession of the key matrix S, the decryption operation on the ciphertext vector c is a calculation
Figure GDA0002622557500000084
Wherein "a")qRepresenting the nearest integer of a modulo q.
And S1.3, calculating a ciphertext conversion matrix M according to the transpose matrix G of the training data set X.
The transpose matrix of the training data set X is counted as G, which is regarded as a linear transformation matrix in VHE, and a key transformation matrix M of the transpose matrix G of the training data set X is calculated.
The calculation of the ciphertext transformation matrix M comprises the following steps:
s1.3.1. according to | c | < 2lSelecting a matched adaptive integer l, and combining each item c in the ciphertext vector ci(i is more than or equal to 1 and less than or equal to n) is converted into a binary expression to obtain a transcoding ciphertext vector c ^ c1,…,c^n]Wherein, c ^ ci=[c^i(l-1),…,c^i1,c^i0],c^ij={- 1,0,1};
S1.3.2. making the intermediate ciphertext vector be c*And satisfy c*∈ZnlI.e. by
Figure GDA0002622557500000085
Wherein Z isnlRepresenting a vector set with length nl in a finite field Z;
s1.3.3. Each item S of the Key matrix Sij(i is not less than 1 and not more than w, i is not less than 1 and not more than v) is converted into a vector S* ij= [2l-1Sij,2l-2Sij,…,2Sij,Sij]Obtaining an intermediate key matrix S*And an intermediate key matrix S*∈Zw×vlWhere l denotes an adaptation integer, w denotes the number of rows of the key matrix S, v denotes the number of columns of the key matrix S, Zm×nlRepresenting a vector set with dimension m multiplied by nl in a finite field Z;
s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S*+ Emodq, to obtain the key translation matrix
Figure GDA0002622557500000086
And M is as large as Zn’×nlNew cipher text vector c ═ Mc*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Zm×nlQ represents a prime number and
Figure GDA0002622557500000087
A∈Z(n’-m)×nln 'denotes the length of the new ciphertext vector c', Zn’×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z(n’-m)×nlRepresents a set of vectors with dimension (n' -m) x nl in the finite field Z.
If definition D ═ Mc*If S' D is S*c*+Ec*. Due to | c*The random noise matrix E may be small, the new error vector E' Ec*The value of (c) may be small.
The homomorphic addition and linear conversion are satisfied by adopting a vector-based homomorphic encryption scheme VHE.
The homomorphic addition comprises the following specific contents: when two plaintext cipher text pairs { X } consisting of elements in the training data set X and elements in the cipher text vector ci,ciAnd { x }j,cjS (c) is satisfied when the same key matrix S is presenti+cj)=w(xi+xj)+(ei+ ej) New ciphertext vector c ═ ci+cjNew error vector e ═ ei+ejWherein e isiAnd ejRespectively representing the error vectors of the ith and jth elements in the training data set X.
The specific contents of the linear transformation are as follows: using an arbitrary matrix B ∈ Zm’×nFor data xiCalculating BxiWhen (BS) c is wBxi+BeiWhere m represents the length of the ith data in the training data set X, and m' represents the data X after linear conversioniS denotes a key matrix, c denotes a ciphertext vector, eiRepresenting the error vector for the ith element in the training data set X. At this time, the ciphertext vector c can be regarded as a result of encrypting the data BX by the key matrix BS, and the key conversion matrix M e Z can be calculated(m ’+1)×m’l. Converting the key matrix BS into S' ∈ Zm’×(m’+1)Then, the new ciphertext vector c ═ Mc. Wherein Z is(m’+1)×m’lRepresents a set of vectors with dimensions (m '+ 1) x m' l in a finite field Z, Zm’×(m’+1)Represents a set of vectors with dimensions m '× (m' +1) within the finite field Z.
S1.4. mark each piece of data in training data set XData label vector y formed by labels and ciphertext data set matrix XeAnd the ciphertext transformation matrix M is sent to the server.
Step 2: and the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user.
The server receives the ciphertext data set matrix XeAfter the ciphertext conversion matrix M and the data label vector y are converted, a linear kernel function is calculated, namely M.X is calculatedeCalculating the calculation result as a ciphertext linear kernel function matrix KeThen, the linear kernel function matrix K of the ciphertext is processedeAnd sent back to the user.
And step 3: and the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server.
The user receives the ciphertext linear kernel function matrix KeThen, the key matrix S is used for decrypting the plaintext K ═ Dec (K)eS). If the training data set X is amplified by l times before being encrypted, the plaintext k is reduced according to the amplification factor l, and the reduction result is counted as a plaintext linear kernel function matrix
Figure GDA0002622557500000091
And sending the plaintext linear kernel function matrix K to the server. And if the training data set X is not amplified, directly sending the plaintext K obtained by decryption to the server as the plaintext linear kernel function matrix K.
And 4, step 4: and the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user.
The server trains a ciphertext data set matrix X according to the data label vector y and the plaintext linear kernel function matrix KeIf each piece of data in the training data set X meets the KKT condition, stopping training; otherwise, repeating the step 4.
Training ciphertext data set matrix XeThe method comprises the following steps:
s4.1. the server receives the plaintext linear kernel functionAfter the matrix K, a Lagrangian coefficient vector alpha (alpha) is initialized1,…,αt) An offset constant b and a penalty coefficient C.
S4.2. selecting a ciphertext data set matrix XeTwo sample points (x) inei,yi) And (x)ej,yj) As the adjustment points, corresponding error values are calculated
Figure GDA0002622557500000101
And
Figure GDA0002622557500000102
wherein alpha iskRepresenting the value of the kth element, y, in the Lagrangian coefficient vector αkDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, KikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, KjkRepresents the value of the jth row and kth column of the plaintext linear kernel matrix K.
S4.3, order the elements to be collected
Figure GDA0002622557500000103
Computing
Figure GDA0002622557500000104
Figure GDA0002622557500000105
And
Figure GDA0002622557500000106
if it is
Figure GDA0002622557500000107
Then
Figure GDA0002622557500000108
If it is
Figure GDA0002622557500000109
Then
Figure GDA00026225575000001010
Wherein the content of the first and second substances,
Figure GDA00026225575000001011
and
Figure GDA00026225575000001012
respectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, yiAnd yjRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, EiAnd EjRespectively represent sample points (x)ei,yi) And (x)ej,yj) The error value of (1);
s4.4. element to be collected
Figure GDA00026225575000001013
And
Figure GDA00026225575000001014
the ith and jth elements of the vector alpha are replaced, respectively.
S4.5, making the element to be collected
Figure GDA00026225575000001015
Let the receive offset constant be bnewCalculating a first value to be determined
Figure GDA00026225575000001016
And a second pending value
Figure GDA00026225575000001017
Figure GDA00026225575000001018
If it is
Figure GDA00026225575000001019
Then b isnew=bi(ii) a If it is
Figure GDA00026225575000001020
Then b isnew=bj(ii) a Otherwise
Figure GDA00026225575000001021
Then shift the receive off by a constant bnewReplaces the value of the offset constant b; wherein, KiiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, KijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, KjjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;
s4.6. judging ciphertext data set matrix XeWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.
Wherein, the ciphertext data set matrix XeEach sample point (x) in (b)ei,yi) KKT conditions that should be met include: 1)
Figure GDA0002622557500000111
Figure GDA0002622557500000112
2)
Figure GDA0002622557500000113
3)
Figure GDA0002622557500000114
wherein the content of the first and second substances,
Figure GDA0002622557500000115
Figure GDA0002622557500000116
wherein alpha isiRepresenting the value of the ith element in the lagrange coefficient vector alpha. If f (x)i)>0, the data is classified as positive, otherwise as negative.
After training is finished, the Lagrange coefficient vector alpha and the receive-close offset constant b are sent to a user, and after the user receives the Lagrange coefficient vector alpha and the receive-close offset constant b, the ciphertext SMO algorithm is finished.
The mathematical form of the SVM model can be written as:
Figure GDA0002622557500000117
Figure GDA0002622557500000118
the SVM can be trained only by solving the mathematical programming model. Each component of the Lagrange coefficient vector alpha is a coefficient corresponding to the training data, and the penalty coefficient C limits the value range of alpha.
Example 2
On the basis of the first embodiment, when a polynomial kernel function containing a linear kernel function is processed, the polynomial kernel function is firstly split into two parts, namely the linear kernel function and a nonlinear kernel function. Firstly, the linear kernel function is calculated under the ciphertext to obtain a kernel function table of the plaintext, then 1 is added to the numerical value in the kernel function table under the plaintext, and then the calculation of the polynomial kernel function can be completed. Because the linear kernel function part is operated under the ciphertext, the training data set is in a secret state, and after the operation of adding 1 and the power is carried out, the server still cannot acquire the information of the training data set, the SVM model of the polynomial kernel function can also be trained under the ciphertext.
Example 3
On the basis of the first embodiment, when a gaussian kernel function containing a linear kernel function is processed, euclidean distances between any two training data are needed, and the VHE can calculate the distance between two vectors in a ciphertext, so that the calculation of the gaussian kernel function can be split into two parts, namely the linear kernel function and the gaussian function. The distance between any two vectors is first calculated in ciphertext and then the gaussian function is calculated in plaintext. Because the distance between the vectors is calculated in a ciphertext state, the training data is in a secret state, the server cannot reversely deduce specific values of the two vectors from the distance between the vectors, and even if the Gaussian function is calculated in a plaintext state, the server still cannot obtain information about the training data set.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A privacy protection linear SVM model training method based on vector homomorphic encryption is characterized by comprising the following steps:
step 1: a user encrypts a training data set by adopting a vector-based homomorphic encryption scheme VHE and sends an encryption result to a server;
step 2: the server calculates the encryption result to obtain a ciphertext linear kernel function matrix and returns the ciphertext linear kernel function matrix to the user;
and step 3: the user decrypts the ciphertext linear kernel function matrix to obtain a plaintext linear kernel function matrix and sends the plaintext linear kernel function matrix to the server;
and 4, step 4: the server trains the plaintext linear kernel function matrix by adopting a ciphertext SMO algorithm and returns a training result to the user;
the step 1 comprises the following steps:
s1.1. initializing a key matrix S [ S ]11,...,Swv]W represents the row number of the key matrix S, and v represents the column number of the key matrix S;
s1.2, training data set X [ X ] through key matrix S1,...,xt]Encrypting to obtain a ciphertext data set matrix XeT represents the number of vectors in the training data set X;
s1.3, calculating a ciphertext conversion matrix M according to the transposition matrix G of the training data set X;
s1.4, a data tag vector y formed by tags of each piece of data in a training data set X and a ciphertext data set matrix XeAnd the ciphertext conversion matrix M is sent to the server;
the specific content of the step 2 is as follows: the server according to the ciphertext data set matrix XeCalculating a ciphertext linear kernel function matrix K by the ciphertext conversion matrix MeAnd a linear kernel function matrix of the ciphertextKeSending to the user;
the specific content of the step 3 is as follows: user linear kernel function matrix K for cipher text by using key matrix SeDecrypting to obtain plaintext linear kernel function matrix K11,...,Kmn]Sending the plaintext linear kernel function matrix K to the server;
the step 4 comprises the following steps:
s4.1, initializing Lagrange coefficient vector alpha (alpha)1,...,αt) An offset constant b and a penalty coefficient C;
s4.2. selecting a ciphertext data set matrix XeTwo sample points (x) inei,yi) And (x)ej,yj) As the adjustment points, corresponding error values are calculated
Figure FDA0002826404810000011
And
Figure FDA0002826404810000012
wherein alpha iskRepresenting the value of the kth element, y, in the Lagrangian coefficient vector αkDenotes the value of the kth element in the data tag vector y, b denotes the offset constant, KikRepresenting the value of the ith row and the kth column of the plaintext Linear Kernel matrix K, KjkRepresenting the value of the jth row and kth column of the plaintext linear kernel function matrix K; s4.3, order the elements to be collected
Figure FDA0002826404810000021
Computing
Figure FDA0002826404810000022
And
Figure FDA0002826404810000023
if it is
Figure FDA0002826404810000024
Then
Figure FDA0002826404810000025
If it is
Figure FDA0002826404810000026
Then
Figure FDA0002826404810000027
Wherein the content of the first and second substances,
Figure FDA0002826404810000028
and
Figure FDA0002826404810000029
respectively representing the values of the ith and jth elements of the Lagrangian coefficient vector alpha, yiAnd yjRespectively representing the value of the ith element and the jth element in a data label vector y, C representing a penalty coefficient, EiAnd EjRespectively represent sample points (x)ei,yi) And (x)ej,yj) The error value of (1);
s4.4. element to be collected
Figure FDA00028264048100000210
And
Figure FDA00028264048100000211
respectively replacing the ith element and the jth element in the vector alpha;
s4.5, making the element to be collected
Figure FDA00028264048100000212
Let the receive offset constant be bnewCalculating a first value to be determined
Figure FDA00028264048100000213
And a second pending value
Figure FDA00028264048100000214
If it is
Figure FDA00028264048100000215
Then b isnew=bi(ii) a If it is
Figure FDA00028264048100000216
Then b isnew=bj(ii) a Otherwise
Figure FDA00028264048100000217
Then shift the receive off by a constant bnewReplaces the value of the offset constant b; wherein, KiiRepresenting the value of the ith row and ith column of a plaintext Linear Kernel matrix K, KijRepresenting the value of the ith row and the jth column of a plaintext linear kernel matrix K, KjjRepresenting the value of the jth row and jth column of the plaintext linear kernel function matrix K;
s4.6. judging ciphertext data set matrix XeWhether all the data sample points meet the KKT condition or not is judged, if yes, the training is stopped, and the Lagrangian coefficient vector alpha and the offset constant b are output; otherwise go to step S4.1.
2. The privacy-preserving linear SVM model training method based on vector homomorphic encryption according to claim 1, wherein the encryption in step S1.2 comprises the following specific contents:
using the key matrix S to train each piece of data X in the data set XiPerforming encryption processing, and encrypting the encrypted ciphertext vector
Figure FDA0002826404810000031
And satisfy Sci=wxi+ e, where i represents the subscript, 1. ltoreq. i.ltoreq.t, n represents the length of the ciphertext vector c,
Figure FDA0002826404810000032
p represents a prime number, m represents data xiThe length of (a) of (b),
Figure FDA0002826404810000033
is shown in a finite field ZpSet of vectors of internal length m, w representsOne integer parameter, e denotes an error vector and | e | < w/2, S denotes a key matrix and
Figure FDA0002826404810000034
q is a prime number and
Figure FDA0002826404810000035
Figure FDA0002826404810000036
is shown in a finite field ZqA set of vectors of internal length n,
Figure FDA0002826404810000037
is shown in a finite field ZqA set of matrices with an inner dimension of m x n.
3. The privacy-preserving linear SVM model training method based on vector homomorphic encryption according to claim 2, wherein the calculation of the ciphertext transformation matrix M in step S1.3 comprises the following steps:
s1.3.1. according to | c | < 2lSelecting a matched adaptive integer l, and combining each item c in the ciphertext vector ciConverting into binary expression to obtain the transcoding ciphertext vector c ^ c1,...,c^n]Wherein i is more than or equal to 1 and less than or equal to n, c ^ ci=[c^i(l-1),...,c^i1,c^i0],c^ij={-1,0,1};
S1.3.2. making the intermediate ciphertext vector be c*And satisfy c*∈ZnlI.e. by
Figure FDA00028264048100000310
Wherein Z isnlRepresenting a vector set with length nl in a finite field Z;
s1.3.3. Each item S of the Key matrix SijConversion into vector S* ij=[2l-1Sij,2l-2Sij,...,2Sij,Sij]Obtaining an intermediate key matrix S*And an intermediate key matrix S*∈Zw×vlWherein i is not less than 1 and not more than w, j is not less than 1 and not more than v, l represents an adaptive integer, Zw×vlRepresenting a vector set with dimension w multiplied by vl in a finite field Z;
s1.3.4. let the key transformation matrix be M and satisfy S' M ═ S*+ Emodq, to obtain the key translation matrix
Figure FDA0002826404810000038
And M is as large as Zn′×nlNew cipher text vector c ═ Mc*(ii) a Where S 'denotes a new key matrix and S' ═ I, T]Where I denotes an identity matrix, T denotes an artificially defined dimension matrix with random element data, and S*Representing an intermediate key matrix, E representing a random noise matrix and E ∈ Zm×nlQ represents a prime number and
Figure FDA0002826404810000039
p represents a prime number, A ∈ Z(n′-m)×nlN 'denotes the length of the new ciphertext vector c', Zn′×nlRepresents a set of vectors with dimensions n' x nl in a finite field Z, Z(n′-m)×nlRepresents a set of vectors with dimension (n' -m) x nl in the finite field Z.
4. The privacy-preserving linear SVM model training method based on vector homomorphic encryption of claim 1, wherein in step S4.6, the ciphertext data set matrix XeEach sample point (x) in (b)ei,yi) KKT conditions that should be met include: 1)
Figure FDA0002826404810000041
wherein the content of the first and second substances,
Figure FDA0002826404810000042
wherein alpha isiRepresenting the value of the ith element in the lagrange coefficient vector alpha.
5. The method of claim 2, wherein if there are decimal parts in the training data set X, the training data set X needs to be normalized before step S1.2 is executed, so that all data in the training data set X are integers, and in step S3, the decryption result needs to be normalized to obtain the plaintext linear kernel matrix K.
6. The privacy-preserving linear SVM model training method based on vector homomorphic encryption as claimed in claim 5, wherein the decryption operation on the ciphertext vector c is calculation under the condition of possessing the key matrix S
Figure FDA0002826404810000043
Wherein the content of the first and second substances,
Figure FDA0002826404810000044
representing the nearest integer of a modulo q.
CN201810317657.9A 2018-04-10 2018-04-10 Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption Expired - Fee Related CN108521326B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810317657.9A CN108521326B (en) 2018-04-10 2018-04-10 Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810317657.9A CN108521326B (en) 2018-04-10 2018-04-10 Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption

Publications (2)

Publication Number Publication Date
CN108521326A CN108521326A (en) 2018-09-11
CN108521326B true CN108521326B (en) 2021-02-19

Family

ID=63431981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810317657.9A Expired - Fee Related CN108521326B (en) 2018-04-10 2018-04-10 Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption

Country Status (1)

Country Link
CN (1) CN108521326B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108650269A (en) * 2018-05-16 2018-10-12 中国科学技术大学 A kind of graded encryption method and system based on intensified learning
CN109359588B (en) * 2018-10-15 2021-02-09 电子科技大学 Novel privacy protection non-interactive K nearest neighbor classification method
CN110059501B (en) * 2019-04-16 2021-02-02 广州大学 Safe outsourcing machine learning method based on differential privacy
CN110190946B (en) * 2019-07-12 2021-09-03 之江实验室 Privacy protection multi-organization data classification method based on homomorphic encryption
CN111104968B (en) * 2019-12-02 2023-04-18 北京理工大学 Safety SVM training method based on block chain
CN111291781B (en) * 2020-01-09 2022-05-27 浙江理工大学 Encrypted image classification method based on support vector machine
WO2021184346A1 (en) * 2020-03-20 2021-09-23 云图技术有限公司 Private machine learning model generation and training methods, apparatus, and electronic device
CN111797907B (en) * 2020-06-16 2023-02-03 武汉大学 Safe and efficient SVM privacy protection training and classification method for medical Internet of things
US11599806B2 (en) 2020-06-22 2023-03-07 International Business Machines Corporation Depth-constrained knowledge distillation for inference on encrypted data
CN112152806B (en) * 2020-09-25 2023-07-18 青岛大学 Cloud-assisted image identification method, device and equipment supporting privacy protection
CN112669068B (en) * 2020-12-28 2024-05-14 深圳前海用友力合科技服务有限公司 Market research data transmission method and system based on big data
CN113706323A (en) * 2021-09-02 2021-11-26 杭州电子科技大学 Automatic insurance policy claim settlement method based on zero knowledge proof
CN116910818B (en) * 2023-09-13 2023-11-21 北京数牍科技有限公司 Data processing method, device, equipment and storage medium based on privacy protection

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105577357A (en) * 2015-12-21 2016-05-11 东南大学 Intelligent household data privacy protection method based on full homomorphic encryption

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099693B (en) * 2014-05-23 2018-10-19 华为技术有限公司 A kind of transmission method and transmitting device
GB2557818A (en) * 2015-09-25 2018-06-27 Veracyte Inc Methods and compositions that utilize transciptome sequencing data in machine learning-based classification

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105577357A (en) * 2015-12-21 2016-05-11 东南大学 Intelligent household data privacy protection method based on full homomorphic encryption

Also Published As

Publication number Publication date
CN108521326A (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN108521326B (en) Privacy protection linear SVM (support vector machine) model training method based on vector homomorphic encryption
US11902413B2 (en) Secure machine learning analytics using homomorphic encryption
CN112989368B (en) Method and device for processing private data by combining multiple parties
CN105122721B (en) For managing the method and system for being directed to the trustship of encryption data and calculating safely
US20200366459A1 (en) Searching Over Encrypted Model and Encrypted Data Using Secure Single-and Multi-Party Learning Based on Encrypted Data
CN110084063B (en) Gradient descent calculation method for protecting private data
US20150381349A1 (en) Privacy-preserving ridge regression using masks
JP2014126865A (en) Device and method for encryption processing
CN110635909B (en) Attribute-based collusion attack resistant proxy re-encryption method
CN114696990B (en) Multi-party computing method, system and related equipment based on fully homomorphic encryption
CN110611662B (en) Attribute-based encryption-based fog collaborative cloud data sharing method
CN108833077A (en) Outer packet classifier encipher-decipher method based on homomorphism OU password
Liu et al. Secure multi-label data classification in cloud by additionally homomorphic encryption
CN112766514B (en) Method, system and device for joint training of machine learning model
CN112052466B (en) Support vector machine user data prediction method based on multi-party secure computing protocol
Bu et al. Privacy preserving back-propagation based on BGV on cloud
Sharma et al. Confidential boosting with random linear classifiers for outsourced user-generated data
CN112906052B (en) Aggregation method of multi-user gradient permutation in federated learning
CN111737756B (en) XGB model prediction method, device and system performed through two data owners
CN112507372B (en) Method and device for realizing privacy protection of multi-party collaborative update model
CN113221153A (en) Graph neural network training method and device, computing equipment and storage medium
CN111859440A (en) Sample classification method of distributed privacy protection logistic regression model based on mixed protocol
Perusheska et al. Deep learning-based cryptanalysis of different AES modes of operation
Teixeira et al. Towards End-to-End Private Automatic Speaker Recognition
CN116192358A (en) Logistic regression method, device and system based on isomorphic encryption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210219