CN115865307B - Data point multiplication operation method for federal learning - Google Patents

Data point multiplication operation method for federal learning Download PDF

Info

Publication number
CN115865307B
CN115865307B CN202310170136.6A CN202310170136A CN115865307B CN 115865307 B CN115865307 B CN 115865307B CN 202310170136 A CN202310170136 A CN 202310170136A CN 115865307 B CN115865307 B CN 115865307B
Authority
CN
China
Prior art keywords
data
vector
party
data matrix
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310170136.6A
Other languages
Chinese (zh)
Other versions
CN115865307A (en
Inventor
冯黎明
马煜翔
刘洋
王玥
邢冰
刘文博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanxiang Zhilian Hangzhou Technology Co ltd
Original Assignee
Lanxiang Zhilian Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanxiang Zhilian Hangzhou Technology Co ltd filed Critical Lanxiang Zhilian Hangzhou Technology Co ltd
Priority to CN202310170136.6A priority Critical patent/CN115865307B/en
Publication of CN115865307A publication Critical patent/CN115865307A/en
Application granted granted Critical
Publication of CN115865307B publication Critical patent/CN115865307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a data point multiplication operation method for federal learning. The method comprises the following steps: preprocessing the data column vector V by a first party, then homomorphic encrypting to obtain an encrypted data matrix V1, and transmitting the encrypted data matrix V1 to a second party; preprocessing a data matrix w by a second party, then homomorphic encrypting to obtain a plurality of encrypted sub-data matrices F1, generating a random number vector Z corresponding to each encrypted sub-data matrix F1, and homomorphic encrypting to obtain an encrypted random number vector Z1; the second party calculates Hadamard products of each encrypted sub-data matrix F1 and the encrypted data matrix V1 to obtain corresponding matrixes E, sums each matrix E according to columns, adds the corresponding encrypted random number vector Z1 to obtain a vector Q, and sends the vector Q to the first party; the first party extracts the sum u of the corresponding data according to each vector Q to form a dot product. The invention can rapidly calculate the dot multiplication result of the data matrix and the data column vector, and protect the data privacy.

Description

Data point multiplication operation method for federal learning
Technical Field
The invention relates to the technical field of federal learning, in particular to a data point multiplication operation method for federal learning.
Background
With the high-speed development of the domestic internet industry and privacy protection, more and more institutions in real life adopt federal learning for model training. Based on privacy protection of data, the data which need to be trained by all parties are encrypted secret state data, and based on the characteristics of the encrypted data and federal learning self-training, a large number of secret state operations of matrix multiplication and vector multiplication are involved in the training process, however, the secret state operations of matrix multiplication and vector multiplication take a large computing time, so that federal learning training under big data and big model scenes becomes very difficult.
At present, federal learning generally adopts homomorphic encryption algorithm to encrypt and calculate plaintext data to be trained, homomorphic encryption refers to an encryption function which encrypts plaintext after ring addition and multiplication operation, and the result is equivalent to ciphertext after encryption.
For CKS, BFV and the like, a homomorphic encryption algorithm of SIMD accelerated computation is used, the SIMD accelerated computation can encrypt a plurality of plaintext data into the same ciphertext, if the situation is required to calculate the secret state operation of matrix multiplied by column vector, the existing method is to homomorphic encrypt column vector, homomorphic encrypt each row of matrix, calculate the dot multiplication result between each row of ciphertext matrix and ciphertext column vector, and in the process, rotation operation (Rotation) is required, and the Rotation operation (Rotation) of ciphertext after homomorphic encryption needs to take a great deal of calculation time, when the dimension of matrix is large, the calculation overhead caused by ciphertext Rotation operation is unacceptable.
Disclosure of Invention
In order to solve the technical problems, the invention provides a data point multiplication operation method for federal learning, which can rapidly calculate the point multiplication result of a data matrix held by a second party and a data column vector held by a first party, does not leak data of both parties, does not need rotation operation, solves the problem that when homomorphic encryption algorithms such as CKS and BFV and the like using SIMD to accelerate calculation are used for calculating the point multiplication result of the matrix multiplied by the vector, occupies extremely high calculation time due to rotation operation, and improves the calculation efficiency.
In order to solve the problems, the invention is realized by adopting the following technical scheme:
the invention relates to a data point multiplication operation method for federal learning, wherein a first party holds a data column vector v in s dimension, and a second party holds a data matrix w in g rows and s columns, and the method comprises the following steps:
s1: the second party determines preprocessing parameters according to the number of rows and the number of columns of the data matrix w, the first party preprocesses the data column vector V according to the preprocessing parameters to obtain the data matrix V, and the second party preprocesses the data matrix w according to the preprocessing parameters to obtain a plurality of sub-data matrices F;
s2: the first party generates a public key pk and a private key sk, homomorphic encryption is carried out on the data matrix V by adopting the public key pk to obtain an encrypted data matrix V1, and the public key pk and the encrypted data matrix V1 are sent to the second party;
s3: the second party generates a corresponding random number vector Z for each sub-data matrix F, and the sum of random numbers at the appointed position in the random number vector Z is determined to be 0 by the preprocessing parameters;
s4: the second party adopts the public key pk to homomorphic encrypt each sub-data matrix F to obtain a corresponding encrypted sub-data matrix F1, and adopts the public key pk to homomorphic encrypt each random number vector Z to obtain a corresponding encrypted random number vector Z1;
s5: the second party calculates Hadamard products of each encryption sub-data matrix F1 and the encryption data matrix V1 to obtain corresponding matrixes E, sums each matrix E according to columns to obtain corresponding vectors R, adds the vectors R and the corresponding encryption random number vectors Z1 to obtain vectors Q, and sends all the calculated vectors Q to the first party;
s6: the first party adopts the private key sk to decrypt each vector Q to obtain a corresponding vector Q, and extracts the sum u of corresponding data from the vector Q to form a dot product result according to the preprocessing parameters.
In this scheme, the number of rows and columns of each sub data matrix F are respectively identical to the number of rows and columns of the data matrix V. The dimension of the random number vector Z corresponds to the number of columns of the sub-data matrix F. Each random number within the random number vector Z is not 0. The first party holds the data column vector v in s dimension and the second party holds the data matrix w of g rows and s columns. The first party performs preprocessing to convert the data column vector V into a data matrix V, the second party performs preprocessing to convert the data matrix w into a plurality of sub-data matrices F, then the first party and the second party encrypt the data matrices of the plaintext held by the first party and the second party by using the same public key pk as an encryption key and adopting the same homomorphic encryption algorithm to obtain corresponding encrypted data matrices, the second party also generates a corresponding random number vector Z for each sub-data matrix F, uses the same public key pk as an encryption key and adopts the same homomorphic encryption algorithm to encrypt the random number vector Z to obtain an encrypted random number vector Z1. In this way, according to the homomorphic encryption algorithm principle, the Hadamard product of each encryption sub-data matrix F1 and the encryption data matrix V1 is calculated to obtain a corresponding matrix E, each matrix E is summed according to columns to obtain a corresponding vector R, the vector R and the corresponding encryption random number vector Z1 are added to obtain a vector Q, the vector Q is decrypted by a private key sk to obtain a vector Q, the data of the corresponding position in each vector Q is summed according to the preprocessing parameters to obtain the sum u of a plurality of data corresponding to each vector Q, and the sum u of the data is the dot product result of the row corresponding to the data column vector V and the data matrix w because the sum of the random numbers of the appointed position in the random number vector Z is determined to be 0 by the preprocessing parameters.
The Rotation operation is not carried out in the whole calculation process of the scheme, extremely high calculation time cannot be occupied by the Rotation operation, the technical problem that extremely high calculation time is occupied by the Rotation operation when homomorphic encryption algorithms such as CKS and BFV and the like using SIMD (single instruction multiple data) acceleration calculation are used for calculating dot multiplication results of the matrix and the column vector is solved, the calculation efficiency is greatly improved, federal learning training under big data and big model scenes is convenient to achieve, and meanwhile data privacy of both parties is protected.
Preferably, the method for extracting the sum u of the corresponding data from the vector q according to the preprocessing parameter in the step S6 includes the following steps:
and summing the data at the corresponding position in each vector q according to the preprocessing parameters to obtain a sum u of a plurality of data corresponding to each vector q, and extracting the corresponding sum u of the data according to the preprocessing parameters to form a data column vector p in s dimension, wherein the data column vector p is the dot multiplication result of the data matrix w and the data column vector v.
Preferably, in the step S1, the method for determining the preprocessing parameters by the second party according to the number of rows and the number of columns of the data matrix w is as follows:
the second party calculates preprocessing parameters k and h, and the formula is as follows:
Figure SMS_1
,/>
Figure SMS_2
,
wherein ,
Figure SMS_3
representing an upward rounding.
Preferably, in the step S1, the first party performs preprocessing on the data column vector V according to the preprocessing parameter to obtain the data matrix V as follows:
m1: the first party calculates a parameter n, n=h×k, and if s=n, the data column vector v is transposed to obtain a data row vector v1; if s is not equal to n, carrying out zero padding operation on the data column vector v until the dimension reaches n, and then transposing to obtain a data row vector v1;
v1=[a 1 、a 2 、a 3 ……a n ],1≤f≤n,a f representing the f-th data within the data line vector v1;
m2: the data matrix V is calculated according to the preprocessing parameters k and h and the data line vector V1, and the formula is as follows:
Figure SMS_4
,
wherein i is more than or equal to 1 and less than or equal to k, A i,1 、A i,2 、A i,3 ……A i,k Are the same row vectors, all are [ a ] (i-1)*h+1 、a (i-1) *h+2 ……a i*h-1 、a i*h ]。
Preferably, in the step S1, the second party performs preprocessing on the data matrix w according to the preprocessing parameters to obtain a plurality of sub-data matrices F, which is as follows:
n1: the second party calculates the parameter m, m=k 2 If the g is not equal to m, performing row-by-row zero padding operation on the data matrix w until the row number reaches m;
n2: the second party calculates a parameter n, n=h×k, if s is not equal to n, the data matrix w is subjected to column zero padding operation until the column number reaches n, and finally a data matrix L with m×n is obtained;
and N3: the data matrix L is equally divided into k x n sub-data matrices W by rows,
Figure SMS_5
,
wherein i is more than or equal to 1 and less than or equal to k, f is more than or equal to 1 and less than or equal to n, b i,f Data representing the ith row and the ith column of the sub data matrix W;
n4: converting each sub-data matrix W into a sub-data matrix F according to the preprocessing parameters k and h to obtain k sub-data matrices F;
the method of converting the sub data matrix W into the sub data matrix F is as follows:
dividing each row of data of the sub data matrix W into k row vectors, wherein each row vector has h data, and obtaining the following formula:
Figure SMS_6
(1),
wherein i is not less than 1 and not more than k, j is not less than 1 and not more than k, B i,j Row vectors representing the ith row and jth column of the sub-data matrix W, B i,j =[b i,(j-1)*h+1 、b i,(j-1)*h+2 ……b i,j*h-1 、b i,j*h ],
Transpose equation (1) to obtain a sub-data matrix F,
Figure SMS_7
preferably, the method for generating the corresponding random number vector Z by the second party to the sub-data matrix F in the step S3 is as follows:
the second party generates a random number vector Z, Z= [ Z ] 1 ,Z 2 ……Z n ]Each random number within the random number vector Z is not 0,
Figure SMS_8
wherein f is not less than 1 and not more than n, d is not less than 0 and not more than k-1, Z f Representing the f-th random number within the random number vector Z.
Preferably, in the step S5, the formula for adding the vector R to the corresponding encrypted random number vector Z1 to obtain the vector Q is as follows:
Q=R+Z1=[R 1 +Z1 1 ,R 2 +Z1 2 ,……R n +Z1 n ],1≤f≤n,R f represents the f-th value, Z1, in the vector R f Represents the f-th encrypted random number in the encrypted random number vector Z1.
Preferably, in the step S6, the formula of decrypting the vector Q by the first party using the private key sk to obtain the corresponding vector Q is as follows:
q=DEC(sk, Q)=[r 1 +Z 1 ,r 2 +Z 2 ,……r n +Z n ],
wherein DEC (sk, Q) represents that the private key sk is used as a decryption key to decrypt the vector Q by adopting a homomorphic encryption algorithm, and f is more than or equal to 1 and less than or equal to n and r f Representing the use of homomorphic encryption algorithm to R using private key sk as decryption key f Result of decryption, Z f Representing the use of homomorphic encryption algorithm to Z1 using private key sk as decryption key f And (5) a result of decryption.
Preferably, in the step S6, the method for summing the data of the corresponding position in the vector q according to the preprocessing parameter by the first party to obtain the sum u of the plurality of data corresponding to the vector q is as follows:
the first party calculates the sum u of k data, denoted as u 1 、u 2 ……u k ,1≤i≤k,
u i =[(r (i-1)*h+1 +Z (i-1)*h+1 )+(r (i-1)*h+2 +Z (i-1)*h+2 )+……(r i*h+ Z i*h )]。
Preferably, the method for extracting the sum u of the corresponding data to form the S-dimensional data column vector p according to the preprocessing parameter in the step S6 is as follows:
the k sub-data matrixes W of the data matrix L divided by the row are respectively marked as W 1 、W 2 ……W k
Figure SMS_9
Sub-data matrix W i The converted sub-data matrix F is denoted as F i
Sub-data matrix F i The corresponding vector Q is denoted as Q i
Vector Q i The decrypted vector q is denoted as q i
Vector q i The sum u of the corresponding k data is respectively recorded as u (i) 1 、u(i) 2 ……u(i) k
The sum u of all data is arranged in sequence into a data column vector y,
Figure SMS_10
,/>
Figure SMS_11
deleting the last t values in the data column vector y to obtain a data column vector p, wherein t=m-s.
The beneficial effects of the invention are as follows: the method has the advantages that the dot multiplication result of the data matrix held by the second party and the data column vector held by the first party can be calculated quickly, plaintext data of the two parties cannot be leaked, data privacy of the two parties is protected, rotation operation is not needed, the problem that when homomorphic encryption algorithms such as CKS and BFV and the like which are used for calculating the dot multiplication result of the matrix multiplied by the vector and are used for calculating the dot multiplication result of the matrix multiplied by the vector, extremely high calculation time is occupied by rotation operation is solved, calculation efficiency is improved, and federal learning training under large data and large model scenes is facilitated.
Drawings
FIG. 1 is a flow chart of an embodiment;
FIG. 2 is a schematic diagram illustrating the conversion of a data column vector V into a data matrix V and the conversion of a data matrix w into a sub-data matrix F;
FIG. 3 is an illustrative matrix E 1 Matrix E 2 Is a schematic diagram of (a).
Detailed Description
The technical scheme of the invention is further specifically described below through examples and with reference to the accompanying drawings.
Examples: the data point multiplication operation method for federal learning in this embodiment includes the following steps, as shown in fig. 1, that a first party holds a data column vector v in s dimension and a second party holds a data matrix w in g rows and s columns:
s1: the second side determines preprocessing parameters k and h according to the number of rows and columns of the data matrix w, and the formula is as follows:
Figure SMS_12
,/>
Figure SMS_13
wherein ,
Figure SMS_14
representing an upward rounding;
the first party preprocesses the data column vector V according to the preprocessing parameters k and h to obtain a data matrix V, and the specific steps are as follows:
m1: the first party calculates a parameter n, n=h×k, and if s=n, the data column vector v is transposed to obtain a data row vector v1; if s is not equal to n, carrying out zero padding operation on the data column vector v until the dimension reaches n, and then transposing to obtain a data row vector v1;
v1=[a 1 、a 2 、a 3 ……a n ],1≤f≤n,a f representing the f-th data within the data line vector v1;
m2: the data matrix V is calculated according to the preprocessing parameters k and h and the data line vector V1, and the formula is as follows:
Figure SMS_15
wherein i is more than or equal to 1 and less than or equal to k, A i,1 、A i,2 、A i,3 ……A i,k Are the same row vectors, all are [ a ] (i-1)*h+1 、a (i-1) *h+2 ……a i*h-1 、a i*h ];
The second side carries out preprocessing on the data matrix w according to the preprocessing parameters to obtain a plurality of sub data matrices F, wherein the number of rows and the number of columns of each sub data matrix F are respectively consistent with the number of rows and the number of columns of the data matrix V, and the specific steps are as follows:
n1: the second party calculates the parameter m, m=k 2 If the g is not equal to m, performing row-by-row zero padding operation on the data matrix w until the row number reaches m;
n2: the second party calculates a parameter n, n=h×k, if s is not equal to n, the data matrix w is subjected to column zero padding operation until the column number reaches n, and finally a data matrix L with m×n is obtained;
and N3: the data matrix L is equally divided into k x n sub-data matrices W by rows,
Figure SMS_16
wherein i is more than or equal to 1 and less than or equal to k, f is more than or equal to 1 and less than or equal to n, b i,f Data representing the ith row and the ith column of the sub data matrix W;
n4: converting each sub-data matrix W into a sub-data matrix F according to the preprocessing parameters k and h to obtain k sub-data matrices F;
the method of converting the sub data matrix W into the sub data matrix F is as follows:
dividing each row of data of the sub data matrix W into k row vectors, wherein each row vector has h data, and obtaining the following formula:
Figure SMS_17
(1),
wherein i is not less than 1 and not more than k, j is not less than 1 and not more than k, B i,j Row vectors representing the ith row and jth column of the sub-data matrix W, B i,j =[b i,(j-1)*h+1 、b i,(j-1)*h+2 ……b i,j*h-1 、b i,j*h ],
Transpose equation (1) to obtain a sub-data matrix F,
Figure SMS_18
s2: the first party generates a public key pk and a private key sk, homomorphic encryption is carried out on the data matrix V by adopting the public key pk to obtain an encrypted data matrix V1, and the public key pk and the encrypted data matrix V1 are sent to the second party;
s3: the second party generates a corresponding random number vector Z for each sub-data matrix F, the dimension of the random number vector Z is consistent with the column number of the sub-data matrix F, each random number in the random number vector Z is not 0, and the sum of random numbers at the appointed position in the random number vector Z is determined to be 0 by the preprocessing parameters k and h;
the method for generating the corresponding random number vector Z for the sub-data matrix F by the second party is as follows:
the second party generates a random number vector Z, Z= [ Z ] 1 ,Z 2 ……Z n ]Each random number within the random number vector Z is not 0,
Figure SMS_19
wherein f is not less than 1 and not more than n, d is not less than 0 and not more than k-1, Z f Representing the f-th random number in the random number vector Z;
s4: the second party adopts the public key pk to homomorphic encrypt each sub-data matrix F to obtain a corresponding encrypted sub-data matrix F1, and adopts the public key pk to homomorphic encrypt each random number vector Z to obtain a corresponding encrypted random number vector Z1;
s5: the second party calculates Hadamard products of each encryption sub-data matrix F1 and the encryption data matrix V1 to obtain corresponding matrixes E, sums each matrix E according to columns to obtain corresponding vectors R, adds the vectors R and the corresponding encryption random number vectors Z1 to obtain vectors Q, and sequentially sends all the calculated vectors Q to the first party;
the formula for adding the vector R to the corresponding encrypted random number vector Z1 to obtain the vector Q is as follows:
Q=R+Z1=[R 1 +Z1 1 ,R 2 +Z1 2 ,……R n +Z1 n ],1≤f≤n,R f represents the f-th value, Z1, in the vector R f Representing the f-th encrypted random number in the encrypted random number vector Z1;
s6: the first party decrypts each vector Q by adopting a private key sk to obtain a corresponding vector Q, sums the data at the corresponding position in each vector Q according to pretreatment parameters k and h to obtain a plurality of data sums u corresponding to each vector Q, and extracts the corresponding data sums u according to the pretreatment parameters k and h to form a data column vector p in s dimension, wherein the data column vector p is the dot product result of a data matrix w and a data column vector v;
the formula for decrypting the vector Q by the first party by using the private key sk to obtain the corresponding vector Q is as follows:
q=DEC(sk, Q)=[r 1 +Z 1 ,r 2 +Z 2 ,……r n +Z n ],
wherein DEC (sk, Q) represents that the private key sk is used as a decryption key to decrypt the vector Q by adopting a homomorphic encryption algorithm, and f is more than or equal to 1 and less than or equal to n and r f Representing the use of homomorphic encryption algorithm to R using private key sk as decryption key f Result of decryption, Z f Representing the use of homomorphic encryption algorithm to Z1 using private key sk as decryption key f The result of decryption;
the method for summing the data of the corresponding position in the vector q by the first party according to the preprocessing parameters k and h to obtain the sum u of a plurality of data corresponding to the vector q is as follows:
the first party calculates the sum u of k data, denoted as u 1 、u 2 ……u k ,1≤i≤k,
u i =[(r (i-1)*h+1 +Z (i-1)*h+1 )+(r (i-1)*h+2 +Z (i-1)*h+2 )+……(r i*h+ Z i*h )]。
In step S6, the method for extracting the data column vector p of S dimension from the sum u of the corresponding data according to the preprocessing parameters k and h is as follows:
the k sub-data matrixes W of the data matrix L divided by the row are respectively marked as W 1 、W 2 ……W k
Figure SMS_20
Sub-data matrix W i The converted sub-data matrix F is denoted as F i
Sub-data matrix F i The corresponding vector Q is denoted as Q i
Vector Q i The decrypted vector q is denoted as q i
Vector q i The sum u of the corresponding k data is respectively recorded as u (i) 1 、u(i) 2 ……u(i) k
The sum u of all data is arranged in sequence into a data column vector y,
Figure SMS_21
,/>
Figure SMS_22
deleting the last t values in the data column vector y to obtain a data column vector p, wherein t=m-s.
In this scheme, the second party calculates preprocessing parameters k and h according to the number of rows and columns of the data matrix w, and calculates parameters m, n, m=k 2 N=h×k, if g=m, the number of rows is unchanged, otherwise, the data matrix W performs a row-wise zero-filling operation until the number of rows reaches m, if s=n, the number of columns is unchanged, otherwise, the data matrix W performs a column-wise zero-filling operation until the number of columns reaches n, a matrix with m×n is finally obtained, and is recorded as a data matrix L, the data matrix L is equally divided into k sub-data matrices W with k×n according to rows, each sub-data matrix W is converted into a sub-data matrix F, and k sub-data matrices F are obtained in total.
The first party calculates a parameter n, and if s=n, the data column vector v is directly transposed to obtain a data row vector v1; if s is not equal to n, performing zero padding operation on the data column vector V until the dimension reaches n, then transposing to obtain a data row vector V1, and calculating a data matrix V of k rows and n columns according to the preprocessing parameters k and h and the data row vector V1, wherein the number of rows and the number of columns of the sub data matrix F are respectively consistent with the number of rows and the number of columns of the data matrix V.
The first party and the second party encrypt the data matrix of the plaintext held by the first party and the second party by using the same public key pk as an encryption key and adopting the same homomorphic encryption algorithm to obtain a corresponding encrypted data matrix, the second party also generates a corresponding random number vector Z for each sub-data matrix F, the same public key pk is used as the encryption key, the same homomorphic encryption algorithm is adopted to encrypt the random number vector Z to obtain an encrypted random number vector Z1, each random number in the random number vector Z is not 0, and the sum of every h random numbers from front to back in the random number vector Z is 0.
Then, the second party calculates Hadamard products of each encrypted sub-data matrix F1 and the encrypted data matrix V1 to obtain a corresponding matrix E, sums each matrix E according to columns to obtain a corresponding vector R, adds the vector R and the corresponding encrypted random number vector Z1 to obtain a vector Q, sequentially sends all the calculated vectors Q to the first party, the first party decrypts each vector Q by adopting a private key sk to obtain a corresponding vector Q, sums every h values from front to back in each vector Q to obtain a corresponding data sum u, sums data at corresponding positions in each vector Q to obtain k data sums u, and the k data sums u in the vector Q are consistent with the k values in the column vector obtained by the dot product of the corresponding sub-data matrix W and the data column vector V from front to back. In the initial stage, the data matrix w is subjected to row-column zero padding operation to obtain a data matrix L, so that the value of the dot product of all rows with 0 value and the data column vector v in the data matrix L is 0, after the sum u of all data is sequentially arranged into the data column vector y, the last t values in the data column vector y correspond to the value of the dot product of the zero padding row and the data column vector v, t=m-s, and the last t values in the data column vector y are directly deleted to obtain the data column vector p, wherein the data column vector p is the dot product result of the data matrix w and the data column vector v.
The Rotation operation is not carried out in the whole calculation process of the scheme, extremely high calculation time cannot be occupied by the Rotation operation, the technical problem that extremely high calculation time is occupied by the Rotation operation when homomorphic encryption algorithms such as CKS and BFV and the like using SIMD (single instruction multiple data) acceleration calculation are used for calculating dot multiplication results of the matrix and the column vector is solved, the calculation efficiency is greatly improved, federal learning training under big data and big model scenes is convenient to achieve, and meanwhile data privacy of both parties is protected.
In the scheme, the homomorphic encryption algorithm such as CKKS and BFV is adopted to encrypt the matrix by rows, that is, the first party sends k ciphertexts when sending the encrypted data matrix V1 to the second party, the second party sends k calculated vectors Q to the first party, that is, the encryption calculation part communicates 2k ciphertexts in total, while the prior algorithm is adopted, since the data matrix w has g rows, the ciphertexts sent to the first party by the second party have at least g ciphertexts, because
Figure SMS_23
The traffic of the method is also much lower than with existing algorithms.
Illustrating:
the first party and the second party perform federal learning model training, as shown in fig. 2, the second party holds a 4-row 4-column data matrix w, the first column characteristic value of the data matrix w is age data, the second column characteristic value is gender data, the third column characteristic value is personal income data, and the fourth column characteristic value is household overall income data; the first party holds a 4-dimensional data column vector v, the first data is an age characteristic parameter, the second data is a gender characteristic parameter, the third data is a personal income characteristic parameter, and the fourth data is a family overall income characteristic parameter.
The second party calculates k= 2,h =2, m=4, n=4, and since the data matrix w is 4 rows and 4 columns and the dimension of the data column vector v is 4, the data matrix w and the data column vector v do not need zero padding operation. As shown in fig. 2, the first party performs a data column vector vPreprocessing to obtain a data matrix V, preprocessing a data matrix W by a second party, and equally dividing the data matrix W into sub-data matrices W 1 、W 2 Then the sub data matrix W 1 、W 2 Conversion into a sub-data matrix F 1 、F 2
The first party generates a public key pk and a private key sk, homomorphic encryption is carried out on the data matrix V by adopting the public key pk, an encrypted data matrix V1 is obtained, v1=enc (V), enc (V) represents homomorphic encryption on the data matrix V, and the public key pk and the encrypted data matrix V1 are sent to the second party.
The second party gives the sub-data matrix F 1 Generating a corresponding random number vector Z 1 For a sub-data matrix F 2 Generating a corresponding random number vector Z 2 ,Z 1 =[Z 1,1 ,Z 1,2 ,Z 1,3 ,Z 1,4 ],Z 1,1 +Z 1,2 =0,Z 1,3 + Z 1,4 =0,Z 1,1 ,Z 1,2 ,Z 1,3 ,Z 1,4 Neither is 0; z is Z 2 =[Z 2,1 ,Z 2,2 ,Z 2,3 ,Z 2,4 ],Z 2,1 +Z 2,2 =0,Z 2,3 +Z 2,4 =0,Z 2,1 ,Z 2,2 ,Z 2,3 ,Z 2,4 Neither is 0.
The second party adopts the public key pk to pair the sub-data matrix F 1 、F 2 Homomorphic encryption is carried out to obtain a corresponding encrypted sub-data matrix F1 1 、F1 2 ,F1 1 =enc(F 1 ),F1 2 =enc(F 2 ) The public key pk is adopted for the random number vector Z 1 、Z 2 Homomorphic encryption is carried out to obtain a corresponding encrypted random number vector Z1 1 、Z1 2 ,Z1 1 =enc(Z 1 ),Z1 2 =enc(Z 2 )。
The second party calculates an encrypted sub-data matrix F1 1 Hadamard product of the encrypted data matrix V1 to obtain a corresponding matrix E 1 The second party calculates an encrypted sub-data matrix F1 2 Hadamard product of the encrypted data matrix V1 to obtain a corresponding matrix E 2 . Matrix E 1 Matrix E 2 As shown in fig. 3, a momentArray E 1 Enc (a) 1 *d 1,1 ) Representing a in the data matrix V 1 Homomorphic encryption ciphertext and sub-data matrix W 1 D in (d) 1,1 A value of the product of homomorphic encryption ciphertexts.
The second party will matrix E 1 Summing by column to obtain corresponding vector R 1 Vector R 1 And corresponding encrypted random number vector Z1 1 Adding to obtain a vector Q 1
Q 1 =[enc(a 1 *d 1,1 +a 3 *d 1,3 +Z 1,1 ),enc(a 2 *d 1,2 +a 4 *d 1,4 +Z 1,2 ) ,enc(a 1 *d 2,1 +a 3 *d 2,3 +Z 1,3 ) ,enc(a 2 *d 2,2 +a 4 *d 2,4 +Z 1,4 )],
Matrix E 2 Summing by column to obtain corresponding vector R 2 Vector R 2 And corresponding encrypted random number vector Z1 2 Adding to obtain a vector Q 2
Q 2 =[enc(a 1 *d 3,1 +a 3 *d 3,3 +Z 2,1 ),enc(a 2 *d 3,2 +a 4 *d 3,4 +Z 2,2 ) ,enc(a 1 *d 4,1 +a 3 *d 4,3 +Z 2,3 ) ,enc(a 2 *d 4,2 +a 4 *d 4,4 +Z 2,4 )],
To calculate the vector Q 1 、Q 2 To the first party.
The first party adopts the private key sk to vector Q 1 、Q 2 Decrypting to obtain a corresponding vector q 1 、q 2
q 1 =[a 1 *d 1,1 +a 3 *d 1,3 +Z 1,1 ,a 2 *d 1,2 +a 4 *d 1,4 +Z 1,2 ,a 1 *d 2,1 +a 3 *d 2,3 +Z 1,3 ,a 2 *d 2,2 +a 4 *d 2,4 +Z 1,4 ],
q 2 =[a 1 *d 3,1 +a 3 *d 3,3 +Z 2,1 ,a 2 *d 3,2 +a 4 *d 3,4 +Z 2,2 ,a 1 *d 4,1 +a 3 *d 4,3 +Z 2,3 ,a 2 *d 4,2 +a 4 *d 4,4 +Z 2,4 ],
Since h=2, the first party will vector q 1 Summing every 2 values from front to back to obtain u (1) 1 、u(1) 2 Vector q 2 Summing every 2 values from front to back to obtain u (2) 1 、u(2) 2
u(1) 1 = a 1 *d 1,1 +a 3 *d 1,3 +Z 1,1 +a 2 *d 1,2 +a 4 *d 1,4 +Z 1,2
u(1) 2 = a 1 *d 2,1 +a 3 *d 2,3 +Z 1,3 +a 2 *d 2,2 +a 4 *d 2,4 +Z 1,4
u(2) 1 = a 1 *d 3,1 +a 3 *d 3,3 +Z 2,1 +a 2 *d 3,2 +a 4 *d 3,4 +Z 2,2
u(2) 2 = a 1 *d 4,1 +a 3 *d 4,3 +Z 2,3 +a 2 *d 4,2 +a 4 *d 4,4 +Z 2,4
Due to Z 1,1 +Z 1,2 =0,Z 1,3 + Z 1,4 =0,Z 2,1 +Z 2,2 =0,Z 2,3 +Z 2,4 =0,
u(1) 1 = a 1 *d 1,1 +a 2 *d 1,2 +a 3 *d 1,3 +a 4 *d 1,4
u(1) 2 = a 1 *d 2,1 +a 2 *d 2,2 +a 3 *d 2,3 +a 4 *d 2,4
u(2) 1 = a 1 *d 3,1 +a 2 *d 3,2 +a 3 *d 3,3 +a 4 *d 3,4
u(2) 2 = a 1 *d 4,1 +a 2 *d 4,2 +a 3 *d 4,3 +a 4 *d 4,4
Therefore, u (1) 1 、u(1) 2 、u(2) 1 、u(2) 2 The data column vector p forming 4 dimensions is the dot product of the data matrix w and the data column vector v, and is consistent with the plaintext dot product of the data matrix w and the data column vector v.
The calculated data column vector p is used as a training sample of the batch in logistic regression and is used for calculating a loss function later and updating the federal learning model through gradient back propagation.

Claims (6)

1. A data point multiplication operation method for federal learning, a first party holds a data column vector v in s dimension, and a second party holds a data matrix w in g rows and s columns, the method is characterized by comprising the following steps:
s1: the second party determines preprocessing parameters according to the number of rows and the number of columns of the data matrix w, the first party preprocesses the data column vector V according to the preprocessing parameters to obtain the data matrix V, and the second party preprocesses the data matrix w according to the preprocessing parameters to obtain a plurality of sub-data matrices F;
s2: the first party generates a public key pk and a private key sk, homomorphic encryption is carried out on the data matrix V by adopting the public key pk to obtain an encrypted data matrix V1, and the public key pk and the encrypted data matrix V1 are sent to the second party;
s3: the second party generates a corresponding random number vector Z for each sub-data matrix F, and the sum of random numbers at the appointed position in the random number vector Z is determined to be 0 by the preprocessing parameters;
s4: the second party adopts the public key pk to homomorphic encrypt each sub-data matrix F to obtain a corresponding encrypted sub-data matrix F1, and adopts the public key pk to homomorphic encrypt each random number vector Z to obtain a corresponding encrypted random number vector Z1;
s5: the second party calculates Hadamard products of each encryption sub-data matrix F1 and the encryption data matrix V1 to obtain corresponding matrixes E, sums each matrix E according to columns to obtain corresponding vectors R, adds the vectors R and the corresponding encryption random number vectors Z1 to obtain vectors Q, and sends all the calculated vectors Q to the first party;
s6: the first party decrypts each vector Q by adopting a private key sk to obtain a corresponding vector Q, and extracts a sum u of corresponding data from the vector Q to form a dot product result according to the preprocessing parameters;
the method for extracting the corresponding sum u of the data from the vector q to form the dot product result in the step S6 according to the preprocessing parameters comprises the following steps:
summing the data at the corresponding position in each vector q according to the preprocessing parameters to obtain a sum u of a plurality of data corresponding to each vector q, and extracting the corresponding sum u of the data according to the preprocessing parameters to form a data column vector p in s dimension, wherein the data column vector p is the dot multiplication result of the data matrix w and the data column vector v;
the method for determining the preprocessing parameters by the second party according to the row number and the column number of the data matrix w in the step S1 is as follows:
the second party calculates preprocessing parameters k and h, and the formula is as follows:
Figure QLYQS_1
,/>
Figure QLYQS_2
,
wherein ,
Figure QLYQS_3
representing an upward rounding;
in the step S1, the first party performs preprocessing on the data column vector V according to the preprocessing parameter, and the method for obtaining the data matrix V is as follows:
m1: the first party calculates a parameter n, n=h×k, and if s=n, the data column vector v is transposed to obtain a data row vector v1; if s is not equal to n, carrying out zero padding operation on the data column vector v until the dimension reaches n, and then transposing to obtain a data row vector v1;
v1=[a 1 、a 2 、a 3 ……a n ],1≤f≤n,a f representing the f-th data within the data line vector v1;
m2: the data matrix V is calculated according to the preprocessing parameters k and h and the data line vector V1, and the formula is as follows:
Figure QLYQS_4
wherein i is more than or equal to 1 and less than or equal to k, A i,1 、A i,2 、A i,3 ……A i,k Are the same row vectors, all are [ a ] (i-1)*h+1 、a (i-1) *h+2 ……a i*h-1 、a i*h ];
In the step S1, the second party performs preprocessing on the data matrix w according to the preprocessing parameters, and the method for obtaining a plurality of sub-data matrices F is as follows:
n1: the second party calculates the parameter m, m=k 2 If the g is not equal to m, performing row-by-row zero padding operation on the data matrix w until the row number reaches m;
n2: the second party calculates a parameter n, n=h×k, if s is not equal to n, the data matrix w is subjected to column zero padding operation until the column number reaches n, and finally a data matrix L with m×n is obtained;
and N3: the data matrix L is equally divided into k x n sub-data matrices W by rows,
Figure QLYQS_5
wherein i is more than or equal to 1 and less than or equal to k, f is more than or equal to 1 and less than or equal to n, b i,f Data representing the ith row and the ith column of the sub data matrix W;
n4: converting each sub-data matrix W into a sub-data matrix F according to the preprocessing parameters k and h to obtain k sub-data matrices F;
the method of converting the sub data matrix W into the sub data matrix F is as follows:
dividing each row of data of the sub data matrix W into k row vectors, wherein each row vector has h data, and obtaining the following formula:
Figure QLYQS_6
(1),
wherein i is not less than 1 and not more than k, j is not less than 1 and not more than k, B i,j Row vectors representing the ith row and jth column of the sub-data matrix W, B i,j =[b i,(j-1)*h+1 、b i,(j-1)*h+2 ……b i,j*h-1 、b i,j*h ],
Transpose equation (1) to obtain a sub-data matrix F,
Figure QLYQS_7
2. the method of claim 1, wherein the method of generating the corresponding random number vector Z for the sub-data matrix F by the second party in the step S3 is as follows:
the second party generates a random number vector Z, Z= [ Z ] 1 ,Z 2 ……Z n ]Each random number within the random number vector Z is not 0,
Figure QLYQS_8
wherein f is not less than 1 and not more than n, d is not less than 0 and not more than k-1, Z f Representing the f-th random number within the random number vector Z.
3. The method according to claim 2, wherein the formula for adding the vector R to the corresponding encrypted random number vector Z1 in the step S5 to obtain the vector Q is as follows:
Q=R+Z1=[R 1 +Z1 1 ,R 2 +Z1 2 ,……R n +Z1 n ],1≤f≤n,R f represents the f-th value, Z1, in the vector R f Represents the f-th encrypted random number in the encrypted random number vector Z1.
4. A data point multiplication method for federal learning according to claim 3, wherein the formula of decrypting the vector Q by the first party using the private key sk in the step S6 to obtain the corresponding vector Q is as follows:
q=DEC(sk, Q)=[r 1 +Z 1 ,r 2 +Z 2 ,……r n +Z n ],
wherein DEC (sk, Q) represents that the private key sk is used as a decryption key to decrypt the vector Q by adopting a homomorphic encryption algorithm, and f is more than or equal to 1 and less than or equal to n and r f Representing the use of homomorphic encryption algorithm to R using private key sk as decryption key f Result of decryption, Z f Representing the use of homomorphic encryption algorithm to Z1 using private key sk as decryption key f And (5) a result of decryption.
5. The method for performing the data point multiplication operation for federal learning according to claim 4, wherein the method for summing the data at the corresponding position in the vector q according to the preprocessing parameter by the first party in the step S6 to obtain the sum u of the plurality of data corresponding to the vector q is as follows:
the first party calculates the sum u of k data, denoted as u 1 、u 2 ……u k ,1≤i≤k,
u i =[(r (i-1)*h+1 +Z (i-1)*h+1 )+(r (i-1)*h+2 +Z (i-1)*h+2 )+……(r i*h+ Z i*h )]。
6. The method for performing the data point multiplication operation for federal learning according to claim 5, wherein the step S6 extracts the sum u of the corresponding data according to the preprocessing parameters to form the S-dimensional data column vector p as follows:
the k sub-data matrixes W of the data matrix L divided by the row are respectively marked as W 1 、W 2 ……W k
Figure QLYQS_9
Sub-data matrix W i The converted sub-data matrix F is denoted as F i
Sub-data matrix F i The corresponding vector Q is denoted as Q i
Vector Q i The decrypted vector q is denoted as q i
Vector q i The sum u of the corresponding k data is respectively recorded as u (i) 1 、u(i) 2 ……u(i) k
The sum u of all data is arranged in sequence into a data column vector y,
Figure QLYQS_10
,/>
Figure QLYQS_11
deleting the last t values in the data column vector y to obtain a data column vector p, wherein t=m-s.
CN202310170136.6A 2023-02-27 2023-02-27 Data point multiplication operation method for federal learning Active CN115865307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310170136.6A CN115865307B (en) 2023-02-27 2023-02-27 Data point multiplication operation method for federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310170136.6A CN115865307B (en) 2023-02-27 2023-02-27 Data point multiplication operation method for federal learning

Publications (2)

Publication Number Publication Date
CN115865307A CN115865307A (en) 2023-03-28
CN115865307B true CN115865307B (en) 2023-05-09

Family

ID=85659126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310170136.6A Active CN115865307B (en) 2023-02-27 2023-02-27 Data point multiplication operation method for federal learning

Country Status (1)

Country Link
CN (1) CN115865307B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116248252B (en) * 2023-05-10 2023-07-14 蓝象智联(杭州)科技有限公司 Data dot multiplication processing method for federal learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987559A (en) * 2021-12-24 2022-01-28 支付宝(杭州)信息技术有限公司 Method and device for jointly processing data by two parties for protecting data privacy
CN115225405A (en) * 2022-07-28 2022-10-21 上海光之树科技有限公司 Matrix decomposition method based on security aggregation and key exchange under federated learning framework
CN115392487A (en) * 2022-06-30 2022-11-25 中国人民解放军战略支援部队信息工程大学 Privacy protection nonlinear federal support vector machine training method and system based on homomorphic encryption

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008209499A (en) * 2007-02-23 2008-09-11 Toshiba Corp Aes decryption apparatus and program
CN110766128A (en) * 2018-07-26 2020-02-07 北京深鉴智能科技有限公司 Convolution calculation unit, calculation method and neural network calculation platform
CN109768864A (en) * 2019-01-14 2019-05-17 大连大学 Encryption method based on ECC and homomorphic cryptography
CN109787743B (en) * 2019-01-17 2022-06-14 广西大学 Verifiable fully homomorphic encryption method based on matrix operation
CN110324135B (en) * 2019-07-04 2022-05-31 浙江理工大学 Homomorphic encryption matrix determinant security outsourcing method based on cloud computing
US11431470B2 (en) * 2019-08-19 2022-08-30 The Board Of Regents Of The University Of Texas System Performing computations on sensitive data while guaranteeing privacy
CN112199702A (en) * 2020-10-16 2021-01-08 鹏城实验室 Privacy protection method, storage medium and system based on federal learning
CN113434878B (en) * 2021-06-25 2023-07-07 平安科技(深圳)有限公司 Modeling and application method, device, equipment and storage medium based on federal learning
CN113516253B (en) * 2021-07-02 2022-04-05 深圳市洞见智慧科技有限公司 Data encryption optimization method and device in federated learning
CN114237548B (en) * 2021-11-22 2023-07-18 南京大学 Method and system for complex point multiplication operation based on nonvolatile memory array
CN114168991B (en) * 2022-02-10 2022-05-20 北京鹰瞳科技发展股份有限公司 Method, circuit and related product for processing encrypted data
CN115169576B (en) * 2022-06-24 2024-02-09 上海富数科技有限公司 Model training method and device based on federal learning and electronic equipment
CN115643105B (en) * 2022-11-17 2023-03-10 杭州量安科技有限公司 Federal learning method and device based on homomorphic encryption and depth gradient compression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987559A (en) * 2021-12-24 2022-01-28 支付宝(杭州)信息技术有限公司 Method and device for jointly processing data by two parties for protecting data privacy
CN115392487A (en) * 2022-06-30 2022-11-25 中国人民解放军战略支援部队信息工程大学 Privacy protection nonlinear federal support vector machine training method and system based on homomorphic encryption
CN115225405A (en) * 2022-07-28 2022-10-21 上海光之树科技有限公司 Matrix decomposition method based on security aggregation and key exchange under federated learning framework

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Fully homomorphic encryption: Searching over encrypted cloud data;Alexander Wood;《ACM Computing Surveys》;第53卷(第4期);全文 *
物联网通信特征数据信息加密仿真研究;程志强;连鸿鹏;;计算机仿真(第11期);全文 *
面向联邦学习的共享数据湖建设探讨;刘扬;;网络安全和信息化(第09期);全文 *

Also Published As

Publication number Publication date
CN115865307A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN112989368B (en) Method and device for processing private data by combining multiple parties
US6298136B1 (en) Cryptographic method and apparatus for non-linearly merging a data block and a key
Abraham et al. Secure image encryption algorithms: A review
CN109660696B (en) New image encryption method
CN113940028B (en) Method and device for realizing white box password
CN110166223B (en) Rapid implementation method of cryptographic block cipher algorithm SM4
CN112134688B (en) Asymmetric image encryption method based on quantum chaotic mapping and SHA-3
CN110880967B (en) Method for parallel encryption and decryption of multiple messages by adopting packet symmetric key algorithm
CN115276947B (en) Private data processing method, device, system and storage medium
CN115865307B (en) Data point multiplication operation method for federal learning
Nazeer et al. Implication of genetic algorithm in cryptography to enhance security
CN111597574A (en) Parallel image encryption system and method based on spatial diffusion structure
CN115392487A (en) Privacy protection nonlinear federal support vector machine training method and system based on homomorphic encryption
CN112311524A (en) Image encryption method based on new chaotic mapping and compressed sensing
CN113076551B (en) Color image encryption method based on lifting scheme and cross-component scrambling
JP5689826B2 (en) Secret calculation system, encryption apparatus, secret calculation apparatus and method, program
CN113869499A (en) High-efficiency conversion method for unintentional neural network
Paul et al. Matrix based cryptographic procedure for efficient image encryption
Das et al. Diffusion and encryption of digital image using genetic algorithm
CN106921486A (en) The method and apparatus of data encryption
Khalaf et al. Proposed triple hill cipher algorithm for increasing the security level of encrypted binary data and its implementation using FPGA
Bajaj et al. AES algorithm for encryption
CN116248252B (en) Data dot multiplication processing method for federal learning
CN111756518B (en) Color image encryption method based on memristor hyperchaotic system
CN106961328A (en) A kind of VHE implementation methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant