CN115587139A - Distributed privacy protection classification method and system based on homomorphic encryption - Google Patents
Distributed privacy protection classification method and system based on homomorphic encryption Download PDFInfo
- Publication number
- CN115587139A CN115587139A CN202211372124.3A CN202211372124A CN115587139A CN 115587139 A CN115587139 A CN 115587139A CN 202211372124 A CN202211372124 A CN 202211372124A CN 115587139 A CN115587139 A CN 115587139A
- Authority
- CN
- China
- Prior art keywords
- data
- distance
- vector
- global
- participant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 239000013598 vector Substances 0.000 claims abstract description 117
- 238000007418 data mining Methods 0.000 claims abstract description 34
- 230000000717 retained effect Effects 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a distributed privacy protection classification method and a distributed privacy protection classification system based on homomorphic encryption, wherein n data participants participating in data mining and the attribute of a global data set D are defined, and samples are predicted; data miner P n Generating public and private keys required for homomorphic encryption, data participant P i Calculating Euclidean distances between each data point in the local data set and the prediction sample X, and constructing a local distance vector M; and generating a random number vector to be added to the local distance vector, generating an encrypted distance sub-vector, P i An encrypted distance vector M' is obtained. And obtaining a global encrypted distance vector, P n And sequencing the global encryption distance vectors according to the prediction sample X, and selecting the K data points closest to the X. And the data mining party finds the most class labels by counting the class labels of the K data points and uses the class labels as the prediction class labels of the prediction samples. The method provided by the invention realizes data privacy of all partiesWhile protecting, can give consideration to the classification precision effectively.
Description
Technical Field
The invention relates to the field of information security, in particular to a distributed privacy protection classification method and a distributed privacy protection classification system based on homomorphic encryption.
Background
The rapid development and the continuous deepening of the digitization degree of the modern society lead to the fact that data are stored in a concentrated mode from the past and are dispersed more and more, data mining needs to involve in multiple parts, and due to the fact that two information owners do not want to let the other know the personal privacy, under the condition, the problem can be well solved by utilizing safe multi-part calculation.
Famous charming prize acquirers, chinese scientists Yao Qizhi teach the problem of million-god of the Chinese, namely 2 wealth-free wealth-exposed wealth-how to judge who is more wealth, which has evolved into the current safe Multi-Party computing (SMC). I.e. to reach the desired conclusion without sharing the original data. The multi-party security computing technology randomly encrypts each time, encrypted data cannot be reused, operation is directly carried out on the encrypted data, original data are not restored, participants are determined before each calculation, all the participants need to coordinate together, and value in the data can be obtained without revealing the original data. The possibility of information leakage can be reduced by integrating model gradient updates by using multi-party security calculation. In summary, the secure multiparty computation mainly aims to solve the problem of how to perform cooperative computation on a plurality of mutually untrusted participants in a distributed network without revealing secret information held by the participants. On the one hand, it requires the implementation of cooperative computing of the agreement function between the participants; on the other hand, the secret input data held by the respective participants is also secured.
The invention researches how multiple parties cooperate to build a k-nearest neighbor classifier on vertically partitioned data. We have developed a distributed k nearest neighbor classification protocol based on homomorphic encryption. In the protocol, all the data of all the parties do not need to be sent to a central and trusted party, and the collaborative calculation of the parties is realized by using homomorphic encryption and random perturbation technology under the condition that the data privacy is not disclosed.
Disclosure of Invention
In order to solve the problems in the existing scene, the invention aims to provide a privacy protection classification method which realizes that all parties cooperate to perform classification calculation under the condition of not revealing data privacy and simultaneously maintains the classification precision of an untrusted mining analyst.
Specifically, the invention provides a distributed privacy protection classification method based on homomorphic encryption, which comprises the following steps:
step 1, defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Step 2, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Step 3, calculating a local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji ;
Step (ii) of4, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) E (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Step 5, each data participant P 1 ~P n-1 Will S (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner PnObtaining an undisturbed encrypted global distance vector S (S);
step 6, data miner P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
Further, the step 1 comprises:
defining the symbols of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,…,A m1 ,…,A 1n ,…,A kn ,L>,m,k∈Z
Wherein ID is the attribute of the primary key, A is the attribute of the numerical type condition, L is the attribute of the class number; m and k are integers and represent the attribute number of each parameter party;
dividing D into n parts and dividing D into n partsThe data participator holds and defines the ith part as a local data set D i ,i∈[1,n]And is covered by P i Hold, and D i The following attributes are included in the list,
<ID,A 1i ,A 2i ,…,A mi >m is an integer
D i ={x 1i ,…,x ji ,…,x li }
In the formula:
l denotes the number of records contained in the original data set D,
x ji representing a data set D i J =1,2, …, l
m represents a data set D i Number of attributes of data points in the data set, i.e. data set D i For an m-dimensional dataset containing l data points,
P n owned local data set D n Containing a class label attribute, D n Is expressed by the following formula
<ID,A 1i ,A 2i ,…,A mi ,L>,m∈Z
Title data participant P n The data mining party participates in the prediction work of the prediction data;
define the global prediction sample with prediction class index as x':
x'=<x (11)' ,x (21)' ,…,x (m1)' ,…,x (1N)' ,…,x (kn)' >,m,k∈Z
in the formula:
x (ii)' a value representing the ii-th attribute of the global prediction sample x';
m and k are integers and represent the attribute number of each parameter party;
wherein each data participant P i Owns a portion of the global prediction samples, defined as x' i Expressed by the following formula:
x' i =<x (1i)' ,…,x (ki)' >,k∈Z,i∈[1,n]。
the step 2 comprises the following steps:
the data mining party generates a public key pk and a private key sk of the paillier homomorphic encryption algorithm by the following formulas:
randomly selecting two prime numbers p and q, and satisfying gcd (pq, (p-1) (q-1)) =1, wherein the gcd (x, y) is a function for solving the greatest common divisor of the x scalar and the y scalar;
calculating n = pq and λ = lcm (p-1,q-1), where lcm (x, y) is the least common multiple function that is used to solve the least common multiple of the x and y scalars;
defining functions
Randomly selecting less than n 2 And satisfies the presence of μ
μ=(L(g λ mod n 2 )) -1 mod n
The public key pk is < n, g > and the private key sk is < λ, μ >.
The step 3 specifically includes:
step 3.1, data participant P i Computing a local data set D i And predicted sample x' i Local distance vector d between i The expression is given by the following formula,
d i =<dist(x 1i ,x i ),dist(x 2i ,x i ),…,dist(x ni ,x i )>
in the formula:
dist(x i ,x j ) I, j =1,2, …, n denotes data point x i And data point x j A square distance function of (d), a scalar;
d i representing a calculation data set D i A distance vector is constructed by Euclidean distance between any data point of the data and the prediction sample;
step 3.2, data participant P i Decompose d by i Distance vector: firstly, randomly generating n-1 and d i Homomodal vector d 1i ,d 2i ,…,d n-1,i Then calculate out
Step 3.3, P data participants i N r are generated as follows ji Perturber vector: firstly, randomly generating n-1 and d i Homomodal vector r 1i ,r 2i ,…,r n-1,i Then calculate out
The step 4 specifically includes:
data participant P i Is to s 'by the following formula' ji The homomorphic encryption is carried out, and,
wherein r is randomly selected and satisfies 0<r<n and r ∈ Z * n2 、<n,g>Is the public key pk.
The step 5 specifically comprises: p n Obtaining an undisturbed encrypted global distanceThe vector e (S) is expressed as:
the step 6 specifically includes:
step 6.1, data miner P n E (S) is decrypted homomorphically by the following formula,
e(S)=<e(S 1 ),e(S 2 ),…,e(S l )>
S l =L(e(S l ) λ mod N 2 )*μmod N
wherein < λ, μ > is the private key sk;
step 6.2, data miner P n By obtaining a global distance vector S =<S 1 ,S 2 ,…,S l >And the class label record L =whichitself has<L 1 ,L 2 ,…,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,…,<S l ,L l >}
To perform kNN prediction on the prediction samples, first P n Presetting a K value;
step 6.3, data miner P n Sorting SL from small to large and reserving first K elements Sort (SL) = &<S 1 ,L 1 >',<S 2 ,L 2 >',…,<S K ,L K >', then counting the class number to record the most quantitative class in L', and the class is the predicted result of the predicted sample.
Further, the present invention also provides a distributed privacy protection classification system based on homomorphic encryption, wherein the system comprises:
a parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Key generation module, data miner P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
A vector generation module for calculating local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji ;
Vector encryption module, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) E (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Global vector generation module, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner PnObtaining an undisturbed encrypted global distance vector e (S);
classification output module, data mining party P n To obtainThe encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
Compared with the prior art, the method has the beneficial effect that classified mining participated by an untrusted data mining party is realized aiming at the classified scenes of privacy protection. The method and the system can prevent the malicious digger from attacking by using the grasped part of background knowledge, and improve the security of user data privacy protection.
Drawings
Fig. 1 is a flowchart of a distributed privacy protection classification method based on homomorphic encryption according to the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
In order to achieve the above object, the technical solution adopted by the present invention is a distributed privacy protection classification method based on homomorphic encryption, which is further described below with reference to fig. 1, and includes the following steps:
step 1, defining symbols of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,…,A m1 ,…,A 1n ,…,A kn ,L>,m,k∈Z
Wherein ID is the primary key attribute, A i Is a numerical type condition attribute, and L is a class number attribute. Dividing D into n parts and holding by each data participant, defining the ith part as local data set D i ,i∈[1,n]And is covered by P i Hold, and D i The following attributes are included in the list,<ID,A 1i ,A 2i ,…,A mi >m is an integer
D i ={x 1i ,…,x ji ,…,x li }
In the formula:
l denotes the number of records contained in the original data set D,
x ji representing a data set D i J =1,2, …, l
m represents a data set D i Number of attributes of data points in the data set, i.e. data set D i For an m-dimensional dataset containing l data points,
in particular, P n Owned local data set D n Containing a class label attribute, D n Expressed by the following formula
<ID,A 1i ,A 2i ,…,A mi ,L>,m∈Z
The data participant Pn is called a data mining party, and participates in the prediction work of the prediction data.
Defining the global prediction sample with prediction class index as x':
x'=<x (11)' ,x (21)' ,…,x (m1)' ,…,x (1n)' ,…,x (kn)' >
in the formula:
x (ii)' value representing the ii attribute of the global prediction sample x
m, k are integers representing the number of attributes of each parameter
Wherein each data participant P i Having a portion of the global prediction sample, we define as x' i Expressed by the following formula:
x' i =<x (1i)' ,…,x (ki)' >,k∈Z,i∈[1,n]
step 2, data mining party P n Generating the public key sum required for homomorphic encryptionPrivate key, we define public key as pk, private key as sk. P n Reserving sk for decryption, P n Sending pk to all data participants P 1 ~P n-1 ;
Step 2.1, the data mining party generates a public key pk and a private key sk of the paillier homomorphic encryption algorithm by the following formulas:
two prime numbers p and q are randomly selected, satisfying gcd (pq, (p-1) (q-1)) =1, where gcd (x, y) is the greatest common divisor function used to solve the greatest common divisor of the x and y scalars.
Calculating n = pq and λ = lcm (p-1,q-1), where lcm (x, y) is the least common multiple function that is used to solve the least common multiple of the x and y scalars;
defining functions
Randomly selecting less than n 2 And satisfies the presence of μ
μ=(L(g λ mod n 2 ))mod n
The public key pk is < n, g > and the private key sk is < λ, μ >.
Step 3, each data participant P i ,i∈[1,n]Computing a local data set D i And predicted sample x' i Local distance vector d between i 。P i Will d i Decomposed into n homomodal vectors s ji And satisfyP i Randomly generating n and d i Isomorphic perturbation subvector r ji Satisfy the following requirementsThen merge the perturbation vectors s' ji =s ji +r ji ;
Step 3.1, P i Computing a local data set D i And predicted sample x' i Local distance vector d between i The expression is given by the following formula,
d i =<dist(x 1i ,x i ),dist(x 2i ,x i ),…,dist(x ni ,x i )>
in the formula:
dist(x i ,x j ) I, j =1,2, …, n denotes data point x i And data point x j A square distance function of (d), a scalar;
d i representing a calculation data set D i A distance vector constructed by Euclidean distances between any data point and the prediction sample;
step 3.2, P i Decompose d by i Distance vector: firstly, randomly generating n-1 and d i Homomodal vector d 1i ,d 2i ,…,d n-1,i Then calculate out
Step 3.3, P i N r are generated as follows ji Perturber vector: firstly, randomly generating n-1 and d i Homomodal vector r 1i ,r 2i ,…,r n-1,i Then calculate out
Step 4, data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j ∈ 1, n, j ≠ i) is advancedLine homomorphic encryption to obtain e (s' ji ) E (s' ji ) Is sent to P j . When P is present i Distance sub-vector e (s ') to completely receive encrypted perturbations emanating from other data participants' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Step 4.1, P i Is to s 'by the following formula' ji The homomorphic encryption is carried out, and,
Step 5, each data owner P 1 ~P n-1 E (S) i ) Is sent to P n ,P n Performing a summation operation e (S) =Obtaining an undisturbed encrypted global distance vector e (S);
step 5.1, P n Obtaining an undisturbed encrypted global distance vector
The conclusion of e (S) can be demonstrated by the following formula,
step 6,P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and simultaneously P n Owning type markNumber attribute, P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the predicted class labels of the predicted samples.
Step 6.1, P n E (S) is decrypted homomorphically by the following formula,
e(S)=<e(S 1 ),e(S 2 ),…,e(S l )>
S l =L(e(S l ) λ mod n 2 )*μmod n
where < λ, μ > is the private key sk.
Step 6.2, P n By obtaining a global distance vector S =<S 1 ,S 2 ,…,S l >And the self-existing class label record L =<L 1 ,L 2 ,…,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,…,<S l ,L l >}
To perform kNN prediction on the prediction samples, P first n And presetting a K value, wherein K is a preset parameter and represents the number of the statistical class numbers.
Step 6.3, P n Sorting SL from small to large and reserving first K elements Sort (SL) = &<S 1 ,L 1 >',<S 2 ,L 2 >',…,<S K ,L K >', then counting the class number to record the most quantitative class in L', and the class is the predicted result of the predicted sample.
Correspondingly, the invention also provides a distributed privacy protection classification system based on homomorphic encryption, which comprises:
a parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And the local data set D is combined i Is correspondingly distributed to all data participants P i Each data participant P i Correspond toIs x i ′;
Key generation module, data miner P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈1,n-1;
A vector generation module for calculating local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji ;
Vector encryption module, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) D is e (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Global vector generation module, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner PnObtaining an undisturbed encrypted global distance vector e (S);
classification output module, data mining party P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n By counting the number KAnd finding the most class labels according to the class labels of the points, and using the most class labels as the prediction class labels of the prediction samples.
In order to more clearly introduce the technical scheme of the invention, a distributed privacy protection classification example based on homomorphic encryption is provided. As shown in the table 1 below, the following examples,
TABLE 1
X1 | X2 | X3 | X4 | X5 | X6 | Y |
1187 | 431 | 239 | 280 | 75 | 364 | 2 |
1260 | 246 | 220 | 710 | 73 | 158 | 3 |
1422 | 399 | 251 | 510 | 89 | 353 | 1 |
1252 | 243 | 217 | 200 | 90 | 278 | 2 |
1307 | 150 | 210 | 370 | 118 | 269 | 1 |
1207 | 216 | 217 | 276 | 86 | 328 | 2 |
1383 | 165 | 260 | 560 | 124 | 337 | 1 |
1388 | 504 | 223 | 490 | 58 | 133 | 3 |
1324 | 398 | 229 | 436 | 82 | 300 | 1 |
1225 | 388 | 220 | 821 | 65 | 200 | 3 |
Global data set D is composed of P 1 Local data set (containing attributes X1, X2), P 2 Local data set (containing attributes X3, X4) and P 3 The local data set (containing attributes X5, X6 and class label Y) of (1) consists of ten total data, i.e. n =3,l =6,m =10. Each data set is represented by the following formula,
D 1 ={X1,X2}
D 2 ={X3,X4}
D 3 ={X5,X6,Y}
D={X1,X2,X3,X4,X5,X6,Y}
P 3 generating a key pk, sk by a paillier algorithm, and then sending pk to P 1 And P 2 。
We have the following prediction sample x' = {1270,355,236,500,78,129}, where P is 1 Is only provided with x' 1 ={1270,355}、P 2 Is only provided with x' 2 ={236,500}、P 3 Own only x' 3 ={78,129}。
P 1 Calculating to obtain a local distance vector d 1 ={6654645,6762101,7815380,6718088,6895954,6461570,7308809,7802845,7295845,6777074},P 1 To d 1 Decomposing to obtain
s 11 ={944152,202554,117739,701284,179399,834307,937513,429851,618391,551784}
s 12 ={516237,300613,615949,916074,823429,809045,817028,867594,285042,227010}
s 13 ={5194256,6258934,7081692,5100730,5893126,4818218,5554268,6505400,6392412,5998280},P 1 Generating a perturbation vector
r 11 ={-495209,-722845,151779,671406,-722113,-117051,981280,-335316,391540,336789}
r 12 ={-572483,-545057,-518861,767570,-734741,196243,-398850,81357,-597024,576812}
r 13 ={1067692,1267902,367082,-1438976,1456854,-79192,-582430,253959,205484,-913601}
P1, combining the disturbance vector into the distance vector to obtain
s' 11 ={448943,-520291,269518,1372690,-542714,717256,1918793,94535,1009931,888573}
s' 12 ={-56246,-244444,97088,1683644,88688,1005288,418178,948951,-311982,803822}
s' 13 ={6261948,7526836,7448774,3661754,7349980,4739026,4971838,6759359,6597896,5084679}
P 1 It is encrypted for e (pk, s' 11 ),e(pk,s' 12 ),e(pk,s' 13 ) E (pk, s' 12 ) Is sent to P 2 、e(pk,s' 13 ) Is sent to P 3 。
In the same way, P 2 The same operation is also performed. To obtain
d 2 ={834025,1672036,1257269,695209,955816,807385,1369616,1190781,1092321,1952977}
s 21 ={888708,373635,203935,509229,899667,900983,225738,459837,574678,264926}
s 22 ={775161,401124,912219,897221,631254,942990,525456,985425,883576,348348}
s 23 ={-829844,897277,141115,-711241,-575105,-1036588,618422,-254481,-365933,1339703}
r 21 ={341672,378146,-957230,49325,-522974,-172005,-726181,-605921,756926,-838636}
r 22 ={-839858,-614081,584053,880419,-69298,497012,909647,-866401,318681,-899783}
r 23 ={498186,235935,373177,-929744,592272,-325007,-183466,1472322,-1075607,1738419}
s' 21 ={1230380,751781,-753295,558554,376693,728978,-500443,-146084,1331604,-573710}
s' 22 ={-64697,-212957,1496272,1777640,561956,1440002,1435103,119024,1202257,-551435}
s' 23 ={-331658,1133212,514292,-1640985,17167,-1361595,434956,1217841,-1441540,3078122}
P 3 To obtain
d 3 ={266458,105170,260213,193873,196820,235745,257960,87140,209641,128690}
s 31 ={985273,314437,300723,504939,931920,697098,906371,830535,558415,439459}
s 32 ={42945,633334,763828,913768,539469,616078,388222,558941,121756,500723}
r 31 ={-217937,841799,-418199,832906,989079,931987,58489,-972894,-143857,-125059}
r 32 ={667088,-767364,-723753,519276,865585,49575,-725175,267744,-945622,222052}
r 33 ={-449151,-74435,1141952,-1352182,-1854664,-981562,666686,705150,1089479,-96993}
s' 31 ={767336,1156236,-117476,1337845,1920999,1629085,964860,-142359,414558,314400}
s' 32 ={710033,-134030,40075,1433044,1405054,665653,-336953,826685,-823866,722775}
s' 33 ={-1210911,-917036,337614,-2577016,-3129233,-2058993,-369947,-597186,618949,-908485}
After exchanging data with each other, P 1 Is provided with e (pk, s' 11 ),e(pk,s' 21 ),e(pk,s' 31 ) And summing to obtain e (pk, S) 1 ) Wherein
S 1 ={2446659,1387726,-601253,3269089,1754978,3075319,2383210,-193908,2756093,629263}
Same, P 2 To obtain e (pk, S) 2 )、P 3 To obtain e (pk, S) 3 )
S s ={589090,-591431,1633435,4894328,2055698,3110943,1516328,1894660,66409,975162}
S 3 ={4719379,7743012,8300680,-556247,4237914,1318438,5036847,7380014,5775305,7254316}
P 1 Mixing e (pk, S) 1 ) Is sent to P 3 、P 2 Mixing e (pk, S) 2 ) Is sent to P 3 。
P3 performs a summation operation e (pk, S) = e (pk, S) 1 )+e(pk,S 2 )+e(pk,S 3 ) And e (pk, S) is decrypted by sk to obtain,
S={7755128,8539307,9332862,7607170,8048590,7504700,8936385,9080766,8597807,8858741}
P n combining S with self serial numbers and sequencing according to distance to obtain
Suppose P n Preset K =5, then the class numbers 1,2,3 account for 20%,60%, and 20% of the first K pieces of data, respectively, and therefore P n The class index of the prediction sample is 2.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for the purpose of limiting the scope of the present invention, and on the contrary, any modifications or modifications based on the spirit of the present invention should fall within the scope of the present invention.
Claims (8)
1. A distributed privacy protection classification method based on homomorphic encryption is characterized by comprising the following steps:
step 1, defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Step 2, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Step 3, calculating a local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain disturbed distance sub-vectors s 'with the sum of the n disturbance sub-vectors being 0' ji ;
Step 4, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s′ 2i ,...,s′ ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) E (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption and summation to obtain
Step 5, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner PnObtaining an undisturbed encrypted global distance vector e (S);
step 6, data miner P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
2. The distributed privacy protection classification method based on homomorphic encryption according to claim 1, wherein the step 1 comprises:
defining the symbols of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,...,A m1 ,...,A 1n ,...,A kn ,L>,m,k∈Z
Wherein ID is the attribute of the main key, A is the attribute of the numerical type condition, L is the attribute of the class number; m and k are integers and represent the attribute number of each parameter party;
dividing D into n parts and holding by each data participant, defining the ith part as local data set D i ,i∈[1,n]And is held by Pi, and Di contains the following attributes,
<ID,A 1i ,A 2i ,...,A mi >, m is an integer
D i ={x 1i ,...,x ji ,...,x li }
In the formula:
l denotes the number of records contained in the original data set D,
x ji representing a data set D i J =1,2, …, l
m represents a data set D i Number of attributes of data points in the data set, i.e. data set D i For an m-dimensional dataset containing l data points,
P n owned local data set D n Contains a class label attribute, dn is expressed by the following formula
<ID,A 1i ,A 2i ,...,A mi ,L>,m∈Z
The data participant Pn is called a data mining party and participates in the prediction work of the prediction data;
defining the global prediction sample with prediction class index as x':
x′=<x (11)′ ,x (21)′ ,...,x (m1)′ ,…,x (1N)′ ,...,x (kn)′ >,m,k∈Z
in the formula:
x (ii)′ a value representing the ii-th attribute of the global prediction sample x';
m and k are integers and represent the attribute number of each parameter party;
wherein each data participant Pi owns a portion of the global prediction samples, defined as x' i Expressed by the following formula:
x′ i =<x (1i)′ ,...,x (ki)′ >,k∈Z,i∈[1,n]。
3. the distributed privacy protection classification method based on homomorphic encryption according to claim 2, characterized in that: the step 2 comprises the following steps:
the data mining party generates a public key pk and a private key sk of the paillier homomorphic encryption algorithm by the following formulas:
randomly selecting two prime numbers p and q, and satisfying gcd (pq, (p-1) (q-1)) =1, wherein the gcd (x, y) is a function for solving the greatest common divisor of the x scalar and the y scalar;
calculating n = pq and λ = lcm (p-1,q-1), where lcm (x, y) is the least common multiple function that is used to solve the least common multiple of the x and y scalars;
defining functions
Randomly selecting less than n 2 And satisfies the presence of μ
μ=(L(g λ mod n 2 )) -1 mod n
The public key pk is < n, g > and the private key sk is < λ, μ >.
4. The distributed privacy protection classification method based on homomorphic encryption according to claim 3, characterized in that:
the step 3 specifically includes:
step 3.1, data participant Pi calculates local data set Di and prediction sample x' i Local distance vector d between i And is expressed by the following formula,
d i =<dist(x 1i ,x′ i ),dist(x 2i ,x′ i ),...,dist(x ni ,x′ i )>
in the formula:
dist(x i ,x j ) I, j =1,2, …, n denotes data point x i And data point x j A square distance function of (d), a scalar;
d i a distance vector constructed by expressing the Euclidean distance between any data point of the data set Di and the prediction sample is calculated;
step 3.2, data participant P i Decompose d by i Distance vector: firstly, randomly generating n-1 vectors d with the same mode as di 1i ,d 2i ,...,d n-1,i Then calculate out
Step 3.3. P data participant i generates n r by ji Perturber vector: firstly, randomly generating n-1 and d i Homomodal vector r 1i ,r 2i ,...,r n-1,i Then calculate out
5. The distributed privacy protection classification method based on homomorphic encryption according to claim 4, characterized in that: the step 4 specifically includes:
data participant Pi is to s 'by the following formula' ji The homomorphic encryption is carried out, and,
7. the distributed privacy protection classification method based on homomorphic encryption according to claim 6, characterized in that: the step 6 specifically includes:
step 6.1, data miner P n E (S) is decrypted homomorphically by the following formula,
e(S)=<e(S 1 ),e(S 2 ),...,e(S l )>
S l =L(e(S l ) λ mod N 2 )*μmod N
wherein < λ, μ > is the private key sk;
step 6.2, data miner P n By obtaining a global distance vector S =<S 1 ,S 2 ,...,S l >And the self-existing class label record L =<L 1 ,L 2 ,...,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,...,<S l ,L l >}
To perform kNN prediction on the prediction samples, P first n Presetting a K value;
step 6.3, data miner P n Sorting SL from small to large and reserving first K elements Sort (SL) = &<S 1 ,L 1 >′,<S 2 ,L 2 >′,...,<S K ,L K >', then counting the class number to record the most quantitative class in L', and the class is the predicted result of the predicted sample.
8. A distributed privacy protection classification system based on homomorphic encryption, for implementing the classification method according to any one of claims 1-7, characterized in that the system comprises:
a parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Key generation module, data miner P n Generating homomorphic addsA public key pk and a private key sk needed by the password, the private key sk is reserved for decryption, and the public key pk is sent to other data participants P i ,i∈[1,n-1];
A vector generation module for calculating local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji ;
Vector encryption module, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s′ 2i ,...,s′ ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) D is e (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Global vector generation module, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner PnObtaining an undisturbed encrypted global distance vector e (S);
classification output module, data mining party P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211372124.3A CN115587139B (en) | 2022-11-03 | 2022-11-03 | Distributed privacy protection classification method and system based on homomorphic encryption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211372124.3A CN115587139B (en) | 2022-11-03 | 2022-11-03 | Distributed privacy protection classification method and system based on homomorphic encryption |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115587139A true CN115587139A (en) | 2023-01-10 |
CN115587139B CN115587139B (en) | 2024-03-22 |
Family
ID=84781087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211372124.3A Active CN115587139B (en) | 2022-11-03 | 2022-11-03 | Distributed privacy protection classification method and system based on homomorphic encryption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115587139B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104601596A (en) * | 2015-02-05 | 2015-05-06 | 南京邮电大学 | Data privacy protection method in classification data mining system |
CN106778314A (en) * | 2017-03-01 | 2017-05-31 | 全球能源互联网研究院 | A kind of distributed difference method for secret protection based on k means |
CN108111294A (en) * | 2017-12-13 | 2018-06-01 | 南京航空航天大学 | A kind of multiple labeling sorting technique of the protection privacy based on ML-kNN |
CN110008717A (en) * | 2019-02-26 | 2019-07-12 | 东北大学 | Support the decision tree classification service system and method for secret protection |
CN111967514A (en) * | 2020-08-14 | 2020-11-20 | 安徽大学 | Data packaging-based sample classification method for privacy protection decision tree |
WO2020233260A1 (en) * | 2019-07-12 | 2020-11-26 | 之江实验室 | Homomorphic encryption-based privacy-protecting multi-institution data classification method |
US20210399874A1 (en) * | 2020-06-19 | 2021-12-23 | Duality Technologies, Inc. | Secure distributed key generation for multiparty homomorphic encryption |
CN114817999A (en) * | 2022-06-28 | 2022-07-29 | 北京金睛云华科技有限公司 | Outsourcing privacy protection method and device based on multi-key homomorphic encryption |
CN115150060A (en) * | 2022-07-06 | 2022-10-04 | 三未信安科技股份有限公司 | Data privacy protection method based on secure multi-party clustering method |
-
2022
- 2022-11-03 CN CN202211372124.3A patent/CN115587139B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104601596A (en) * | 2015-02-05 | 2015-05-06 | 南京邮电大学 | Data privacy protection method in classification data mining system |
CN106778314A (en) * | 2017-03-01 | 2017-05-31 | 全球能源互联网研究院 | A kind of distributed difference method for secret protection based on k means |
CN108111294A (en) * | 2017-12-13 | 2018-06-01 | 南京航空航天大学 | A kind of multiple labeling sorting technique of the protection privacy based on ML-kNN |
CN110008717A (en) * | 2019-02-26 | 2019-07-12 | 东北大学 | Support the decision tree classification service system and method for secret protection |
WO2020233260A1 (en) * | 2019-07-12 | 2020-11-26 | 之江实验室 | Homomorphic encryption-based privacy-protecting multi-institution data classification method |
US20210399874A1 (en) * | 2020-06-19 | 2021-12-23 | Duality Technologies, Inc. | Secure distributed key generation for multiparty homomorphic encryption |
CN111967514A (en) * | 2020-08-14 | 2020-11-20 | 安徽大学 | Data packaging-based sample classification method for privacy protection decision tree |
CN114817999A (en) * | 2022-06-28 | 2022-07-29 | 北京金睛云华科技有限公司 | Outsourcing privacy protection method and device based on multi-key homomorphic encryption |
CN115150060A (en) * | 2022-07-06 | 2022-10-04 | 三未信安科技股份有限公司 | Data privacy protection method based on secure multi-party clustering method |
Non-Patent Citations (3)
Title |
---|
SHAUL, H.; FELDMAN, D.; RUS, D: "Secure k-ish Nearest Neighbors Classifier", PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES, pages 42 - 61 * |
崔建京;龙军;闵尔学;于洋;殷建平: "同态加密在加密机器学习中的应用研究综述", 计算机科学, pages 46 - 52 * |
钱萍;吴蒙;: "同态加密隐私保护数据挖掘方法综述", 计算机应用研究, no. 05, pages 20 - 23 * |
Also Published As
Publication number | Publication date |
---|---|
CN115587139B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110008717B (en) | Decision tree classification service system and method supporting privacy protection | |
Li et al. | Differentially private Naive Bayes learning over multiple data sources | |
Aono et al. | Privacy-preserving logistic regression with distributed data sources via homomorphic encryption | |
Vaidya et al. | Privacy-preserving SVM classification | |
Niu et al. | An image encryption approach based on chaotic maps and genetic operations | |
Liu et al. | Toward highly secure yet efficient KNN classification scheme on outsourced cloud data | |
Su et al. | Reversible cellular automata image encryption for similarity search | |
Jiang et al. | Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing | |
Zhan | Privacy-preserving collaborative data mining | |
Jayapandian et al. | Secure and efficient online data storage and sharing over cloud environment using probabilistic with homomorphic encryption | |
CN112966283B (en) | PPARM (vertical partition data parallel processor) method for solving intersection based on multi-party set | |
Yang et al. | Collusion-resistant privacy-preserving data mining | |
Pedersen et al. | Secret charing vs. encryption-based techniques for privacy preserving data mining | |
Malik et al. | A homomorphic approach for security and privacy preservation of Smart Airports | |
Chu et al. | A novel 3D image encryption based on the chaotic system and RNA crossover and mutation | |
Tang | Secret sharing-based IoT text data outsourcing: A secure and efficient scheme | |
Sriramoju et al. | An Analysis on Effective, Precise and Privacy Preserving Data Mining Association Rules with Partitioning on Distributed Databases | |
CN114629717B (en) | Data processing method, device, system, equipment and storage medium | |
Li et al. | Securely outsourcing ID3 decision tree in cloud computing | |
CN115587139A (en) | Distributed privacy protection classification method and system based on homomorphic encryption | |
Zhan | Using homomorphic encryption for privacy-preserving collaborative decision tree classification | |
Sharma et al. | Privacy-preserving boosting with random linear classifiers | |
Al Etaiwi et al. | Structured encryption algorithm for text cryptography | |
Yu et al. | A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective | |
Qiu et al. | Efficient privacy-preserving outsourced k-means clustering on distributed data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |