CN115587139A - Distributed privacy protection classification method and system based on homomorphic encryption - Google Patents

Distributed privacy protection classification method and system based on homomorphic encryption Download PDF

Info

Publication number
CN115587139A
CN115587139A CN202211372124.3A CN202211372124A CN115587139A CN 115587139 A CN115587139 A CN 115587139A CN 202211372124 A CN202211372124 A CN 202211372124A CN 115587139 A CN115587139 A CN 115587139A
Authority
CN
China
Prior art keywords
data
distance
vector
global
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211372124.3A
Other languages
Chinese (zh)
Other versions
CN115587139B (en
Inventor
邹云峰
吴宁
周红勇
单超
祝宇楠
范环宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202211372124.3A priority Critical patent/CN115587139B/en
Publication of CN115587139A publication Critical patent/CN115587139A/en
Application granted granted Critical
Publication of CN115587139B publication Critical patent/CN115587139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed privacy protection classification method and a distributed privacy protection classification system based on homomorphic encryption, wherein n data participants participating in data mining and the attribute of a global data set D are defined, and samples are predicted; data miner P n Generating public and private keys required for homomorphic encryption, data participant P i Calculating Euclidean distances between each data point in the local data set and the prediction sample X, and constructing a local distance vector M; and generating a random number vector to be added to the local distance vector, generating an encrypted distance sub-vector, P i An encrypted distance vector M' is obtained. And obtaining a global encrypted distance vector, P n And sequencing the global encryption distance vectors according to the prediction sample X, and selecting the K data points closest to the X. And the data mining party finds the most class labels by counting the class labels of the K data points and uses the class labels as the prediction class labels of the prediction samples. The method provided by the invention realizes data privacy of all partiesWhile protecting, can give consideration to the classification precision effectively.

Description

Distributed privacy protection classification method and system based on homomorphic encryption
Technical Field
The invention relates to the field of information security, in particular to a distributed privacy protection classification method and a distributed privacy protection classification system based on homomorphic encryption.
Background
The rapid development and the continuous deepening of the digitization degree of the modern society lead to the fact that data are stored in a concentrated mode from the past and are dispersed more and more, data mining needs to involve in multiple parts, and due to the fact that two information owners do not want to let the other know the personal privacy, under the condition, the problem can be well solved by utilizing safe multi-part calculation.
Famous charming prize acquirers, chinese scientists Yao Qizhi teach the problem of million-god of the Chinese, namely 2 wealth-free wealth-exposed wealth-how to judge who is more wealth, which has evolved into the current safe Multi-Party computing (SMC). I.e. to reach the desired conclusion without sharing the original data. The multi-party security computing technology randomly encrypts each time, encrypted data cannot be reused, operation is directly carried out on the encrypted data, original data are not restored, participants are determined before each calculation, all the participants need to coordinate together, and value in the data can be obtained without revealing the original data. The possibility of information leakage can be reduced by integrating model gradient updates by using multi-party security calculation. In summary, the secure multiparty computation mainly aims to solve the problem of how to perform cooperative computation on a plurality of mutually untrusted participants in a distributed network without revealing secret information held by the participants. On the one hand, it requires the implementation of cooperative computing of the agreement function between the participants; on the other hand, the secret input data held by the respective participants is also secured.
The invention researches how multiple parties cooperate to build a k-nearest neighbor classifier on vertically partitioned data. We have developed a distributed k nearest neighbor classification protocol based on homomorphic encryption. In the protocol, all the data of all the parties do not need to be sent to a central and trusted party, and the collaborative calculation of the parties is realized by using homomorphic encryption and random perturbation technology under the condition that the data privacy is not disclosed.
Disclosure of Invention
In order to solve the problems in the existing scene, the invention aims to provide a privacy protection classification method which realizes that all parties cooperate to perform classification calculation under the condition of not revealing data privacy and simultaneously maintains the classification precision of an untrusted mining analyst.
Specifically, the invention provides a distributed privacy protection classification method based on homomorphic encryption, which comprises the following steps:
step 1, defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Step 2, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Step 3, calculating a local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji
Step (ii) of4, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) E (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Figure BDA0003925374440000021
Figure BDA0003925374440000022
Step 5, each data participant P 1 ~P n-1 Will S (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner Pn
Figure BDA0003925374440000023
Obtaining an undisturbed encrypted global distance vector S (S);
step 6, data miner P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
Further, the step 1 comprises:
defining the symbols of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,…,A m1 ,…,A 1n ,…,A kn ,L>,m,k∈Z
Wherein ID is the attribute of the primary key, A is the attribute of the numerical type condition, L is the attribute of the class number; m and k are integers and represent the attribute number of each parameter party;
dividing D into n parts and dividing D into n partsThe data participator holds and defines the ith part as a local data set D i ,i∈[1,n]And is covered by P i Hold, and D i The following attributes are included in the list,
<ID,A 1i ,A 2i ,…,A mi >m is an integer
D i ={x 1i ,…,x ji ,…,x li }
Figure BDA0003925374440000031
In the formula:
l denotes the number of records contained in the original data set D,
x ji representing a data set D i J =1,2, …, l
m represents a data set D i Number of attributes of data points in the data set, i.e. data set D i For an m-dimensional dataset containing l data points,
Figure BDA0003925374440000032
represents the data point x ji R =1,2, …, m;
P n owned local data set D n Containing a class label attribute, D n Is expressed by the following formula
<ID,A 1i ,A 2i ,…,A mi ,L>,m∈Z
Title data participant P n The data mining party participates in the prediction work of the prediction data;
define the global prediction sample with prediction class index as x':
x'=<x (11)' ,x (21)' ,…,x (m1)' ,…,x (1N)' ,…,x (kn)' >,m,k∈Z
in the formula:
x (ii)' a value representing the ii-th attribute of the global prediction sample x';
m and k are integers and represent the attribute number of each parameter party;
wherein each data participant P i Owns a portion of the global prediction samples, defined as x' i Expressed by the following formula:
x' i =<x (1i)' ,…,x (ki)' >,k∈Z,i∈[1,n]。
the step 2 comprises the following steps:
the data mining party generates a public key pk and a private key sk of the paillier homomorphic encryption algorithm by the following formulas:
randomly selecting two prime numbers p and q, and satisfying gcd (pq, (p-1) (q-1)) =1, wherein the gcd (x, y) is a function for solving the greatest common divisor of the x scalar and the y scalar;
calculating n = pq and λ = lcm (p-1,q-1), where lcm (x, y) is the least common multiple function that is used to solve the least common multiple of the x and y scalars;
defining functions
Figure BDA0003925374440000041
Randomly selecting less than n 2 And satisfies the presence of μ
μ=(L(g λ mod n 2 )) -1 mod n
The public key pk is < n, g > and the private key sk is < λ, μ >.
The step 3 specifically includes:
step 3.1, data participant P i Computing a local data set D i And predicted sample x' i Local distance vector d between i The expression is given by the following formula,
Figure BDA0003925374440000042
d i =<dist(x 1i ,x i ),dist(x 2i ,x i ),…,dist(x ni ,x i )>
in the formula:
dist(x i ,x j ) I, j =1,2, …, n denotes data point x i And data point x j A square distance function of (d), a scalar;
d i representing a calculation data set D i A distance vector is constructed by Euclidean distance between any data point of the data and the prediction sample;
step 3.2, data participant P i Decompose d by i Distance vector: firstly, randomly generating n-1 and d i Homomodal vector d 1i ,d 2i ,…,d n-1,i Then calculate out
Figure BDA0003925374440000043
Step 3.3, P data participants i N r are generated as follows ji Perturber vector: firstly, randomly generating n-1 and d i Homomodal vector r 1i ,r 2i ,…,r n-1,i Then calculate out
Figure BDA0003925374440000051
The step 4 specifically includes:
data participant P i Is to s 'by the following formula' ji The homomorphic encryption is carried out, and,
Figure BDA0003925374440000052
Figure BDA0003925374440000053
wherein r is randomly selected and satisfies 0<r<n and r ∈ Z * n2 、<n,g>Is the public key pk.
The step 5 specifically comprises: p n Obtaining an undisturbed encrypted global distanceThe vector e (S) is expressed as:
Figure BDA0003925374440000054
the step 6 specifically includes:
step 6.1, data miner P n E (S) is decrypted homomorphically by the following formula,
e(S)=<e(S 1 ),e(S 2 ),…,e(S l )>
S l =L(e(S l ) λ mod N 2 )*μmod N
wherein < λ, μ > is the private key sk;
step 6.2, data miner P n By obtaining a global distance vector S =<S 1 ,S 2 ,…,S l >And the class label record L =whichitself has<L 1 ,L 2 ,…,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,…,<S l ,L l >}
To perform kNN prediction on the prediction samples, first P n Presetting a K value;
step 6.3, data miner P n Sorting SL from small to large and reserving first K elements Sort (SL) = &<S 1 ,L 1 >',<S 2 ,L 2 >',…,<S K ,L K >', then counting the class number to record the most quantitative class in L', and the class is the predicted result of the predicted sample.
Further, the present invention also provides a distributed privacy protection classification system based on homomorphic encryption, wherein the system comprises:
a parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Key generation module, data miner P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
A vector generation module for calculating local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji
Vector encryption module, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) E (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Figure BDA0003925374440000061
Global vector generation module, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner Pn
Figure BDA0003925374440000062
Obtaining an undisturbed encrypted global distance vector e (S);
classification output module, data mining party P n To obtainThe encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
Compared with the prior art, the method has the beneficial effect that classified mining participated by an untrusted data mining party is realized aiming at the classified scenes of privacy protection. The method and the system can prevent the malicious digger from attacking by using the grasped part of background knowledge, and improve the security of user data privacy protection.
Drawings
Fig. 1 is a flowchart of a distributed privacy protection classification method based on homomorphic encryption according to the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
In order to achieve the above object, the technical solution adopted by the present invention is a distributed privacy protection classification method based on homomorphic encryption, which is further described below with reference to fig. 1, and includes the following steps:
step 1, defining symbols of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,…,A m1 ,…,A 1n ,…,A kn ,L>,m,k∈Z
Wherein ID is the primary key attribute, A i Is a numerical type condition attribute, and L is a class number attribute. Dividing D into n parts and holding by each data participant, defining the ith part as local data set D i ,i∈[1,n]And is covered by P i Hold, and D i The following attributes are included in the list,<ID,A 1i ,A 2i ,…,A mi >m is an integer
D i ={x 1i ,…,x ji ,…,x li }
Figure BDA0003925374440000071
In the formula:
l denotes the number of records contained in the original data set D,
x ji representing a data set D i J =1,2, …, l
m represents a data set D i Number of attributes of data points in the data set, i.e. data set D i For an m-dimensional dataset containing l data points,
Figure BDA0003925374440000072
represents the data point x ji R =1,2, …, m;
in particular, P n Owned local data set D n Containing a class label attribute, D n Expressed by the following formula
<ID,A 1i ,A 2i ,…,A mi ,L>,m∈Z
The data participant Pn is called a data mining party, and participates in the prediction work of the prediction data.
Defining the global prediction sample with prediction class index as x':
x'=<x (11)' ,x (21)' ,…,x (m1)' ,…,x (1n)' ,…,x (kn)' >
in the formula:
x (ii)' value representing the ii attribute of the global prediction sample x
m, k are integers representing the number of attributes of each parameter
Wherein each data participant P i Having a portion of the global prediction sample, we define as x' i Expressed by the following formula:
x' i =<x (1i)' ,…,x (ki)' >,k∈Z,i∈[1,n]
step 2, data mining party P n Generating the public key sum required for homomorphic encryptionPrivate key, we define public key as pk, private key as sk. P n Reserving sk for decryption, P n Sending pk to all data participants P 1 ~P n-1
Step 2.1, the data mining party generates a public key pk and a private key sk of the paillier homomorphic encryption algorithm by the following formulas:
two prime numbers p and q are randomly selected, satisfying gcd (pq, (p-1) (q-1)) =1, where gcd (x, y) is the greatest common divisor function used to solve the greatest common divisor of the x and y scalars.
Calculating n = pq and λ = lcm (p-1,q-1), where lcm (x, y) is the least common multiple function that is used to solve the least common multiple of the x and y scalars;
defining functions
Figure BDA0003925374440000081
Randomly selecting less than n 2 And satisfies the presence of μ
μ=(L(g λ mod n 2 ))mod n
Wherein
Figure BDA0003925374440000082
The public key pk is < n, g > and the private key sk is < λ, μ >.
Step 3, each data participant P i ,i∈[1,n]Computing a local data set D i And predicted sample x' i Local distance vector d between i 。P i Will d i Decomposed into n homomodal vectors s ji And satisfy
Figure BDA0003925374440000083
P i Randomly generating n and d i Isomorphic perturbation subvector r ji Satisfy the following requirements
Figure BDA0003925374440000084
Then merge the perturbation vectors s' ji =s ji +r ji
Step 3.1, P i Computing a local data set D i And predicted sample x' i Local distance vector d between i The expression is given by the following formula,
Figure BDA0003925374440000091
d i =<dist(x 1i ,x i ),dist(x 2i ,x i ),…,dist(x ni ,x i )>
in the formula:
dist(x i ,x j ) I, j =1,2, …, n denotes data point x i And data point x j A square distance function of (d), a scalar;
d i representing a calculation data set D i A distance vector constructed by Euclidean distances between any data point and the prediction sample;
step 3.2, P i Decompose d by i Distance vector: firstly, randomly generating n-1 and d i Homomodal vector d 1i ,d 2i ,…,d n-1,i Then calculate out
Figure BDA0003925374440000092
Step 3.3, P i N r are generated as follows ji Perturber vector: firstly, randomly generating n-1 and d i Homomodal vector r 1i ,r 2i ,…,r n-1,i Then calculate out
Figure BDA0003925374440000093
Step 4, data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j ∈ 1, n, j ≠ i) is advancedLine homomorphic encryption to obtain e (s' ji ) E (s' ji ) Is sent to P j . When P is present i Distance sub-vector e (s ') to completely receive encrypted perturbations emanating from other data participants' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Figure BDA0003925374440000094
Step 4.1, P i Is to s 'by the following formula' ji The homomorphic encryption is carried out, and,
Figure BDA0003925374440000095
Figure BDA0003925374440000096
wherein r is randomly selected and satisfies 0<r<n and
Figure BDA0003925374440000097
<n,g>is the public key pk.
Step 5, each data owner P 1 ~P n-1 E (S) i ) Is sent to P n ,P n Performing a summation operation e (S) =
Figure BDA0003925374440000101
Obtaining an undisturbed encrypted global distance vector e (S);
step 5.1, P n Obtaining an undisturbed encrypted global distance vector
The conclusion of e (S) can be demonstrated by the following formula,
Figure BDA0003925374440000102
step 6,P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and simultaneously P n Owning type markNumber attribute, P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the predicted class labels of the predicted samples.
Step 6.1, P n E (S) is decrypted homomorphically by the following formula,
e(S)=<e(S 1 ),e(S 2 ),…,e(S l )>
S l =L(e(S l ) λ mod n 2 )*μmod n
where < λ, μ > is the private key sk.
Step 6.2, P n By obtaining a global distance vector S =<S 1 ,S 2 ,…,S l >And the self-existing class label record L =<L 1 ,L 2 ,…,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,…,<S l ,L l >}
To perform kNN prediction on the prediction samples, P first n And presetting a K value, wherein K is a preset parameter and represents the number of the statistical class numbers.
Step 6.3, P n Sorting SL from small to large and reserving first K elements Sort (SL) = &<S 1 ,L 1 >',<S 2 ,L 2 >',…,<S K ,L K >', then counting the class number to record the most quantitative class in L', and the class is the predicted result of the predicted sample.
Correspondingly, the invention also provides a distributed privacy protection classification system based on homomorphic encryption, which comprises:
a parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And the local data set D is combined i Is correspondingly distributed to all data participants P i Each data participant P i Correspond toIs x i ′;
Key generation module, data miner P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈1,n-1;
A vector generation module for calculating local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji
Vector encryption module, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s' 2i ,…,s' ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) D is e (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Figure BDA0003925374440000111
Global vector generation module, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner Pn
Figure BDA0003925374440000112
Obtaining an undisturbed encrypted global distance vector e (S);
classification output module, data mining party P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n By counting the number KAnd finding the most class labels according to the class labels of the points, and using the most class labels as the prediction class labels of the prediction samples.
In order to more clearly introduce the technical scheme of the invention, a distributed privacy protection classification example based on homomorphic encryption is provided. As shown in the table 1 below, the following examples,
TABLE 1
X1 X2 X3 X4 X5 X6 Y
1187 431 239 280 75 364 2
1260 246 220 710 73 158 3
1422 399 251 510 89 353 1
1252 243 217 200 90 278 2
1307 150 210 370 118 269 1
1207 216 217 276 86 328 2
1383 165 260 560 124 337 1
1388 504 223 490 58 133 3
1324 398 229 436 82 300 1
1225 388 220 821 65 200 3
Global data set D is composed of P 1 Local data set (containing attributes X1, X2), P 2 Local data set (containing attributes X3, X4) and P 3 The local data set (containing attributes X5, X6 and class label Y) of (1) consists of ten total data, i.e. n =3,l =6,m =10. Each data set is represented by the following formula,
D 1 ={X1,X2}
D 2 ={X3,X4}
D 3 ={X5,X6,Y}
D={X1,X2,X3,X4,X5,X6,Y}
P 3 generating a key pk, sk by a paillier algorithm, and then sending pk to P 1 And P 2
We have the following prediction sample x' = {1270,355,236,500,78,129}, where P is 1 Is only provided with x' 1 ={1270,355}、P 2 Is only provided with x' 2 ={236,500}、P 3 Own only x' 3 ={78,129}。
P 1 Calculating to obtain a local distance vector d 1 ={6654645,6762101,7815380,6718088,6895954,6461570,7308809,7802845,7295845,6777074},P 1 To d 1 Decomposing to obtain
s 11 ={944152,202554,117739,701284,179399,834307,937513,429851,618391,551784}
s 12 ={516237,300613,615949,916074,823429,809045,817028,867594,285042,227010}
s 13 ={5194256,6258934,7081692,5100730,5893126,4818218,5554268,6505400,6392412,5998280},P 1 Generating a perturbation vector
r 11 ={-495209,-722845,151779,671406,-722113,-117051,981280,-335316,391540,336789}
r 12 ={-572483,-545057,-518861,767570,-734741,196243,-398850,81357,-597024,576812}
r 13 ={1067692,1267902,367082,-1438976,1456854,-79192,-582430,253959,205484,-913601}
P1, combining the disturbance vector into the distance vector to obtain
s' 11 ={448943,-520291,269518,1372690,-542714,717256,1918793,94535,1009931,888573}
s' 12 ={-56246,-244444,97088,1683644,88688,1005288,418178,948951,-311982,803822}
s' 13 ={6261948,7526836,7448774,3661754,7349980,4739026,4971838,6759359,6597896,5084679}
P 1 It is encrypted for e (pk, s' 11 ),e(pk,s' 12 ),e(pk,s' 13 ) E (pk, s' 12 ) Is sent to P 2 、e(pk,s' 13 ) Is sent to P 3
In the same way, P 2 The same operation is also performed. To obtain
d 2 ={834025,1672036,1257269,695209,955816,807385,1369616,1190781,1092321,1952977}
s 21 ={888708,373635,203935,509229,899667,900983,225738,459837,574678,264926}
s 22 ={775161,401124,912219,897221,631254,942990,525456,985425,883576,348348}
s 23 ={-829844,897277,141115,-711241,-575105,-1036588,618422,-254481,-365933,1339703}
r 21 ={341672,378146,-957230,49325,-522974,-172005,-726181,-605921,756926,-838636}
r 22 ={-839858,-614081,584053,880419,-69298,497012,909647,-866401,318681,-899783}
r 23 ={498186,235935,373177,-929744,592272,-325007,-183466,1472322,-1075607,1738419}
s' 21 ={1230380,751781,-753295,558554,376693,728978,-500443,-146084,1331604,-573710}
s' 22 ={-64697,-212957,1496272,1777640,561956,1440002,1435103,119024,1202257,-551435}
s' 23 ={-331658,1133212,514292,-1640985,17167,-1361595,434956,1217841,-1441540,3078122}
P 3 To obtain
d 3 ={266458,105170,260213,193873,196820,235745,257960,87140,209641,128690}
s 31 ={985273,314437,300723,504939,931920,697098,906371,830535,558415,439459}
s 32 ={42945,633334,763828,913768,539469,616078,388222,558941,121756,500723}
Figure BDA0003925374440000141
r 31 ={-217937,841799,-418199,832906,989079,931987,58489,-972894,-143857,-125059}
r 32 ={667088,-767364,-723753,519276,865585,49575,-725175,267744,-945622,222052}
r 33 ={-449151,-74435,1141952,-1352182,-1854664,-981562,666686,705150,1089479,-96993}
s' 31 ={767336,1156236,-117476,1337845,1920999,1629085,964860,-142359,414558,314400}
s' 32 ={710033,-134030,40075,1433044,1405054,665653,-336953,826685,-823866,722775}
s' 33 ={-1210911,-917036,337614,-2577016,-3129233,-2058993,-369947,-597186,618949,-908485}
After exchanging data with each other, P 1 Is provided with e (pk, s' 11 ),e(pk,s' 21 ),e(pk,s' 31 ) And summing to obtain e (pk, S) 1 ) Wherein
S 1 ={2446659,1387726,-601253,3269089,1754978,3075319,2383210,-193908,2756093,629263}
Same, P 2 To obtain e (pk, S) 2 )、P 3 To obtain e (pk, S) 3 )
S s ={589090,-591431,1633435,4894328,2055698,3110943,1516328,1894660,66409,975162}
S 3 ={4719379,7743012,8300680,-556247,4237914,1318438,5036847,7380014,5775305,7254316}
P 1 Mixing e (pk, S) 1 ) Is sent to P 3 、P 2 Mixing e (pk, S) 2 ) Is sent to P 3
P3 performs a summation operation e (pk, S) = e (pk, S) 1 )+e(pk,S 2 )+e(pk,S 3 ) And e (pk, S) is decrypted by sk to obtain,
S={7755128,8539307,9332862,7607170,8048590,7504700,8936385,9080766,8597807,8858741}
P n combining S with self serial numbers and sequencing according to distance to obtain
Figure BDA0003925374440000161
Suppose P n Preset K =5, then the class numbers 1,2,3 account for 20%,60%, and 20% of the first K pieces of data, respectively, and therefore P n The class index of the prediction sample is 2.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for the purpose of limiting the scope of the present invention, and on the contrary, any modifications or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims (8)

1. A distributed privacy protection classification method based on homomorphic encryption is characterized by comprising the following steps:
step 1, defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D into different local data sets D vertically i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Step 2, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Step 3, calculating a local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain disturbed distance sub-vectors s 'with the sum of the n disturbance sub-vectors being 0' ji
Step 4, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s′ 2i ,...,s′ ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) E (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption and summation to obtain
Figure FDA0003925374430000011
Figure FDA0003925374430000012
Step 5, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner Pn
Figure FDA0003925374430000013
Obtaining an undisturbed encrypted global distance vector e (S);
step 6, data miner P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
2. The distributed privacy protection classification method based on homomorphic encryption according to claim 1, wherein the step 1 comprises:
defining the symbols of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,...,A m1 ,...,A 1n ,...,A kn ,L>,m,k∈Z
Wherein ID is the attribute of the main key, A is the attribute of the numerical type condition, L is the attribute of the class number; m and k are integers and represent the attribute number of each parameter party;
dividing D into n parts and holding by each data participant, defining the ith part as local data set D i ,i∈[1,n]And is held by Pi, and Di contains the following attributes,
<ID,A 1i ,A 2i ,...,A mi >, m is an integer
D i ={x 1i ,...,x ji ,...,x li }
Figure FDA0003925374430000021
In the formula:
l denotes the number of records contained in the original data set D,
x ji representing a data set D i J =1,2, …, l
m represents a data set D i Number of attributes of data points in the data set, i.e. data set D i For an m-dimensional dataset containing l data points,
Figure FDA0003925374430000022
represents the data point x ji R =1,2, …, m;
P n owned local data set D n Contains a class label attribute, dn is expressed by the following formula
<ID,A 1i ,A 2i ,...,A mi ,L>,m∈Z
The data participant Pn is called a data mining party and participates in the prediction work of the prediction data;
defining the global prediction sample with prediction class index as x':
x′=<x (11)′ ,x (21)′ ,...,x (m1)′ ,…,x (1N)′ ,...,x (kn)′ >,m,k∈Z
in the formula:
x (ii)′ a value representing the ii-th attribute of the global prediction sample x';
m and k are integers and represent the attribute number of each parameter party;
wherein each data participant Pi owns a portion of the global prediction samples, defined as x' i Expressed by the following formula:
x′ i =<x (1i)′ ,...,x (ki)′ >,k∈Z,i∈[1,n]。
3. the distributed privacy protection classification method based on homomorphic encryption according to claim 2, characterized in that: the step 2 comprises the following steps:
the data mining party generates a public key pk and a private key sk of the paillier homomorphic encryption algorithm by the following formulas:
randomly selecting two prime numbers p and q, and satisfying gcd (pq, (p-1) (q-1)) =1, wherein the gcd (x, y) is a function for solving the greatest common divisor of the x scalar and the y scalar;
calculating n = pq and λ = lcm (p-1,q-1), where lcm (x, y) is the least common multiple function that is used to solve the least common multiple of the x and y scalars;
defining functions
Figure FDA0003925374430000031
Randomly selecting less than n 2 And satisfies the presence of μ
μ=(L(g λ mod n 2 )) -1 mod n
The public key pk is < n, g > and the private key sk is < λ, μ >.
4. The distributed privacy protection classification method based on homomorphic encryption according to claim 3, characterized in that:
the step 3 specifically includes:
step 3.1, data participant Pi calculates local data set Di and prediction sample x' i Local distance vector d between i And is expressed by the following formula,
Figure FDA0003925374430000032
d i =<dist(x 1i ,x′ i ),dist(x 2i ,x′ i ),...,dist(x ni ,x′ i )>
in the formula:
dist(x i ,x j ) I, j =1,2, …, n denotes data point x i And data point x j A square distance function of (d), a scalar;
d i a distance vector constructed by expressing the Euclidean distance between any data point of the data set Di and the prediction sample is calculated;
step 3.2, data participant P i Decompose d by i Distance vector: firstly, randomly generating n-1 vectors d with the same mode as di 1i ,d 2i ,...,d n-1,i Then calculate out
Figure FDA0003925374430000041
Step 3.3. P data participant i generates n r by ji Perturber vector: firstly, randomly generating n-1 and d i Homomodal vector r 1i ,r 2i ,...,r n-1,i Then calculate out
Figure FDA0003925374430000042
5. The distributed privacy protection classification method based on homomorphic encryption according to claim 4, characterized in that: the step 4 specifically includes:
data participant Pi is to s 'by the following formula' ji The homomorphic encryption is carried out, and,
Figure FDA0003925374430000043
Figure FDA0003925374430000044
wherein r is randomly selected, satisfies 0 < r < n
Figure FDA0003925374430000045
<n,g>Is the public key pk.
6. The distributed privacy protection classification method based on homomorphic encryption according to claim 5, characterized in that: the step 5 specifically comprises: p n The resulting perturbation-free encrypted global distance vector e (S) is expressed as:
Figure FDA0003925374430000046
7. the distributed privacy protection classification method based on homomorphic encryption according to claim 6, characterized in that: the step 6 specifically includes:
step 6.1, data miner P n E (S) is decrypted homomorphically by the following formula,
e(S)=<e(S 1 ),e(S 2 ),...,e(S l )>
S l =L(e(S l ) λ mod N 2 )*μmod N
wherein < λ, μ > is the private key sk;
step 6.2, data miner P n By obtaining a global distance vector S =<S 1 ,S 2 ,...,S l >And the self-existing class label record L =<L 1 ,L 2 ,...,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,...,<S l ,L l >}
To perform kNN prediction on the prediction samples, P first n Presetting a K value;
step 6.3, data miner P n Sorting SL from small to large and reserving first K elements Sort (SL) = &<S 1 ,L 1 >′,<S 2 ,L 2 >′,...,<S K ,L K >', then counting the class number to record the most quantitative class in L', and the class is the predicted result of the predicted sample.
8. A distributed privacy protection classification system based on homomorphic encryption, for implementing the classification method according to any one of claims 1-7, characterized in that the system comprises:
a parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', and one of the data participants is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And set the local data D i Is correspondingly distributed to all data participants P i Each data participant P i The corresponding prediction sample portion is x i ′;
Key generation module, data miner P n Generating homomorphic addsA public key pk and a private key sk needed by the password, the private key sk is reserved for decryption, and the public key pk is sent to other data participants P i ,i∈[1,n-1];
A vector generation module for calculating local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i Will be the local distance vector d i Decomposing the distance sub-vectors into n same-mode distance sub-vectors, randomly generating n disturbance sub-vectors, enabling the sum of the n disturbance sub-vectors to be 0, correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a disturbed distance sub-vector s' ji
Vector encryption module, each data participant P i Distance sub-vector s 'with disturbance owned by self' 1i ,s′ 2i ,...,s′ ni S 'is retained' ii S 'is' ji (j∈[1,n]J ≠ i) is homomorphic encrypted to obtain an encrypted perturbed distance subvector e (s' ji ) D is e (s' ji ) To data participant P j When P is i Distance subvectors e (s ') of encrypted perturbations emitted by other data participants are received in their entirety' ji ) Rear (j ≠ i), P i Performing homomorphic encryption summation to obtain
Figure FDA0003925374430000051
Global vector generation module, each data participant P 1 ~P n-1 E (S) i ) Sending the data to a data miner Pn, and carrying out summation operation on the data miner Pn
Figure FDA0003925374430000061
Obtaining an undisturbed encrypted global distance vector e (S);
classification output module, data mining party P n The obtained encrypted global distance vector e (S) is decrypted by using a private key sk to obtain a global distance vector S, and a data mining party P n And counting the class labels of the K data points to find the most class labels, and using the most class labels as the prediction class labels of the prediction samples.
CN202211372124.3A 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption Active CN115587139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211372124.3A CN115587139B (en) 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211372124.3A CN115587139B (en) 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption

Publications (2)

Publication Number Publication Date
CN115587139A true CN115587139A (en) 2023-01-10
CN115587139B CN115587139B (en) 2024-03-22

Family

ID=84781087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211372124.3A Active CN115587139B (en) 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption

Country Status (1)

Country Link
CN (1) CN115587139B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601596A (en) * 2015-02-05 2015-05-06 南京邮电大学 Data privacy protection method in classification data mining system
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN108111294A (en) * 2017-12-13 2018-06-01 南京航空航天大学 A kind of multiple labeling sorting technique of the protection privacy based on ML-kNN
CN110008717A (en) * 2019-02-26 2019-07-12 东北大学 Support the decision tree classification service system and method for secret protection
CN111967514A (en) * 2020-08-14 2020-11-20 安徽大学 Data packaging-based sample classification method for privacy protection decision tree
WO2020233260A1 (en) * 2019-07-12 2020-11-26 之江实验室 Homomorphic encryption-based privacy-protecting multi-institution data classification method
US20210399874A1 (en) * 2020-06-19 2021-12-23 Duality Technologies, Inc. Secure distributed key generation for multiparty homomorphic encryption
CN114817999A (en) * 2022-06-28 2022-07-29 北京金睛云华科技有限公司 Outsourcing privacy protection method and device based on multi-key homomorphic encryption
CN115150060A (en) * 2022-07-06 2022-10-04 三未信安科技股份有限公司 Data privacy protection method based on secure multi-party clustering method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601596A (en) * 2015-02-05 2015-05-06 南京邮电大学 Data privacy protection method in classification data mining system
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN108111294A (en) * 2017-12-13 2018-06-01 南京航空航天大学 A kind of multiple labeling sorting technique of the protection privacy based on ML-kNN
CN110008717A (en) * 2019-02-26 2019-07-12 东北大学 Support the decision tree classification service system and method for secret protection
WO2020233260A1 (en) * 2019-07-12 2020-11-26 之江实验室 Homomorphic encryption-based privacy-protecting multi-institution data classification method
US20210399874A1 (en) * 2020-06-19 2021-12-23 Duality Technologies, Inc. Secure distributed key generation for multiparty homomorphic encryption
CN111967514A (en) * 2020-08-14 2020-11-20 安徽大学 Data packaging-based sample classification method for privacy protection decision tree
CN114817999A (en) * 2022-06-28 2022-07-29 北京金睛云华科技有限公司 Outsourcing privacy protection method and device based on multi-key homomorphic encryption
CN115150060A (en) * 2022-07-06 2022-10-04 三未信安科技股份有限公司 Data privacy protection method based on secure multi-party clustering method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHAUL, H.; FELDMAN, D.; RUS, D: "Secure k-ish Nearest Neighbors Classifier", PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES, pages 42 - 61 *
崔建京;龙军;闵尔学;于洋;殷建平: "同态加密在加密机器学习中的应用研究综述", 计算机科学, pages 46 - 52 *
钱萍;吴蒙;: "同态加密隐私保护数据挖掘方法综述", 计算机应用研究, no. 05, pages 20 - 23 *

Also Published As

Publication number Publication date
CN115587139B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN110008717B (en) Decision tree classification service system and method supporting privacy protection
Li et al. Differentially private Naive Bayes learning over multiple data sources
Aono et al. Privacy-preserving logistic regression with distributed data sources via homomorphic encryption
Vaidya et al. Privacy-preserving SVM classification
Niu et al. An image encryption approach based on chaotic maps and genetic operations
Liu et al. Toward highly secure yet efficient KNN classification scheme on outsourced cloud data
Su et al. Reversible cellular automata image encryption for similarity search
Jiang et al. Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing
Zhan Privacy-preserving collaborative data mining
Jayapandian et al. Secure and efficient online data storage and sharing over cloud environment using probabilistic with homomorphic encryption
CN112966283B (en) PPARM (vertical partition data parallel processor) method for solving intersection based on multi-party set
Yang et al. Collusion-resistant privacy-preserving data mining
Pedersen et al. Secret charing vs. encryption-based techniques for privacy preserving data mining
Malik et al. A homomorphic approach for security and privacy preservation of Smart Airports
Chu et al. A novel 3D image encryption based on the chaotic system and RNA crossover and mutation
Tang Secret sharing-based IoT text data outsourcing: A secure and efficient scheme
Sriramoju et al. An Analysis on Effective, Precise and Privacy Preserving Data Mining Association Rules with Partitioning on Distributed Databases
CN114629717B (en) Data processing method, device, system, equipment and storage medium
Li et al. Securely outsourcing ID3 decision tree in cloud computing
CN115587139A (en) Distributed privacy protection classification method and system based on homomorphic encryption
Zhan Using homomorphic encryption for privacy-preserving collaborative decision tree classification
Sharma et al. Privacy-preserving boosting with random linear classifiers
Al Etaiwi et al. Structured encryption algorithm for text cryptography
Yu et al. A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective
Qiu et al. Efficient privacy-preserving outsourced k-means clustering on distributed data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant