CN115587139B - Distributed privacy protection classification method and system based on homomorphic encryption - Google Patents

Distributed privacy protection classification method and system based on homomorphic encryption Download PDF

Info

Publication number
CN115587139B
CN115587139B CN202211372124.3A CN202211372124A CN115587139B CN 115587139 B CN115587139 B CN 115587139B CN 202211372124 A CN202211372124 A CN 202211372124A CN 115587139 B CN115587139 B CN 115587139B
Authority
CN
China
Prior art keywords
data
distance
vector
global
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211372124.3A
Other languages
Chinese (zh)
Other versions
CN115587139A (en
Inventor
邹云峰
吴宁
周红勇
单超
祝宇楠
范环宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202211372124.3A priority Critical patent/CN115587139B/en
Publication of CN115587139A publication Critical patent/CN115587139A/en
Application granted granted Critical
Publication of CN115587139B publication Critical patent/CN115587139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed privacy protection classification method and a system based on homomorphic encryption, which define n data participants participating in data mining, attributes of a global data set D and prediction samples; data mining party P n Generating public and private keys required for homomorphic encryption, data party P i Calculating Euclidean distance between each data point in the local data set and the prediction sample X, and constructing a local distance vector M; and generates a random number vector to be added to the local distance vector to generate an encrypted distance sub-vector, P i An encrypted distance vector M' is obtained. And obtain global encryption distance vector, P n And sorting the global encryption distance vector according to the prediction sample X, and selecting K data points closest to the X. The data miner finds the most class labels by counting class labels of the K data points and uses the class labels as predicted class labels of the predicted samples. The method provided by the invention can effectively consider classification precision while realizing privacy protection of data of all parties.

Description

Distributed privacy protection classification method and system based on homomorphic encryption
Technical Field
The invention relates to the field of information security, in particular to a distributed privacy protection classification method and system based on homomorphic encryption.
Background
The rapid development and the continuous deepening of the digitalization degree of the modern society lead to the fact that data are stored in a concentrated mode in the past and become more and more scattered, the data mining needs to be participated in by multiple parties, and the information owners do not want to know personal privacy by the opposite parties, so that the problem can be well solved by utilizing safe multiparty calculation under the condition.
The well-known tutoring prize acquirer, the professor Yao Qizhi of the chinese scientist, presents a Yao Shi million rich question, namely 2 wealths that do not expose themselves, how to determine who is richer, which has evolved into the current Secure Multi-party computing (SMC). I.e. to obtain the desired conclusion without sharing the original data. The multiparty secure computing technology encrypts randomly each time, can not reuse the encrypted data, directly operates on the encrypted data, and the original data is not restored, and before each computation, the participants are determined first, and all the participants are required to coordinate together, so that the value in the data can be obtained without revealing the original data. The integration of model gradient update is performed by using multiparty security calculation, so that the possibility of information leakage can be reduced. In summary, secure multiparty computing is mainly used to solve the problem of how a plurality of mutually untrusted parties in a distributed network perform collaborative computing without revealing secret information held by each party. On the one hand, it requires the realization of cooperative computation of the inter-participant engagement function; on the other hand, the secret input data held by each participant is also secured.
The invention researches how multiple parties cooperate to construct k-nearest neighbor classifiers on vertically partitioned data. We have developed a distributed k-nearest neighbor classification protocol based on homomorphic encryption. In this protocol, all the parties do not need to send all their data to a central, trusted party, and we use homomorphic encryption and random perturbation techniques to achieve collaborative computing of multiple parties without revealing the data privacy.
Disclosure of Invention
In order to solve the problems in the existing scene, the invention aims to provide a privacy protection classification method for realizing that all parties cooperate to perform classification calculation under the condition of not revealing data privacy and simultaneously maintaining the classification precision of an unreliable mining analyst.
Specifically, the invention provides a distributed privacy protection classification method based on homomorphic encryption, which comprises the following steps:
step 1, defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', wherein one data participant is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And the local data set D i Corresponding to all data participants P i Each data participant P i The corresponding prediction sample part is x i ′;
Step 2, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Step 3, calculating the local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i To the local distance vector d i Decomposing into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, wherein the sum of the n disturbance sub-vectors is 0, and correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a distance sub-vector s 'with disturbance' ji
Step 4, each data participant P i For self-owned distance subvector s 'with perturbation' 1i ,s' 2i ,…,s' ni Reserve s' ii Will s' ji (j∈[1,n]Homomorphic encryption is carried out on j not equal to i) to obtain an encrypted disturbance distance subvector e (s' ji ) E (s' ji ) Send to data participationSquare P j When P i Distance sub-vector e (s 'for completely receiving encrypted disturbances from other data participants' ji ) Rear (j.noteq.i), P i Homomorphic encryption summation is carried out to obtain
Step 5, each data participant P 1 ~P n-1 Will S (S) i ) Transmitting to a data miner Pn, and summing the data miner PnObtaining an undisturbed encrypted global distance vector S (S);
step 6, data mining party P n Decrypting the obtained encrypted global distance vector e (S) by using the private key sk to obtain the global distance vector S, and obtaining the data mining party P n By counting class labels of the K data points, the most class labels are found and used as the predicted class labels of the predicted samples.
Further, the step 1 includes:
defining the symbol of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,…,A m1 ,…,A 1n ,…,A kn ,L>,m,k∈Z
Wherein ID is a primary key attribute, A is a numerical value type condition attribute, and L is a class number attribute; m and k are integers and represent the attribute number of each parameter party;
vertically dividing D into n parts and respectively holding the n parts by each data participant, and defining the ith part as a local data set D i ,i∈[1,n]And is P i Hold, and D i Comprising the following attributes,
<ID,A 1i ,A 2i ,…,A mi >m is an integer
D i ={x 1i ,…,x ji ,…,x li }
Wherein:
l denotes the number of records contained in the original dataset D,
x ji representing data set D i J=1, 2, …, l
m represents the dataset D i The number of attributes of the data points, i.e. data set D i For an m-dimensional dataset containing l data points,
representing data point x ji R = 1,2, …, m;
P n owned local dataset D n Contains class label attributes, D n Expressed by the following formula
<ID,A 1i ,A 2i ,…,A mi ,L>,m∈Z
Called data participants P n The method is a data mining party and participates in the prediction work of the prediction data;
defining a global prediction sample with a prediction class label as x':
x'=<x (11)' ,x (21)' ,…,x (m1)' ,…,x (1N)' ,…,x (kn)' >,m,k∈Z
wherein:
x (ii)' a value representing the ii-th attribute of the global prediction sample x';
m and k are integers and represent the attribute number of each parameter party;
wherein each data participant P i A portion of the global prediction samples is owned, defined as x' i Expressed by the following formula:
x' i =<x (1i)' ,…,x (ki)' >,k∈Z,i∈[1,n]。
the step 2 comprises the following steps:
the data miner generates the public key pk and private key sk of the pamillier homomorphic encryption algorithm by:
randomly selecting two prime numbers p and q to satisfy gcd (pq, (p-1) (q-1))=1, wherein gcd (x, y) is a function of greatest common divisor used to solve for greatest common divisors of x and y scalar quantities;
calculating n=pq and λ=lcm (p-1, q-1), where lcm (x, y) is a least common multiple function used to solve for the least common multiple of the x and y scalar quantities;
definition of a function
Randomly selecting less than n 2 And satisfies the presence of mu
μ=(L(g λ mod n 2 )) -1 mod n
The public key pk is < n, g >, and the private key sk is < lambda, mu >.
The step 3 specifically includes:
step 3.1, data participant P i Computing a local data set D i And prediction sample x' i Local distance vector d between i Expressed by the following formula,
d i =<dist(x 1i ,x i ),dist(x 2i ,x i ),…,dist(x ni ,x i )>
wherein:
dist(x i ,x j ) I, j=1, 2, …, n represents the data point x i Data point x j A square distance function of (2), a scalar;
d i representing a calculation dataset D i Distance constructed by euclidean distance between any data point of (a) to predicted sampleA vector of separation;
step 3.2, data participant P i Decompose d by i Distance vector: first randomly generating n-1 and d i Vector d of the same pattern 1i ,d 2i ,…,d n-1,i Then calculate
Step 3.3, P data participants i N r are generated as follows ji Disturbance subvector: first randomly generating n-1 and d i Vector r of the same mode 1i ,r 2i ,…,r n-1,i Then calculate
The step 4 specifically includes:
data participant P i The pair s 'is expressed by the following formula' ji The homomorphic encryption is carried out and the data are encrypted,
wherein r is randomly selected to satisfy 0<r<n and r.epsilon.Z * n2 、<n,g>Is the public key pk.
The step 5 specifically includes: p (P) n The resulting undisturbed encrypted global distance vector e (S) is expressed as:
the step 6 specifically includes:
step 6.1, data mining Party P n E (S) is homomorphically encrypted decrypted by the following formula,
e(S)=<e(S 1 ),e(S 2 ),…,e(S l )>
S l =L(e(S l ) λ mod N 2 )*μmod N
wherein < lambda, mu > is the private key sk;
step 6.2, data mining Party P n By obtaining a global distance vector S =<S 1 ,S 2 ,…,S l >And the class label record l=existing in itself<L 1 ,L 2 ,…,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,…,<S l ,L l >}
To make kNN prediction on the prediction sample, first P n Presetting a K value;
step 6.3, data mining Party P n Ordering SL from small to large and retaining the first K elements Sort (SL) = {<S 1 ,L 1 >',<S 2 ,L 2 >',…,<S K ,L K >'the most numerous classes are then counted in class number record L', which is the prediction result of the prediction sample.
Further, the invention also provides a distributed privacy protection classification system based on homomorphic encryption, which comprises:
parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', wherein one data participant is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And the local data set D i Corresponding to all data participants P i Each data participant P i The corresponding prediction sample part is x i ′;
Key generation module, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, and protectingThe private key sk is reserved for decryption and the public key pk is sent to the other data participants P i ,i∈[1,n-1];
Vector generation module for computing local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i To the local distance vector d i Decomposing into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, wherein the sum of the n disturbance sub-vectors is 0, and correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a distance sub-vector s 'with disturbance' ji
Vector encryption module, each data participant P i For self-owned distance subvector s 'with perturbation' 1i ,s' 2i ,…,s' ni Reserve s' ii Will s' ji (j∈[1,n]Homomorphic encryption is carried out on j not equal to i) to obtain an encrypted disturbance distance subvector e (s' ji ) E (s' ji ) Sent to data participant P j When P i Distance sub-vector e (s 'for completely receiving encrypted disturbances from other data participants' ji ) Rear (j.noteq.i), P i Homomorphic encryption summation is carried out to obtain
Global vector generation module, each data participant P 1 ~P n-1 Will e (S) i ) Transmitting to a data miner Pn, and summing the data miner PnObtaining an undisturbed encryption global distance vector e (S);
classification output module and data mining party P n Decrypting the obtained encrypted global distance vector e (S) by using the private key sk to obtain the global distance vector S, and obtaining the data mining party P n By counting class labels of the K data points, the most class labels are found and used as the predicted class labels of the predicted samples.
Compared with the prior art, the method has the beneficial effects that aiming at privacy protection classification scenes, the method realizes classification mining of participation of an untrusted data mining party. The invention can prevent malicious diggers from initiating attack by utilizing the mastered partial background knowledge, and improve the privacy protection safety of user data.
Drawings
Fig. 1 is a flowchart of a distributed privacy protection classification method based on homomorphic encryption.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention and are not intended to limit the scope of protection of the present application.
In order to achieve the above object, the present invention adopts a technical scheme that is a distributed privacy protection classification method based on homomorphic encryption, and the present application is further described with reference to fig. 1, which includes the following steps:
step 1, defining the symbol of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,…,A m1 ,…,A 1n ,…,A kn ,L>,m,k∈Z
Wherein ID is a primary key attribute, A i The attribute is a numerical conditional attribute, and L is a class number attribute. Vertically dividing D into n parts and respectively holding the n parts by each data participant, and defining the ith part as a local data set D i ,i∈[1,n]And is P i Hold, and D i Comprising the following attributes,<ID,A 1i ,A 2i ,…,A mi >m is an integer
D i ={x 1i ,…,x ji ,…,x li }
Wherein:
l denotes the number of records contained in the original dataset D,
x ji representing data set D i J=1, 2, …, l
m represents the dataset D i The number of attributes of the data points, i.e. data set D i For an m-dimensional dataset containing l data points,
representing data point x ji R = 1,2, …, m;
in particular, P n Owned local dataset D n Contains class label attributes, D n Expressed by the following formula
<ID,A 1i ,A 2i ,…,A mi ,L>,m∈Z
The data participant Pn is called a data miner, which participates in the prediction work of the prediction data.
Defining a global prediction sample with a prediction class label as x':
x'=<x (11)' ,x (21)' ,…,x (m1)' ,…,x (1n)' ,…,x (kn)' >
wherein:
x (ii)' values representing the ii-th attribute of the global prediction sample x
m and k are integers and represent the number of attributes of each parameter party
Wherein each data participant P i Having a portion of the global prediction samples we define as x' i Expressed by the following formula:
x' i =<x (1i)' ,…,x (ki)' >,k∈Z,i∈[1,n]
step 2, data mining party P n The public key and the private key required by homomorphic encryption are generated, and the public key is defined as pk and the private key is defined as sk. P (P) n Reserving sk for decryption, P n Issuing pk to all data participants P 1 ~P n-1
Step 2.1, the data miner generates the public key pk and the private key sk of the paillier homomorphic encryption algorithm by the following formula:
two prime numbers p and q are randomly selected to satisfy gcd (pq, (p-1) (q-1)) =1, where gcd (x, y) is a greatest common divisor function used to solve for the greatest common divisors of the x and y scalar quantities.
Calculating n=pq and λ=lcm (p-1, q-1), where lcm (x, y) is a least common multiple function used to solve for the least common multiple of the x and y scalar quantities;
definition of a function
Randomly selecting less than n 2 And satisfies the presence of mu
μ=(L(g λ mod n 2 ))mod n
Wherein the method comprises the steps of
The public key pk is < n, g >, and the private key sk is < lambda, mu >.
Step 3, each data participant P i ,i∈[1,n]Computing a local data set D i And prediction sample x' i Local distance vector d between i 。P i Will d i Decomposition into n homomodal vectors s ji And satisfy the followingP i Randomly generating n and d i Homomodal disturbance sub-vector r ji Satisfy->Then merge the disturbance vectors s' ji =s ji +r ji
Step 3.1, P i Computing a local data set D i And prediction sample x' i Local distance vector d between i Expressed by the following formula,
d i =<dist(x 1i ,x i ),dist(x 2i ,x i ),…,dist(x ni ,x i )>
wherein:
dist(x i ,x j ) I, j=1, 2, …, n represents the data point x i Data point x j A square distance function of (2), a scalar;
d i representing a calculation dataset D i Distance vectors constructed from Euclidean distances between any data point of the (b) and the predicted samples;
step 3.2, P i Decompose d by i Distance vector: first randomly generating n-1 and d i Vector d of the same pattern 1i ,d 2i ,…,d n-1,i Then calculate
Step 3.3, P i N r are generated as follows ji Disturbance subvector: first randomly generating n-1 and d i Vector r of the same mode 1i ,r 2i ,…,r n-1,i Then calculate
Step 4, data participant P i For self-owned distance subvector s 'with perturbation' 1i ,s' 2i ,…,s' ni Reserve s' ii Will s' ji (j.epsilon.1, n, j.noteq.i) homomorphic encryption to obtain e (s' ji ) E (s' ji ) Send to P j . When P i Distance sub-vector e (s 'for completely receiving encrypted disturbances from other data participants' ji ) Rear (j.noteq.i), P i Homomorphic encryption summation is carried out to obtain
Step 4.1, P i The pair s 'is expressed by the following formula' ji The homomorphic encryption is carried out and the data are encrypted,
wherein r is randomly selected to satisfy 0<r<n is n and<n,g>is the public key pk.
Step 5, all parties P of each data 1 ~P n-1 Will e (S) i ) Send to P n ,P n Summation e (S) =is performedObtaining an undisturbed encryption global distance vector e (S);
step 5.1, P n Obtaining an undisturbed encrypted global distance vector
This conclusion of e (S) can be demonstrated by the following formula,
step 6, P n Decrypting the obtained encrypted global distance vector e (S) by using the private key sk to obtain the global distance vector S, and simultaneously P n Possessing class number attributes, P n By counting class labels of the K data points, the most class labels are found and used as the predicted class labels of the predicted samples.
Step 6.1, P n E (S) is homomorphically encrypted decrypted by the following formula,
e(S)=<e(S 1 ),e(S 2 ),…,e(S l )>
S l =L(e(S l ) λ mod n 2 )*μmod n
where < lambda, mu > is the private key sk.
Step 6.2, P n By obtaining a global distance vector S =<S 1 ,S 2 ,…,S l >And the class label record l=existing in itself<L 1 ,L 2 ,…,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,…,<S l ,L l >}
To make kNN prediction on the prediction sample, first P n And presetting a K value, wherein K is a preset parameter and represents the number of the statistic class numbers.
Step 6.3, P n Ordering SL from small to large and retaining the first K elements Sort (SL) = {<S 1 ,L 1 >',<S 2 ,L 2 >',…,<S K ,L K >'the most numerous classes are then counted in class number record L', which is the prediction result of the prediction sample.
Correspondingly, the invention also provides a distributed privacy protection classification system based on homomorphic encryption, which comprises:
parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', wherein one data participant is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And the local data set D i Corresponding to all data participants P i Each data participant P i The corresponding prediction sample part is x i ′;
Key generation module, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈1,n-1;
Vector generation module for computing local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i To the local distance vector d i Decomposing into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, wherein the sum of the n disturbance sub-vectors is 0, and correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a distance sub-vector s 'with disturbance' ji
Vector encryption module, each data participant P i For self-owned distance subvector s 'with perturbation' 1i ,s' 2i ,…,s' ni Reserve s' ii Will s' ji (j∈[1,n]Homomorphic encryption is carried out on j not equal to i) to obtain an encrypted disturbance distance subvector e (s' ji ) E (s' ji ) Sent to data participant P j When P i Distance sub-vector e (s 'for completely receiving encrypted disturbances from other data participants' ji ) Rear (j.noteq.i), P i Homomorphic encryption summation is carried out to obtain
Global vector generation module, each data participant P 1 ~P n-1 Will e (S) i ) Transmitting to a data miner Pn, and summing the data miner PnObtaining an undisturbed encryption global distance vector e (S);
classification output module and data mining party P n Decrypting the obtained encrypted global distance vector e (S) by using the private key sk to obtain the global distance vector S, and obtaining the data mining party P n By counting class labels of the K data points, the most class labels are found and used as the predicted class labels of the predicted samples.
In order to more clearly introduce the technical scheme of the invention, a distributed privacy protection classification example based on homomorphic encryption is provided. As shown in the table 1 below,
TABLE 1
X1 X2 X3 X4 X5 X6 Y
1187 431 239 280 75 364 2
1260 246 220 710 73 158 3
1422 399 251 510 89 353 1
1252 243 217 200 90 278 2
1307 150 210 370 118 269 1
1207 216 217 276 86 328 2
1383 165 260 560 124 337 1
1388 504 223 490 58 133 3
1324 398 229 436 82 300 1
1225 388 220 821 65 200 3
Global dataset D is defined by P 1 Local data set (containing attributes X1, X2), P 2 Local data sets (containing attributes X3, X4) and P 3 Consists of ten pieces of data in total, namely n= 3,l =6, m=10, of the local data set (including the attributes X5, X6 and the class label Y). The individual data sets are represented by the following formula,
D 1 ={X1,X2}
D 2 ={X3,X4}
D 3 ={X5,X6,Y}
D={X1,X2,X3,X4,X5,X6,Y}
P 3 generating keys pk, sk by the paillier algorithm and then sending pk to P 1 And P 2
We have the following prediction samples x' = {1270,355,236,500,78,129}, where P 1 Having only x' 1 ={1270,355}、P 2 Having only x' 2 ={236,500}、P 3 Having only x' 3 ={78,129}。
P 1 Calculating to obtain a local distance vector d 1 ={6654645,6762101,7815380,6718088,6895954,6461570,7308809,7802845,7295845,6777074},P 1 For d 1 Decomposing to obtain
s 11 ={944152,202554,117739,701284,179399,834307,937513,429851,618391,551784}
s 12 ={516237,300613,615949,916074,823429,809045,817028,867594,285042,227010}
s 13 ={5194256,6258934,7081692,5100730,5893126,4818218,5554268,6505400,6392412,5998280},P 1 Generating disturbance vectors
r 11 ={-495209,-722845,151779,671406,-722113,-117051,981280,-335316,391540,336789}
r 12 ={-572483,-545057,-518861,767570,-734741,196243,-398850,81357,-597024,576812}
r 13 ={1067692,1267902,367082,-1438976,1456854,-79192,-582430,253959,205484,-913601}
P1 combining the disturbance vector into the distance vector to obtain
s' 11 ={448943,-520291,269518,1372690,-542714,717256,1918793,94535,1009931,888573}
s' 12 ={-56246,-244444,97088,1683644,88688,1005288,418178,948951,-311982,803822}
s' 13 ={6261948,7526836,7448774,3661754,7349980,4739026,4971838,6759359,6597896,5084679}
P 1 Encrypting it e (pk, s' 11 ),e(pk,s' 12 ),e(pk,s' 13 ) E (pk, s' 12 ) Send to P 2 、e(pk,s' 13 ) Send to P 3
Similarly, P 2 The same operation is also performed. Obtaining
d 2 ={834025,1672036,1257269,695209,955816,807385,1369616,1190781,1092321,1952977}
s 21 ={888708,373635,203935,509229,899667,900983,225738,459837,574678,264926}
s 22 ={775161,401124,912219,897221,631254,942990,525456,985425,883576,348348}
s 23 ={-829844,897277,141115,-711241,-575105,-1036588,618422,-254481,-365933,1339703}
r 21 ={341672,378146,-957230,49325,-522974,-172005,-726181,-605921,756926,-838636}
r 22 ={-839858,-614081,584053,880419,-69298,497012,909647,-866401,318681,-899783}
r 23 ={498186,235935,373177,-929744,592272,-325007,-183466,1472322,-1075607,1738419}
s' 21 ={1230380,751781,-753295,558554,376693,728978,-500443,-146084,1331604,-573710}
s' 22 ={-64697,-212957,1496272,1777640,561956,1440002,1435103,119024,1202257,-551435}
s' 23 ={-331658,1133212,514292,-1640985,17167,-1361595,434956,1217841,-1441540,3078122}
P 3 Obtaining
d 3 ={266458,105170,260213,193873,196820,235745,257960,87140,209641,128690}
s 31 ={985273,314437,300723,504939,931920,697098,906371,830535,558415,439459}
s 32 ={42945,633334,763828,913768,539469,616078,388222,558941,121756,500723}
r 31 ={-217937,841799,-418199,832906,989079,931987,58489,-972894,-143857,-125059}
r 32 ={667088,-767364,-723753,519276,865585,49575,-725175,267744,-945622,222052}
r 33 ={-449151,-74435,1141952,-1352182,-1854664,-981562,666686,705150,1089479,-96993}
s' 31 ={767336,1156236,-117476,1337845,1920999,1629085,964860,-142359,414558,314400}
s' 32 ={710033,-134030,40075,1433044,1405054,665653,-336953,826685,-823866,722775}
s' 33 ={-1210911,-917036,337614,-2577016,-3129233,-2058993,-369947,-597186,618949,-908485}
After exchanging data with each other, P 1 Possessing e (pk, s' 11 ),e(pk,s' 21 ),e(pk,s' 31 ) Summing to obtain e (pk, S 1 ) Wherein
S 1 ={2446659,1387726,-601253,3269089,1754978,3075319,2383210,-193908,2756093,629263}
Identical, P 2 Obtaining e (pk, S) 2 )、P 3 Obtaining e (pk, S) 3 )
S s ={589090,-591431,1633435,4894328,2055698,3110943,1516328,1894660,66409,975162}
S 3 ={4719379,7743012,8300680,-556247,4237914,1318438,5036847,7380014,5775305,7254316}
P 1 Will e (pk, S 1 ) Send to P 3 、P 2 Will e (pk, S 2 ) Send to P 3
P3 performs a summation operation e (pk, S) =e (pk, S 1 )+e(pk,S 2 )+e(pk,S 3 ) And decrypting e (pk, S) with sk,
S={7755128,8539307,9332862,7607170,8048590,7504700,8936385,9080766,8597807,8858741}
P n combining S with the serial number of the self class, and sorting according to the distance to obtain
Assume thatP n The preset k=5, and the class labels 1,2,3 respectively account for 20%,60%,20% in the first K pieces of data, thus P n The class label of the prediction sample is 2.
While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims (6)

1. The distributed privacy protection classification method based on homomorphic encryption is characterized by comprising the following steps of:
step 1, defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', wherein one data participant is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And the local data set D i Corresponding to all data participants P i Each data participant P i The corresponding prediction sample part is x i 'A'; comprising the following steps:
defining the symbol of n data participants participating in data mining as P i ,i∈[1,n]The global dataset D contains the following attributes
<ID,A 11 ,A 21 ,...,A m1 ,...,A 1n ,...,A kn ,L>,m,k∈Z
Wherein ID is a primary key attribute, A is a numerical value type condition attribute, and L is a class number attribute; m and k are integers and represent the attribute number of each parameter party;
vertically dividing D into n parts and respectively holding the n parts by each data participant, and defining the ith part as a local data set D i ,i∈[1,n]And held by Pi, and Di contains attributes,
<ID,A 1i ,A 2i ,...,A mi >, m is an integer
D i ={x 1i ,...,x ji ,...,x li }
Wherein:
l denotes the number of records contained in the original dataset D,
x ji representing data set D i J=1, 2, …, l
m represents the dataset D i The number of attributes of the data points, i.e. data set D i For an m-dimensional dataset containing l data points,
representing data point x ji R = 1,2, …, m;
P n owned local dataset D n Containing class label attributes, dn is expressed by the following formula
<ID,A 1i ,A 2i ,...,A mi ,L>,m∈Z
The data participant Pn is called a data mining party, which participates in the prediction work of the prediction data;
defining a global prediction sample with a prediction class label as x':
x′=<x (11)′ ,x (21)′ ,...,x (m1)′ ,…,x (1N)′ ,...,x (kn)′ >,m,k∈Z
wherein:
x (ii)′ a value representing the ii-th attribute of the global prediction sample x';
m and k are integers and represent the attribute number of each parameter party;
wherein each data participant Pi has a portion of a global prediction sample, determinedMeaning x' i Expressed by the following formula:
x′ i =<x (1i)′ ,...,x (ki)′ >,k∈Z,i∈[1,n];
step 2, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Step 3, calculating the local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i To the local distance vector d i Decomposing into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, wherein the sum of the n disturbance sub-vectors is 0, and correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a distance sub-vector s 'with disturbance' ji
Step 4, each data participant P i For self-owned distance subvector s 'with perturbation' 1i ,s′ 2i ,...,s′ ni Reserve s' ii Will s' ji (j∈[1,n]Homomorphic encryption is carried out on j not equal to i) to obtain an encrypted disturbance distance subvector e (s' ji ) E (s' ji ) Sent to data participant P j When P i Distance sub-vector e (s 'for completely receiving encrypted disturbances from other data participants' ji ) Rear (j.noteq.i), P i Homomorphic encryption summation is carried out to obtain
Step 5, each data participant P 1 ~P n-1 Will e (S) i ) Transmitting to a data miner Pn, and summing the data miner PnObtaining an undisturbed encryption global distance vector e (S);
step 6, data mining party P n To the obtained encryptionThe global distance vector e (S) is decrypted by using the private key sk to obtain the global distance vector S, and the data mining party P n By counting class labels of the K data points, finding the most class label and taking the most class label as a predicted class label of a predicted sample, wherein the method comprises the following steps:
step 6.1, data mining Party P n E (S) is homomorphically encrypted decrypted by the following formula,
e(S)=<e(S 1 ),e(S 2 ),...,e(S l )>
S l =L(e(S l ) λ mod N 2 )*μmod N
wherein < lambda, mu > is the private key sk;
step 6.2, data mining Party P n By obtaining a global distance vector S =<S 1 ,S 2 ,...,S l >And the class label record l=existing in itself<L 1 ,L 2 ,...,L l >Composing key-value pairs
SL={<S 1 ,L 1 >,<S 2 ,L 2 >,...,<S l ,L l >}
To make kNN prediction on the prediction sample, first P n Presetting a K value;
step 6.3, data mining Party P n Ordering SL from small to large and retaining the first K elements Sort (SL) = {<S 1 ,L 1 >′,<S 2 ,L 2 >′,...,<S K ,L K >'the most numerous classes are then counted in class number record L', which is the prediction result of the prediction sample.
2. The distributed privacy protection classification method based on homomorphic encryption as claimed in claim 1, wherein: the step 2 comprises the following steps:
the data miner generates the public key pk and private key sk of the pamillier homomorphic encryption algorithm by:
randomly selecting two prime numbers p and q to satisfy gcd (pq, (p-1) (q-1))=1, wherein gcd (x, y) is a function of greatest common divisor used to solve for greatest common divisors of x and y scalar quantities;
calculating n=pq and λ=lcm (p-1, q-1), where lcm (x, y) is a least common multiple function used to solve for the least common multiple of the x and y scalar quantities;
definition of a function
Randomly selecting less than n 2 And satisfies the presence of mu
μ=(L(g λ mod n 2 )) -1 mod n
The public key pk is < n, g >, and the private key sk is < lambda, mu >.
3. The distributed privacy protection classification method based on homomorphic encryption as claimed in claim 2, wherein:
the step 3 specifically includes:
step 3.1, the data participant Pi calculates the local data set Di and the prediction samples x' i Local distance vector d between i Expressed by the following formula,
d i =<dist(x 1i ,x′ i ),dist(x 2i ,x′ i ),...,dist(x ni ,x′ i )>
wherein:
dist(x i ,x j ) I, j=1, 2, …, n represents the data point x i Data point x j A square distance function of (2), a scalar;
d i a distance vector representing the euclidean distance construction between any data point of the computation dataset Di and the predicted sample;
step 3.2, data participant P i By the following stepsMode decomposition d i Distance vector: first randomly generating n-1 vectors d in the same mode as di 1i ,d 2i ,...,d n-1,i Then calculate
Step 3.3, P data participant i generates n r by ji Disturbance subvector: first randomly generating n-1 and d i Vector r of the same mode 1i ,r 2i ,...,r n-1,i Then calculate
4. A distributed privacy preserving classification method based on homomorphic encryption as claimed in claim 3, wherein: the step 4 specifically includes:
the data participant Pi pairs s 'by the following formula' ji The homomorphic encryption is carried out and the data are encrypted,
wherein r is randomly selected, satisfies 0 < r < n and<n,g>is the public key pk.
5. The distributed privacy protection classification method based on homomorphic encryption according to claim 4, wherein: said step 5 is specificallyComprising the following steps: p (P) n The resulting undisturbed encrypted global distance vector e (S) is expressed as:
6. a distributed privacy preserving classification system based on homomorphic encryption for implementing the classification method of any one of claims 1-5, the system comprising:
parameter definition module for defining n data participants P participating in data mining i ,i∈[1,n]The attribute of the global data set D and the global prediction sample x', wherein one data participant is taken as a data mining party and is marked as P n Splitting the global data D vertically into different local data sets D i ,i∈[1,n]And the local data set D i Corresponding to all data participants P i Each data participant P i The corresponding prediction sample part is x i ′;
Key generation module, data mining party P n Generating a public key pk and a private key sk required by homomorphic encryption, reserving the private key sk for decryption, and sending the public key pk to other data participants P i ,i∈[1,n-1];
Vector generation module for computing local data set D owned by each data participant i And corresponding prediction sample x i ' local distance vector d between i Each data participant P i To the local distance vector d i Decomposing into n homomodal distance sub-vectors, randomly generating n disturbance sub-vectors, wherein the sum of the n disturbance sub-vectors is 0, and correspondingly combining the disturbance sub-vectors into the distance sub-vectors to obtain a distance sub-vector s 'with disturbance' ji
Vector encryption module, each data participant P i For self-owned distance subvector s 'with perturbation' 1i ,s′ 2i ,...,s′ ni Reserve s' ii Will s' ji (j∈[1,n]J not equal i) homomorphic encryptionObtaining a distance sub-vector e (s' ji ) E (s' ji ) Sent to data participant P j When P i Distance sub-vector e (s 'for completely receiving encrypted disturbances from other data participants' ji ) Rear (j.noteq.i), P i Homomorphic encryption summation is carried out to obtain
Global vector generation module, each data participant P 1 ~P n-1 Will e (S) i ) Transmitting to a data miner Pn, and summing the data miner PnObtaining an undisturbed encryption global distance vector e (S);
classification output module and data mining party P n Decrypting the obtained encrypted global distance vector e (S) by using the private key sk to obtain the global distance vector S, and obtaining the data mining party P n By counting class labels of the K data points, the most class labels are found and used as the predicted class labels of the predicted samples.
CN202211372124.3A 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption Active CN115587139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211372124.3A CN115587139B (en) 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211372124.3A CN115587139B (en) 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption

Publications (2)

Publication Number Publication Date
CN115587139A CN115587139A (en) 2023-01-10
CN115587139B true CN115587139B (en) 2024-03-22

Family

ID=84781087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211372124.3A Active CN115587139B (en) 2022-11-03 2022-11-03 Distributed privacy protection classification method and system based on homomorphic encryption

Country Status (1)

Country Link
CN (1) CN115587139B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601596A (en) * 2015-02-05 2015-05-06 南京邮电大学 Data privacy protection method in classification data mining system
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN108111294A (en) * 2017-12-13 2018-06-01 南京航空航天大学 A kind of multiple labeling sorting technique of the protection privacy based on ML-kNN
CN110008717A (en) * 2019-02-26 2019-07-12 东北大学 Support the decision tree classification service system and method for secret protection
CN111967514A (en) * 2020-08-14 2020-11-20 安徽大学 Data packaging-based sample classification method for privacy protection decision tree
WO2020233260A1 (en) * 2019-07-12 2020-11-26 之江实验室 Homomorphic encryption-based privacy-protecting multi-institution data classification method
CN114817999A (en) * 2022-06-28 2022-07-29 北京金睛云华科技有限公司 Outsourcing privacy protection method and device based on multi-key homomorphic encryption
CN115150060A (en) * 2022-07-06 2022-10-04 三未信安科技股份有限公司 Data privacy protection method based on secure multi-party clustering method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11962679B2 (en) * 2020-06-19 2024-04-16 Duality Technologies, Inc. Secure distributed key generation for multiparty homomorphic encryption

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601596A (en) * 2015-02-05 2015-05-06 南京邮电大学 Data privacy protection method in classification data mining system
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN108111294A (en) * 2017-12-13 2018-06-01 南京航空航天大学 A kind of multiple labeling sorting technique of the protection privacy based on ML-kNN
CN110008717A (en) * 2019-02-26 2019-07-12 东北大学 Support the decision tree classification service system and method for secret protection
WO2020233260A1 (en) * 2019-07-12 2020-11-26 之江实验室 Homomorphic encryption-based privacy-protecting multi-institution data classification method
CN111967514A (en) * 2020-08-14 2020-11-20 安徽大学 Data packaging-based sample classification method for privacy protection decision tree
CN114817999A (en) * 2022-06-28 2022-07-29 北京金睛云华科技有限公司 Outsourcing privacy protection method and device based on multi-key homomorphic encryption
CN115150060A (en) * 2022-07-06 2022-10-04 三未信安科技股份有限公司 Data privacy protection method based on secure multi-party clustering method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Shaul, H. ; Feldman, D. ; Rus, D.Secure k-ish Nearest Neighbors Classifier.Proceedings on Privacy Enhancing Technologies.2020,第42-61页. *
同态加密隐私保护数据挖掘方法综述;钱萍;吴蒙;;计算机应用研究(05);第20-23+28页 *
崔建京 ; 龙军 ; 闵尔学 ; 于洋 ; 殷建平.同态加密在加密机器学习中的应用研究综述.计算机科学.2018,第46-52页. *

Also Published As

Publication number Publication date
CN115587139A (en) 2023-01-10

Similar Documents

Publication Publication Date Title
Zhang et al. DeepPAR and DeepDPA: privacy preserving and asynchronous deep learning for industrial IoT
Almaiah et al. A new hybrid text encryption approach over mobile ad hoc network
Zhong et al. Privacy-enhancing k-anonymization of customer data
US20220303114A9 (en) Format-preserving encryption method based on stream cipher
CN111510281A (en) Homomorphic encryption method and device
Yang et al. Collusion-resistant privacy-preserving data mining
Paar et al. Introduction to cryptography and data security
CN117118617B (en) Distributed threshold encryption and decryption method based on mode component homomorphism
Lyu Lightweight crypto-assisted distributed differential privacy for privacy-preserving distributed learning
Kim et al. How to securely collaborate on data: Decentralized threshold he and secure key update
US20170359177A1 (en) Method and System for Cryptographic Decision-making of Set Membership
US20190215148A1 (en) Method of establishing anti-attack public key cryptogram
CN115587139B (en) Distributed privacy protection classification method and system based on homomorphic encryption
Shao et al. A new method to compute ratio of secure summations and its application in privacy preserving distributed data mining
Al Etaiwi et al. Structured encryption algorithm for text cryptography
Cao et al. Fuzzy Identity‐Based Ring Signature from Lattices
Qiu et al. Efficient privacy-preserving outsourced k-means clustering on distributed data
Narad et al. Secret sharing scheme for group authentication—A review
Lakum et al. A key-ordered decisional learning parity with noise (DLPN) scheme for public key encryption scheme in cloud computing
Eich et al. A Quantum-Safe Public-Key-Algorithms Approach with Lattice-Based Scheme
Kjamilji Blockchain assisted secure feature selection, training and classifications in cloud and distributed edge IoT environments
Zhu et al. Privacy-preserving affinity propagation clustering over vertically partitioned data
CN111130786A (en) Multi-party cooperative SM2 key generation and ciphertext decryption method and medium
Lamba et al. Privacy-preserving frequent itemset mining in vertically partitioned database using symmetric homomorphic encryption scheme
Yang et al. An Efficient Identity-Based Encryption With Equality Test in Cloud Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant