CN111259443B - PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage - Google Patents

PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage Download PDF

Info

Publication number
CN111259443B
CN111259443B CN202010046301.3A CN202010046301A CN111259443B CN 111259443 B CN111259443 B CN 111259443B CN 202010046301 A CN202010046301 A CN 202010046301A CN 111259443 B CN111259443 B CN 111259443B
Authority
CN
China
Prior art keywords
eid
alice
prediction
bob
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010046301.3A
Other languages
Chinese (zh)
Other versions
CN111259443A (en
Inventor
张韶峰
单进勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bairong Yunchuang Technology Co ltd
Original Assignee
Bairong Yunchuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bairong Yunchuang Technology Co ltd filed Critical Bairong Yunchuang Technology Co ltd
Priority to CN202010046301.3A priority Critical patent/CN111259443B/en
Publication of CN111259443A publication Critical patent/CN111259443A/en
Application granted granted Critical
Publication of CN111259443B publication Critical patent/CN111259443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for protecting privacy of a federal learning prediction stage based on a PSI technology comprises the following steps: firstly, a prediction service party calculates a prediction result of a model of the prediction service party, then the two parties execute an improved PSI protocol, and the prediction demand party decrypts a calculation result of a data provider by encrypting a part of model calculation results of the prediction service party in combination with self data to finally obtain a prediction result of a keyword id shared by the two parties. The method utilizes the PSI technology to encrypt data through the key derivation function, meets the privacy protection requirement in the federal learning prediction stage, gets through the last link of the federal learning privacy safety protection, and promotes the federal learning more application scenes to fall to the ground.

Description

PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage
Technical Field
The invention relates to the field of information safety and the field of artificial intelligence, in particular to application of a PSI (program specific information) technology in federal learning, which realizes privacy protection in a federal learning prediction stage and is a method for protecting the privacy in the federal learning prediction stage based on the PSI technology.
Background
The rapid development of emerging technologies such as big data, cloud computing, internet of things and the like leads to the explosive growth of data, and the data are mastered by different entity organizations. On one hand, with the issuance of laws and regulations such as the network security law of China and the General Data Protection Regulation (GDPR) of the European Union, the requirements of governments of various countries on the privacy protection of user data will become more and more strict, and bottlenecks already appear in the use and analysis of big data only by relying on the traditional method; on the other hand, each entity organization grasping a large amount of data hopes to realize data sharing, mine the potential huge value of the data, and does not hope to reveal own data. Therefore, in the process of big data application, data privacy security is more and more emphasized. Common privacy protection techniques include k-anonymity, differential privacy, homomorphic encryption, secure multiparty computing, and the like.
The federal learning technique proposed in google of 2016, which is considered one of the techniques to go out of the dilemma, is gaining increasing attention. The federal learning is actually a distributed learning technology based on cryptography, and each participating entity organization trains a model on an equal level on the premise of not revealing own data, namely the federal model, so that the requirements of user privacy protection, organization data safety and government laws and regulations can be met, and the model training effect can be ensured. According to the distribution characteristics of the data set, the federal learning is divided into horizontal federal learning, vertical federal learning and federal transfer learning.
From the currently published codes, papers and patents, in the prior art, the federal learning scheme focuses more on the training phase of the model, and how to implement the prediction phase is rarely mentioned, and most of the prior art only simply illustrates that the federal learning participators perform joint prediction by using the federal learning model, and then think that the user directly uses the federal learning model. Although the federal learning in the prior art takes the data privacy protection of the training phase of the federal learning model into consideration, the data security problem which may exist in the subsequent use of the over-model is not taken into deep consideration. As the last ring of the federal learning, the prediction is used most frequently in the actual application scene, and the application value of the federal learning is reflected, while most of the prior art only considers how to obtain the federal learning model, and few researches are made on the problems existing in the use of the model. The actual situation of federal learning is that each entity organization can only obtain part of the model through federal learning, so that the participation of each entity organization is still required in the prediction stage. Meanwhile, in a real-world scenario, a party who proposes a prediction demand often does not want other participants to know the id of prediction data, and therefore privacy protection is also needed in the prediction stage. For example, the lending institution and the credit bureau cooperate to predict the credit of a borrower, the lending institution does not want to let the other party know that the borrower has the borrowing demand, otherwise the credit bureau can provide the borrower information to other lending institutions, so the lending institution hopes that the borrower information is not leaked while the lender credit is predicted.
The Privacy Set Interaction (PSI) belongs to a specific application problem in the field of secure multi-party computing, and is one of the hot problems of privacy protection. The method can be used for data alignment before training the model in federal learning and can also be used for realizing privacy protection in a prediction phase. PSI allows participants to use respective data sets to calculate intersection through a series of underlying cryptography techniques, and does not reveal any data of the participants except the intersection, wherein the information of the intersection can be obtained by a certain participant or all the participants. Therefore, the PSI technology has potential application value in the scenes of blacklist sharing, marketing matching, similar document detection, private contact person discovery and the like. The method is mainly divided into 3 types of protocols according to the difference of the cryptographic technology used by the bottom layer: PSI based on public key cryptography, PSI based on obfuscation circuits, and PSI based on OT (Oblivious Transfer) protocol. Although PSI technology is mature, there has not been any relevant study on how to specifically use it in the prediction phase of federal learning.
Disclosure of Invention
The invention aims to solve the problems that: in the existing federal learning technology, most emphasis is placed on establishment of a federal learning model, and actual landing application research on the federal learning is few; as the last ring of federal learning, prediction is used most frequently in practical application scenarios, while the prior art is only rarely studied on the prediction stage, and the data security problem of the federal learning prediction stage is not considered.
The technical means of the invention is as follows: in the prediction stage of the federal learning, data are interacted by the PSI technology based on a privacy set, interaction information is encrypted in the interaction process, and privacy protection is performed on a party participating in the federal learning.
The federal learning is longitudinal federal learning, the participants of the federal learning comprise a forecasting demand party Alice, a forecasting service party Bob and a trusted third party Carol, and Alice and Bob respectively obtain corresponding part models of the federal learning Model through the federal learning under the help of CarolAAnd a ModelBIn the prediction phase, Alice and Bob use respective Model modelsAAnd a ModelBCalculating a prediction result and then performing merging prediction: and carrying out privacy protection on Alice and Bob based on PSI in the merged prediction, wherein Bob encrypts the sent prediction result, Alice decrypts the prediction result corresponding to the intersection part after completing set intersection based on PSI and Bob to obtain the required prediction data, so that Alice can not reveal own privacy information while obtaining the prediction result of Bob, and Bob can not reveal own privacy information when providing the prediction data.
In the Federal learning model, the data samples (id, f) are composed of keywords id and featuresAnd (3) forming a value F, recording a set of ID as ID and a characteristic set as F, and when Alice sends a prediction request in the prediction stage of Federal learning, firstly, Bob sends a prediction request according to a ModelBCalculating the prediction result of the Model, then Alice and Bob execute a PSI protocol to obtain masks of respective keywords, Bob encrypts the prediction result of the Model through a key derivation function, Alice performs set intersection according to the PSI transformation result of the keywords, then decrypts the prediction result of the intersection part according to the key derivation function, and the prediction result and the Model of the Model are combinedAAnd combining the calculated prediction results to finally obtain the prediction result of the key word id shared by the two parties.
In order to solve the problem of privacy protection in the federal learning prediction stage, the invention provides a solution based on a PSI technology, which meets the privacy protection requirement in the federal learning prediction stage, realizes the last link of privacy safety protection in federal learning, and promotes the falling of more application scenes in federal learning. The invention aims to solve the problems that when a prediction demand side Alice cooperates with a prediction service side Bob for prediction, the prediction demand side Bob cannot know the id information and the characteristic value information of the ID information which needs to be predicted by the Alice, and meanwhile, the Alice cannot know the data set of the Bob, and the prediction service side Bob cannot know the prediction data id which is needed by the Alice and can send the calculation result of the self model to the prediction demand side Alice at the same time when the prediction service side Bob cannot know the characteristic value corresponding to the id which is overlapped with the Bob.
The privacy problem research of the prior art in the federal learning prediction stage is still blank, although the PSI technology provides a safety scheme for two data interaction parties to obtain intersection information, the prediction result of the federal learning not only lies in the acquisition of coincident ids of Alice and Bob, but also lies in a specific value related to the ids, the privacy protection in the prediction stage not only lies in preventing Bob from knowing which ids are submitted by Alice, but also lies in the safety protection of information provided by Bob, the invention researches the privacy and data safety problems in the federal learning prediction stage, analyzes the problems existing in the stage, and provides a solution, after the prediction demand party and the prediction service party obtain the self prediction result according to the federal learning model, under the privacy protection method of the invention, the prediction demand party Alice further obtains the related prediction result of the prediction service party Bob, and a more perfect prediction result is obtained by combining, and the privacy safety of Alice and Bob and the data safety in the data interaction process are ensured.
Drawings
Fig. 1 is a flow chart of federal learning, in which a prediction stage of federal learning is located in the invention in a dashed line box, and the federal learning can be generally divided into several stages of data alignment, training, prediction and the like.
FIG. 2 is a flow chart schematic of the Federal learning prediction phase of the present invention, namely the contents of the dashed box of FIG. 1 as refined.
Detailed Description
The invention provides a method for protecting privacy of a federal learning prediction stage based on a PSI technology. As shown in fig. 1, the federal learning can be roughly divided into several stages of data alignment, training, prediction, etc., and the present invention is a technical solution made for data security and privacy security of the prediction stage, i.e., the part of the dashed box in fig. 1.
The method is mainly used for longitudinal federal learning, and the participants of the federal learning comprise a prediction demand party Alice, a prediction service party Bob and a trusted third party Carol, wherein Alice and Bob respectively obtain corresponding part models of a federal learning Model through the federal learning with the help of CarolAAnd a ModelBThe data sample (id, f) is composed of a keyword id and a characteristic value f, and in the prediction stage, Alice and Bob respectively use respective Model modelsAAnd ModelBAnd (3) calculating prediction results for respective data samples, and then performing merging prediction: in the merging prediction, based on PSI, privacy protection is carried out on Alice and Bob, the Alice and the Bob execute PSI protocol to obtain masks of respective keywords, wherein the Bob further encrypts the sent prediction result through a key derivation function, the Alice carries out set intersection according to PSI transformation results of the keywords, then decrypts the prediction result of the intersection part according to the key derivation function and the prediction result is combined with the Model of the Alice and the Model of the AliceAThe calculated prediction results are merged to finally obtain the common keywords of the two partiesD, predicting the result; therefore, Alice can not reveal own private information while obtaining the prediction result of Bob, and Bob can not reveal own private information when providing prediction data.
The implementation of the present invention is specifically illustrated in fig. 2. In the federal learning model, Data samples (ID, F) are composed of keywords ID and characteristic values F, the set of ID is represented as ID, the set of characteristic is represented as F, and Data sets using Alice and Bob as prediction are respectively represented as DataAAnd DataBThe corresponding key word sets are respectively IDA、IDBThe corresponding feature sets are respectively FA、FB,DataAAnd DataBThe keywords have intersection and different characteristic values; dataAAnd DataBThe samples in the dataset are (id, f) respectivelyA,id) And (id, f)B,id) Wherein f isA,idAnd fB,idRespectively representing the characteristic values corresponding to the respective keyword IDs of Alice and Bob, wherein Alice needs to calculate ID e to ID under the cooperation of BobAWhen Alice sends a prediction request, Alice and Bob execute the following steps:
s1) Bob utilizes ModelBFor self Data set DataBCalculating to obtain scoreB,id=ModelB(fB,id) For all ID ∈ IDBForming a new data set denoted as PredictB={(id,scoreB,id)|id∈IDB}。
S2) Alice and Bob execute the PSI protocol to Predict the data set of BobBTransformation to ePB={(eidB,id,scoreB,id)|id∈IDBWhere the same id may correspond to multiple eidsB,idAlice's keyword set IDAConversion to eIDA={eidA,id|id∈IDAWhere the same id corresponds to only one eidA,id
S3) for each (eid)B,id,scoreB,id)∈ePBBob derives a symmetric encryption key k using a key derivation function KDFid=KDF(eidB,idId, iter, klen), among othersB,idRepresenting a password, id as a keyIter represents the number of iterations, and klen represents the key length of the symmetric encryption algorithm; in cryptography, a salt value is a result of a hash operation that is not matched with a hash value of an original password by inserting a specific character string at an arbitrary fixed position of the password.
S4) Bob uses k in step S3)idTo scoreB,idIs encrypted to obtain cid=Enc(kid,scoreB,id) Wherein Enc (·,) is an encryption algorithm of a symmetric cipher, and a new data set EPredict is obtainedB={(eidB,id,cid)|id∈IDBAnd sending to Alice; preferably, eid is also treatedB,idCarrying out Hash operation to obtain EPredictB={(H(eidB,id),cid)|id∈IDBH (-) is a hash function, so as to prevent the data sent by Bob from being violently searched, and further ensure the data security, and Alice performs calculation by using the same hash function in step S5).
S5) according to the data set EPredict sent by Bob in step S4)B={(eidB,id,cid)|id∈IDBAnd (4) calculating intersection by Alice:
EPredictA∩B={(eidA,id,cid)|eidA,id=eidB,id,eidA,id∈eIDA,(eidB,id,cid)∈ePB}
if the eid of BobB,idAnd performing hash operation, and then, Alice calculates by using the same hash function, and the calculation intersection is as follows:
EPredictA∩B={(eidA,id,cid)|H(eidA,id)=H(eidB,id),eidA,id∈eIDA,(H(eidB,id),cid)∈ePB};
in this step, a cuckoofilter is further preferably adopted to improve the intersection calculation efficiency.
S6) for all (eid)A,id,cid)∈EPredictA∩BAlice obtains the corresponding keyword intersection idA∩B∈IDA∩IDBComputing a symmetric key k using a key derivation functionid,A∩B=KDF(eidA,id,idA∩BIter, klen), decrypt cidGet prediction result score of Bob corresponding to intersectionB,id=Dec(kid,A∩B,cid) Where Dec (·,) is the decryption algorithm of symmetric cipher to obtain data intersection PredictA∩B={(id,scoreB,id)|id∈IDA∩IDB}。
S7) obtaining the Presect according to the result of the step S6)A∩BAnd Alice calculates a merged prediction result: for ID ∈ IDA∩IDBAccording to ModelACalculate scoreA,id=ModelA(fA,id) Will scoreA,idAnd PredictA∩BScore in (1)B,idThe final prediction result score is obtained by combinationid=Merge(scoreA,id,scoreB,id) Where Merge (·,) is the machine learning algorithm set for federal learning.
As shown in fig. 2, the present invention obtains the respective masks of Alice and Bob by using PSI technique. The prediction service party uses the mask as a seed of a Key Derivation Function (KDF) to obtain a symmetric cipher encryption key, and encrypts a self model calculation result. And the prediction demand party can obtain a decryption key through a key derivation function according to the intersection result, and obtain required data from the prediction service party to complete final prediction.
The invention can use various PSI protocols, such as PSI based on OT protocol, and realizes the prediction of large-scale data while protecting privacy; the PSI technology based on the public key is suitable for the situation that the data volume of Alice is small and the data volume of Bob is large; and various machine learning algorithms can be used, so that the method is suitable for different application scenes.
The practice of the invention is illustrated by the following specific examples.
Example 1:
assuming that the machine learning algorithm used in the present invention is logistic regression, Alice's data features are
Figure BDA0002369510770000051
Bob is characterized by the data
Figure BDA0002369510770000052
Wherein d isAAnd dBRespectively, are the feature numbers owned by Alice and Bob. Part of models obtained by Alice and Bob through federal learning are respectively
Figure BDA0002369510770000053
And
Figure BDA0002369510770000054
wherein theta is-1Is the intercept. The prediction function of the logistic regression is
Figure BDA0002369510770000055
Wherein
Figure BDA0002369510770000056
For specific sample data, the first half
Figure BDA0002369510770000057
Owned by Alice, the second half
Figure BDA0002369510770000058
Is owned by Bob. Then Alice predicts according to the following steps:
1. for all ID e IDsBBob calculation
Figure BDA0002369510770000061
Form a new data set denoted PredictB={(id,scoreB,id)|id∈IDB};
Alice and Bob perform an improved PSI based on the OT protocol to obtain a new data set eID respectivelyA={eidA,id|id∈IDA} and ePB={(eidB,id,scoreB,id)|id∈IDB};
3. For each (eid)B,id,scoreB,id)∈ePBBob derives a symmetric encryption key k using a key derivation function KDFid=KDF(eidB,idId, iter, klen), among othersB,idRepresenting a password, taking id as a salt value of a key, representing iter representing iteration times, and representing the key length of a symmetric encryption algorithm by klen;
bob uses k in step 3idTo scoreB,idIs encrypted to obtain cid=Enc(kid,scoreB,id) Wherein Enc () is an encryption algorithm of a symmetric cipher to obtain a new data set EPredictB={(H(eidB,id),cid)|id∈IDBH (·) is a hash function, and send to Alice;
5. according to the data set EPredict sent by Bob in step 4BAnd Alice calculates the intersection:
EPredictA∩B={(eidA,id,cid)|H(eidA,id)=H(eidB,id),eidA,id∈eIDA,(H(eidB,id),cid)∈ePB};
meanwhile, improving the intersection calculation efficiency in the step 5 by using cuckoofilter;
6. for all (eid)A,id,cid)∈EPredictA∩BAlice gets the corresponding idA∩B∈IDA∩IDBComputing a symmetric key k using a key derivation functionid,A∩B=KDF(eidA,id,idA∩BIter, klen), decryption cidObtain score of corresponding intersectionB,id=Dec(kid,A∩B,cid) Where Dec (·,) is the decryption algorithm of symmetric cipher to obtain data intersection PredictA∩B={(id,scoreB,id)|id∈IDA∩IDB};
7. Predict according to the result obtained in step 6A∩BAnd Alice calculates a prediction result: for (id, score)B,id)∈PredictA∩BCalculating
Figure BDA0002369510770000062
Two parts scoreA,idAnd scoreB,idMerging to obtain the final prediction result
Figure BDA0002369510770000063
Example 2:
the same notations as used in example 1 were used for prediction using logistic regression, all except that in this example, public key based PSI was used. For this purpose, some other symbols are required, which are assumed to be based on elliptic curves in the present embodiment, q is the order of the elliptic curve,
Figure BDA0002369510770000064
represents the set {1, 2.., q-1}, H1And H2Is a hash function, where H1The message may be mapped to a point on an elliptic curve. The method comprises the following specific steps:
1. precomputation, for all ID ∈ IDBBob random selection
Figure BDA0002369510770000071
Calculating sid=H2(a*H1(id)), and all sid,id∈IDBSending the data to Alice;
2. for all ID ∈ IDARandom selection by Alice
Figure BDA0002369510770000072
Calculating tid=b*H1(id), and mixing tid,id∈IDASending the data to Bob;
bob performs the following steps:
3.1. for all ID ∈ IDBBased on Alice's needs, Bob calculates
Figure BDA0002369510770000073
3.2. For all ID ∈ IDBComputing the key k using a key derivation functionid=KDF(a*H1(id), id, iter, klen), and for scoreB,idAnd (3) encryption: c. Cid=Enc(kid,scoreB,id);
3.3. T sent to AliceidAnd Bob calculates t'id=a*tid=a*b*H1(id);
3.4.Bob mixing cidAnd t'idSent to Alice, guarantees cidAnd tidA one-to-one correspondence relationship of;
alice performs the following steps:
4.1. for all ID ∈ IDAAlice calculates t ″id=b-1*t′id=a*H1(id) and s'id=H2(t″id);
4.2.Alice calculates S ═ S'id,id∈IDA}∩{sid,id∈IDB};
4.3. To s'idE.g. S, using a key derivation function to calculate kid=KDF(t″idId, iter, klen), decryption cidGet scoreB,id=Dec(kid,cid);
4.4, Alice finds the id corresponding to the ciphertext intersection to obtain the PredictA∩B={(id,scoreB,id)|id∈IDA∩IDB};
5. Predict according to the result obtained in step 4.4A∩BAnd Alice calculates a prediction result: for ID ∈ IDA∩IDBThen (id, score)B,id)∈PredictA∩BCalculating
Figure BDA0002369510770000074
Two parts scoreA,idAnd scoreB,idMerging to obtain the final prediction result
Figure BDA0002369510770000075
The implementation case is particularly suitable for the situation that the data volume of a prediction demand side is small, and the data volume of the other side is large, and the score is guaranteedB,idEtc. sensitive data is not revealed. In addition, the hash function H described above2Cuckoo filter can be adopted to improve the intersection calculation efficiency in the step 4.2.
When the method is specifically implemented, an intermediate service provider can be introduced, and the application scene is expanded. For example, in embodiment 2, the calculation result in step 1 may be sent to an intermediate facilitator, and finally the process of calculating the intersection in step 4.2 is completed by the intermediate facilitator, and the intermediate facilitator cannot obtain useful information.

Claims (5)

1. A method for protecting privacy of a federal learning prediction stage based on a PSI technology is characterized in that in the prediction stage of the federal learning, data is interacted with the PSI technology based on a privacy set for interaction, interaction information is encrypted in the interaction process, and privacy protection is carried out on participants of the federal learning; the federal learning is longitudinal federal learning, the participants of the federal learning comprise a forecasting demand party Alice, a forecasting service party Bob and a trusted third party Carol, and Alice and Bob respectively obtain corresponding part models of the federal learning Model through the federal learning under the help of CarolAAnd a ModelBIn the prediction phase, Alice and Bob use their Model models respectivelyAAnd a ModelBCalculating a prediction result and then performing merging prediction: in the combined prediction, privacy protection is carried out on Alice and Bob based on PSI, wherein Bob encrypts the sent prediction result, Alice decrypts the prediction result corresponding to the intersection part after completing set intersection based on PSI and Bob to obtain the required prediction data, so that Alice does not reveal own privacy information while obtaining the prediction result of Bob, and Bob does not reveal own privacy information when providing the prediction data;
in the federal learning model, Data samples (ID, F) are composed of keywords ID and characteristic values F, the set of ID is represented as ID, the set of characteristic is represented as F, and Data sets using Alice and Bob as prediction are respectively represented as DataAAnd DataBThe corresponding key word sets are respectively IDA、IDBThe corresponding feature sets are respectively FA、FB,DataAAnd DataBThe keywords have intersection, and the characteristic values are different; dataAAnd DataBThe samples in the dataset are (id, f) respectivelyA,id) And (id, f)B,id) Wherein f isA,idAnd fB,idRespectively represent the corresponding characters of the key words id of Alice and BobProper value, Alice needs to calculate ID e ID under the cooperation of BobAWhen Alice sends out a prediction request, Alice and Bob execute the following steps:
s1) Bob utilizes ModelBFor self Data set DataBCalculating to obtain scoreB,id=ModelB(fB,id) For all ID ∈ IDBForm a new data set as PresectB={(id,scoreB,id)|id∈IDB};
S2) Alice and Bob execute the PSI protocol to Predict the data set of BobBTransformation to ePB={(eidB,id,scoreB,id)|id∈IDBWhere the same id corresponds to one or more eidsB,idAlice's keyword set IDAConversion to eIDA={eidA,id|id∈IDAWhere the same id corresponds to only one eidA,id
S3) for each (eid)B,id,scoreB,id)∈ePBBob derives a symmetric encryption key k using a key derivation function KDFid=KDF(eidB,idId, iter, klen), among othersB,idRepresenting a password, taking id as a salt value of a key, iter representing iteration times, and klen representing the key length of a symmetric encryption algorithm;
s4) Bob uses k in step S3)idTo scoreB,idIs encrypted to obtain cid=Enc(kid,scoreB,id) Wherein Enc () is an encryption algorithm of a symmetric cipher to obtain a new data set EPredictB={(eidB,id,cid)|id∈IDBAnd sending the data to Alice;
s5) according to the data set EPredict sent by Bob in step S4)BAnd Alice calculates the intersection according to the PSI protocol:
EPredictA∩B={(eidA,id,cid)|eidA,id=eidB,id,eidA,id∈eIDA,(eidB,id,cid)∈ePB};
s6) for all (eid)A,id,cid)∈EPredictA∩BAlice gets the corresponding idA∩B∈IDA∩IDBComputing a symmetric key k using a key derivation functionid,A∩B=KDF(eidA,id,idA∩BIter, klen), decrypt cidObtain score of corresponding intersectionB,id=Dec(kid,A∩B,cid) Where Dec (·,) is the decryption algorithm of symmetric cipher to obtain data intersection PredictA∩B={(id,scoreB,id)|id∈IDA∩IDB};
S7) Predict according to the result obtained in the step S6)A∩BAnd Alice calculates a merged prediction result: for ID ∈ IDA∩IDBAccording to ModelACalculate scoreA,id=ModelA(fA,id) Will scoreA,idAnd PredictA∩BScore in (1)B,idThe final prediction result score is obtained by combinationid=Merge(scoreA,id,scoreB,id) Where Merge (·,) is the machine learning algorithm set for federal learning.
2. The method for protecting privacy of the federal learning forecast phase based on PSI technique as claimed in claim 1, wherein in step S4), eid is also correctedB,idCarrying out Hash operation to obtain EPredictB={(H(eidB,id),cid)|id∈IDBH (·) represents a hash function, and in the corresponding step S5), Alice calculates an intersection as:
EPredictA∩B={(eidA,id,cid)|H(eidA,id)=H(eidB,id),eidA,id∈eIDA,(H(eidB,id),cid)∈ePB}。
3. the method according to claim 1, wherein in step S5), cuckoo filter is used to improve intersection calculation efficiency.
4.A method for protecting privacy of the federal learning forecast phase based on PSI techniques as in any of claims 1-3, wherein the PSI is a PSI technique based on OT protocol.
5. A method for protecting privacy of the federal learning forecast phase based on PSI techniques as in any of claims 1-3, wherein the PSI is a public key based PSI technique.
CN202010046301.3A 2020-01-16 2020-01-16 PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage Active CN111259443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010046301.3A CN111259443B (en) 2020-01-16 2020-01-16 PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010046301.3A CN111259443B (en) 2020-01-16 2020-01-16 PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage

Publications (2)

Publication Number Publication Date
CN111259443A CN111259443A (en) 2020-06-09
CN111259443B true CN111259443B (en) 2022-07-01

Family

ID=70950642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010046301.3A Active CN111259443B (en) 2020-01-16 2020-01-16 PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage

Country Status (1)

Country Link
CN (1) CN111259443B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113922960B (en) * 2020-07-08 2023-10-24 神州融安数字科技(北京)有限公司 PSI acquisition method, device and system based on SM2
CN111881480A (en) * 2020-07-31 2020-11-03 平安付科技服务有限公司 Private data encryption method and device, computer equipment and storage medium
CN111914277B (en) * 2020-08-07 2023-09-01 平安科技(深圳)有限公司 Intersection data generation method and federal model training method based on intersection data
CN111970277B (en) * 2020-08-18 2022-09-27 中国工商银行股份有限公司 Flow identification method and device based on federal learning
CN111784001B (en) * 2020-09-07 2020-12-25 腾讯科技(深圳)有限公司 Model training method and device and computer readable storage medium
CN112183612B (en) * 2020-09-24 2023-01-24 重庆邮电大学 Joint learning method, device and system based on parameter expansion
CN114465740A (en) * 2020-10-22 2022-05-10 阿里巴巴集团控股有限公司 Method and device for converting identifier and method and device for converting privacy item
CN112100680B (en) * 2020-11-16 2021-02-05 支付宝(杭州)信息技术有限公司 Multi-platform data processing method and device for protecting privacy
CN112287377A (en) * 2020-11-25 2021-01-29 南京星环智能科技有限公司 Model training method based on federal learning, computer equipment and storage medium
CN112613726A (en) * 2020-12-18 2021-04-06 深圳前海微众银行股份有限公司 Risk detection method based on federal learning, client, equipment and storage medium
CN112598139B (en) * 2020-12-22 2023-08-18 百度在线网络技术(北京)有限公司 Category encoding method, category encoding device, category encoding apparatus, category encoding device, category encoding storage medium, and category encoding program product
CN112700012B (en) * 2020-12-30 2024-08-09 深圳前海微众银行股份有限公司 Federal feature selection method, federal feature selection device, federal feature selection equipment and federal feature selection storage medium
CN113065155A (en) * 2021-03-26 2021-07-02 杭州宇链科技有限公司 Privacy set intersection method based on trusted execution environment assistance
CN112989386B (en) * 2021-03-31 2023-09-22 苏州黑云智能科技有限公司 Blacklist sharing method and system based on careless transmission
CN115567427A (en) * 2021-07-02 2023-01-03 中国移动通信有限公司研究院 Evaluation method, evaluation node and federal learning system for consistency of interaction protocols
CN113591151A (en) * 2021-08-04 2021-11-02 神谱科技(上海)有限公司 Privacy intersection method based on efficient compressed data transmission
CN114448702B (en) * 2022-01-29 2024-02-27 中国工商银行股份有限公司 Data encryption method and device based on privacy set intersection
CN115021985B (en) * 2022-05-23 2023-06-16 北京融数联智科技有限公司 Logistic regression model training method and system without third party participation
CN115378693B (en) * 2022-08-18 2024-02-23 西安电子科技大学 Hidden set intersection method for longitudinal federal learning data alignment
CN116383865B (en) * 2022-12-30 2023-10-10 上海零数众合信息科技有限公司 Federal learning prediction stage privacy protection method and system
CN116049909B (en) * 2023-01-28 2023-06-27 腾讯科技(深圳)有限公司 Feature screening method, device, equipment and storage medium in federal feature engineering
CN116244753B (en) * 2023-05-12 2023-08-15 建信金融科技有限责任公司 Method, device, equipment and storage medium for intersection of private data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492420B (en) * 2018-12-28 2021-07-20 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federal learning
CN110572253B (en) * 2019-09-16 2023-03-24 济南大学 Method and system for enhancing privacy of federated learning training data
CN110633806B (en) * 2019-10-21 2024-04-26 深圳前海微众银行股份有限公司 Longitudinal federal learning system optimization method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN111259443A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN111259443B (en) PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage
CN109495465B (en) Privacy set intersection method based on intelligent contracts
CN104486315B (en) A kind of revocable key outsourcing decryption method based on contents attribute
US20230108682A1 (en) Data processing method and apparatus, device, and computer-readable storage medium
CN107196926B (en) Cloud outsourcing privacy set comparison method and device
CN108259158A (en) Efficient and secret protection individual layer perceptron learning method under a kind of cloud computing environment
CN105577368A (en) Two-way privacy protective system and method for inquiring medical diagnostic service
CN113609508A (en) Block chain-based federal learning method, device, equipment and storage medium
CN112685783B (en) Data sharing method supporting privacy protection in financial permission blockchain
CN104601596A (en) Data privacy protection method in classification data mining system
CN104038493B (en) Bilinear pairing-free cloud storage data security audit method
Ma et al. Secure multiparty learning from the aggregation of locally trained models
CN116502732B (en) Federal learning method and system based on trusted execution environment
CN111104968A (en) Safety SVM training method based on block chain
CN104092686A (en) Privacy protection and data safety access method based on SVW classifier
CN114285556B (en) Privacy protection feature engineering method based on federal learning
Zhao et al. SGBoost: An efficient and privacy-preserving vertical federated tree boosting framework
CN114301677A (en) Key negotiation method, device, electronic equipment and storage medium
CN111859440B (en) Sample classification method of distributed privacy protection logistic regression model based on mixed protocol
CN113468440A (en) Anonymous query method for protecting location privacy based on SF-blind filtering protocol
CN117675270A (en) Multi-mode data encryption transmission method and system for longitudinal federal learning
CN113962286A (en) Decentralized logistic regression classification prediction method based on piecewise function
CN117353912A (en) Three-party privacy set intersection base number calculation method and system based on bilinear mapping
WO2024138854A1 (en) Method and system for protecting privacy in federated learning prediction stage
Yang et al. Federated medical learning framework based on blockchain and homomorphic encryption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant