CN109992977A - A kind of data exception point cleaning method based on multi-party computations technology - Google Patents
A kind of data exception point cleaning method based on multi-party computations technology Download PDFInfo
- Publication number
- CN109992977A CN109992977A CN201910156492.6A CN201910156492A CN109992977A CN 109992977 A CN109992977 A CN 109992977A CN 201910156492 A CN201910156492 A CN 201910156492A CN 109992977 A CN109992977 A CN 109992977A
- Authority
- CN
- China
- Prior art keywords
- data
- server
- party
- representing
- thre
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 46
- 238000005516 engineering process Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 75
- 230000002159 abnormal effect Effects 0.000 claims description 15
- 230000000052 comparative effect Effects 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 12
- 230000014759 maintenance of location Effects 0.000 claims description 12
- 239000000203 mixture Substances 0.000 claims description 12
- 238000012163 sequencing technique Methods 0.000 claims description 12
- 230000000717 retained effect Effects 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 abstract description 7
- 238000012545 processing Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 231100000279 safety data Toxicity 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to field of information security technology, disclose a kind of data exception point cleaning method based on multi-party computations technology, the data exception point cleaning method based on multi-party computations technology includes: that the data of A and two participants of B are unified for matrix format, possess identical dimensional, and the last one-dimensional AVF value for the data;Participant A and participant B encrypts data matrix using Yao ' the s Encryption Algorithm in multi-party computations algorithm ABY;Server A and server B carry out the cleaning of data exception point to the encrypted data set that each participant uploads.Present invention combination multi-party computations technology and AVF rejecting outliers algorithm, utilize existing multi-party computations tool ABY algorithm, the efficient detection to high dimensional data is realized, and ensure that the comparable safety of each side's data-privacy using Yao ' the s Encryption Algorithm in multi-party computations technology under the premise of guaranteeing certain efficiency.
Description
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a data anomaly point cleaning method based on a secure multi-party computing technology.
Background
Currently, the closest prior art: the combined data source refers to the fact that in the machine learning training process, a plurality of participants have the same type of data, the data are fused, the scale of a training data set can be enlarged, and the accuracy of a model training result is improved. The quality of the model depends on the scale and quality of the data set, so that the joint data source learning becomes a great trend of the machine learning development. However, with the advantages of the training of the joint data source, the new problem of the privacy and security protection of the data of multiple data sources is solved, and in some situations, the data owned by each participant may be privacy-sensitive, such as some commercial data or some privacy information of some customers, such as medical information or property information, and the like, so that the requirement of the data on the privacy protection is extremely high, and it is naturally difficult to share the data at will.
With the increasing demand for data fusion, algorithms aimed at protecting data privacy are also emerging. For example, a method of adding a trusted third party is adopted, a plurality of participants jointly authenticate one trusted third party, respective plaintext data is uploaded to the third party, and the third party performs tasks such as data cleaning and training, and the trusted third party is often an organization with public trust or a cloud computing provider providing charging services. The method has the advantages that the privacy protection of the data is realized, and the purpose of fusing the data is also achieved. However, the algorithm has certain security risk, a trusted third party is usually honest and curious, and if unpredictable data leakage occurs in the process of collecting and processing the data or a malicious third party steals data information, serious consequences are often caused.
With the integration of technologies in various fields, the thinking of cryptography is applied to the field of joint data source training, namely, the data of each participant is encrypted by using a mature encryption algorithm, and then the encrypted data is gathered and sent to a trusted third party, the trusted third party does not have sensitive plaintext data and only has ciphertext data which seems to have no practical significance after encryption, the encryption algorithm usually adopts homomorphic encryption, namely after the plaintext is encrypted, how to operate the ciphertext is equal to that of performing the same operation on the plaintext, and the encryption method ensures the feasibility of ciphertext training, so that the privacy of the data is greatly ensured. However, the algorithm also has a practical problem, the biggest problem is the game between safety and efficiency, the obtained result of the existing homomorphic encryption algorithm usually needs to consume a large amount of time and computing resources, and under the scene with less high privacy requirement, the algorithm has only extremely low use efficiency and is not suitable for large-scale popularization.
In the prior art, an algorithm for cleaning abnormal points of multi-data source combined data is provided by using a homomorphic encryption algorithm, each data is encrypted by using the homomorphic encryption algorithm, and then abnormal points in a data set are screened and cleaned by using an AVF abnormal point detection algorithm, but due to the limitation of the efficiency of homomorphic encryption, the time and the computing resources required by encryption and decryption are more, so that the algorithm has lower relative computing efficiency and cannot meet the requirement of processing a large amount of data; the second prior art provides a privacy protection data cleaning scheme based on an LOF abnormal point detection algorithm, but because the privacy protection data cleaning scheme determines whether data is abnormal points or not based on data distribution density, if the dimensionality of the data is high, the existence of the abnormal points cannot be effectively distinguished according to the difference of the distribution density, and therefore the second prior art has the problem of low processing efficiency in the face of high-dimensional data sets.
In summary, the problems of the prior art are as follows:
(1) the existing algorithm for cleaning abnormal points of multi-data source combined data by using a homomorphic encryption algorithm is low in calculation efficiency and cannot meet the requirement of processing a large amount of data.
(2) The existing privacy protection data cleaning scheme based on the LOF abnormal point detection algorithm has the problem of low processing efficiency when facing a high-dimensional data set.
In view of the above problems, there is a need for a new technique capable of balancing computational efficiency and security, improving the low efficiency and high energy consumption of the conventional homomorphic encryption algorithm, ensuring the necessary data privacy security requirements, and supporting the processing of high-dimensional data in order to better adapt to the actual implementation example.
The significance of solving the technical problems is as follows:
after the problems existing in the technology are improved, the algorithm can be more suitable for the actual use environment, the actual use efficiency is improved, the implementability of the algorithm is increased, and the privacy security of the sensitive data can be better protected.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a data anomaly point cleaning method based on a safe multi-party computing technology.
The invention is realized in such a way that a data abnormal point cleaning method based on a safe multi-party computing technology comprises the following steps:
step one, unifying data of two parties A and B into a matrix format, wherein the data have the same dimensionality, and the last dimension is the AVF value of the data;
secondly, encrypting the data matrix by the participant A and the participant B by using a Yao's encryption algorithm in a secure multi-party computing algorithm ABY;
and thirdly, the server A and the server B perform data anomaly point cleaning on the encrypted data sets uploaded by the participants.
Further, the first participant a and the participant B unify the proprietary data set formats as specified:
wherein D is1A data set matrix of N (M +1) representing party A, aijRepresenting arbitrary data in the participant a dataset, avfaiAVF value representing participant A ith data, i ∈ [1, N],j∈[1,M],M,N∈N+;D2A dataset matrix representing P x (M +1) of party B, BkjRepresenting arbitrary data in the participant B dataset, avfbkAVF value representing data of k-th item of participant A, k ∈ [1, P],j∈[1,M],M,P∈N+. Where the data dimensions of both participants are the same.
Further, the encrypting the owned data set by the second party a and the second party B according to the specification specifically includes:
1) encrypting data set D of participant A by Yao's encryption algorithm in ABY algorithm by utilizing secure multi-party computation1And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,presentation of encrypted data set toPart of the server B, Enc denotes the Yao's encryption algorithm, D1A data set representing party a;
specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, aijAny data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to part of server a,part of server B handed over the AVF value representing encrypted participant a's ith piece of data, AVFaiAVF value representing participant a's ith data;
2) the encrypted data set of party a is represented by:
wherein, X10An encrypted data set, X, representing a party A held by a server A11An encrypted data set representing party A held by Server B, i ∈ [1, N],j∈[1,M],M,N∈N+;
3) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation2And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D2A data set representing party B;
specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, BkjIndicating participationAny data of party a;the AVF value representing the encrypted kth piece of data of party B is handed over to part of server a,part of server B handed over the AVF value representing the encrypted kth piece of data of party B, AVFbkAVF value representing the kth piece of data for party B;
4) the encrypted data set of party B is represented by:
wherein, X20An encrypted data set, X, representing a party B held by a server A21An encrypted data set representing party B held by Server B, k ∈ [1, P ∈],j∈[1,M],M,P∈N+;
5) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.
Further, the third step of performing, by the server a and the server B, data anomaly point cleaning on the encrypted data sets uploaded by the participants specifically includes:
1) the server A extracts the last one-dimensional data in the encrypted data set of the party A taken by the server A:
server A uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABYFair A10And (3) sequencing:
A′10=Sort(A10);
wherein A is10Last-dimensional data, A ', in the encrypted data set representing party A owned by Server A'10Is represented by A10Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A10Taking X as a reference10Also sorted simultaneously, i.e. according to A10Descending order of X10And after the sorting is finished:
wherein, X'10Is represented by X10Last one-dimensional data, i.e. A10Submitting the data set of the server A for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈],j∈[1,M],M,N∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'10The data in (1) is compared with Thre in order:
Resi=Comp(A′10i,Thre);
wherein, A'10iRepresents A'10Element in (1, N) is i ∈ [, N ∈ [ ]],N∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmiRepresents A'10iIf Res is the result of comparison with ThreiValue is 1 and represents A'10iMore than or equal to Thre; if ResiValue is 0 and represents A'10i< Thre, A'10The data in (1) is compared with Thre in sequence until Resi0, stop comparison, and mix X'10First i row data retention in (1):
wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈],M∈N+,X″10The data set of the participant A owned by the server A after the final data cleaning is finished;
2) the server A extracts the last one-dimensional data in the encrypted data set of the party B taken by the server A:
server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A20And (3) sequencing:
A′20=Sort(A20);
wherein A is20Last-dimensional data, A 'in encrypted data set representing party B owned by Server A'20Is represented by A20Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A20Taking X as a reference20Also sorted simultaneously, i.e. according to A20Descending order of X20And after the sorting is finished:
wherein, X'20Is represented by X20Last one-dimensional data, i.e. A20Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ]],j∈[1,M],M,P∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'20The data in (1) is compared with Thre in order:
Resk=Comp(A′20k,Thre);
wherein, A'20kRepresents A'20Element (b) k ∈ [1, P ]],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmkRepresents A'20kIf Res is the result of comparison with ThrekValue is 1 and represents A'20kMore than or equal to Thre; if ReskValue is 0 and represents A'20k< Thre, A'20The data in (1) is compared with Thre in sequence until Resk0, stop comparison, and mix X'20First k rows of data retention:
where K is K, the first K rows of data retained after sorting, j e [1, M],M∈N+,X″20The data set of the participant B owned by the server A after the final data cleaning is finished;
3) the server B extracts the last one-dimensional data in the encrypted data set of the party A taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a11And (3) sequencing:
A′11=Sort(A11);
wherein A is11Last-dimensional data, A 'in encrypted data set representing party A owned by server B'11Is represented by A11Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A11Taking X as a reference11Also sorted simultaneously, i.e. according to A11Descending order of X11And after the sorting is finished:
wherein, X'11Is represented by X11Last one-dimensional data, i.e. A11Submitting the data set of the server B for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈],j∈[1,M],M,N∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'11The data in (1) is compared with Thre in order:
Resi=Comp(A′11i,Thre);
wherein, A'11iRepresents A'11Element in (1, N) is i ∈ [, N ∈ [ ]],N∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmiRepresents A'11iIf Res is the result of comparison with ThreiValue is 1 and represents A'11iMore than or equal to Thre; if ResiValue is 0 and represents A'11i< Thre, A'11The data in (1) is compared with Thre in sequence until Resi0, stop comparison, and mix X'11First i row data retention in (1):
wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈],M∈N+,X″11The data set of the participant A owned by the server B after the final data cleaning is finished;
4) the server B extracts the last one-dimensional data in the encrypted data set of the party B taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a21And (3) sequencing:
A′21=Sort(A21);
wherein A is21Last-dimensional data, A 'in encrypted data set representing party B owned by server B'21Is represented by A21And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A21Taking X as a reference21Also sorted simultaneously, i.e. according to A21Descending order of X21And after the sorting is finished:
wherein, X'21Is represented by X21Last one-dimensional data, i.e. A21Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ∈],j∈[1,M],M,P∈N+。
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'21The data in (1) is compared with Thre in order:
Resk=Comp(A′21k,Thre);
wherein, A'21kRepresents A'21Element (b) k ∈ [1, P ]],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmkRepresents A'21kIf Res is the result of comparison with ThrekValue is 1 and represents A'21kMore than or equal to Thre; if ReskValue is 0 and represents A'21k< Thre, A'21The data in (1) is compared with Thre in sequence until Resk0, stop comparison, and mix X'21First k rows of data retention:
where K is K, the first K rows of data retained after sorting, j e [1, M],M∈N+,X″21The data set of the participant B owned by the server B after the final data cleaning is finished;
5) the final X ″)10,X″11,X″20,X″21And cleaning the finished data set for the final data.
Another object of the present invention is to provide a machine learning system applying the data anomaly point cleaning method based on the secure multi-party computing technology.
In summary, the advantages and positive effects of the invention are: the invention combines the safe multi-party computing technology and the AVF abnormal value detection algorithm, utilizes the ABY algorithm of the existing safe multi-party computing tool to realize the high-efficient detection of high-dimensional data, and utilizes the Yao's encryption algorithm in the safe multi-party computing technology to ensure the safety with equivalent privacy of each party of data on the premise of ensuring certain efficiency.
TABLE 1 comparison of technical Properties
Drawings
Fig. 1 is a flowchart of a data anomaly point cleaning method based on a secure multi-party computing technology according to an embodiment of the present invention.
Fig. 2 is a scene schematic diagram of an embodiment provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problem that the existing algorithm for cleaning abnormal points of multi-data source combined data by using a homomorphic encryption algorithm is low in calculation efficiency and cannot meet the requirement of processing a large amount of data; the existing privacy protection data cleaning scheme based on the LOF abnormal point detection algorithm has the problem of low processing efficiency when facing a high-dimensional data set. The method is mainly used for realizing a safety data abnormal point cleaning algorithm under the combined data source environment; based on a safe multi-party computing technology, the data abnormal point cleaning work under the condition that a plurality of data sources are combined with a machine learning scene under the condition of privacy protection is realized.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
As shown in fig. 1, the method for cleaning data anomaly points based on secure multi-party computing technology according to the embodiment of the present invention includes the following steps:
s101: participant a and participant B unify the proprietary data set formats as specified: unifying the data of the two participants A and B into a matrix format, wherein the data have the same dimensionality, and the last dimensionality is an Attribute Value Frequency (AVF) value of the data;
s102: party a and party B encrypt their own data sets as specified: the method comprises the following steps that a participant A and a participant B encrypt a data matrix by using a Yao's encryption algorithm in a secure multiparty computing algorithm ABY;
s103: and the server A and the server B carry out data anomaly point cleaning on the encrypted data sets uploaded by the participants.
The application of the principles of the present invention will now be described in further detail with reference to specific embodiments.
The data anomaly point cleaning method based on the safe multi-party computing technology provided by the embodiment of the invention specifically comprises the following steps:
step one, a participant A and a participant B unify the formats of own data sets according to the specification:
wherein D is1A data set matrix of N (M +1) representing party A, aijRepresenting arbitrary data in the participant a dataset, avfaiAttribute Value Frequency (AVF) Value representing participant A ith piece of data, i ∈ [1, N],j∈[1,M],M,N∈N+;D2A dataset matrix representing P x (M +1) of party B, BkjRepresenting arbitrary data in the participant B dataset, avfbkAttribute Value Frequency (AVF) Value representing kth data of participant A, k ∈ [1, P],j∈[1,M],M,P∈N+. Where the data dimensions (i.e., the value of M) of both participants are the same.
Step two, the participant A and the participant B encrypt the own data set according to the regulation:
2a) encrypting data set D of participant A by Yao's encryption algorithm in ABY algorithm by utilizing secure multi-party computation1And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D1A data set representing party a.
Specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, aijAny data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to part of server a,part of server B handed over the AVF value representing encrypted participant a's ith piece of data, AVFaiAVF value for the ith piece of data representing party a.
2b) The encrypted data set of party a is represented by:
wherein, X10An encrypted data set, X, representing a party A held by a server A11An encrypted data set representing party A held by Server B, i ∈ [1, N],j∈[1,M],M,N∈N+。
2c) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation2And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D2A data set representing party B.
Specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, BkjAny data representing party a;the AVF value representing the encrypted kth piece of data of party B is handed over to part of server a,part of server B handed over the AVF value representing the encrypted kth piece of data of party B, AVFbkAVF value representing participant B's kth data.
2d) The encrypted data set of party B is represented by:
wherein, X20An encrypted data set, X, representing a party B held by a server A21An encrypted data set representing party B held by Server B, k ∈ [1, P ∈],j∈[1,M],M,P∈N+。
2e) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.
Step three, the server A and the server B carry out data anomaly point cleaning on the encrypted data sets uploaded by all the participants:
3a) the server a extracts the last one-dimensional data in the encrypted data set of the party a taken by itself, that is:
server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A10And (3) sequencing:
A′10=Sort(A10);
wherein A is10Last-dimensional data, A ', in the encrypted data set representing party A owned by Server A'10Is represented by A10And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A10Taking X as a reference10Also sorted simultaneously, i.e. according to A10Descending order of X10And after the sorting is finished:
wherein, X'10Is represented by X10Last one-dimensional data, i.e. A10Submitting the data set of the server A for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈],j∈[1,M],M,N∈N+。
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'10The data in (1) is compared with Thre in order:
Resi=Comp(A′10i,Thre);
wherein, A'10iRepresents A'10Element in (1, N) is i ∈ [, N ∈ [ ]],N∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmiRepresents A'10iIf Res is the result of comparison with ThreiValue is 1 and represents A'10iMore than or equal to Thre; if ResiValue is 0 and represents A'10i< Thre, A'10The data in (1) is compared with Thre in sequence until Resi0, stop comparison, and mix X'10The first i rows inData retention:
wherein, I ═ I, j ∈ [1, M ], M ∈ N, and the row data of the first I remained after sorting+,X″10And cleaning the data set of the participant A owned by the server A after the final data is cleaned.
3b) The server a extracts the last one-dimensional data in the encrypted data set of the party B taken by itself, that is:
server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A20And (3) sequencing:
A′20=Sort(A20);
wherein A is20Last-dimensional data, A 'in encrypted data set representing party B owned by Server A'20Is represented by A20And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A20Taking X as a reference20Also sorted simultaneously, i.e. according to A20Descending order of X20And after the sorting is finished:
wherein, X'20Is represented by X20Last one-dimensional data, i.e. A20Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ]],j∈[1,M],M,P∈N+。
A fixed value Thre (same as Thre above) is defined, which represents a threshold value of AVF value within normal range, and A'20
The data in (1) is compared with Thre in order:
Resk=Comp(A′20k,Thre);
wherein, A'20kRepresents A'20Element (b) k ∈ [1, P ]],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmkRepresents A'20kIf Res is the result of comparison with ThrekValue is 1 and represents A'20kMore than or equal to Thre; if ReskValue is 0 and represents A'20k< Thre, A'20The data in (1) is compared with Thre in sequence until Resk0, stop comparison, and mix X'20First k rows of data retention:
where K is K, the first K rows of data retained after sorting, j e [1, M],M∈N+,X″20And cleaning the data set of the participant B owned by the server A after the final data is cleaned.
3c) The server B extracts the last one-dimensional data in the encrypted data set of the party a taken by itself, that is:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a11And (3) sequencing:
A′11=Sort(A11);
wherein A is11Encrypted data set representing a party a owned by a server BMiddle last one-dimensional data, A'11Is represented by A11And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A11Taking X as a reference11Also sorted simultaneously, i.e. according to A11Descending order of X11And after the sorting is finished:
wherein, X'11Is represented by X11Last one-dimensional data, i.e. A11Submitting the data set of the server B for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈],j∈[1,M],M,N∈N+。
A fixed value Thre (same as Thre above) is defined, which represents a threshold value of AVF value within normal range, and A'11The data in (1) is compared with Thre in order:
Resi=Comp(A′11i,Thre);
wherein, A'11iRepresents A'11Element (b) k ∈ [1, N ∈],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmiRepresents A'11iIf Res is the result of comparison with ThreiValue is 1 and represents A'11iMore than or equal to Thre; if ResiValue is 0 and represents A'11i< Thre, A'11The data in (1) is compared with Thre in sequence until Resi0, stop comparison, and mix X'11First i row data retention in (1):
wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈],M∈N+,X″11Cleaning for final dataThe data set of party a owned by server B is completed.
3d) The server B extracts the last one-dimensional data in the encrypted data set of the party B taken by itself, that is:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a21And (3) sequencing:
A′21=Sort(A21);
wherein A is21Last-dimensional data, A 'in encrypted data set representing party B owned by server B'21Is represented by A21And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A21Taking X as a reference21Also sorted simultaneously, i.e. according to A21Descending order of X21And after the sorting is finished:
wherein, X'21Is represented by X21Last one-dimensional data, i.e. A21Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ∈],j∈[1,M],M,P∈N+。
A fixed value Thre (same as Thre above) is defined, which represents a threshold value of AVF value within normal range, and A'21The data in (1) is compared with Thre in order:
Resk=Comp(A′21k,Thre);
wherein, A'21kRepresents A'21Element (b) k ∈ [1, P ]],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmkRepresents A'21kIf Res is the result of comparison with ThrekValue is 1 and represents A'21kMore than or equal to Thre; if ReskValue is 0 and represents A'21k< Thre, A'21kThe data in (1) is compared with Thre in sequence until Resk0, stop comparison, and mix X'21First k rows of data retention:
where K is K, the first K rows of data retained after sorting, j e [1, M],M∈N+,X″21And cleaning the data set of the participant B owned by the server B after the final data is cleaned.
3e) The final X ″)10,X″11,X″20,X″21And cleaning the finished data set for the final data.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (5)
1. A data abnormal point cleaning method based on a secure multi-party computing technology is characterized by comprising the following steps:
step one, unifying data of two parties A and B into a matrix format, wherein the data have the same dimensionality, and the last dimension is the AVF value of the data;
secondly, encrypting the data matrix by the participant A and the participant B by using a Yao's encryption algorithm in a secure multi-party computing algorithm ABY;
and thirdly, the server A and the server B perform data anomaly point cleaning on the encrypted data sets uploaded by the participants.
2. The secure multi-party computing technology based data anomaly cleansing method according to claim 1, wherein said first step participant a and participant B unify their own data set formats as specified:
wherein D is1A data set matrix of N (M +1) representing party A, aijRepresenting arbitrary data in the participant a dataset, avfaiAVF value representing participant A ith data, i ∈ [1, N],j∈[1,M],M,N∈N+;D2A dataset matrix representing P x (M +1) of party B, BkjRepresenting arbitrary data in the participant B dataset, avfbkAVF value representing data of k-th item of participant A, k ∈ [1, P],j∈[1,M],M,P∈N+(ii) a Where the data dimensions of both participants are the same.
3. The method for cleansing data anomaly points based on secure multiparty computing technology according to claim 1, wherein said second step of encrypting the owned data set by party a and party B specifically comprises:
1) encrypting data set D of participant A by Yao's encryption algorithm in ABY algorithm by utilizing secure multi-party computation1And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D1A data set representing party a;
specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, aijAny data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to part of server a,part of server B handed over the AVF value representing encrypted participant a's ith piece of data, AVFaiAVF value representing participant a's ith data;
2) the encrypted data set of party a is represented by:
wherein, X10An encrypted data set, X, representing a party A held by a server A11An encrypted data set representing party A held by Server B, i ∈ [1, N],j∈[1,M],M,N∈N+;
3) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation2And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D2A data set representing party B;
specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, BkjAny data representing party a;the AVF value representing the encrypted kth piece of data of party B is handed over to part of server a,part of server B handed over the AVF value representing the encrypted kth piece of data of party B, AVFbkAVF value representing the kth piece of data for party B;
4) the encrypted data set of party B is represented by:
wherein, X20An encrypted data set, X, representing a party B held by a server A21An encrypted data set representing party B held by Server B, k ∈ [1, P ∈],j∈[1,M],M,P∈N+;
5) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.
4. The method for cleaning data anomaly points based on the secure multi-party computing technology as claimed in claim 1, wherein the third step of cleaning the data anomaly points of the encrypted data sets uploaded by the participants by the server a and the server B specifically comprises:
1) the server A extracts the last one-dimensional data in the encrypted data set of the party A taken by the server A:
server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A10And (3) sequencing:
A′10=Sort(A10);
wherein A is10Last-dimensional data, A ', in the encrypted data set representing party A owned by Server A'10Represents A'10Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A10Taking X as a reference10Also sorted simultaneously, i.e. according to A10Descending order of X10And after the sorting is finished:
wherein, X'10Is represented by X10Last one-dimensional data, i.e. A10Submitting the data set of the server A for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈],j∈[1,M],M,N∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'10The data in (1) is compared with Thre in order:
Resi=Comp(A′10i,Thre);
wherein, A'10iRepresents A'10The element in (1) is i' [1, N ]],N∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmiRepresents A'10iIf Res is the result of comparison with ThreiValue is 1 and represents A'10iMore than or equal to Thre; if ResiValue is 0 and represents A'10i< Thre, A'10The data in (1) is compared with Thre in sequence until Resi0, stop comparison, and mix X'10First i row data retention in (1):
wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈],M∈N+,X″10Of party A owned by Server A after completion of cleaning for final dataA data set;
2) the server A extracts the last one-dimensional data in the encrypted data set of the party B taken by the server A:
server A uses the sorting Algorithm of the Yao's encryption Algorithm of the secure encryption Algorithm ABY to pair A'20And (3) sequencing:
A′20=Sort(A20);
wherein A is20Last-dimensional data, A 'in encrypted data set representing party B owned by Server A'20Is represented by A20Sorting the finished data in a descending order, wherein the Sore () represents a sorting algorithm in the Yao's encryption algorithm;
with A20Taking X as a reference20Also sorted simultaneously, i.e. according to A20Descending order of X20And after the sorting is finished:
wherein, X'20Is represented by X20Last one-dimensional data, i.e. A20Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ]],j∈[1,M],M,P∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'20The data in (1) is compared with Thre in order:
Resk=Comp(A′20k,Thre);
wherein, A'20kRepresents A'20Element (b) k ∈ [1, P ]],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmkRepresents A'20kIf Res is the result of comparison with ThrekValue is 1 and represents A'20kMore than or equal to Thre; if ReskValue is 0 and represents A'20k< Thre, A'20In order of the data inSize comparison with Thre until Resk0, stop comparison, and mix X'20First k rows of data retention:
where K is K, the first K rows of data retained after sorting, j e [1, M],M∈N+,X″20The data set of the participant B owned by the server A after the final data cleaning is finished;
3) the server B extracts the last one-dimensional data in the encrypted data set of the party A taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a11And (3) sequencing:
A′11=Sort(A11);
wherein A is11Last-dimensional data, A 'in encrypted data set representing party A owned by server B'11Is represented by A11Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A11Taking X as a reference11Also sorted simultaneously, i.e. according to A11Descending order of X11And after the sorting is finished:
wherein, A'11Is represented by X11Last one-dimensional data, i.e. A11Submitting the data set of the server B for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈],j∈[1,M],M,N∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'11The data in (1) is compared with Thre in order:
Resi=Comp(A′11i,Thre);
wherein, A'11iRepresents A'11Element in (1, N) is i ∈ [, N ∈ [ ]],N∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmiRepresents A'11iIf Res is the result of comparison with ThreiValue is 1 and represents A'11iMore than or equal to Thre; if ResiValue is 0 and represents A'11i< Thre, A'11The data in (1) is compared with Thre in sequence until Resi0, stop comparison, and mix X'11First i row data retention in (1):
wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈],M∈N+,X″11The data set of the participant A owned by the server B after the final data cleaning is finished;
4) the server B extracts the last one-dimensional data in the encrypted data set of the party B taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a21And (3) sequencing:
A′21=Sort(A21);
wherein, A'21Last-dimensional data, A 'in encrypted data set representing party B owned by server B'21Is represented by A21Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A21Taking X as a reference21Also sorted simultaneously, i.e. according to A21Descending order of X21And after the sorting is finished:
wherein, X'21Is represented by X21Last one-dimensional data, i.e. A21Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ∈],j∈[1,M],M,P∈N+;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'21The data in (1) is compared with Thre in order:
Resk=Comp(A′21k,Thre);
wherein, A'21kRepresents A'21Element (b) k ∈ [1, P ]],P∈N+Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithmkRepresents A'21kIf Res is the result of comparison with ThrekValue is 1 and represents A'21kMore than or equal to Thre; if ReskValue is 0 and represents A'21k< Thre, A'21The data in (1) is compared with Thre in sequence until Resk0, stop comparison, and mix X'21First k rows of data retention:
where K is K, the first K rows of data retained after sorting, j e [1, M],M∈M+,X″21The data set of the participant B owned by the server B after the final data cleaning is finished;
5) the final X ″)10,X″11,X″20,X″21And cleaning the finished data set for the final data.
5. A machine learning system applying the data anomaly point cleaning method based on the secure multi-party computing technology according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156492.6A CN109992977B (en) | 2019-03-01 | 2019-03-01 | Data anomaly point cleaning method based on safe multi-party computing technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156492.6A CN109992977B (en) | 2019-03-01 | 2019-03-01 | Data anomaly point cleaning method based on safe multi-party computing technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109992977A true CN109992977A (en) | 2019-07-09 |
CN109992977B CN109992977B (en) | 2022-12-16 |
Family
ID=67130167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910156492.6A Active CN109992977B (en) | 2019-03-01 | 2019-03-01 | Data anomaly point cleaning method based on safe multi-party computing technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992977B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046409A (en) * | 2019-12-16 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Private data multi-party security calculation method and system |
CN111125735A (en) * | 2019-12-20 | 2020-05-08 | 支付宝(杭州)信息技术有限公司 | Method and system for model training based on private data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102488A1 (en) * | 2012-01-02 | 2013-07-11 | Telecom Italia S.P.A. | Method and system for comparing images |
CN108712260A (en) * | 2018-05-09 | 2018-10-26 | 曲阜师范大学 | The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment |
CN108809628A (en) * | 2018-06-13 | 2018-11-13 | 哈尔滨工业大学深圳研究生院 | Based on the time series method for detecting abnormality and system under Secure |
-
2019
- 2019-03-01 CN CN201910156492.6A patent/CN109992977B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102488A1 (en) * | 2012-01-02 | 2013-07-11 | Telecom Italia S.P.A. | Method and system for comparing images |
CN108712260A (en) * | 2018-05-09 | 2018-10-26 | 曲阜师范大学 | The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment |
CN108809628A (en) * | 2018-06-13 | 2018-11-13 | 哈尔滨工业大学深圳研究生院 | Based on the time series method for detecting abnormality and system under Secure |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046409A (en) * | 2019-12-16 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Private data multi-party security calculation method and system |
CN111125735A (en) * | 2019-12-20 | 2020-05-08 | 支付宝(杭州)信息技术有限公司 | Method and system for model training based on private data |
Also Published As
Publication number | Publication date |
---|---|
CN109992977B (en) | 2022-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112906044B (en) | Multi-party security calculation method, device, equipment and storage medium | |
CN106341421B (en) | A kind of method for interchanging data based on block chain technology | |
CN111931250B (en) | Multiparty safe calculation integrated machine | |
CN107196926A (en) | A kind of cloud outsourcing privacy set comparative approach and device | |
Fan et al. | Identity Management Security Authentication Based on Blockchain Technologies. | |
CN109992977B (en) | Data anomaly point cleaning method based on safe multi-party computing technology | |
CN114548418A (en) | Secret sharing-based transverse federal IV algorithm | |
Qin et al. | Privacy-preserving wildcards pattern matching protocol for IoT applications | |
CN115913537A (en) | Data intersection method and system based on privacy protection and related equipment | |
CN114614970A (en) | Privacy data security processing method based on multi-calculator and homomorphic encryption | |
CN115664629A (en) | Homomorphic encryption-based data privacy protection method for intelligent Internet of things platform | |
CN111988260B (en) | Symmetric key management system, transmission method and device | |
Wen et al. | A Blockchain‐Based Privacy Preservation Scheme in Mobile Medical | |
Lv et al. | A review of big data security and privacy protection technology | |
CN117353912A (en) | Three-party privacy set intersection base number calculation method and system based on bilinear mapping | |
CN117499123A (en) | Distributed biometric authentication method and system integrating TEE and secret sharing | |
Adeniyi et al. | A systematic review on elliptic curve cryptography algorithm for internet of things: Categorization, application areas, and security | |
TWI782701B (en) | Non-interactive approval system for blockchain wallet and method thereof | |
CN116248247A (en) | Privacy set intersection method based on addition homomorphic encryption of national secret SM2 | |
Ashouri-Talouki et al. | Privacy-Preserving Attribute-Based Access Control with Non-Monotonic Access Structure | |
Gupta et al. | Cryptography approach for Secure Outsourced Data Storage in Cloud Environment | |
Feng et al. | Secure outsourced principal eigentensor computation for cyber-physical-social systems | |
CN110943833B (en) | Quantum trust model construction method and computer readable storage medium | |
Gao et al. | Research on" Cloud-Edge-End" Security Protection System of Internet of Things Based on National Secret Algorithm | |
Zhao et al. | An Efficient Privacy-Preserving Data Aggregation Scheme without Trusted Authority in Smart Grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |