CN109992977B - Data anomaly point cleaning method based on safe multi-party computing technology - Google Patents
Data anomaly point cleaning method based on safe multi-party computing technology Download PDFInfo
- Publication number
- CN109992977B CN109992977B CN201910156492.6A CN201910156492A CN109992977B CN 109992977 B CN109992977 B CN 109992977B CN 201910156492 A CN201910156492 A CN 201910156492A CN 109992977 B CN109992977 B CN 109992977B
- Authority
- CN
- China
- Prior art keywords
- data
- server
- party
- thre
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000005516 engineering process Methods 0.000 title claims abstract description 22
- 230000002159 abnormal effect Effects 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 87
- 230000000052 comparative effect Effects 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 12
- 230000014759 maintenance of location Effects 0.000 claims description 12
- 230000000717 retained effect Effects 0.000 claims description 10
- 238000012163 sequencing technique Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 abstract description 8
- 238000012545 processing Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 231100000279 safety data Toxicity 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to the technical field of information security, and discloses a data abnormal point cleaning method based on a secure multi-party computing technology, which comprises the following steps: unifying the data of the two participants A and B into a matrix format, wherein the data have the same dimensionality, and the last dimensionality is the AVF value of the data; the method comprises the following steps that a participant A and a participant B encrypt a data matrix by using a Yao's encryption algorithm in a secure multiparty computing algorithm ABY; and the server A and the server B carry out data anomaly point cleaning on the encrypted data sets uploaded by all the participants. The invention combines the safe multi-party computing technology and the AVF abnormal value detection algorithm, utilizes the ABY algorithm of the existing safe multi-party computing tool to realize the high-efficient detection of high-dimensional data, and utilizes the Yao's encryption algorithm in the safe multi-party computing technology to ensure the safety with equivalent privacy of each party of data on the premise of ensuring certain efficiency.
Description
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a data anomaly point cleaning method based on a secure multi-party computing technology.
Background
Currently, the closest prior art: the combined data source refers to the fact that in the machine learning training process, a plurality of participants have the same type of data, the data are fused, the scale of a training data set can be enlarged, and the accuracy of a model training result is improved. The quality of the model depends on the scale and quality of the data set, so that the joint data source learning becomes a great trend of the machine learning development. However, with the advantages of the training of the joint data sources, the new problem of the security protection of the data privacy of the multiple data sources is solved, and in some scenes, the data owned by each participant may be privacy-sensitive, for example, some commercial data or privacy information of some customers, such as medical information or property information, etc., and the requirement of the data on the privacy protection is extremely high, and it is naturally difficult to share the data at will.
With the increasing demand for data fusion, algorithms aimed at protecting data privacy are also emerging. For example, a method of adding a trusted third party is adopted, a plurality of participants jointly authenticate one trusted third party, respective plaintext data is uploaded to the third party, and the third party performs tasks such as data cleaning and training, and the trusted third party is often an organization with public trust or a cloud computing provider providing charging services. The method has the advantages that the privacy protection of the data is realized, and the purpose of fusing the data is also achieved. However, the algorithm has certain security risk, a trusted third party is usually honest and curious, and if unpredictable data leakage occurs in the process of collecting and processing the data or a malicious third party steals data information, serious consequences are often caused.
With the integration of technologies in various fields, the thinking of cryptography is applied to the field of joint data source training, namely, the data of each participant is encrypted by using a mature encryption algorithm, and then the encrypted data is gathered and sent to a trusted third party, the trusted third party does not have sensitive plaintext data and only has ciphertext data which seems to have no practical significance after encryption, the encryption algorithm usually adopts homomorphic encryption, namely after the plaintext is encrypted, how to operate the ciphertext is equal to that of performing the same operation on the plaintext, and the encryption method ensures the feasibility of ciphertext training, so that the privacy of the data is greatly ensured. However, the algorithm also has a practical problem, the biggest problem is the game between safety and efficiency, the result obtained by the existing homomorphic encryption algorithm usually needs to consume a large amount of time and computing resources, and the algorithm has extremely low use efficiency under the condition that the privacy requirement is not so high, and is not suitable for being popularized in a large amount.
In the prior art, an algorithm for cleaning abnormal points of multi-data source combined data by using a homomorphic encryption algorithm is provided, each party of data is encrypted by using the homomorphic encryption algorithm, and then an AVF abnormal point detection algorithm is adopted to screen and clean abnormal points in a data set, but due to the limitation of the efficiency of homomorphic encryption, the time and the computing resources required by encryption and decryption are more, so that the algorithm has lower relative computing efficiency and cannot meet the requirement of processing a large amount of data; the second prior art provides a privacy protection data cleaning scheme based on an LOF abnormal point detection algorithm, but because the privacy protection data cleaning scheme determines whether data is abnormal points or not based on data distribution density, if the dimensionality of the data is high, the existence of the abnormal points cannot be effectively distinguished according to the difference of the distribution density, and therefore the second prior art has the problem of low processing efficiency in the face of high-dimensional data sets.
In summary, the problems of the prior art are:
(1) The existing algorithm for cleaning the abnormal points of the multi-data-source combined data by using a homomorphic encryption algorithm is low in calculation efficiency and cannot meet the requirement of processing a large amount of data.
(2) The existing privacy protection data cleaning scheme based on the LOF abnormal point detection algorithm has the problem of low processing efficiency when facing a high-dimensional data set.
In view of the above problems, a new technique capable of balancing the computational efficiency and security is needed, which can improve the low efficiency and high energy consumption of the conventional homomorphic encryption algorithm, and can also ensure the necessary data privacy security requirements, and at the same time, in order to better adapt to the actual implementation example, it is also needed to support the processing of high-dimensional data.
The significance of solving the technical problems is as follows:
after the problems existing in the technology are improved, the algorithm can be more suitable for the actual use environment, the actual use efficiency is improved, the implementability of the algorithm is increased, and the privacy security of the sensitive data can be better protected.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a data anomaly point cleaning method based on a safe multi-party computing technology.
The invention is realized in such a way that a data abnormal point cleaning method based on a safe multi-party computing technology comprises the following steps:
step one, unifying data of two parties A and B into a matrix format, wherein the data have the same dimensionality, and the last dimension is the AVF value of the data;
secondly, encrypting the data matrix by the participant A and the participant B by using a Yao's encryption algorithm in a secure multi-party computing algorithm ABY;
and thirdly, the server A and the server B perform data anomaly point cleaning on the encrypted data sets uploaded by the participants.
Further, the first participant a and the participant B unify the proprietary data set formats as specified:
wherein D is 1 A dataset matrix of N x (M + 1) representing party A, a ij Representing arbitrary data in participant A dataset, avf ai AVF values representing participant A ith data, i ∈ [1, N],j∈[1,M],M,N∈N + ;D 2 A dataset matrix representing P x (M + 1) of party B, B kj Represents arbitrary data in participant B dataset, avf bk AVF values for kth data, k ∈ [1],j∈[1,M],M,P∈N + . Where the data dimensions of both participants are the same.
Further, the encrypting the own data set by the second party a and the second party B according to the specification specifically includes:
1) Encrypting data set D of participant A by Yao's encryption algorithm in ABY algorithm by utilizing secure multi-party computation 1 And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,denotes the portion of the encrypted data set that is handed over to server B, enc denotes Yao's encryption algorithm, D 1 Representing participantsA is a data set;
specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, a ij Any data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to a portion of server a,the part in which the AVF value of the ith piece of data representing the encrypted participant A is handed over to the server B, AVF ai AVF value of the ith piece of data representing party a;
2) The encrypted data set of party a is represented by:
wherein, X 10 An encrypted data set, X, representing a party A held by a server A 11 Representing a parameter held by Server BEncrypted dataset with party A, i ∈ [1, N ]],j∈[1,M],M,N∈N + ;
3) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation 2 And (3) encryption is carried out:
wherein,a section indicating that the encrypted data set is handed to the server a,denotes the portion of the encrypted data set that is handed over to server B, enc denotes Yao's encryption algorithm, D 2 A data set representing party B;
specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, B kj Any data representing party a;the AVF value representing the encrypted kth piece of data of party B is handed over to part of server a,part of server B that gives AVF value of encrypted kth data of party B to bk AVF value representing the kth piece of data for party B;
4) The encrypted data set of party B is represented by:
wherein, X 20 An encrypted data set, X, representing a party B held by a server A 21 Represents the encrypted data set of party B held by Server B, k ∈ [1],j∈[1,M],M,P∈N + ;
5) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.
Further, the third step of performing, by the server a and the server B, data anomaly point cleaning on the encrypted data sets uploaded by the participants specifically includes:
1) The server A extracts the last one-dimensional data in the encrypted data set of the party A taken by the server A:
server A uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair A 10 And (3) sequencing:
A′ 10 =Sort(A 10 );
wherein, A 10 Last-dimensional data, A ', in the encrypted data set representing party A owned by Server A' 10 Is represented by A 10 Sort the data in descending order, sort () represents Yao's encryption algorithmThe sorting algorithm of (1);
with A 10 Taking X as a reference 10 Also sorted simultaneously, i.e. according to A 10 Descending order of X 10 And after the sorting is finished:
wherein, X' 10 Is represented by X 10 Last one-dimensional data, i.e. A 10 Submitting the data set of the server A for the participant A after the reference descending sorting is completed, i belongs to [1, N ]],j∈[1,M],M,N∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 10 The data in (1) is compared with Thre in order:
Res i =Comp(A′ 10i ,Thre);
wherein, A' 10i Represents A' 10 Element in (1, N), i ∈ [],N∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm i Is represented by A' 10i If Res is compared with Thre i Value is 1 and represents A' 10i More than or equal to Thre; if Res i Value is 0 and represents A' 10i < Thre, A' 10 The data in (2) is compared with Thre in sequence until Res i =0, stop comparison, compare X' 10 First i row data retention in (1):
where I = I, is the first I row of data retained after sorting, j ∈ [1,M],M∈N + ,X″ 10 Cleaning the final data and then completing the data set of the participant A owned by the server A;
2) The server A extracts the last one-dimensional data in the encrypted data set of the party B taken by the server A:
server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A 20 And (3) sequencing:
A′ 20 =Sort(A 20 );
wherein A is 20 Last-dimensional data, A 'in encrypted data set representing party B owned by Server A' 20 Is shown as A 20 Sort the data according to the descending order, sort () represents the sorting algorithm in Yao's encryption algorithm;
with A 20 Taking X as a reference 20 Also sorted simultaneously, i.e. according to A 20 Descending order of X 20 And after the sorting is finished:
wherein, X' 20 Is represented by X 20 Last one-dimensional data, i.e. A 20 Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ]],j∈[1,M],M,P∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 20 The data in (1) is compared with Thre in order:
Res k =Comp(A′ 20k ,Thre);
wherein, A' 20k Represents A' 20 The element in (1, P) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm k Represents A' 20k If Res is the result of comparison with Thre k Value is 1 and represents A' 20k More than or equal to Thre; if Res k Value is 0 and represents A' 20k < Thre, A' 20 The data in (2) is compared with Thre in sequence until Res k =0, stop comparison, compare X' 20 First k rows of data retention:
where K = K, is the first K rows of data retained after sorting, j ∈ [1,M],M∈N + ,X″ 20 The data set of the participant B owned by the server A after the final data cleaning is finished;
3) The server B extracts the last one-dimensional data in the encrypted data set of the party A taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a 11 And (3) sequencing:
A′ 11 =Sort(A 11 );
wherein A is 11 Last-dimensional data, A 'in encrypted data set representing party A owned by server B' 11 Is shown as A 11 Sort the data according to the descending order, sort () represents the sorting algorithm in Yao's encryption algorithm;
with A 11 Taking X as a reference 11 Also sorted simultaneously, i.e. according to A 11 Descending order of X 11 After the sorting is finished:
wherein, X' 11 Is represented by X 11 Last one-dimensional data, i.e. A 11 The data set submitted to the server B by the participant A after the reference descending sorting is completed, i belongs to [1, N ]],j∈[1,M],M,N∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 11 The data in (1), comparing size to Thre in order:
Res i =Comp(A′ 11i ,Thre);
wherein, A' 11i Represents A' 11 Element in (1, N), i ∈ [],N∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm i Is represented by A' 11i If Res is the result of comparison with Thre i Value is 1 and represents A' 11i More than or equal to Thre; if Res i Value is 0 and represents A' 11i < Thre, A' 11 The data in (1) is compared with Thre in sequence until Res i =0, stop comparison, X' 11 First i row data retention in (1):
where I = I, is the first I row of data retained after sorting, j ∈ [1,M],M∈N + ,X″ 11 The data set of the participant A owned by the server B after the final data cleaning is finished;
4) The server B extracts the last one-dimensional data in the encrypted data set of the party B taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair A 21 And (3) sequencing:
A′ 21 =Sort(A 21 );
wherein A is 21 Last-dimensional data, A 'in encrypted data set representing party B owned by server B' 21 Is represented by A 21 Sort the completed data in descending order, sort () represents the sorting algorithm in Yao's encryption algorithm.
With A 21 Taking X as a reference 21 Also sorted simultaneously, i.e. according to A 21 Descending order of X 21 And after the sorting is finished:
wherein, X' 21 Is represented by X 21 Last one-dimensional data, i.e. A 21 Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1],j∈[1,M],M,P∈N + 。
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 21 The data in (1) is compared with Thre in order:
Res k =Comp(A′ 21k ,Thre);
wherein, A' 21k Is represented by A' 21 The element in (1, P) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithms k Represents A' 21k If Res is the result of comparison with Thre k Value is 1 and represents A' 21k More than or equal to Thre; if Res k Value is 0 and represents A' 21k < Thre, A' 21 The data in (1) is compared with Thre in sequence until Res k =0, stop comparison, compare X' 21 The first k rows of data retention in (1):
where K = K, j ∈ [1,M ] for the top K rows of data retained after sorting],M∈N + ,X″ 21 Cleaning the final data and then finishing the data set of the participant B owned by the server B;
5) The final X ″) 10 ,X″ 11 ,X″ 20 ,X″ 21 And cleaning the finished data set for the final data.
Another object of the present invention is to provide a machine learning system applying the data outlier cleaning method based on secure multi-party computing technology.
In summary, the advantages and positive effects of the invention are: the invention combines the safe multi-party computing technology and the AVF abnormal value detection algorithm, utilizes the ABY algorithm of the existing safe multi-party computing tool to realize the high-efficiency detection of high-dimensional data, and utilizes the Yao's encryption algorithm in the safe multi-party computing technology to ensure the safety with equivalent privacy of each party of data on the premise of ensuring certain efficiency.
TABLE 1 comparison of technical Properties
Drawings
Fig. 1 is a flowchart of a data anomaly point cleaning method based on a secure multi-party computing technology according to an embodiment of the present invention.
Fig. 2 is a scene schematic diagram of an embodiment provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problem that the existing algorithm for cleaning abnormal points of multi-data source combined data by using a homomorphic encryption algorithm is low in calculation efficiency and cannot meet the requirement of processing a large amount of data; the existing privacy protection data cleaning scheme based on the LOF abnormal point detection algorithm has the problem of low processing efficiency when facing a high-dimensional data set. The method is mainly used for realizing a safety data abnormal point cleaning algorithm under the combined data source environment; based on a safe multi-party computing technology, the data abnormal point cleaning work under the condition that a plurality of data sources are combined with a machine learning scene under the condition of privacy protection is realized.
The application of the principles of the present invention will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the method for cleaning data anomaly points based on secure multi-party computing technology according to the embodiment of the present invention includes the following steps:
s101: participant a and participant B unify the proprietary data set formats as specified: unifying the data of the two participants A and B into a matrix format, wherein the data have the same dimensionality, and the last dimensionality is an Attribute Value Frequency (AVF) Value of the data;
s102: party a and party B encrypt their own data sets as specified: the method comprises the following steps that a participant A and a participant B encrypt a data matrix by using a Yao's encryption algorithm in a secure multi-party computing algorithm ABY;
s103: and the server A and the server B carry out data anomaly point cleaning on the encrypted data sets uploaded by all the participants.
The application of the principles of the present invention will now be described in further detail with reference to specific embodiments.
The data anomaly point cleaning method based on the safe multi-party computing technology provided by the embodiment of the invention specifically comprises the following steps:
step one, a participant A and a participant B unify the formats of own data sets according to the specification:
wherein D is 1 A data set matrix of N (M + 1) representing party A, a ij Represents arbitrary data in participant a dataset, avf ai Attribute Value Frequency (AVF) Value representing participant A ith piece of data, i ∈ [1, N],j∈[1,M],M,N∈N + ;D 2 A dataset matrix representing Px (M + 1) of party B, B kj Represents arbitrary data in participant B dataset, avf bk Attribute Value Frequency (AVF) Value representing the kth data of Party A, k ∈ [1, P],j∈[1,M],M,P∈N + . Where the data dimensions (i.e., the value of M) of both participants are the same.
Step two, the participant A and the participant B encrypt the own data sets according to the specification:
2a) Data set D of participant A using Yao's encryption algorithm in secure multiparty computing encryption ABY algorithm 1 And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,denotes the portion of the encrypted data set that is handed over to server B, enc denotes Yao's encryption algorithm, D 1 A data set representing party a.
Each element is encrypted specifically according to the following equation:
wherein,a section indicating that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, a ij Arbitrary data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to part of server a,the part that the AVF value of the ith piece of data representing the encrypted participant A is handed over to the server B, AVF ai AVF value for the ith piece of data representing party a.
2b) The encrypted data set of party a is represented by:
wherein, X 10 An encrypted data set, X, representing a party A held by a server A 11 An encrypted data set representing party A held by server B, i ∈ [1, N ]],j∈[1,M],M,N∈N + 。
2c) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation 2 And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, enc representing the Yao's encryption algorithm, D 2 A data set representing party B.
Specifically, each element is encrypted according to the following formula:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, B kj Arbitrary data representing party a;the AVF value representing the encrypted kth piece of data of party B is handed over to the part of server a,part of server B that gives AVF value of encrypted kth data of party B to bk AVF value representing participant B's kth data.
2d) The encrypted data set of party B is represented by:
wherein, X 20 An encrypted data set, X, representing a party B held by a server A 21 An encrypted data set representing party B held by Server B, k ∈ [1],j∈[1,M],M,P∈N + 。
2e) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.
Step three, the server A and the server B carry out data anomaly point cleaning on the encrypted data sets uploaded by all the participants:
3a) The server a extracts the last one-dimensional data in the encrypted dataset of the party a taken by itself, that is:
server A uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair A 10 And (4) sorting:
A′ 10 =Sort(A 10 );
wherein, A 10 Last-dimensional data, A ', in the encrypted data set representing party A owned by Server A' 10 Is represented by A 10 And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A 10 Taking X as a reference 10 Also sorted simultaneously, i.e. according to A 10 Descending order of X 10 And after the sorting is finished:
wherein, X' 10 Is represented by X 10 Last one-dimensional data, i.e. A 10 The data set submitted to the server A by the participant A after the reference descending sorting is completed, i belongs to [1, N ]],j∈[1,M],M,N∈N + 。
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 10 The data in (1) is compared with Thre in order:
Res i =Comp(A′ 10i ,Thre);
wherein, A' 10i Represents A' 10 Element in (1, N), i ∈ [],N∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithms i Represents A' 10i If Res is compared with Thre i Value is 1 and represents A' 10i More than or equal to Thre; if Res i Value is 0 and represents A' 10i < Thre, A' 10 The data in (2) is compared with Thre in sequence until Res i =0, stop comparison, compare X' 10 First i row data retention in (1):
wherein, I = I, I belongs to [1, M ], and M belongs to N, for the first I row of data reserved after sorting + ,X″ 10 And cleaning the data set of the party A owned by the server A after the final data is cleaned.
3b) The server a extracts the last one-dimensional data in the encrypted data set of the party B taken by itself, that is:
server A uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair A 20 And (3) sequencing:
A′ 20 =Sort(A 20 );
wherein A is 20 Last-dimensional data, A 'in encrypted data set representing party B owned by Server A' 20 Is represented by A 20 And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A 20 Taking X as a reference 20 Also sorted simultaneously, i.e. according to A 20 Descending order of X 20 And after the sorting is finished:
wherein, X' 20 Is represented by X 20 Last one-dimensional data, i.e. A 20 Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1],j∈[1,M],M,P∈N + 。
A fixed value Thre (same as Thre above) is defined, which represents a threshold value of AVF value within normal range, and A' 20
The data in (1), comparing size to Thre in order:
Res k =Comp(A′ 20k ,Thre);
wherein, A' 20k Is represented by A' 20 The element in (1, P) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm k Represents A' 20k If Res is the result of comparison with Thre k Value is 1 and represents A' 20k More than or equal to Thre; if Res k Value is 0 and represents A' 20k < Thre, A' 20 The data in (2) is compared with Thre in sequence until Res k =0, stop comparison, compare X' 20 First k rows of data retention:
where K = K, j ∈ [1,M ] for the top K rows of data retained after sorting],M∈N + ,X″ 20 And cleaning the data set of the participant B owned by the server A after the final data is cleaned.
3c) The server B extracts the last one-dimensional data in the encrypted data set of the party a taken by itself, that is:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a 11 And (4) sorting:
A′ 11 =Sort(A 11 );
wherein A is 11 Last-dimensional data, A 'in encrypted data set representing party A owned by server B' 11 Is represented by A 11 Sort the completed data in descending order, sort () represents the sorting algorithm in Yao's encryption algorithm.
With A 11 Taking X as a reference 11 Also sorted simultaneously, i.e. according to A 11 Descending order of X 11 After the sorting is finished:
wherein, X' 11 Is represented by X 11 Last one-dimensional data, i.e. A 11 The data set submitted to the server B by the participant A after the reference descending sorting is completed, i belongs to [1, N ]],j∈[1,M],M,N∈N + 。
A fixed value Thre (same as Thre above) is defined, and A 'is a threshold value indicating that the AVF value is within the normal range' 11 The data in (1) is compared with Thre in order:
Res i =Comp(A′ 11i ,Thre);
wherein, A' 11i Represents A' 11 The element in (1, N) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm i Is represented by A' 11i If Res is compared with Thre i Value is 1, represents A' 11i More than or equal to Thre; if Res i Value is 0 and represents A' 11i < Thre, A' 11 The data in (1) is compared with Thre in sequence until Res i =0, stop comparison, X' 11 First i row data retention in (1):
where I = I, is the first I row of data retained after sorting, j ∈ [1,M],M∈N + ,X″ 11 And cleaning the data set of the party A owned by the server B after the final data is cleaned.
3d) The server B extracts the last one-dimensional data in the encrypted data set of the party B taken by itself, that is:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a 21 And (4) sorting:
A′ 21 =Sort(A 21 );
wherein A is 21 Last-dimensional data, A 'in encrypted data set representing party B owned by server B' 21 Is shown as A 21 And sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm.
With A 21 Taking X as a reference 21 Also sorted simultaneously, i.e. according to A 21 Descending order of X 21 After the sorting is finished:
wherein, X' 21 Is represented by X 21 Last one-dimensional data, i.e. A 21 Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1],j∈[1,M],M,P∈N + 。
A fixed value Thre (same as Thre above) is defined, and A 'is a threshold value indicating that the AVF value is within the normal range' 21 The data in (1), comparing size to Thre in order:
Res k =Comp(A′ 21k ,Thre);
wherein, A' 21k Is represented by A' 21 The element in (1, P) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm k Is represented by A' 21k If Res is compared with Thre k Value is 1 and represents A' 21k More than or equal to Thre; if Res k Value is 0 and represents A' 21k < Thre, A' 21k The data in (1) is compared with Thre in sequence until Res k =0, stop comparison, X' 21 First k rows of data retention:
where K = K, is the first K rows of data retained after sorting, j ∈ [1,M],M∈N + ,X″ 21 Is the most importantAnd after the final data cleaning is finished, the server B owns the data set of the participant B.
3e) The final X ″) 10 ,X″ 11 ,X″ 20 ,X″ 21 And cleaning the finished data set for the final data.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (5)
1. A data abnormal point cleaning method based on a secure multi-party computing technology is characterized by comprising the following steps:
step one, unifying data of two participants A and B into a matrix format, wherein the data have the same dimensionality, and the last dimension is the AVF value of the data;
secondly, encrypting the data matrix by the participant A and the participant B by using a Yao's encryption algorithm in a secure multi-party computing algorithm ABY;
and thirdly, the server A and the server B perform data anomaly point cleaning on the encrypted data sets uploaded by the participants.
2. The secure multi-party computing technology based data anomaly cleansing method according to claim 1, wherein said first step participant a and participant B unify their own data set formats as specified:
wherein D is 1 A data set matrix of N (M + 1) representing party A, a ij Representing arbitrary data in participant A dataset, avf ai AVF values representing participant A ith data, i ∈ [1, N],j∈[1,M],M,N∈N + ;D 2 A dataset matrix representing P x (M + 1) of party B, B kj Representing arbitrary data in the participant B dataset, avf bk AVF values for kth data, k ∈ [1],j∈[1,M],M,P∈N + (ii) a Where the data dimensions of both participants are the same.
3. The method for cleansing data anomaly points based on secure multiparty computing technology according to claim 1, wherein said second step of encrypting the owned data set by party a and party B specifically comprises:
1) Encrypting data set D of participant A by Yao's encryption algorithm in ABY algorithm by utilizing secure multi-party computation 1 And (3) encryption:
wherein,a section representing that the encrypted data set is handed to the server a,denotes the portion of the encrypted data set that is handed over to server B, enc denotes Yao's encryption algorithm, D 1 A data set representing party a;
each element is encrypted specifically according to the following equation:
wherein,means plusThe encrypted data is handed over to a part of the server a,a section showing that the encrypted data is handed to the server B, a ij Any data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to part of server a,the part in which the AVF value of the ith piece of data representing the encrypted participant A is handed over to the server B, AVF ai AVF value representing participant a's ith data;
2) The encrypted data set of party a is represented by:
wherein, X 10 An encrypted data set, X, representing a party A held by a server A 11 An encrypted data set representing party A held by server B, i ∈ [1, N ]],j∈[1,M],M,N∈N + ;
3) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation 2 And (3) encryption:
wherein,a section indicating that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, enc representing the Yao's encryption algorithm, D 2 A data set representing party B;
each element is encrypted specifically according to the following equation:
wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, B kj Arbitrary data representing party a;the AVF value representing the encrypted kth piece of data of party B is handed over to part of server a,part of server B that gives AVF value of encrypted kth data of party B to bk AVF value representing the kth piece of data for party B;
4) The encrypted data set of party B is represented by:
wherein, X 20 An encrypted data set, X, representing a party B held by a server A 21 An encrypted data set representing party B held by Server B, k ∈ [1],j∈[1,M],M,P∈N + ;
5) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.
4. The method for cleaning data anomaly points based on the secure multi-party computing technology as claimed in claim 1, wherein the third step of cleaning the data anomaly points of the encrypted data sets uploaded by the participants by the server a and the server B specifically comprises:
1) The server A extracts the last one-dimensional data in the encrypted data set of the party A taken by the server A:
server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A 10 And (4) sorting:
A′ 10 =Sort(A 10 );
wherein A is 10 Last dimension data, A 'in the encrypted dataset representing party A owned by Server A' 10 Represents A' 10 Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A 10 Taking X as a reference 10 Also sorted simultaneously, i.e. according to A 10 Descending order of X 10 And after the sorting is finished:
wherein, X' 10 Is represented by X 10 Last one-dimensional data, i.e. A 10 The data set submitted to the server A by the participant A after the reference descending sorting is completed, i belongs to [1, N ]],j∈[1,M],M,N∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 10 The data in (1) is compared with Thre in order:
Res i =Comp(A′ 10i ,Thre);
wherein, A' 10i Represents A' 10 Element of (1) or (N)],N∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithms i Is represented by A' 10i If Res is the result of comparison with Thre i Value is 1, represents A' 10i More than or equal to Thre; if Res i Value is 0 and represents A' 10i < Thre, A' 10 The data in (1) is compared with Thre in sequence until Res i =0, stop comparison, compare X' 10 First i row data retention in (1):
where I = I, is the first I row of data retained after sorting, j ∈ [1,M],M∈N + ,X″ 10 Cleaning the final data and then completing the data set of the participant A owned by the server A;
2) The server A extracts the last one-dimensional data in the encrypted data set of the party B taken by the server A:
server A uses the sorting Algorithm of the Yao's encryption Algorithm of the secure encryption Algorithm ABY to pair A' 20 And (3) sequencing:
A′ 20 =Sort(A 20 );
wherein A is 20 Last-dimensional data, A 'in encrypted data set representing party B owned by Server A' 20 Is represented by A 20 Sorting the finished data in a descending order, wherein the Sore () represents a sorting algorithm in the Yao's encryption algorithm;
with A 20 Taking X as a reference 20 Also sorted simultaneously, i.e. according to A 20 Descending order of X 20 And after the sorting is finished:
wherein, X' 20 Is represented by X 20 Last one-dimensional data, i.e. A 20 Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1],j∈[1,M],M,P∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 20 The data in (1), comparing size to Thre in order:
Res k =Comp(A′ 20k ,Thre);
wherein, A' 20k Represents A' 20 The element in (1, P) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm k Represents A' 20k If Res is the result of comparison with Thre k Value is 1 and represents A' 20k More than or equal to Thre; if Res k Value is 0 and represents A' 20k < Thre, A' 20 The data in (1) is compared with Thre in sequence until Res k =0, stop comparison, compare X' 20 First k rows of data retention:
where K = K, is the top reserved after sortingk lines of data, j ∈ [1, M ]],M∈N + ,X″ 20 The data set of the participant B owned by the server A after the final data cleaning is finished;
3) The server B extracts the last one-dimensional data in the encrypted data set of the party A taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a 11 And (3) sequencing:
A′ 11 =Sort(A 11 );
wherein A is 11 Last dimension data, A 'in the encrypted dataset representing party A owned by Server B' 11 Is represented by A 11 Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A 11 Taking X as a reference 11 Also sorted simultaneously, i.e. according to A 11 Descending order of X 11 After the sorting is finished:
wherein, A' 11 Is represented by X 11 Last one-dimensional data, i.e. A 11 The data set submitted to the server B by the participant A after the reference descending sorting is completed, i belongs to [1, N ]],j∈[1,M],M,N∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 11 The data in (1) is compared with Thre in order:
Res i =Comp(A′ 11i ,Thre);
wherein, A' 11i Is represented by A' 11 Element of (1, N), i ∈ [1, N ]],N∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithms i Is represented by A' 11i If Res is the result of comparison with Thre i Value is 1, represents A' 11i More than or equal to Thre; if Res i Value is 0 and represents A' 11i < Thre, A' 11 The data in (1) is compared with Thre in sequence until Res i =0, stop comparison, X' 11 First i row data retention in (1):
where I = I, j ∈ [1,M ] for the first I row of data retained after sorting],M∈N + ,X″ 11 The data set of the participant A owned by the server B after the final data cleaning is finished;
4) The server B extracts the last one-dimensional data in the encrypted data set of the party B taken by the server B:
server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair A 21 And (4) sorting:
A′ 21 =Sort(A 21 );
wherein, A' 21 Last-dimensional data, A 'in encrypted data set representing party B owned by server B' 21 Is shown as A 21 Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;
with A 21 Taking X as a reference 21 Also sorted simultaneously, i.e. according to A 21 Descending order of X 21 And after the sorting is finished:
wherein, X' 21 Is represented by X 21 Last one-dimensional data, i.e. A 21 Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1],j∈[1,M],M,P∈N + ;
A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range' 21 The data in (1), comparing size to Thre in order:
Res k =Comp(A′ 21k ,Thre);
wherein, A' 21k Is represented by A' 21 The element in (1, P) k ∈],P∈N + Comp () denotes the comparative size algorithm, res, in Yao's encryption algorithm k Represents A' 21k If Res is the result of comparison with Thre k Value is 1 and represents A' 21k More than or equal to Thre; if Res k Value is 0 and represents A' 21k < Thre, A' 21 The data in (1) is compared with Thre in sequence until Res k =0, stop comparison, X' 21 First k rows of data retention:
where K = K, is the first K rows of data retained after sorting, j ∈ [1,M],M∈M + ,X″ 21 The data set of the participant B owned by the server B after the final data cleaning is finished;
5) The final X ″) 10 ,X″ 11 ,X″ 20 ,X″ 21 And cleaning the finished data set for the final data.
5. A machine learning system applying the method for cleaning abnormal data points based on the secure multi-party computing technology as claimed in any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156492.6A CN109992977B (en) | 2019-03-01 | 2019-03-01 | Data anomaly point cleaning method based on safe multi-party computing technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156492.6A CN109992977B (en) | 2019-03-01 | 2019-03-01 | Data anomaly point cleaning method based on safe multi-party computing technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109992977A CN109992977A (en) | 2019-07-09 |
CN109992977B true CN109992977B (en) | 2022-12-16 |
Family
ID=67130167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910156492.6A Active CN109992977B (en) | 2019-03-01 | 2019-03-01 | Data anomaly point cleaning method based on safe multi-party computing technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992977B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046409B (en) * | 2019-12-16 | 2021-04-13 | 支付宝(杭州)信息技术有限公司 | Private data multi-party security calculation method and system |
CN111125735B (en) * | 2019-12-20 | 2021-11-02 | 支付宝(杭州)信息技术有限公司 | Method and system for model training based on private data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102488A1 (en) * | 2012-01-02 | 2013-07-11 | Telecom Italia S.P.A. | Method and system for comparing images |
CN108712260A (en) * | 2018-05-09 | 2018-10-26 | 曲阜师范大学 | The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment |
CN108809628A (en) * | 2018-06-13 | 2018-11-13 | 哈尔滨工业大学深圳研究生院 | Based on the time series method for detecting abnormality and system under Secure |
-
2019
- 2019-03-01 CN CN201910156492.6A patent/CN109992977B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102488A1 (en) * | 2012-01-02 | 2013-07-11 | Telecom Italia S.P.A. | Method and system for comparing images |
CN108712260A (en) * | 2018-05-09 | 2018-10-26 | 曲阜师范大学 | The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment |
CN108809628A (en) * | 2018-06-13 | 2018-11-13 | 哈尔滨工业大学深圳研究生院 | Based on the time series method for detecting abnormality and system under Secure |
Also Published As
Publication number | Publication date |
---|---|
CN109992977A (en) | 2019-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110557245B (en) | Method and system for SPDZ fault tolerant and secure multiparty computing | |
CN112906044B (en) | Multi-party security calculation method, device, equipment and storage medium | |
CN107196926A (en) | A kind of cloud outsourcing privacy set comparative approach and device | |
Fan et al. | Identity Management Security Authentication Based on Blockchain Technologies. | |
CN113518092A (en) | Set intersection method for realizing multi-party privacy | |
CN109992977B (en) | Data anomaly point cleaning method based on safe multi-party computing technology | |
CN114548418A (en) | Secret sharing-based transverse federal IV algorithm | |
Li et al. | Privacy-aware secure anonymous communication protocol in CPSS cloud computing | |
CN114614983B (en) | Feature fusion privacy protection method based on secure multiparty calculation | |
CN114614970A (en) | Privacy data security processing method based on multi-calculator and homomorphic encryption | |
CN115664629A (en) | Homomorphic encryption-based data privacy protection method for intelligent Internet of things platform | |
CN109409111B (en) | Encrypted image-oriented fuzzy search method | |
Hsu et al. | Private data preprocessing for privacy-preserving Federated Learning | |
CN115276986B (en) | Cloud agent pool shunting re-encryption sharing method under general scene | |
CN114866312B (en) | Shared data determining method and device for protecting data privacy | |
Wang et al. | General survey on massive data encryption | |
Feng et al. | Secure outsourced principal eigentensor computation for cyber-physical-social systems | |
Liu et al. | Power grid data sharing technology based on communication data fusion | |
CN110943833B (en) | Quantum trust model construction method and computer readable storage medium | |
CN116028969B (en) | Privacy calculation method based on data encryption technology | |
CN116582341B (en) | Abnormality detection method, abnormality detection device, abnormality detection apparatus, and storage medium | |
CN114257412B (en) | Privacy protection multi-party data cooperation box-separating method, system, equipment and terminal | |
Zhao et al. | An Efficient Privacy-Preserving Data Aggregation Scheme without Trusted Authority in Smart Grid | |
CN113132102B (en) | Quantum key protection method, device and system based on three layers of keys | |
CN116471091A (en) | Block chain enabled medical internet of things multi-keyword searchable encryption method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |