CN109992977A

CN109992977A - A data anomaly point cleaning method based on secure multi-party computing technology

Info

Publication number: CN109992977A
Application number: CN201910156492.6A
Authority: CN
Inventors: 刘雪峰; 杨烨; 裴庆祺
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2019-07-09
Anticipated expiration: 2039-03-01
Also published as: CN109992977B

Abstract

The invention belongs to the technical field of information security, and discloses a method for cleaning abnormal data points based on a secure multi-party computing technology. The method for cleaning abnormal data points based on the secure multi-party computing technology includes: unifying the data of two parties A and B. It is a matrix format, with the same dimensions, and the last dimension is the AVF value of the piece of data; participant A and participant B use Yao's encryption algorithm in the secure multi-party computation algorithm ABY to encrypt the data matrix; server A and server B pair The encrypted data set uploaded by each participant is cleaned for abnormal data points. The invention combines the secure multi-party computing technology and the AVF abnormal value detection algorithm, uses the existing secure multi-party computing tool ABY algorithm, realizes the efficient detection of high-dimensional data, and uses the secure multi-party computing technology on the premise of ensuring a certain efficiency. Yao's encryption algorithm ensures the security of the data privacy of all parties.

Description

A data anomaly point cleaning method based on secure multi-party computing technology

技术领域technical field

本发明属于信息安全技术领域，尤其涉及一种基于安全多方计算技术的数据异常点清洗方法。The invention belongs to the technical field of information security, and in particular relates to a method for cleaning abnormal data points based on a secure multi-party computing technology.

背景技术Background technique

目前，最接近的现有技术：联合数据源是指机器学习训练过程中，多个参与方拥有同一类型的数据，将这些数据融合起来，可以扩大训练数据集规模，提升模型训练结果的准确度。机器学习发展至今，模型的优劣在很大程度上取决于数据集的规模与质量，因此联合数据源学习成为机器学习发展的一大趋势。但是随着联合数据源训练优势而来的，就是多数据源数据隐私安全保护的新问题，由于在一些场景下，各参与方拥有的数据也许是隐私敏感的，比如一些商业数据或者一些客户的隐私信息，如医疗信息或财产信息等，这样的数据对隐私保护的要求极高，自然也很难做到随意共享。At present, the closest existing technology: joint data source refers to the fact that in the process of machine learning training, multiple participants have the same type of data. Fusion of these data can expand the scale of the training data set and improve the accuracy of the model training results. . Since the development of machine learning, the quality of the model depends to a large extent on the scale and quality of the data set. Therefore, joint data source learning has become a major trend in the development of machine learning. However, with the advantages of joint data source training, there is a new problem of data privacy and security protection of multiple data sources. In some scenarios, the data owned by each participant may be sensitive to privacy, such as some commercial data or some customers' data. Private information, such as medical information or property information, has extremely high requirements for privacy protection, and it is naturally difficult to share it at will.

随着大家对于数据融合的需求逐渐增加，针对保护数据隐私的算法也陆续出现。如增加可信第三方的方法，多个参与方共同认证一个可信的第三方，将各自的明文数据上传给第三方，由第三方进行数据清洗、训练等任务，可信第三方往往是一些具有公信力的组织，或者一些提供收费服务的云计算提供者。这样带来的好处是实现了数据的隐私保护，同时也达到了融合数据的目的。但是这种算法存在一定的安全风险，可信第三方往往是诚实但好奇的，如果在收集到数据进行处理的过程中有不可预料的数据泄露，或者遇到恶意的第三方窃取数据信息，往往会造成严重的后果。With the increasing demand for data fusion, algorithms for protecting data privacy have emerged one after another. For example, in the method of adding a trusted third party, multiple participants jointly authenticate a trusted third party, upload their own plaintext data to the third party, and the third party performs tasks such as data cleaning and training. The trusted third party is often some Credible organizations, or some cloud computing providers that offer paid services. The advantage of this is to realize the privacy protection of data, and also achieve the purpose of data fusion. However, this algorithm has certain security risks. Trusted third parties are often honest but curious. If there is unforeseen data leakage during the process of collecting data for processing, or encountering malicious third parties stealing data information, it is often will cause serious consequences.

随着各领域技术的融会贯通，密码学的思维被应用在了联合数据源训练的领域，即使用成熟的加密算法，将各参与方的数据进行加密，再将加密数据集合起来送给可信第三方，可信第三方并不拥有敏感的明文数据，只拥有加密后看上去毫无现实意义的密文数据，加密算法往往采用同态加密，即明文加密后，对密文进行怎样的运算，等同于对明文进行同样的运算，这种加密方法保证了密文训练的可行性，这样就极大程度的保证了数据的隐私性。但是同样，这样的算法也存在现实问题，最大的问题就是安全与效率之间的博弈，目前已有的同态加密算法，得到结果往往需要耗费大量的时间和计算资源，在对隐私要求没有那么高的场景下，这种算法只有极低的使用效率，并不适合大量推广。With the integration of technologies in various fields, the idea of cryptography has been applied to the field of joint data source training, that is, using mature encryption algorithms to encrypt the data of each participant, and then collect the encrypted data and send it to trusted third parties. The three parties, the trusted third party does not own sensitive plaintext data, but only possesses ciphertext data that seems meaningless after encryption. Equivalent to performing the same operation on plaintext, this encryption method ensures the feasibility of ciphertext training, which greatly ensures the privacy of data. However, there are also practical problems with such algorithms. The biggest problem is the game between security and efficiency. The existing homomorphic encryption algorithms often require a lot of time and computing resources to obtain the results, and the privacy requirements are not so high. In high scenarios, this algorithm has very low efficiency and is not suitable for mass promotion.

现有技术一提出了一种利用同态加密算法解决多数据源联合数据异常点清洗的算法，利用同态加密算法对各方数据进行加密，然后采用AVF异常点检测算法对数据集中的异常点进行筛选和清洗，但是由于同态加密本身的效率限制，其加解密所需的时间和计算资源较多，导致该算法相对计算效率较低，不能满足大量的数据处理需求；现有技术二提出了基于LOF异常点检测算法的隐私保护数据清洗方案，但是由于其基于数据分布密度而决策数据是否为异常点的性质，如果数据的维度较高，则无法有效的根据分布密度的区别来分辨异常点的存在，因此该技术存在一定的面对高维数据集时处理效率较低的问题。Prior art 1 proposes an algorithm that uses a homomorphic encryption algorithm to solve the multi-data source joint data outlier cleaning algorithm. The homomorphic encryption algorithm is used to encrypt the data of each party, and then the AVF outlier detection algorithm is used to detect the outliers in the data set. Screening and cleaning are performed, but due to the efficiency limitation of homomorphic encryption itself, its encryption and decryption requires more time and computing resources, resulting in a relatively low computational efficiency of the algorithm, which cannot meet a large number of data processing needs; the second prior art proposes A privacy-preserving data cleaning scheme based on the LOF outlier detection algorithm is proposed, but due to the nature of whether the data is an outlier or not based on the data distribution density, if the dimension of the data is high, it cannot effectively distinguish the abnormality according to the difference in distribution density. Therefore, this technology has a certain problem of low processing efficiency in the face of high-dimensional data sets.

综上所述，现有技术存在的问题是：To sum up, the problems existing in the prior art are:

(1)现有利用同态加密算法解决多数据源联合数据异常点清洗的算法，计算效率较低，不能满足大量的数据处理需求。(1) Existing algorithms that use homomorphic encryption algorithms to solve joint data outlier cleaning of multiple data sources have low computational efficiency and cannot meet a large number of data processing needs.

(2)现有基于LOF异常点检测算法的隐私保护数据清洗方案存在面对高维数据集时处理效率较低的问题。(2) The existing privacy-preserving data cleaning schemes based on the LOF outlier detection algorithm have the problem of low processing efficiency when faced with high-dimensional data sets.

针对以上技术存在问题，需要一种能够平衡计算效率与安全性的新的技术，能够改进传统同态加密算法的低效率和高能耗，还能够保证必要的数据隐私安全需求，同时为了更好地适应实际实施实例，还需要能够支持高维数据的处理。In view of the problems of the above technologies, a new technology that can balance computing efficiency and security is required, which can improve the low efficiency and high energy consumption of traditional homomorphic encryption algorithms, and can also ensure the necessary data privacy and security requirements. To adapt to actual implementation examples, it is also necessary to support the processing of high-dimensional data.

解决上述技术问题的意义：The significance of solving the above technical problems:

针对以上技术存在的问题进行改进之后，可以使算法更加适应实际使用环境，提升了实际使用效率，增加了算法的可实施度，能够更好地保护敏感数据的隐私安全。After improving the problems existing in the above technologies, the algorithm can be more adapted to the actual use environment, improve the actual use efficiency, increase the implementability of the algorithm, and better protect the privacy and security of sensitive data.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的问题，本发明提供了一种基于安全多方计算技术的数据异常点清洗方法。Aiming at the problems existing in the prior art, the present invention provides a method for cleaning abnormal data points based on a secure multi-party computing technology.

本发明是这样实现的，一种基于安全多方计算技术的数据异常点清洗方法，所述基于安全多方计算技术的数据异常点清洗方法包括：The present invention is implemented in the following way: a method for cleaning abnormal data points based on secure multi-party computing technology, and the method for cleaning abnormal data points based on secure multi-party computing technology includes:

第一步，将A与B两个参与方的数据统一为矩阵格式，拥有相同维度，并且最后一维为该条数据的AVF值；The first step is to unify the data of the two parties A and B into a matrix format, with the same dimensions, and the last dimension is the AVF value of the data;

第二步，参与方A与参与方B利用安全多方计算算法ABY中的Yao’s加密算法对数据矩阵进行加密；In the second step, participant A and participant B use Yao's encryption algorithm in the secure multi-party computation algorithm ABY to encrypt the data matrix;

第三步，服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗。In the third step, server A and server B perform data anomaly cleaning on the encrypted data set uploaded by each participant.

进一步，所述第一步参与方A与参与方B按照规定统一自有数据集格式：Further, in the first step, participant A and participant B unify their own data set formats according to regulations:

其中，D₁表示参与方A的N×(M+1)的数据集矩阵，a_ij表示参与方A数据集中的任意数据，avf_ai表示参与方A第i条数据的AVF值，i∈[1，N]，j∈[1，M]，M，N∈N⁺；D₂表示参与方B的P×(M+1)的数据集矩阵，b_kj表示参与方B数据集中的任意数据，avf_bk表示参与方A第k条数据的AVF值，k∈[1，P]，j∈[1，M]，M，P∈N⁺。其中两个参与方的数据维度相同。Among them, D ₁ represents the N×(M+1) dataset matrix of participant A, a _ij represents any data in participant A’s dataset, avf _ai represents the AVF value of participant A’s ith data, i∈[ 1, N], j∈[1, M], M, N∈N ⁺ ; D ₂ represents the P×(M+1) dataset matrix of participant B, and b _kj represents any data in the dataset of participant B , avf _bk represents the AVF value of the kth data of participant A, k∈[1, P], j∈[1, M], M, P∈N ⁺ . Two of the parties have the same data dimension.

进一步，所述第二步参与方A与参与方B按照规定加密自有数据集具体包括：Further, in the second step, participant A and participant B encrypt their own data sets according to the regulations, including:

1)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方A的数据集D₁进行加密：1) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D ₁ of the participant A:

其中，表示加密后的数据集交给服务器A的部分，表示加密后的数据集交给服务器B的部分，Enc表示Yao’s加密算法，D₁表示参与方A的数据集；in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D ₁ represents the data set of participant A;

具体按照下式加密每一个元素：Specifically, encrypt each element as follows:

其中，表示加密后的数据交给服务器A的部分，表示加密后的数据交给服务器B的部分，a_ij表示参与方A的任意数据；表示加密后的参与方A的第i条数据的AVF值交给服务器A的部分，表示加密后的参与方A的第i条数据的AVF值交给服务器B的部分，avf_ai表示参与方A的第i条数据的AVF值；in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and a _ij represents any data of participant A; The part indicating that the encrypted AVF value of the i-th data of the participant A is handed over to the server A, Represents the part where the encrypted AVF value of the i-th data of the participant A is handed over to the server B, and avf _ai represents the AVF value of the i-th data of the participant A;

2)利用下式表示加密后的参与方A的数据集：2) Use the following formula to represent the encrypted data set of participant A:

其中，X₁₀表示服务器A持有的参与方A的加密数据集，X₁₁表示服务器B持有的参与方A的加密数据集，i∈[1，N]，j∈[1，M]，M，N∈N⁺；Among them, X ₁₀ represents the encrypted data set of participant A held by server A, X ₁₁ represents the encrypted data set of participant A held by server B, i ∈ [1, N], j ∈ [1, M], M, N∈N ⁺ ;

3)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方B的数据集D₂进行加密：3) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D ₂ of the participant B:

其中，表示加密后的数据集交给服务器A的部分，表示加密后的数据集交给服务器B的部分，Enc表示Yao’s加密算法，D₂表示参与方B的数据集；in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D ₂ represents the data set of participant B;

其中，表示加密后的数据交给服务器A的部分，表示加密后的数据交给服务器B的部分，b_kj表示参与方A的任意数据；表示加密后的参与方B的第k条数据的AVF值交给服务器A的部分，表示加密后的参与方B的第k条数据的AVF值交给服务器B的部分，avf_bk表示参与方B的第k条数据的AVF值；in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and b _kj represents any data of participant A; The part indicating that the AVF value of the k-th data of the encrypted participant B is handed over to the server A, Represents the part where the encrypted AVF value of the k-th data of the participant B is handed over to the server B, and avf _bk represents the AVF value of the k-th data of the participant B;

4)利用下式表示加密后的参与方B的数据集：4) Use the following formula to represent the encrypted data set of participant B:

其中，X₂₀表示服务器A持有的参与方B的加密数据集，X₂₁表示服务器B持有的参与方B的加密数据集，k∈[1，P]，j∈[1，M]，M，P∈N⁺；Among them, X ₂₀ represents the encrypted data set of participant B held by server A, X ₂₁ represents the encrypted data set of participant B held by server B, k ∈ [1, P], j ∈ [1, M], M, P∈N ⁺ ;

5)参与方A与参与方B分别将加密后的数据上传至对应服务器。5) Participant A and Participant B respectively upload the encrypted data to the corresponding server.

进一步，所述第三步服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗具体包括：Further, in the third step, server A and server B perform data anomaly cleaning on the encrypted data set uploaded by each participant, which specifically includes:

1)服务器A提取自己拿到的参与方A的加密数据集中的最后一维数据：1) Server A extracts the last one-dimensional data in the encrypted data set of participant A obtained by itself:

服务器A使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A₁₀进行排序：Server A sorts A ₁₀ using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:

A′₁₀＝Sort(A₁₀)；A' ₁₀ =Sort(A ₁₀ );

其中，A₁₀表示服务器A拥有的参与方A的加密数据集中最后一维数据，A′₁₀表示A₁₀按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法；Among them, A ₁₀ represents the last one-dimensional data in the encrypted data set of participant A owned by server A, A' ₁₀ represents the data after A ₁₀ is sorted in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm;

以A₁₀为基准将X₁₀也同时排序，即按照A₁₀降序排列X₁₀，排序完成后：Sort X ₁₀ at the same time based on A ₁₀ , that is, arrange X ₁₀ in descending order of A _10. After sorting is completed:

其中，X′₁₀为以X₁₀最后一维数据，即A₁₀为基准降序排序完成后的参与方A提交给服务器A的数据集，i∈[1，N]，j∈[1，M]，M，N∈N⁺；Among them, X′ ₁₀ is the data set submitted by participant A to server A after the last one-dimensional data of X ₁₀ , that is, A ₁₀ is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N∈N ⁺ ;

规定一个固定值Thre，表示AVF值在正常范围内的阈值，将A′₁₀中的数据，按顺序与Thre比较大小：A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' ₁₀ is compared with Thre in order:

Res_i＝Comp(A′_10i，Thre)；Res _i =Comp(A' _10i , Thre);

其中，A′_10i表示A′₁₀中的元素，i∈[1，N]，N∈N⁺，Comp()表示Yao’s加密算法中的比较大小算法，Res_i表示A′_10i与Thre比较的结果，若Res_i值为1，表示A′_10i≥Thre；若Res_i值为0，表示A′_10i＜Thre，将A′₁₀中的数据，按顺序与Thre比较大小，直到Res_i＝0，停止比较，将X′₁₀中的前i行数据保留：Among them, A' _10i represents the element in A' ₁₀ , i∈[1, N], N∈N ⁺ , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res _i represents the result of comparing A' _10i with Thre , if the value of Res _i is 1, it means that A' _10i ≥ Thre; if the value of Res _i is 0, it means that A' _10i <Thre, compare the data in A' ₁₀ with Thre in order, until Res _i =0, Stop the comparison and keep the first i rows of data in X' ₁₀ :

其中，I＝i，为排序之后保留的前i行数据，j∈[1，M]，M∈N⁺，X″₁₀为最终数据清洗完成后服务器A所拥有的参与方A的数据集；Among them, I=i, is the first i row data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₁₀ is the data set of participant A owned by server A after the final data cleaning is completed;

2)服务器A提取自己拿到的参与方B的加密数据集中的最后一维数据：2) Server A extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself:

服务器A使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A₂₀进行排序：Server A sorts A ₂₀ using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:

A′₂₀＝Sort(A₂₀)；A' ₂₀ =Sort(A ₂₀ );

其中，A₂₀表示服务器A拥有的参与方B的加密数据集中最后一维数据，A′₂₀表示A₂₀按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法；Among them, A ₂₀ represents the last one-dimensional data in the encrypted data set of participant B owned by server A, A' ₂₀ represents the data after A ₂₀ is sorted in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm;

以A₂₀为基准将X₂₀也同时排序，即按照A₂₀降序排列X₂₀，排序完成后：Sort X ₂₀ at the same time based on A ₂₀ , that is, sort X ₂₀ in descending order of A _20. After sorting is completed:

其中，X′₂₀为以X₂₀最后一维数据，即A₂₀为基准降序排序完成后的参与方B提交给服务器A的数据集，k∈[1，P]，j∈[1，M]，M，P∈N⁺；Among them, X′ ₂₀ is the data set submitted by participant B to server A after the last one-dimensional data of X ₂₀ , that is, A ₂₀ is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P∈N ⁺ ;

规定一个固定值Thre，表示AVF值在正常范围内的阈值，将A′₂₀中的数据，按顺序与Thre比较大小：A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' ₂₀ is compared with Thre in order:

Res_k＝Comp(A′_20k，Thre)；Res _k =Comp(A' _20k , Thre);

其中，A′_20k表示A′₂₀中的元素，k∈[1，P]，P∈N⁺，Comp()表示Yao’s加密算法中的比较大小算法，Res_k表示A′_20k与Thre比较的结果，若Res_k值为1，表示A′_20k≥Thre；若Res_k值为0，表示A′_20k＜Thre，将A′₂₀中的数据，按顺序与Thre比较大小，直到Res_k＝0，停止比较，将X′₂₀中的前k行数据保留：Among them, A' _20k represents the element in A' ₂₀ , k∈[1, P], P∈N ⁺ , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res _k represents the result of comparing A' _20k with Thre , if the value of Res _k is 1, it means that A' _20k ≥ Thre; if the value of Res _k is 0, it means that A' _20k <Thre, compare the data in A' ₂₀ with Thre in order until Res _k =0, Stop the comparison and keep the first k rows of data in X' ₂₀ :

其中，K＝k，为排序之后保留的前k行数据，j∈[1，M]，M∈N⁺，X″₂₀为最终数据清洗完成后服务器A所拥有的参与方B的数据集；Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₂₀ is the data set of participant B owned by server A after the final data cleaning is completed;

3)服务器B提取自己拿到的参与方A的加密数据集中的最后一维数据：3) Server B extracts the last one-dimensional data in the encrypted data set of participant A obtained by itself:

服务器B使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A₁₁进行排序：Server B sorts A ₁₁ using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:

A′₁₁＝Sort(A₁₁)；A' ₁₁ =Sort(A ₁₁ );

其中，A₁₁表示服务器B拥有的参与方A的加密数据集中最后一维数据，A′₁₁表示A₁₁按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法；Among them, A ₁₁ represents the last one-dimensional data in the encrypted data set of participant A owned by server B, A' ₁₁ represents the data after A ₁₁ is sorted in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm;

以A₁₁为基准将X₁₁也同时排序，即按照A₁₁降序排列X₁₁，排序完成后：Sort X ₁₁ at the same time based on A ₁₁ , that is, sort X ₁₁ in descending order of A _11. After sorting is completed:

其中，X′₁₁为以X₁₁最后一维数据，即A₁₁为基准降序排序完成后的参与方A提交给服务器B的数据集，i∈[1，N]，j∈[1，M]，M，N∈N⁺；Among them, X′ ₁₁ is the data set submitted by participant A to server B after the last one-dimensional data of X ₁₁ , that is, A ₁₁ is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N∈N ⁺ ;

规定一个固定值Thre，表示AVF值在正常范围内的阈值，将A′₁₁中的数据，按顺序与Thre比较大小：A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' ₁₁ is compared with Thre in order:

Res_i＝Comp(A′_11i，Thre)；Res _i =Comp(A' _11i , Thre);

其中，A′_11i表示A′₁₁中的元素，i∈[1，N]，N∈N⁺，Comp()表示Yao’s加密算法中的比较大小算法，Res_i表示A′_11i与Thre比较的结果，若Res_i值为1，表示A′_11i≥Thre；若Res_i值为0，表示A′_11i＜Thre，将A′₁₁中的数据，按顺序与Thre比较大小，直到Res_i＝0，停止比较，将X′₁₁中的前i行数据保留：Among them, A' _11i represents the element in A' ₁₁ , i∈[1, N], N∈N ⁺ , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res _i represents the result of comparing A' _11i with Thre , if the value of Res _i is 1, it means that A' _11i ≥ Thre; if the value of Res _i is 0, it means that A' _11i <Thre, compare the data in A' ₁₁ with Thre in order, until Res _i =0, Stop the comparison and keep the first i rows of data in _X'11 :

其中，I＝i，为排序之后保留的前i行数据，j∈[1，M]，M∈N⁺，X″₁₁为最终数据清洗完成后服务器B所拥有的参与方A的数据集；Among them, I=i, is the first i row data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₁₁ is the data set of participant A owned by server B after the final data cleaning is completed;

4)服务器B提取自己拿到的参与方B的加密数据集中的最后一维数据：4) Server B extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself:

服务器B使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A₂₁进行排序：Server B sorts A ₂₁ using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:

A′₂₁＝Sort(A₂₁)；A' ₂₁ =Sort(A ₂₁ );

其中，A₂₁表示服务器B拥有的参与方B的加密数据集中最后一维数据，A′₂₁表示A₂₁按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法。Among them, A ₂₁ represents the last one-dimensional data in the encrypted data set of participant B owned by server B, A' ₂₁ represents the data sorted by A ₂₁ in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.

以A₂₁为基准将X₂₁也同时排序，即按照A₂₁降序排列X₂₁，排序完成后：Sort X ₂₁ at the same time based on A ₂₁ , that is, sort X ₂₁ in descending order of A ₂₁ , after sorting is completed:

其中，X′₂₁为以X₂₁最后一维数据，即A₂₁为基准降序排序完成后的参与方B提交给服务器B的数据集，k∈[1，P]，j∈[1，M]，M，P∈N⁺。Among them, X′ ₂₁ is the data set submitted by participant B to server B after the last one-dimensional data of X ₂₁ , that is, A ₂₁ is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P ∈ N ⁺ .

规定一个固定值Thre，表示AVF值在正常范围内的阈值，将A′₂₁中的数据，按顺序与Thre比较大小：A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' ₂₁ is compared with Thre in order:

Res_k＝Comp(A′_21k，Thre)；Res _k =Comp(A' _21k , Thre);

其中，A′_21k表示A′₂₁中的元素，k∈[1，P]，P∈N⁺，Comp()表示Yao’s加密算法中的比较大小算法，Res_k表示A′_21k与Thre比较的结果，若Res_k值为1，表示A′_21k≥Thre；若Res_k值为0，表示A′_21k＜Thre，将A′₂₁中的数据，按顺序与Thre比较大小，直到Res_k＝0，停止比较，将X′₂₁中的前k行数据保留：Among them, A' _21k represents the element in A' ₂₁ , k∈[1, P], P∈N ⁺ , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res _k represents the result of comparing A' _21k with Thre , if the value of Res _k is 1, it means that A' _21k ≥ Thre; if the value of Res _k is 0, it means that A' _21k <Thre, compare the data in A' ₂₁ with Thre in order, until Res _k =0, Stop the comparison and keep the first k rows of data in _X'21 :

其中，K＝k，为排序之后保留的前k行数据，j∈[1，M]，M∈N⁺，X″₂₁为最终数据清洗完成后服务器B所拥有的参与方B的数据集；Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₂₁ is the data set of participant B owned by server B after the final data cleaning is completed;

5)最终得到的X″₁₀，X″₁₁，X″₂₀，X″₂₁为最终数据清洗完成后的数据集。5) The finally obtained X″ ₁₀ , X″ ₁₁ , X″ ₂₀ , and X″ ₂₁ are the data set after the final data cleaning is completed.

本发明的另一目的在于提供一种应用所述基于安全多方计算技术的数据异常点清洗方法的机器学习系统。Another object of the present invention is to provide a machine learning system applying the method for cleaning abnormal data points based on the secure multi-party computing technology.

综上所述，本发明的优点及积极效果为：本发明结合安全多方计算技术和AVF异常值检测算法，利用现有的安全多方计算工具ABY算法，实现了对高维数据的高效检测，并且在保证一定效率的前提下利用安全多方计算技术中的Yao’s加密算法保证了各方数据隐私相当的安全性。To sum up, the advantages and positive effects of the present invention are as follows: the present invention combines the secure multi-party computing technology and the AVF outlier detection algorithm, and utilizes the existing secure multi-party computing tool ABY algorithm to achieve efficient detection of high-dimensional data, and Under the premise of ensuring a certain efficiency, the Yao's encryption algorithm in the secure multi-party computing technology is used to ensure the security of the data privacy of all parties.

表1技术性能对比Table 1 Technical performance comparison

附图说明Description of drawings

图1是本发明实施例提供的基于安全多方计算技术的数据异常点清洗方法流程图。FIG. 1 is a flowchart of a method for cleaning abnormal data points based on a secure multi-party computing technology provided by an embodiment of the present invention.

图2是本发明实施例提供的实施例的场景示意图。FIG. 2 is a schematic diagram of a scenario of an embodiment provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

针对现有利用同态加密算法解决多数据源联合数据异常点清洗的算法，计算效率较低，不能满足大量的数据处理需求；现有基于LOF异常点检测算法的隐私保护数据清洗方案存在面对高维数据集时处理效率较低的问题。本发明主要用于实现联合数据源环境下的安全数据异常点清洗算法；基于安全多方计算技术，实现了隐私保护前提下的多个数据源联合机器学习场景下的数据异常点清洗工作。In view of the existing algorithms that use homomorphic encryption algorithm to solve the multi-data source joint data outlier cleaning, the computational efficiency is low and cannot meet the needs of a large number of data processing; the existing privacy protection data cleaning solutions based on the LOF outlier detection algorithm face the Inefficient processing of high-dimensional datasets. The invention is mainly used to realize the safe data abnormal point cleaning algorithm in the joint data source environment; based on the secure multi-party computing technology, the data abnormal point cleaning work under the joint machine learning scenario of multiple data sources under the premise of privacy protection is realized.

下面结合附图对本发明的应用原理作详细的描述。The application principle of the present invention will be described in detail below with reference to the accompanying drawings.

如图1所示，本发明实施例提供的基于安全多方计算技术的数据异常点清洗方法包括以下步骤：As shown in FIG. 1 , the method for cleaning abnormal data points based on the secure multi-party computing technology provided by the embodiment of the present invention includes the following steps:

S101：参与方A与参与方B按照规定统一自有数据集格式：将A与B两个参与方的数据统一为矩阵格式，拥有相同维度，并且最后一维为该条数据的Attribute ValueFrequency(AVF)值；S101: Participant A and Participant B unify their own data set formats according to the regulations: unify the data of the two parties A and B into a matrix format with the same dimensions, and the last dimension is the Attribute Value Frequency (AVF )value;

S102：参与方A与参与方B按照规定加密自有数据集：参与方A与参与方B利用安全多方计算算法ABY中的Yao’s加密算法对数据矩阵进行加密；S102: Participant A and Participant B encrypt their own data sets according to regulations: Participant A and Participant B encrypt the data matrix using Yao's encryption algorithm in the secure multi-party computing algorithm ABY;

S103：服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗。S103: Server A and server B perform data anomaly point cleaning on the encrypted data set uploaded by each participant.

下面结合具体实施例对本发明的应用原理作进一步的描述。The application principle of the present invention will be further described below with reference to specific embodiments.

本发明实施例提供的基于安全多方计算技术的数据异常点清洗方法具体包括以下步骤：The method for cleaning abnormal data points based on the secure multi-party computing technology provided by the embodiment of the present invention specifically includes the following steps:

步骤一，参与方A与参与方B按照规定统一自有数据集格式：Step 1, Participant A and Participant B unify their own data set format according to the regulations:

其中，D₁表示参与方A的N×(M+1)的数据集矩阵，a_ij表示参与方A数据集中的任意数据，avf_ai表示参与方A第i条数据的Attribute Value Frequency(AVF)值，i∈[1，N]，j∈[1，M]，M，N∈N⁺；D₂表示参与方B的P×(M+1)的数据集矩阵，b_kj表示参与方B数据集中的任意数据，avf_bk表示参与方A第k条数据的Attribute Value Frequency(AVF)值，k∈[1，P]，j∈[1，M]，M，P∈N⁺。其中两个参与方的数据维度(即M的值)相同。Among them, D ₁ represents the N×(M+1) data set matrix of the participant A, a _ij represents any data in the data set of the participant A, and avf _ai represents the Attribute Value Frequency (AVF) of the i-th data of the participant A Value, i∈[1, N], j∈[1, M], M, N∈N ⁺ ; D ₂ represents the P×(M+1) dataset matrix of participant B, and b _kj represents participant B Arbitrary data in the dataset, avf _bk represents the Attribute Value Frequency (AVF) value of the kth data of participant A, k∈[1, P], j∈[1, M], M, P∈N ⁺ . The data dimension (ie, the value of M) of the two parties is the same.

步骤二，参与方A与参与方B按照规定加密自有数据集：Step 2, Participant A and Participant B encrypt their own data sets according to the regulations:

2a)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方A的数据集D₁进行加密：2a) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D ₁ of the participant A:

其中，表示加密后的数据集交给服务器A的部分，表示加密后的数据集交给服务器B的部分，Enc表示Yao’s加密算法，D₁表示参与方A的数据集。in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D ₁ represents the data set of participant A.

其中，表示加密后的数据交给服务器A的部分，表示加密后的数据交给服务器B的部分，a_ij表示参与方A的任意数据；表示加密后的参与方A的第i条数据的AVF值交给服务器A的部分，表示加密后的参与方A的第i条数据的AVF值交给服务器B的部分，avf_ai表示参与方A的第i条数据的AVF值。in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and a _ij represents any data of participant A; The part indicating that the encrypted AVF value of the i-th data of the participant A is handed over to the server A, The AVF value representing the ith piece of data of the encrypted participant A is handed over to the server B, and avf _ai represents the AVF value of the ith piece of data of the participant A.

2b)利用下式表示加密后的参与方A的数据集：2b) Use the following formula to represent the encrypted data set of participant A:

其中，X₁₀表示服务器A持有的参与方A的加密数据集，X₁₁表示服务器B持有的参与方A的加密数据集，i∈[1，N]，j∈[1，M]，M，N∈N⁺。Among them, X ₁₀ represents the encrypted data set of participant A held by server A, X ₁₁ represents the encrypted data set of participant A held by server B, i ∈ [1, N], j ∈ [1, M], M, N∈N ⁺ .

2c)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方B的数据集D₂进行加密：2c) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D ₂ of the participant B:

其中，表示加密后的数据集交给服务器A的部分，表示加密后的数据集交给服务器B的部分，Enc表示Yao’s加密算法，D₂表示参与方B的数据集。in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D ₂ represents the data set of participant B.

其中，表示加密后的数据交给服务器A的部分，表示加密后的数据交给服务器B的部分，b_kj表示参与方A的任意数据；表示加密后的参与方B的第k条数据的AVF值交给服务器A的部分，表示加密后的参与方B的第k条数据的AVF值交给服务器B的部分，avf_bk表示参与方B的第k条数据的AVF值。in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and b _kj represents any data of participant A; The part indicating that the AVF value of the k-th data of the encrypted participant B is handed over to the server A, The AVF value representing the k-th piece of data of the encrypted participant B is handed over to the server B, and avf _bk represents the AVF value of the k-th piece of data of the participant B.

2d)利用下式表示加密后的参与方B的数据集：2d) Use the following formula to represent the encrypted data set of Party B:

其中，X₂₀表示服务器A持有的参与方B的加密数据集，X₂₁表示服务器B持有的参与方B的加密数据集，k∈[1，P]，j∈[1，M]，M，P∈N⁺。Among them, X ₂₀ represents the encrypted data set of participant B held by server A, X ₂₁ represents the encrypted data set of participant B held by server B, k ∈ [1, P], j ∈ [1, M], M, P∈N ⁺ .

2e)参与方A与参与方B分别将加密后的数据上传至对应服务器。2e) Participant A and Participant B respectively upload the encrypted data to the corresponding server.

步骤三，服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗：Step 3, server A and server B clean the encrypted data set uploaded by each participant for abnormal data points:

3a)服务器A提取自己拿到的参与方A的加密数据集中的最后一维数据，即：3a) Server A extracts the last one-dimensional data in the encrypted data set of Participant A obtained by itself, namely:

A′₁₀＝Sort(A₁₀)；A' ₁₀ =Sort(A ₁₀ );

其中，A₁₀表示服务器A拥有的参与方A的加密数据集中最后一维数据，A′₁₀表示A₁₀按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法。Among them, A ₁₀ represents the last one-dimensional data in the encrypted data set of participant A owned by server A, A' ₁₀ represents the data sorted by A ₁₀ in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.

其中，X′₁₀为以X₁₀最后一维数据，即A₁₀为基准降序排序完成后的参与方A提交给服务器A的数据集，i∈[1，N]，j∈[1，M]，M，N∈N⁺。Among them, X′ ₁₀ is the data set submitted by participant A to server A after the last one-dimensional data of X ₁₀ , that is, A ₁₀ is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N ∈ N ⁺ .

Res_i＝Comp(A′_10i，Thre)；Res _i =Comp(A' _10i , Thre);

其中，I＝i，为排序之后保留的前i行数据，j∈[1，M}，M∈N⁺，X″₁₀为最终数据清洗完成后服务器A所拥有的参与方A的数据集。Among them, I=i, is the first i row of data retained after sorting, j∈[1, M}, M∈N ⁺ , X″ ₁₀ is the data set of participant A owned by server A after the final data cleaning is completed.

3b)服务器A提取自己拿到的参与方B的加密数据集中的最后一维数据，即：3b) Server A extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself, namely:

A′₂₀＝Sort(A₂₀)；A' ₂₀ =Sort(A ₂₀ );

其中，A₂₀表示服务器A拥有的参与方B的加密数据集中最后一维数据，A′₂₀表示A₂₀按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法。Among them, A ₂₀ represents the last one-dimensional data in the encrypted data set of participant B owned by server A, A' ₂₀ represents the data sorted by A ₂₀ in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.

其中，X′₂₀为以X₂₀最后一维数据，即A₂₀为基准降序排序完成后的参与方B提交给服务器A的数据集，k∈[1，P]，j∈[1，M]，M，P∈N⁺。Among them, X′ ₂₀ is the data set submitted by participant B to server A after the last one-dimensional data of X ₂₀ , that is, A ₂₀ is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P ∈ N ⁺ .

规定一个固定值Thre(同上文Thre)，表示AVF值在正常范围内的阈值，将A′₂₀ A fixed value Thre (same as Thre above) is specified, indicating the threshold value of the AVF value within the normal range, and A′ ₂₀

中的数据，按顺序与Thre比较大小：The data in , compare the size with Thre in order:

Res_k＝Comp(A′_20k，Thre)；Res _k =Comp(A' _20k , Thre);

其中，K＝k，为排序之后保留的前k行数据，j∈[1，M]，M∈N⁺，X″₂₀为最终数据清洗完成后服务器A所拥有的参与方B的数据集。Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₂₀ is the data set of participant B owned by server A after the final data cleaning is completed.

3c)服务器B提取自己拿到的参与方A的加密数据集中的最后一维数据，即：3c) Server B extracts the last one-dimensional data in the encrypted data set of Participant A obtained by itself, namely:

A′₁₁＝Sort(A₁₁)；A' ₁₁ =Sort(A ₁₁ );

其中，A₁₁表示服务器B拥有的参与方A的加密数据集中最后一维数据，A′₁₁表示A₁₁按照降序排序完成后的数据，Sort()表示Yao’s加密算法中的排序算法。Among them, A ₁₁ represents the last one-dimensional data in the encrypted data set of participant A owned by server B, A' ₁₁ represents the data sorted by A ₁₁ in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.

其中，X′₁₁为以X₁₁最后一维数据，即A₁₁为基准降序排序完成后的参与方A提交给服务器B的数据集，i∈[1，N]，j∈[1，M]，M，N∈N⁺。Among them, X′ ₁₁ is the data set submitted by participant A to server B after the last one-dimensional data of X ₁₁ , that is, A ₁₁ is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N ∈ N ⁺ .

规定一个固定值Thre(同上文Thre)，表示AVF值在正常范围内的阈值，将A′₁₁中的数据，按顺序与Thre比较大小：A fixed value Thre (same as Thre above) is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' ₁₁ is compared with Thre in order:

Res_i＝Comp(A′_11i，Thre)；Res _i =Comp(A' _11i , Thre);

其中，A′_11i表示A′₁₁中的元素，k∈[1，N]，P∈N⁺，Comp()表示Yao’s加密算法中的比较大小算法，Res_i表示A′_11i与Thre比较的结果，若Res_i值为1，表示A′_11i≥Thre；若Res_i值为0，表示A′_11i＜Thre，将A′₁₁中的数据，按顺序与Thre比较大小，直到Res_i＝0，停止比较，将X′₁₁中的前i行数据保留：Among them, A' _11i represents the element in A' ₁₁ , k∈[1, N], P∈N ⁺ , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res _i represents the result of comparing A' _11i with Thre , if the value of Res _i is 1, it means that A' _11i ≥ Thre; if the value of Res _i is 0, it means that A' _11i <Thre, compare the data in A' ₁₁ with Thre in order, until Res _i =0, Stop the comparison and keep the first i rows of data in _X'11 :

其中，I＝i，为排序之后保留的前i行数据，j∈[1，M]，M∈N⁺，X″₁₁为最终数据清洗完成后服务器B所拥有的参与方A的数据集。Among them, I=i, is the first i row data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₁₁ is the data set of participant A owned by server B after the final data cleaning is completed.

3d)服务器B提取自己拿到的参与方B的加密数据集中的最后一维数据，即：3d) Server B extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself, namely:

A′₂₁＝Sort(A₂₁)；A' ₂₁ =Sort(A ₂₁ );

规定一个固定值Thre(同上文Thre)，表示AVF值在正常范围内的阈值，将A′₂₁中的数据，按顺序与Thre比较大小：A fixed value Thre (same as Thre above) is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' ₂₁ is compared with Thre in order:

Res_k＝Comp(A′_21k，Thre)；Res _k =Comp(A' _21k , Thre);

其中，A′_21k表示A′₂₁中的元素，k∈[1，P]，P∈N⁺，Comp()表示Yao’s加密算法中的比较大小算法，Res_k表示A′_21k与Thre比较的结果，若Res_k值为1，表示A′_21k≥Thre；若Res_k值为0，表示A′_21k＜Thre，将A′_21k中的数据，按顺序与Thre比较大小，直到Res_k＝0，停止比较，将X′₂₁中的前k行数据保留：Among them, A' _21k represents the element in A' ₂₁ , k∈[1, P], P∈N ⁺ , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res _k represents the result of comparing A' _21k with Thre , if the value of Res _k is 1, it means that A' _21k ≥ Thre; if the value of Res _k is 0, it means that A' _21k <Thre, compare the data in A' _21k with Thre in order, until Res _k =0, Stop the comparison and keep the first k rows of data in _X'21 :

其中，K＝k，为排序之后保留的前k行数据，j∈[1，M]，M∈N⁺，X″₂₁为最终数据清洗完成后服务器B所拥有的参与方B的数据集。Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N ⁺ , X″ ₂₁ is the data set of participant B owned by server B after the final data cleaning is completed.

3e)最终得到的X″₁₀，X″₁₁，X″₂₀，X″₂₁为最终数据清洗完成后的数据集。3e) The finally obtained X″ ₁₀ , X″ ₁₁ , X″ ₂₀ , and X″ ₂₁ are the data set after the final data cleaning is completed.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. A data abnormal point cleaning method based on a secure multi-party computing technology is characterized by comprising the following steps:

step one, unifying data of two parties A and B into a matrix format, wherein the data have the same dimensionality, and the last dimension is the AVF value of the data;

secondly, encrypting the data matrix by the participant A and the participant B by using a Yao's encryption algorithm in a secure multi-party computing algorithm ABY;

and thirdly, the server A and the server B perform data anomaly point cleaning on the encrypted data sets uploaded by the participants.

2. The secure multi-party computing technology based data anomaly cleansing method according to claim 1, wherein said first step participant a and participant B unify their own data set formats as specified:

wherein D is₁A data set matrix of N (M +1) representing party A, a_ijRepresenting arbitrary data in the participant a dataset, avf_aiAVF value representing participant A ith data, i ∈ [1, N]，j∈[1，M]，M，N∈N⁺；D₂A dataset matrix representing P x (M +1) of party B, B_kjRepresenting arbitrary data in the participant B dataset, avf_bkAVF value representing data of k-th item of participant A, k ∈ [1, P]，j∈[1，M]，M，P∈N⁺(ii) a Where the data dimensions of both participants are the same.

3. The method for cleansing data anomaly points based on secure multiparty computing technology according to claim 1, wherein said second step of encrypting the owned data set by party a and party B specifically comprises:

1) encrypting data set D of participant A by Yao's encryption algorithm in ABY algorithm by utilizing secure multi-party computation₁And (3) encryption:

wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D₁A data set representing party a;

specifically, each element is encrypted according to the following formula:

wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, a_ijAny data representing party a;the AVF value representing the encrypted ith piece of data of party a is handed to part of server a,part of server B handed over the AVF value representing encrypted participant a's ith piece of data, AVF_aiAVF value representing participant a's ith data;

2) the encrypted data set of party a is represented by:

wherein, X₁₀An encrypted data set, X, representing a party A held by a server A₁₁An encrypted data set representing party A held by Server B, i ∈ [1, N]，j∈[1，M]，M，N∈N⁺；

3) Encrypting data set D of participant B using Yao's encryption algorithm in ABY algorithm using secure multi-party computation₂And (3) encryption:

wherein,a section representing that the encrypted data set is handed to the server a,representing the part of the encrypted data set handed to server B, Enc representing the Yao's encryption algorithm, D₂A data set representing party B;

specifically, each element is encrypted according to the following formula:

wherein,a section showing that the encrypted data is handed to the server a,a section showing that the encrypted data is handed to the server B, B_kjAny data representing party a;the AVF value representing the encrypted kth piece of data of party B is handed over to part of server a,part of server B handed over the AVF value representing the encrypted kth piece of data of party B, AVF_bkAVF value representing the kth piece of data for party B;

4) the encrypted data set of party B is represented by:

wherein, X₂₀An encrypted data set, X, representing a party B held by a server A₂₁An encrypted data set representing party B held by Server B, k ∈ [1, P ∈]，j∈[1，M]，M，P∈N⁺；

5) And the participant A and the participant B respectively upload the encrypted data to corresponding servers.

4. The method for cleaning data anomaly points based on the secure multi-party computing technology as claimed in claim 1, wherein the third step of cleaning the data anomaly points of the encrypted data sets uploaded by the participants by the server a and the server B specifically comprises:

1) the server A extracts the last one-dimensional data in the encrypted data set of the party A taken by the server A:

server A uses the sorting algorithm in Yao's encryption algorithm in security encryption algorithm ABY to pair A₁₀And (3) sequencing:

A′₁₀＝Sort(A₁₀)；

wherein A is₁₀Last-dimensional data, A ', in the encrypted data set representing party A owned by Server A'₁₀Represents A'₁₀Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;

with A₁₀Taking X as a reference₁₀Also sorted simultaneously, i.e. according to A₁₀Descending order of X₁₀And after the sorting is finished:

wherein, X'₁₀Is represented by X₁₀Last one-dimensional data, i.e. A₁₀Submitting the data set of the server A for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈]，j∈[1，M]，M，N∈N⁺；

A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'₁₀The data in (1) is compared with Thre in order:

Res_i＝Comp(A′_10i，Thre)；

wherein, A'_10iRepresents A'₁₀The element in (1) is i' [1, N ]]，N∈N⁺Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithm_iRepresents A'_10iIf Res is the result of comparison with Thre_iValue is 1 and represents A'_10iMore than or equal to Thre; if Res_iValue is 0 and represents A'_10i< Thre, A'₁₀The data in (1) is compared with Thre in sequence until Res_i0, stop comparison, and mix X'₁₀First i row data retention in (1):

wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈]，M∈N⁺，X″₁₀Of party A owned by Server A after completion of cleaning for final dataA data set;

2) the server A extracts the last one-dimensional data in the encrypted data set of the party B taken by the server A:

server A uses the sorting Algorithm of the Yao's encryption Algorithm of the secure encryption Algorithm ABY to pair A'₂₀And (3) sequencing:

A′₂₀＝Sort(A₂₀)；

wherein A is₂₀Last-dimensional data, A 'in encrypted data set representing party B owned by Server A'₂₀Is represented by A₂₀Sorting the finished data in a descending order, wherein the Sore () represents a sorting algorithm in the Yao's encryption algorithm;

with A₂₀Taking X as a reference₂₀Also sorted simultaneously, i.e. according to A₂₀Descending order of X₂₀And after the sorting is finished:

wherein, X'₂₀Is represented by X₂₀Last one-dimensional data, i.e. A₂₀Submitting the data set of the server A for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ]]，j∈[1，M]，M，P∈N⁺；

A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'₂₀The data in (1) is compared with Thre in order:

Res_k＝Comp(A′_20k，Thre)；

wherein, A'_20kRepresents A'₂₀Element (b) k ∈ [1, P ]]，P∈N⁺Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithm_kRepresents A'_20kIf Res is the result of comparison with Thre_kValue is 1 and represents A'_20kMore than or equal to Thre; if Res_kValue is 0 and represents A'_20k< Thre, A'₂₀In order of the data inSize comparison with Thre until Res_k0, stop comparison, and mix X'₂₀First k rows of data retention:

where K is K, the first K rows of data retained after sorting, j e [1, M]，M∈N⁺，X″₂₀The data set of the participant B owned by the server A after the final data cleaning is finished;

3) the server B extracts the last one-dimensional data in the encrypted data set of the party A taken by the server B:

server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a₁₁And (3) sequencing:

A′₁₁＝Sort(A₁₁)；

wherein A is₁₁Last-dimensional data, A 'in encrypted data set representing party A owned by server B'₁₁Is represented by A₁₁Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;

with A₁₁Taking X as a reference₁₁Also sorted simultaneously, i.e. according to A₁₁Descending order of X₁₁And after the sorting is finished:

wherein, A'₁₁Is represented by X₁₁Last one-dimensional data, i.e. A₁₁Submitting the data set of the server B for the participant A after the reference descending sorting is completed, wherein i belongs to [1, N ∈]，j∈[1，M]，M，N∈N⁺；

A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'₁₁The data in (1) is compared with Thre in order:

Res_i＝Comp(A′_11i，Thre)；

wherein, A'_11iRepresents A'₁₁Element in (1, N) is i ∈ [, N ∈ [ ]]，N∈N⁺Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithm_iRepresents A'_11iIf Res is the result of comparison with Thre_iValue is 1 and represents A'_11iMore than or equal to Thre; if Res_iValue is 0 and represents A'_11i< Thre, A'₁₁The data in (1) is compared with Thre in sequence until Res_i0, stop comparison, and mix X'₁₁First i row data retention in (1):

wherein, I is I, the first I row of data reserved after sorting, j belongs to [1, M ∈]，M∈N⁺，X″₁₁The data set of the participant A owned by the server B after the final data cleaning is finished;

4) the server B extracts the last one-dimensional data in the encrypted data set of the party B taken by the server B:

server B uses the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY to pair a₂₁And (3) sequencing:

A′₂₁＝Sort(A₂₁)；

wherein, A'₂₁Last-dimensional data, A 'in encrypted data set representing party B owned by server B'₂₁Is represented by A₂₁Sorting the finished data in a descending order, wherein the Sort () represents a sorting algorithm in the Yao's encryption algorithm;

with A₂₁Taking X as a reference₂₁Also sorted simultaneously, i.e. according to A₂₁Descending order of X₂₁And after the sorting is finished:

wherein, X'₂₁Is represented by X₂₁Last one-dimensional data, i.e. A₂₁Submitting the data set of the server B for the participant B after the reference descending sorting is completed, wherein k belongs to [1, P ∈]，j∈[1，M]，M，P∈N⁺；

A 'is defined as a fixed value Thre representing a threshold value of AVF value within a normal range'₂₁The data in (1) is compared with Thre in order:

Res_k＝Comp(A′_21k，Thre)；

wherein, A'_21kRepresents A'₂₁Element (b) k ∈ [1, P ]]，P∈N⁺Comp () denotes the comparative size algorithm, Res, in Yao's encryption algorithm_kRepresents A'_21kIf Res is the result of comparison with Thre_kValue is 1 and represents A'_21kMore than or equal to Thre; if Res_kValue is 0 and represents A'_21k< Thre, A'₂₁The data in (1) is compared with Thre in sequence until Res_k0, stop comparison, and mix X'₂₁First k rows of data retention:

where K is K, the first K rows of data retained after sorting, j e [1, M]，M∈M⁺，X″₂₁The data set of the participant B owned by the server B after the final data cleaning is finished;

5) the final X ″)₁₀，X″₁₁，X″₂₀，X″₂₁And cleaning the finished data set for the final data.

5. A machine learning system applying the data anomaly point cleaning method based on the secure multi-party computing technology according to any one of claims 1 to 4.