CN109992977A - A data anomaly point cleaning method based on secure multi-party computing technology - Google Patents
A data anomaly point cleaning method based on secure multi-party computing technology Download PDFInfo
- Publication number
- CN109992977A CN109992977A CN201910156492.6A CN201910156492A CN109992977A CN 109992977 A CN109992977 A CN 109992977A CN 201910156492 A CN201910156492 A CN 201910156492A CN 109992977 A CN109992977 A CN 109992977A
- Authority
- CN
- China
- Prior art keywords
- data
- server
- participant
- party
- thre
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 44
- 238000005516 engineering process Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 139
- 230000002159 abnormal effect Effects 0.000 claims abstract description 16
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 83
- 239000000284 extract Substances 0.000 claims description 12
- 230000000717 retained effect Effects 0.000 claims description 10
- 238000010801 machine learning Methods 0.000 claims description 6
- 230000000052 comparative effect Effects 0.000 claims 4
- 230000014759 maintenance of location Effects 0.000 claims 4
- 239000000203 mixture Substances 0.000 claims 4
- 238000012163 sequencing technique Methods 0.000 claims 4
- 238000001514 detection method Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 238000013450 outlier detection Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本发明属于信息安全技术领域,公开了一种基于安全多方计算技术的数据异常点清洗方法,所述基于安全多方计算技术的数据异常点清洗方法包括:将A与B两个参与方的数据统一为矩阵格式,拥有相同维度,并且最后一维为该条数据的AVF值;参与方A与参与方B利用安全多方计算算法ABY中的Yao’s加密算法对数据矩阵进行加密;服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗。本发明结合安全多方计算技术和AVF异常值检测算法,利用现有的安全多方计算工具ABY算法,实现了对高维数据的高效检测,并且在保证一定效率的前提下利用安全多方计算技术中的Yao’s加密算法保证了各方数据隐私相当的安全性。
The invention belongs to the technical field of information security, and discloses a method for cleaning abnormal data points based on a secure multi-party computing technology. The method for cleaning abnormal data points based on the secure multi-party computing technology includes: unifying the data of two parties A and B. It is a matrix format, with the same dimensions, and the last dimension is the AVF value of the piece of data; participant A and participant B use Yao's encryption algorithm in the secure multi-party computation algorithm ABY to encrypt the data matrix; server A and server B pair The encrypted data set uploaded by each participant is cleaned for abnormal data points. The invention combines the secure multi-party computing technology and the AVF abnormal value detection algorithm, uses the existing secure multi-party computing tool ABY algorithm, realizes the efficient detection of high-dimensional data, and uses the secure multi-party computing technology on the premise of ensuring a certain efficiency. Yao's encryption algorithm ensures the security of the data privacy of all parties.
Description
技术领域technical field
本发明属于信息安全技术领域,尤其涉及一种基于安全多方计算技术的数据异常点清洗方法。The invention belongs to the technical field of information security, and in particular relates to a method for cleaning abnormal data points based on a secure multi-party computing technology.
背景技术Background technique
目前,最接近的现有技术:联合数据源是指机器学习训练过程中,多个参与方拥有同一类型的数据,将这些数据融合起来,可以扩大训练数据集规模,提升模型训练结果的准确度。机器学习发展至今,模型的优劣在很大程度上取决于数据集的规模与质量,因此联合数据源学习成为机器学习发展的一大趋势。但是随着联合数据源训练优势而来的,就是多数据源数据隐私安全保护的新问题,由于在一些场景下,各参与方拥有的数据也许是隐私敏感的,比如一些商业数据或者一些客户的隐私信息,如医疗信息或财产信息等,这样的数据对隐私保护的要求极高,自然也很难做到随意共享。At present, the closest existing technology: joint data source refers to the fact that in the process of machine learning training, multiple participants have the same type of data. Fusion of these data can expand the scale of the training data set and improve the accuracy of the model training results. . Since the development of machine learning, the quality of the model depends to a large extent on the scale and quality of the data set. Therefore, joint data source learning has become a major trend in the development of machine learning. However, with the advantages of joint data source training, there is a new problem of data privacy and security protection of multiple data sources. In some scenarios, the data owned by each participant may be sensitive to privacy, such as some commercial data or some customers' data. Private information, such as medical information or property information, has extremely high requirements for privacy protection, and it is naturally difficult to share it at will.
随着大家对于数据融合的需求逐渐增加,针对保护数据隐私的算法也陆续出现。如增加可信第三方的方法,多个参与方共同认证一个可信的第三方,将各自的明文数据上传给第三方,由第三方进行数据清洗、训练等任务,可信第三方往往是一些具有公信力的组织,或者一些提供收费服务的云计算提供者。这样带来的好处是实现了数据的隐私保护,同时也达到了融合数据的目的。但是这种算法存在一定的安全风险,可信第三方往往是诚实但好奇的,如果在收集到数据进行处理的过程中有不可预料的数据泄露,或者遇到恶意的第三方窃取数据信息,往往会造成严重的后果。With the increasing demand for data fusion, algorithms for protecting data privacy have emerged one after another. For example, in the method of adding a trusted third party, multiple participants jointly authenticate a trusted third party, upload their own plaintext data to the third party, and the third party performs tasks such as data cleaning and training. The trusted third party is often some Credible organizations, or some cloud computing providers that offer paid services. The advantage of this is to realize the privacy protection of data, and also achieve the purpose of data fusion. However, this algorithm has certain security risks. Trusted third parties are often honest but curious. If there is unforeseen data leakage during the process of collecting data for processing, or encountering malicious third parties stealing data information, it is often will cause serious consequences.
随着各领域技术的融会贯通,密码学的思维被应用在了联合数据源训练的领域,即使用成熟的加密算法,将各参与方的数据进行加密,再将加密数据集合起来送给可信第三方,可信第三方并不拥有敏感的明文数据,只拥有加密后看上去毫无现实意义的密文数据,加密算法往往采用同态加密,即明文加密后,对密文进行怎样的运算,等同于对明文进行同样的运算,这种加密方法保证了密文训练的可行性,这样就极大程度的保证了数据的隐私性。但是同样,这样的算法也存在现实问题,最大的问题就是安全与效率之间的博弈,目前已有的同态加密算法,得到结果往往需要耗费大量的时间和计算资源,在对隐私要求没有那么高的场景下,这种算法只有极低的使用效率,并不适合大量推广。With the integration of technologies in various fields, the idea of cryptography has been applied to the field of joint data source training, that is, using mature encryption algorithms to encrypt the data of each participant, and then collect the encrypted data and send it to trusted third parties. The three parties, the trusted third party does not own sensitive plaintext data, but only possesses ciphertext data that seems meaningless after encryption. Equivalent to performing the same operation on plaintext, this encryption method ensures the feasibility of ciphertext training, which greatly ensures the privacy of data. However, there are also practical problems with such algorithms. The biggest problem is the game between security and efficiency. The existing homomorphic encryption algorithms often require a lot of time and computing resources to obtain the results, and the privacy requirements are not so high. In high scenarios, this algorithm has very low efficiency and is not suitable for mass promotion.
现有技术一提出了一种利用同态加密算法解决多数据源联合数据异常点清洗的算法,利用同态加密算法对各方数据进行加密,然后采用AVF异常点检测算法对数据集中的异常点进行筛选和清洗,但是由于同态加密本身的效率限制,其加解密所需的时间和计算资源较多,导致该算法相对计算效率较低,不能满足大量的数据处理需求;现有技术二提出了基于LOF异常点检测算法的隐私保护数据清洗方案,但是由于其基于数据分布密度而决策数据是否为异常点的性质,如果数据的维度较高,则无法有效的根据分布密度的区别来分辨异常点的存在,因此该技术存在一定的面对高维数据集时处理效率较低的问题。Prior art 1 proposes an algorithm that uses a homomorphic encryption algorithm to solve the multi-data source joint data outlier cleaning algorithm. The homomorphic encryption algorithm is used to encrypt the data of each party, and then the AVF outlier detection algorithm is used to detect the outliers in the data set. Screening and cleaning are performed, but due to the efficiency limitation of homomorphic encryption itself, its encryption and decryption requires more time and computing resources, resulting in a relatively low computational efficiency of the algorithm, which cannot meet a large number of data processing needs; the second prior art proposes A privacy-preserving data cleaning scheme based on the LOF outlier detection algorithm is proposed, but due to the nature of whether the data is an outlier or not based on the data distribution density, if the dimension of the data is high, it cannot effectively distinguish the abnormality according to the difference in distribution density. Therefore, this technology has a certain problem of low processing efficiency in the face of high-dimensional data sets.
综上所述,现有技术存在的问题是:To sum up, the problems existing in the prior art are:
(1)现有利用同态加密算法解决多数据源联合数据异常点清洗的算法,计算效率较低,不能满足大量的数据处理需求。(1) Existing algorithms that use homomorphic encryption algorithms to solve joint data outlier cleaning of multiple data sources have low computational efficiency and cannot meet a large number of data processing needs.
(2)现有基于LOF异常点检测算法的隐私保护数据清洗方案存在面对高维数据集时处理效率较低的问题。(2) The existing privacy-preserving data cleaning schemes based on the LOF outlier detection algorithm have the problem of low processing efficiency when faced with high-dimensional data sets.
针对以上技术存在问题,需要一种能够平衡计算效率与安全性的新的技术,能够改进传统同态加密算法的低效率和高能耗,还能够保证必要的数据隐私安全需求,同时为了更好地适应实际实施实例,还需要能够支持高维数据的处理。In view of the problems of the above technologies, a new technology that can balance computing efficiency and security is required, which can improve the low efficiency and high energy consumption of traditional homomorphic encryption algorithms, and can also ensure the necessary data privacy and security requirements. To adapt to actual implementation examples, it is also necessary to support the processing of high-dimensional data.
解决上述技术问题的意义:The significance of solving the above technical problems:
针对以上技术存在的问题进行改进之后,可以使算法更加适应实际使用环境,提升了实际使用效率,增加了算法的可实施度,能够更好地保护敏感数据的隐私安全。After improving the problems existing in the above technologies, the algorithm can be more adapted to the actual use environment, improve the actual use efficiency, increase the implementability of the algorithm, and better protect the privacy and security of sensitive data.
发明内容SUMMARY OF THE INVENTION
针对现有技术存在的问题,本发明提供了一种基于安全多方计算技术的数据异常点清洗方法。Aiming at the problems existing in the prior art, the present invention provides a method for cleaning abnormal data points based on a secure multi-party computing technology.
本发明是这样实现的,一种基于安全多方计算技术的数据异常点清洗方法,所述基于安全多方计算技术的数据异常点清洗方法包括:The present invention is implemented in the following way: a method for cleaning abnormal data points based on secure multi-party computing technology, and the method for cleaning abnormal data points based on secure multi-party computing technology includes:
第一步,将A与B两个参与方的数据统一为矩阵格式,拥有相同维度,并且最后一维为该条数据的AVF值;The first step is to unify the data of the two parties A and B into a matrix format, with the same dimensions, and the last dimension is the AVF value of the data;
第二步,参与方A与参与方B利用安全多方计算算法ABY中的Yao’s加密算法对数据矩阵进行加密;In the second step, participant A and participant B use Yao's encryption algorithm in the secure multi-party computation algorithm ABY to encrypt the data matrix;
第三步,服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗。In the third step, server A and server B perform data anomaly cleaning on the encrypted data set uploaded by each participant.
进一步,所述第一步参与方A与参与方B按照规定统一自有数据集格式:Further, in the first step, participant A and participant B unify their own data set formats according to regulations:
其中,D1表示参与方A的N×(M+1)的数据集矩阵,aij表示参与方A数据集中的任意数据,avfai表示参与方A第i条数据的AVF值,i∈[1,N],j∈[1,M],M,N∈N+;D2表示参与方B的P×(M+1)的数据集矩阵,bkj表示参与方B数据集中的任意数据,avfbk表示参与方A第k条数据的AVF值,k∈[1,P],j∈[1,M],M,P∈N+。其中两个参与方的数据维度相同。Among them, D 1 represents the N×(M+1) dataset matrix of participant A, a ij represents any data in participant A’s dataset, avf ai represents the AVF value of participant A’s ith data, i∈[ 1, N], j∈[1, M], M, N∈N + ; D 2 represents the P×(M+1) dataset matrix of participant B, and b kj represents any data in the dataset of participant B , avf bk represents the AVF value of the kth data of participant A, k∈[1, P], j∈[1, M], M, P∈N + . Two of the parties have the same data dimension.
进一步,所述第二步参与方A与参与方B按照规定加密自有数据集具体包括:Further, in the second step, participant A and participant B encrypt their own data sets according to the regulations, including:
1)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方A的数据集D1进行加密:1) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D 1 of the participant A:
其中,表示加密后的数据集交给服务器A的部分,表示加密后的数据集交给服务器B的部分,Enc表示Yao’s加密算法,D1表示参与方A的数据集;in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D 1 represents the data set of participant A;
具体按照下式加密每一个元素:Specifically, encrypt each element as follows:
其中,表示加密后的数据交给服务器A的部分,表示加密后的数据交给服务器B的部分,aij表示参与方A的任意数据;表示加密后的参与方A的第i条数据的AVF值交给服务器A的部分,表示加密后的参与方A的第i条数据的AVF值交给服务器B的部分,avfai表示参与方A的第i条数据的AVF值;in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and a ij represents any data of participant A; The part indicating that the encrypted AVF value of the i-th data of the participant A is handed over to the server A, Represents the part where the encrypted AVF value of the i-th data of the participant A is handed over to the server B, and avf ai represents the AVF value of the i-th data of the participant A;
2)利用下式表示加密后的参与方A的数据集:2) Use the following formula to represent the encrypted data set of participant A:
其中,X10表示服务器A持有的参与方A的加密数据集,X11表示服务器B持有的参与方A的加密数据集,i∈[1,N],j∈[1,M],M,N∈N+;Among them, X 10 represents the encrypted data set of participant A held by server A, X 11 represents the encrypted data set of participant A held by server B, i ∈ [1, N], j ∈ [1, M], M, N∈N + ;
3)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方B的数据集D2进行加密:3) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D 2 of the participant B:
其中,表示加密后的数据集交给服务器A的部分,表示加密后的数据集交给服务器B的部分,Enc表示Yao’s加密算法,D2表示参与方B的数据集;in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D 2 represents the data set of participant B;
具体按照下式加密每一个元素:Specifically, encrypt each element as follows:
其中,表示加密后的数据交给服务器A的部分,表示加密后的数据交给服务器B的部分,bkj表示参与方A的任意数据;表示加密后的参与方B的第k条数据的AVF值交给服务器A的部分,表示加密后的参与方B的第k条数据的AVF值交给服务器B的部分,avfbk表示参与方B的第k条数据的AVF值;in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and b kj represents any data of participant A; The part indicating that the AVF value of the k-th data of the encrypted participant B is handed over to the server A, Represents the part where the encrypted AVF value of the k-th data of the participant B is handed over to the server B, and avf bk represents the AVF value of the k-th data of the participant B;
4)利用下式表示加密后的参与方B的数据集:4) Use the following formula to represent the encrypted data set of participant B:
其中,X20表示服务器A持有的参与方B的加密数据集,X21表示服务器B持有的参与方B的加密数据集,k∈[1,P],j∈[1,M],M,P∈N+;Among them, X 20 represents the encrypted data set of participant B held by server A, X 21 represents the encrypted data set of participant B held by server B, k ∈ [1, P], j ∈ [1, M], M, P∈N + ;
5)参与方A与参与方B分别将加密后的数据上传至对应服务器。5) Participant A and Participant B respectively upload the encrypted data to the corresponding server.
进一步,所述第三步服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗具体包括:Further, in the third step, server A and server B perform data anomaly cleaning on the encrypted data set uploaded by each participant, which specifically includes:
1)服务器A提取自己拿到的参与方A的加密数据集中的最后一维数据:1) Server A extracts the last one-dimensional data in the encrypted data set of participant A obtained by itself:
服务器A使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A10进行排序:Server A sorts A 10 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′10=Sort(A10);A' 10 =Sort(A 10 );
其中,A10表示服务器A拥有的参与方A的加密数据集中最后一维数据,A′10表示A10按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法;Among them, A 10 represents the last one-dimensional data in the encrypted data set of participant A owned by server A, A' 10 represents the data after A 10 is sorted in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm;
以A10为基准将X10也同时排序,即按照A10降序排列X10,排序完成后:Sort X 10 at the same time based on A 10 , that is, arrange X 10 in descending order of A 10. After sorting is completed:
其中,X′10为以X10最后一维数据,即A10为基准降序排序完成后的参与方A提交给服务器A的数据集,i∈[1,N],j∈[1,M],M,N∈N+;Among them, X′ 10 is the data set submitted by participant A to server A after the last one-dimensional data of X 10 , that is, A 10 is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N∈N + ;
规定一个固定值Thre,表示AVF值在正常范围内的阈值,将A′10中的数据,按顺序与Thre比较大小:A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 10 is compared with Thre in order:
Resi=Comp(A′10i,Thre);Res i =Comp(A' 10i , Thre);
其中,A′10i表示A′10中的元素,i∈[1,N],N∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resi表示A′10i与Thre比较的结果,若Resi值为1,表示A′10i≥Thre;若Resi值为0,表示A′10i<Thre,将A′10中的数据,按顺序与Thre比较大小,直到Resi=0,停止比较,将X′10中的前i行数据保留:Among them, A' 10i represents the element in A' 10 , i∈[1, N], N∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res i represents the result of comparing A' 10i with Thre , if the value of Res i is 1, it means that A' 10i ≥ Thre; if the value of Res i is 0, it means that A' 10i <Thre, compare the data in A' 10 with Thre in order, until Res i =0, Stop the comparison and keep the first i rows of data in X' 10 :
其中,I=i,为排序之后保留的前i行数据,j∈[1,M],M∈N+,X″10为最终数据清洗完成后服务器A所拥有的参与方A的数据集;Among them, I=i, is the first i row data retained after sorting, j∈[1, M], M∈N + , X″ 10 is the data set of participant A owned by server A after the final data cleaning is completed;
2)服务器A提取自己拿到的参与方B的加密数据集中的最后一维数据:2) Server A extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself:
服务器A使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A20进行排序:Server A sorts A 20 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′20=Sort(A20);A' 20 =Sort(A 20 );
其中,A20表示服务器A拥有的参与方B的加密数据集中最后一维数据,A′20表示A20按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法;Among them, A 20 represents the last one-dimensional data in the encrypted data set of participant B owned by server A, A' 20 represents the data after A 20 is sorted in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm;
以A20为基准将X20也同时排序,即按照A20降序排列X20,排序完成后:Sort X 20 at the same time based on A 20 , that is, sort X 20 in descending order of A 20. After sorting is completed:
其中,X′20为以X20最后一维数据,即A20为基准降序排序完成后的参与方B提交给服务器A的数据集,k∈[1,P],j∈[1,M],M,P∈N+;Among them, X′ 20 is the data set submitted by participant B to server A after the last one-dimensional data of X 20 , that is, A 20 is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P∈N + ;
规定一个固定值Thre,表示AVF值在正常范围内的阈值,将A′20中的数据,按顺序与Thre比较大小:A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 20 is compared with Thre in order:
Resk=Comp(A′20k,Thre);Res k =Comp(A' 20k , Thre);
其中,A′20k表示A′20中的元素,k∈[1,P],P∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resk表示A′20k与Thre比较的结果,若Resk值为1,表示A′20k≥Thre;若Resk值为0,表示A′20k<Thre,将A′20中的数据,按顺序与Thre比较大小,直到Resk=0,停止比较,将X′20中的前k行数据保留:Among them, A' 20k represents the element in A' 20 , k∈[1, P], P∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res k represents the result of comparing A' 20k with Thre , if the value of Res k is 1, it means that A' 20k ≥ Thre; if the value of Res k is 0, it means that A' 20k <Thre, compare the data in A' 20 with Thre in order until Res k =0, Stop the comparison and keep the first k rows of data in X' 20 :
其中,K=k,为排序之后保留的前k行数据,j∈[1,M],M∈N+,X″20为最终数据清洗完成后服务器A所拥有的参与方B的数据集;Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N + , X″ 20 is the data set of participant B owned by server A after the final data cleaning is completed;
3)服务器B提取自己拿到的参与方A的加密数据集中的最后一维数据:3) Server B extracts the last one-dimensional data in the encrypted data set of participant A obtained by itself:
服务器B使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A11进行排序:Server B sorts A 11 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′11=Sort(A11);A' 11 =Sort(A 11 );
其中,A11表示服务器B拥有的参与方A的加密数据集中最后一维数据,A′11表示A11按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法;Among them, A 11 represents the last one-dimensional data in the encrypted data set of participant A owned by server B, A' 11 represents the data after A 11 is sorted in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm;
以A11为基准将X11也同时排序,即按照A11降序排列X11,排序完成后:Sort X 11 at the same time based on A 11 , that is, sort X 11 in descending order of A 11. After sorting is completed:
其中,X′11为以X11最后一维数据,即A11为基准降序排序完成后的参与方A提交给服务器B的数据集,i∈[1,N],j∈[1,M],M,N∈N+;Among them, X′ 11 is the data set submitted by participant A to server B after the last one-dimensional data of X 11 , that is, A 11 is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N∈N + ;
规定一个固定值Thre,表示AVF值在正常范围内的阈值,将A′11中的数据,按顺序与Thre比较大小:A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 11 is compared with Thre in order:
Resi=Comp(A′11i,Thre);Res i =Comp(A' 11i , Thre);
其中,A′11i表示A′11中的元素,i∈[1,N],N∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resi表示A′11i与Thre比较的结果,若Resi值为1,表示A′11i≥Thre;若Resi值为0,表示A′11i<Thre,将A′11中的数据,按顺序与Thre比较大小,直到Resi=0,停止比较,将X′11中的前i行数据保留:Among them, A' 11i represents the element in A' 11 , i∈[1, N], N∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res i represents the result of comparing A' 11i with Thre , if the value of Res i is 1, it means that A' 11i ≥ Thre; if the value of Res i is 0, it means that A' 11i <Thre, compare the data in A' 11 with Thre in order, until Res i =0, Stop the comparison and keep the first i rows of data in X'11 :
其中,I=i,为排序之后保留的前i行数据,j∈[1,M],M∈N+,X″11为最终数据清洗完成后服务器B所拥有的参与方A的数据集;Among them, I=i, is the first i row data retained after sorting, j∈[1, M], M∈N + , X″ 11 is the data set of participant A owned by server B after the final data cleaning is completed;
4)服务器B提取自己拿到的参与方B的加密数据集中的最后一维数据:4) Server B extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself:
服务器B使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A21进行排序:Server B sorts A 21 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′21=Sort(A21);A' 21 =Sort(A 21 );
其中,A21表示服务器B拥有的参与方B的加密数据集中最后一维数据,A′21表示A21按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法。Among them, A 21 represents the last one-dimensional data in the encrypted data set of participant B owned by server B, A' 21 represents the data sorted by A 21 in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.
以A21为基准将X21也同时排序,即按照A21降序排列X21,排序完成后:Sort X 21 at the same time based on A 21 , that is, sort X 21 in descending order of A 21 , after sorting is completed:
其中,X′21为以X21最后一维数据,即A21为基准降序排序完成后的参与方B提交给服务器B的数据集,k∈[1,P],j∈[1,M],M,P∈N+。Among them, X′ 21 is the data set submitted by participant B to server B after the last one-dimensional data of X 21 , that is, A 21 is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P ∈ N + .
规定一个固定值Thre,表示AVF值在正常范围内的阈值,将A′21中的数据,按顺序与Thre比较大小:A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 21 is compared with Thre in order:
Resk=Comp(A′21k,Thre);Res k =Comp(A' 21k , Thre);
其中,A′21k表示A′21中的元素,k∈[1,P],P∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resk表示A′21k与Thre比较的结果,若Resk值为1,表示A′21k≥Thre;若Resk值为0,表示A′21k<Thre,将A′21中的数据,按顺序与Thre比较大小,直到Resk=0,停止比较,将X′21中的前k行数据保留:Among them, A' 21k represents the element in A' 21 , k∈[1, P], P∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res k represents the result of comparing A' 21k with Thre , if the value of Res k is 1, it means that A' 21k ≥ Thre; if the value of Res k is 0, it means that A' 21k <Thre, compare the data in A' 21 with Thre in order, until Res k =0, Stop the comparison and keep the first k rows of data in X'21 :
其中,K=k,为排序之后保留的前k行数据,j∈[1,M],M∈N+,X″21为最终数据清洗完成后服务器B所拥有的参与方B的数据集;Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N + , X″ 21 is the data set of participant B owned by server B after the final data cleaning is completed;
5)最终得到的X″10,X″11,X″20,X″21为最终数据清洗完成后的数据集。5) The finally obtained X″ 10 , X″ 11 , X″ 20 , and X″ 21 are the data set after the final data cleaning is completed.
本发明的另一目的在于提供一种应用所述基于安全多方计算技术的数据异常点清洗方法的机器学习系统。Another object of the present invention is to provide a machine learning system applying the method for cleaning abnormal data points based on the secure multi-party computing technology.
综上所述,本发明的优点及积极效果为:本发明结合安全多方计算技术和AVF异常值检测算法,利用现有的安全多方计算工具ABY算法,实现了对高维数据的高效检测,并且在保证一定效率的前提下利用安全多方计算技术中的Yao’s加密算法保证了各方数据隐私相当的安全性。To sum up, the advantages and positive effects of the present invention are as follows: the present invention combines the secure multi-party computing technology and the AVF outlier detection algorithm, and utilizes the existing secure multi-party computing tool ABY algorithm to achieve efficient detection of high-dimensional data, and Under the premise of ensuring a certain efficiency, the Yao's encryption algorithm in the secure multi-party computing technology is used to ensure the security of the data privacy of all parties.
表1技术性能对比Table 1 Technical performance comparison
附图说明Description of drawings
图1是本发明实施例提供的基于安全多方计算技术的数据异常点清洗方法流程图。FIG. 1 is a flowchart of a method for cleaning abnormal data points based on a secure multi-party computing technology provided by an embodiment of the present invention.
图2是本发明实施例提供的实施例的场景示意图。FIG. 2 is a schematic diagram of a scenario of an embodiment provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
针对现有利用同态加密算法解决多数据源联合数据异常点清洗的算法,计算效率较低,不能满足大量的数据处理需求;现有基于LOF异常点检测算法的隐私保护数据清洗方案存在面对高维数据集时处理效率较低的问题。本发明主要用于实现联合数据源环境下的安全数据异常点清洗算法;基于安全多方计算技术,实现了隐私保护前提下的多个数据源联合机器学习场景下的数据异常点清洗工作。In view of the existing algorithms that use homomorphic encryption algorithm to solve the multi-data source joint data outlier cleaning, the computational efficiency is low and cannot meet the needs of a large number of data processing; the existing privacy protection data cleaning solutions based on the LOF outlier detection algorithm face the Inefficient processing of high-dimensional datasets. The invention is mainly used to realize the safe data abnormal point cleaning algorithm in the joint data source environment; based on the secure multi-party computing technology, the data abnormal point cleaning work under the joint machine learning scenario of multiple data sources under the premise of privacy protection is realized.
下面结合附图对本发明的应用原理作详细的描述。The application principle of the present invention will be described in detail below with reference to the accompanying drawings.
如图1所示,本发明实施例提供的基于安全多方计算技术的数据异常点清洗方法包括以下步骤:As shown in FIG. 1 , the method for cleaning abnormal data points based on the secure multi-party computing technology provided by the embodiment of the present invention includes the following steps:
S101:参与方A与参与方B按照规定统一自有数据集格式:将A与B两个参与方的数据统一为矩阵格式,拥有相同维度,并且最后一维为该条数据的Attribute ValueFrequency(AVF)值;S101: Participant A and Participant B unify their own data set formats according to the regulations: unify the data of the two parties A and B into a matrix format with the same dimensions, and the last dimension is the Attribute Value Frequency (AVF )value;
S102:参与方A与参与方B按照规定加密自有数据集:参与方A与参与方B利用安全多方计算算法ABY中的Yao’s加密算法对数据矩阵进行加密;S102: Participant A and Participant B encrypt their own data sets according to regulations: Participant A and Participant B encrypt the data matrix using Yao's encryption algorithm in the secure multi-party computing algorithm ABY;
S103:服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗。S103: Server A and server B perform data anomaly point cleaning on the encrypted data set uploaded by each participant.
下面结合具体实施例对本发明的应用原理作进一步的描述。The application principle of the present invention will be further described below with reference to specific embodiments.
本发明实施例提供的基于安全多方计算技术的数据异常点清洗方法具体包括以下步骤:The method for cleaning abnormal data points based on the secure multi-party computing technology provided by the embodiment of the present invention specifically includes the following steps:
步骤一,参与方A与参与方B按照规定统一自有数据集格式:Step 1, Participant A and Participant B unify their own data set format according to the regulations:
其中,D1表示参与方A的N×(M+1)的数据集矩阵,aij表示参与方A数据集中的任意数据,avfai表示参与方A第i条数据的Attribute Value Frequency(AVF)值,i∈[1,N],j∈[1,M],M,N∈N+;D2表示参与方B的P×(M+1)的数据集矩阵,bkj表示参与方B数据集中的任意数据,avfbk表示参与方A第k条数据的Attribute Value Frequency(AVF)值,k∈[1,P],j∈[1,M],M,P∈N+。其中两个参与方的数据维度(即M的值)相同。Among them, D 1 represents the N×(M+1) data set matrix of the participant A, a ij represents any data in the data set of the participant A, and avf ai represents the Attribute Value Frequency (AVF) of the i-th data of the participant A Value, i∈[1, N], j∈[1, M], M, N∈N + ; D 2 represents the P×(M+1) dataset matrix of participant B, and b kj represents participant B Arbitrary data in the dataset, avf bk represents the Attribute Value Frequency (AVF) value of the kth data of participant A, k∈[1, P], j∈[1, M], M, P∈N + . The data dimension (ie, the value of M) of the two parties is the same.
步骤二,参与方A与参与方B按照规定加密自有数据集:Step 2, Participant A and Participant B encrypt their own data sets according to the regulations:
2a)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方A的数据集D1进行加密:2a) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D 1 of the participant A:
其中,表示加密后的数据集交给服务器A的部分,表示加密后的数据集交给服务器B的部分,Enc表示Yao’s加密算法,D1表示参与方A的数据集。in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D 1 represents the data set of participant A.
具体按照下式加密每一个元素:Specifically, encrypt each element as follows:
其中,表示加密后的数据交给服务器A的部分,表示加密后的数据交给服务器B的部分,aij表示参与方A的任意数据;表示加密后的参与方A的第i条数据的AVF值交给服务器A的部分,表示加密后的参与方A的第i条数据的AVF值交给服务器B的部分,avfai表示参与方A的第i条数据的AVF值。in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and a ij represents any data of participant A; The part indicating that the encrypted AVF value of the i-th data of the participant A is handed over to the server A, The AVF value representing the ith piece of data of the encrypted participant A is handed over to the server B, and avf ai represents the AVF value of the ith piece of data of the participant A.
2b)利用下式表示加密后的参与方A的数据集:2b) Use the following formula to represent the encrypted data set of participant A:
其中,X10表示服务器A持有的参与方A的加密数据集,X11表示服务器B持有的参与方A的加密数据集,i∈[1,N],j∈[1,M],M,N∈N+。Among them, X 10 represents the encrypted data set of participant A held by server A, X 11 represents the encrypted data set of participant A held by server B, i ∈ [1, N], j ∈ [1, M], M, N∈N + .
2c)利用安全多方计算加密ABY算法中的Yao’s加密算法对参与方B的数据集D2进行加密:2c) Use Yao's encryption algorithm in the secure multi-party computation encryption ABY algorithm to encrypt the data set D 2 of the participant B:
其中,表示加密后的数据集交给服务器A的部分,表示加密后的数据集交给服务器B的部分,Enc表示Yao’s加密算法,D2表示参与方B的数据集。in, Indicates the part where the encrypted data set is handed over to server A, Indicates that the encrypted data set is handed over to server B, Enc represents Yao's encryption algorithm, and D 2 represents the data set of participant B.
具体按照下式加密每一个元素:Specifically, encrypt each element as follows:
其中,表示加密后的数据交给服务器A的部分,表示加密后的数据交给服务器B的部分,bkj表示参与方A的任意数据;表示加密后的参与方B的第k条数据的AVF值交给服务器A的部分,表示加密后的参与方B的第k条数据的AVF值交给服务器B的部分,avfbk表示参与方B的第k条数据的AVF值。in, Indicates the part where the encrypted data is handed over to server A, Indicates that the encrypted data is handed over to server B, and b kj represents any data of participant A; The part indicating that the AVF value of the k-th data of the encrypted participant B is handed over to the server A, The AVF value representing the k-th piece of data of the encrypted participant B is handed over to the server B, and avf bk represents the AVF value of the k-th piece of data of the participant B.
2d)利用下式表示加密后的参与方B的数据集:2d) Use the following formula to represent the encrypted data set of Party B:
其中,X20表示服务器A持有的参与方B的加密数据集,X21表示服务器B持有的参与方B的加密数据集,k∈[1,P],j∈[1,M],M,P∈N+。Among them, X 20 represents the encrypted data set of participant B held by server A, X 21 represents the encrypted data set of participant B held by server B, k ∈ [1, P], j ∈ [1, M], M, P∈N + .
2e)参与方A与参与方B分别将加密后的数据上传至对应服务器。2e) Participant A and Participant B respectively upload the encrypted data to the corresponding server.
步骤三,服务器A与服务器B对各参与方上传的加密数据集进行数据异常点清洗:Step 3, server A and server B clean the encrypted data set uploaded by each participant for abnormal data points:
3a)服务器A提取自己拿到的参与方A的加密数据集中的最后一维数据,即:3a) Server A extracts the last one-dimensional data in the encrypted data set of Participant A obtained by itself, namely:
服务器A使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A10进行排序:Server A sorts A 10 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′10=Sort(A10);A' 10 =Sort(A 10 );
其中,A10表示服务器A拥有的参与方A的加密数据集中最后一维数据,A′10表示A10按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法。Among them, A 10 represents the last one-dimensional data in the encrypted data set of participant A owned by server A, A' 10 represents the data sorted by A 10 in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.
以A10为基准将X10也同时排序,即按照A10降序排列X10,排序完成后:Sort X 10 at the same time based on A 10 , that is, arrange X 10 in descending order of A 10. After sorting is completed:
其中,X′10为以X10最后一维数据,即A10为基准降序排序完成后的参与方A提交给服务器A的数据集,i∈[1,N],j∈[1,M],M,N∈N+。Among them, X′ 10 is the data set submitted by participant A to server A after the last one-dimensional data of X 10 , that is, A 10 is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N ∈ N + .
规定一个固定值Thre,表示AVF值在正常范围内的阈值,将A′10中的数据,按顺序与Thre比较大小:A fixed value Thre is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 10 is compared with Thre in order:
Resi=Comp(A′10i,Thre);Res i =Comp(A' 10i , Thre);
其中,A′10i表示A′10中的元素,i∈[1,N],N∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resi表示A′10i与Thre比较的结果,若Resi值为1,表示A′10i≥Thre;若Resi值为0,表示A′10i<Thre,将A′10中的数据,按顺序与Thre比较大小,直到Resi=0,停止比较,将X′10中的前i行数据保留:Among them, A' 10i represents the element in A' 10 , i∈[1, N], N∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res i represents the result of comparing A' 10i with Thre , if the value of Res i is 1, it means that A' 10i ≥ Thre; if the value of Res i is 0, it means that A' 10i <Thre, compare the data in A' 10 with Thre in order, until Res i =0, Stop the comparison and keep the first i rows of data in X' 10 :
其中,I=i,为排序之后保留的前i行数据,j∈[1,M},M∈N+,X″10为最终数据清洗完成后服务器A所拥有的参与方A的数据集。Among them, I=i, is the first i row of data retained after sorting, j∈[1, M}, M∈N + , X″ 10 is the data set of participant A owned by server A after the final data cleaning is completed.
3b)服务器A提取自己拿到的参与方B的加密数据集中的最后一维数据,即:3b) Server A extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself, namely:
服务器A使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A20进行排序:Server A sorts A 20 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′20=Sort(A20);A' 20 =Sort(A 20 );
其中,A20表示服务器A拥有的参与方B的加密数据集中最后一维数据,A′20表示A20按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法。Among them, A 20 represents the last one-dimensional data in the encrypted data set of participant B owned by server A, A' 20 represents the data sorted by A 20 in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.
以A20为基准将X20也同时排序,即按照A20降序排列X20,排序完成后:Sort X 20 at the same time based on A 20 , that is, sort X 20 in descending order of A 20. After sorting is completed:
其中,X′20为以X20最后一维数据,即A20为基准降序排序完成后的参与方B提交给服务器A的数据集,k∈[1,P],j∈[1,M],M,P∈N+。Among them, X′ 20 is the data set submitted by participant B to server A after the last one-dimensional data of X 20 , that is, A 20 is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P ∈ N + .
规定一个固定值Thre(同上文Thre),表示AVF值在正常范围内的阈值,将A′20 A fixed value Thre (same as Thre above) is specified, indicating the threshold value of the AVF value within the normal range, and A′ 20
中的数据,按顺序与Thre比较大小:The data in , compare the size with Thre in order:
Resk=Comp(A′20k,Thre);Res k =Comp(A' 20k , Thre);
其中,A′20k表示A′20中的元素,k∈[1,P],P∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resk表示A′20k与Thre比较的结果,若Resk值为1,表示A′20k≥Thre;若Resk值为0,表示A′20k<Thre,将A′20中的数据,按顺序与Thre比较大小,直到Resk=0,停止比较,将X′20中的前k行数据保留:Among them, A' 20k represents the element in A' 20 , k∈[1, P], P∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res k represents the result of comparing A' 20k with Thre , if the value of Res k is 1, it means that A' 20k ≥ Thre; if the value of Res k is 0, it means that A' 20k <Thre, compare the data in A' 20 with Thre in order until Res k =0, Stop the comparison and keep the first k rows of data in X' 20 :
其中,K=k,为排序之后保留的前k行数据,j∈[1,M],M∈N+,X″20为最终数据清洗完成后服务器A所拥有的参与方B的数据集。Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N + , X″ 20 is the data set of participant B owned by server A after the final data cleaning is completed.
3c)服务器B提取自己拿到的参与方A的加密数据集中的最后一维数据,即:3c) Server B extracts the last one-dimensional data in the encrypted data set of Participant A obtained by itself, namely:
服务器B使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A11进行排序:Server B sorts A 11 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′11=Sort(A11);A' 11 =Sort(A 11 );
其中,A11表示服务器B拥有的参与方A的加密数据集中最后一维数据,A′11表示A11按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法。Among them, A 11 represents the last one-dimensional data in the encrypted data set of participant A owned by server B, A' 11 represents the data sorted by A 11 in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.
以A11为基准将X11也同时排序,即按照A11降序排列X11,排序完成后:Sort X 11 at the same time based on A 11 , that is, sort X 11 in descending order of A 11. After sorting is completed:
其中,X′11为以X11最后一维数据,即A11为基准降序排序完成后的参与方A提交给服务器B的数据集,i∈[1,N],j∈[1,M],M,N∈N+。Among them, X′ 11 is the data set submitted by participant A to server B after the last one-dimensional data of X 11 , that is, A 11 is the benchmark after the descending sorting is completed, i∈[1, N], j∈[1, M] , M, N ∈ N + .
规定一个固定值Thre(同上文Thre),表示AVF值在正常范围内的阈值,将A′11中的数据,按顺序与Thre比较大小:A fixed value Thre (same as Thre above) is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 11 is compared with Thre in order:
Resi=Comp(A′11i,Thre);Res i =Comp(A' 11i , Thre);
其中,A′11i表示A′11中的元素,k∈[1,N],P∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resi表示A′11i与Thre比较的结果,若Resi值为1,表示A′11i≥Thre;若Resi值为0,表示A′11i<Thre,将A′11中的数据,按顺序与Thre比较大小,直到Resi=0,停止比较,将X′11中的前i行数据保留:Among them, A' 11i represents the element in A' 11 , k∈[1, N], P∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res i represents the result of comparing A' 11i with Thre , if the value of Res i is 1, it means that A' 11i ≥ Thre; if the value of Res i is 0, it means that A' 11i <Thre, compare the data in A' 11 with Thre in order, until Res i =0, Stop the comparison and keep the first i rows of data in X'11 :
其中,I=i,为排序之后保留的前i行数据,j∈[1,M],M∈N+,X″11为最终数据清洗完成后服务器B所拥有的参与方A的数据集。Among them, I=i, is the first i row data retained after sorting, j∈[1, M], M∈N + , X″ 11 is the data set of participant A owned by server B after the final data cleaning is completed.
3d)服务器B提取自己拿到的参与方B的加密数据集中的最后一维数据,即:3d) Server B extracts the last one-dimensional data in the encrypted data set of participant B obtained by itself, namely:
服务器B使用安全加密算法ABY中的Yao’s加密算法中的排序算法对A21进行排序:Server B sorts A 21 using the sorting algorithm in Yao's encryption algorithm in secure encryption algorithm ABY:
A′21=Sort(A21);A' 21 =Sort(A 21 );
其中,A21表示服务器B拥有的参与方B的加密数据集中最后一维数据,A′21表示A21按照降序排序完成后的数据,Sort()表示Yao’s加密算法中的排序算法。Among them, A 21 represents the last one-dimensional data in the encrypted data set of participant B owned by server B, A' 21 represents the data sorted by A 21 in descending order, and Sort() represents the sorting algorithm in Yao's encryption algorithm.
以A21为基准将X21也同时排序,即按照A21降序排列X21,排序完成后:Sort X 21 at the same time based on A 21 , that is, sort X 21 in descending order of A 21 , after sorting is completed:
其中,X′21为以X21最后一维数据,即A21为基准降序排序完成后的参与方B提交给服务器B的数据集,k∈[1,P],j∈[1,M],M,P∈N+。Among them, X′ 21 is the data set submitted by participant B to server B after the last one-dimensional data of X 21 , that is, A 21 is the benchmark after the descending sorting is completed, k∈[1, P], j∈[1, M] , M, P ∈ N + .
规定一个固定值Thre(同上文Thre),表示AVF值在正常范围内的阈值,将A′21中的数据,按顺序与Thre比较大小:A fixed value Thre (same as Thre above) is specified, indicating the threshold value of the AVF value within the normal range, and the data in A' 21 is compared with Thre in order:
Resk=Comp(A′21k,Thre);Res k =Comp(A' 21k , Thre);
其中,A′21k表示A′21中的元素,k∈[1,P],P∈N+,Comp()表示Yao’s加密算法中的比较大小算法,Resk表示A′21k与Thre比较的结果,若Resk值为1,表示A′21k≥Thre;若Resk值为0,表示A′21k<Thre,将A′21k中的数据,按顺序与Thre比较大小,直到Resk=0,停止比较,将X′21中的前k行数据保留:Among them, A' 21k represents the element in A' 21 , k∈[1, P], P∈N + , Comp() represents the comparison size algorithm in Yao's encryption algorithm, Res k represents the result of comparing A' 21k with Thre , if the value of Res k is 1, it means that A' 21k ≥ Thre; if the value of Res k is 0, it means that A' 21k <Thre, compare the data in A' 21k with Thre in order, until Res k =0, Stop the comparison and keep the first k rows of data in X'21 :
其中,K=k,为排序之后保留的前k行数据,j∈[1,M],M∈N+,X″21为最终数据清洗完成后服务器B所拥有的参与方B的数据集。Among them, K=k, is the first k rows of data retained after sorting, j∈[1, M], M∈N + , X″ 21 is the data set of participant B owned by server B after the final data cleaning is completed.
3e)最终得到的X″10,X″11,X″20,X″21为最终数据清洗完成后的数据集。3e) The finally obtained X″ 10 , X″ 11 , X″ 20 , and X″ 21 are the data set after the final data cleaning is completed.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156492.6A CN109992977B (en) | 2019-03-01 | 2019-03-01 | A data anomaly point cleaning method based on secure multi-party computing technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156492.6A CN109992977B (en) | 2019-03-01 | 2019-03-01 | A data anomaly point cleaning method based on secure multi-party computing technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109992977A true CN109992977A (en) | 2019-07-09 |
CN109992977B CN109992977B (en) | 2022-12-16 |
Family
ID=67130167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910156492.6A Active CN109992977B (en) | 2019-03-01 | 2019-03-01 | A data anomaly point cleaning method based on secure multi-party computing technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992977B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046409A (en) * | 2019-12-16 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Private data multi-party security calculation method and system |
CN111125735A (en) * | 2019-12-20 | 2020-05-08 | 支付宝(杭州)信息技术有限公司 | Method and system for model training based on private data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102488A1 (en) * | 2012-01-02 | 2013-07-11 | Telecom Italia S.P.A. | Method and system for comparing images |
CN108712260A (en) * | 2018-05-09 | 2018-10-26 | 曲阜师范大学 | The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment |
CN108809628A (en) * | 2018-06-13 | 2018-11-13 | 哈尔滨工业大学深圳研究生院 | Based on the time series method for detecting abnormality and system under Secure |
-
2019
- 2019-03-01 CN CN201910156492.6A patent/CN109992977B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102488A1 (en) * | 2012-01-02 | 2013-07-11 | Telecom Italia S.P.A. | Method and system for comparing images |
CN108712260A (en) * | 2018-05-09 | 2018-10-26 | 曲阜师范大学 | The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment |
CN108809628A (en) * | 2018-06-13 | 2018-11-13 | 哈尔滨工业大学深圳研究生院 | Based on the time series method for detecting abnormality and system under Secure |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046409A (en) * | 2019-12-16 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Private data multi-party security calculation method and system |
CN111125735A (en) * | 2019-12-20 | 2020-05-08 | 支付宝(杭州)信息技术有限公司 | Method and system for model training based on private data |
Also Published As
Publication number | Publication date |
---|---|
CN109992977B (en) | 2022-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mehmood et al. | Protection of big data privacy | |
CN106649587B (en) | High-security desensitization method based on big data information system | |
Wang et al. | Search in my way: Practical outsourced image retrieval framework supporting unshared key | |
CN107147720B (en) | Traceable effective public auditing method and traceable effective public auditing system in cloud storage data sharing | |
US10635824B1 (en) | Methods and apparatus for private set membership using aggregation for reduced communications | |
Wang et al. | A system framework of security management in enterprise systems | |
He et al. | Secure logistic regression for vertical federated learning | |
CN110011810A (en) | Blockchain Anonymous Signature Method Based on Linkable Ring Signature and Multi-signature | |
CN107196967B (en) | A kind of logistics big data information security access control system | |
CN113434898B (en) | A non-interactive privacy-preserving logistic regression federated training method and system | |
CN114548418A (en) | A Horizontal Federation IV Algorithm Based on Secret Sharing | |
CN111259440A (en) | Privacy protection decision tree classification method for cloud outsourcing data | |
CN108197491A (en) | A kind of subgraph search method based on ciphertext | |
CN114218322A (en) | Data display method, device, equipment and medium based on ciphertext transmission | |
CN109344637B (en) | A searchable and privacy-preserving data-sharing cloud-assisted e-health system | |
CN114528331A (en) | Data query method, device, medium and equipment based on block chain | |
CN107592298A (en) | A kind of sequence comparison algorithm based on single server model safely outsourced method, user terminal and server | |
CN109992977A (en) | A data anomaly point cleaning method based on secure multi-party computing technology | |
CN111159727B (en) | Multi-party cooperation oriented Bayes classifier safety generation system and method | |
CN110555783A (en) | block chain-based power marketing data protection method and system | |
CN104283930B (en) | Keyword search system for security index and method for establishing the system | |
CN114510734B (en) | Data access control method, device and computer readable storage medium | |
CN114793237B (en) | Smart city data sharing method, equipment and medium based on block chain technology | |
CN116894051A (en) | Efficient hidden trace query method based on hash PSI in federal learning | |
CN116248289A (en) | Access Control Method for Industrial Internet Identity Resolution Based on Ciphertext Attribute Encryption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |