TWI738333B - Method and device for multi-party joint feature evaluation for protecting privacy and safety - Google Patents

Method and device for multi-party joint feature evaluation for protecting privacy and safety Download PDF

Info

Publication number
TWI738333B
TWI738333B TW109115723A TW109115723A TWI738333B TW I738333 B TWI738333 B TW I738333B TW 109115723 A TW109115723 A TW 109115723A TW 109115723 A TW109115723 A TW 109115723A TW I738333 B TWI738333 B TW I738333B
Authority
TW
Taiwan
Prior art keywords
sample
encrypted
exchange information
sample set
feature
Prior art date
Application number
TW109115723A
Other languages
Chinese (zh)
Other versions
TW202123049A (en
Inventor
陸夢倩
汲小溪
王維強
Original Assignee
大陸商支付寶(杭州)信息技術有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商支付寶(杭州)信息技術有限公司 filed Critical 大陸商支付寶(杭州)信息技術有限公司
Publication of TW202123049A publication Critical patent/TW202123049A/en
Application granted granted Critical
Publication of TWI738333B publication Critical patent/TWI738333B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

本說明書實施例提供了保護隱私安全的多方聯合進行特徵評估的方法和裝置。該多方至少包括儲存有第一樣本集的第一設備和儲存有第二樣本集的第二設備,該方法應用於第一設備;該方法包括:對第一樣本集中各樣本的初始ID進行加密,並將得到的第一樣本集的第一次加密ID和標籤發送給第二設備;從第二設備接收第二樣本集的第一次加密ID和所在分箱的標識,以及第一樣本集的第二次加密ID和標籤;對第二樣本集的第一次加密ID進行加密,得到第二樣本集的第二次加密ID;根據第二樣本集的第二次加密ID和第一樣本集的第二加密ID判定共有樣本;根據共有樣本的標籤、所在分箱的標識計算特徵的資訊價值,以針對機器學習模型進行特徵選擇。The embodiment of this specification provides a method and device for multi-party joint feature evaluation to protect privacy and security. The multi-party includes at least a first device storing a first sample set and a second device storing a second sample set. The method is applied to the first device; the method includes: assigning the initial ID of each sample in the first sample set Encrypt, and send the obtained first encrypted ID and label of the first sample set to the second device; receive the first encrypted ID of the second sample set and the identification of the bin where it is located from the second device, and the first encrypted ID and label of the first sample set. The second encrypted ID and label of the same set; the first encrypted ID of the second sample set is encrypted to obtain the second encrypted ID of the second sample set; according to the second encrypted ID of the second sample set Determine the shared sample with the second encrypted ID of the first sample set; calculate the information value of the feature according to the label of the shared sample and the identification of the bin where it is located, so as to perform feature selection for the machine learning model.

Description

保護隱私安全的多方聯合進行特徵評估的方法及裝置Method and device for multi-party joint feature evaluation for protecting privacy and safety

本說明書一個或多個實施例涉及電腦資訊處理領域,尤其涉及一種保護隱私安全的多方聯合進行特徵評估的方法及裝置。One or more embodiments of this specification relate to the field of computer information processing, and more particularly to a method and device for multi-party joint feature evaluation to protect privacy and security.

機器學習所需要的資料往往會涉及到多個領域。例如在基於機器學習的商家分類分析場景中,電子支付平台擁有商家的交易流水資料,電子商務平台儲存有商家的銷售資料,銀行機構擁有商家的借貸資料。資料往往以孤島的形式存在。由於行業競爭、資料安全、使用者隱私等問題,資料整合面臨著很大阻力,將分散在各個平台的資料整合在一起訓練機器學習模型難以實現。在保證資料不洩露的前提下,使用多方資料聯合訓練機器學習模型變成目前的一大挑戰。為此,提出有聯邦學習(Federated Learning)方案。 通常,利用聯邦學習(Federated Learning)演算法訓練機器學習模型需要標籤相關特徵,因此,聯邦學習的第一步是進行特徵篩選。目前,較為常用的特徵篩選方案為計算特徵的資訊價值(Information Value,IV),以此來評估該特徵和標籤的相關性。計算特徵的資訊價值需要用到標籤和特徵資料。其中,計算非標籤持有方的特徵的資訊價值需要標籤持有方的標籤資料,但標籤持有方通常不願意直接向非標籤持有方透露的標籤和使用者的對應關係(即黑白名單庫)。並且,非標籤持有方也不願意把其使用者和特徵資料透露給標籤持有方。 另外,利用聯邦學習(Federated Learning)需要各平台共有的使用者,以進行聯合訓練。 而對於任一方而言,使用者以及使用者與標籤(或特徵)的對應關係都為隱私資料。因此,需要一種能夠在各方未知其他方的使用者的情況下,以及在標籤和特徵資料隔離的情況下,計算特徵的資訊價值的方案。The materials needed for machine learning often involve multiple fields. For example, in a business classification analysis scenario based on machine learning, the electronic payment platform owns the merchant’s transaction flow data, the e-commerce platform stores the merchant’s sales data, and the banking institution owns the merchant’s loan data. Data often exists in the form of isolated islands. Due to industry competition, data security, user privacy and other issues, data integration is facing great resistance. It is difficult to integrate data scattered on various platforms to train machine learning models. Under the premise of ensuring that data is not leaked, the use of multi-party data to jointly train machine learning models has become a major challenge at present. To this end, a Federated Learning program is proposed. Generally, using Federated Learning algorithms to train machine learning models requires label-related features. Therefore, the first step of federated learning is to perform feature screening. At present, the most commonly used feature screening scheme is to calculate the information value (IV) of a feature to evaluate the correlation between the feature and the label. To calculate the information value of a feature, tags and feature data are needed. Among them, the calculation of the information value of the characteristics of the non-label holder requires the label data of the label holder, but the label holder is usually unwilling to directly disclose the correspondence between the label and the user (that is, the black and white list). Library). Moreover, non-tag holders are unwilling to disclose their user and characteristic information to the tag holder. In addition, the use of Federated Learning requires users shared by all platforms for joint training. For any party, the user and the corresponding relationship between the user and the tag (or feature) are all private data. Therefore, there is a need for a solution that can calculate the information value of a feature when each party does not know the user of the other party, and when the label and the feature data are isolated.

本說明書一個或多個實施例描述了一種保護隱私安全的多方聯合進行特徵評估的方法及裝置,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值。 根據第一態樣,提供了一種保護隱私安全的多方聯合進行特徵評估的方法,所述多方至少包括第一設備和第二設備,第一設備儲存有第一樣本集和其中各樣本的標籤,第二設備儲存有第二樣本集,所述方法應用於第一設備;所述方法包括: 使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID; 向所述第二設備發送第一交換資訊,其中至少包括,第一樣本集中每個樣本的第一次加密ID和標籤; 從所述第二設備分別接收第二交換資訊和第三交換資訊,其中,所述第二交換資訊包括,由所述第二設備使用第二金鑰對第一樣本集中每個樣本的第一次加密ID進行二次加密後得到的第二次加密ID和對應的標籤,且所述第二交換資訊中各樣本的相對順序已由所述第二設備擾亂;所述第三交換資訊包括,針對第二樣本集中每一個樣本,由所述第二設備基於所述第二金鑰對其初始ID進行加密得到的第一次加密ID和該樣本所在第一分箱的標識,所述第一分箱的標識由所述第二設備基於第二樣本集中各樣本的第一特徵的特徵值進行分箱得到; 使用所述第一金鑰,對所述第三交換資訊中各樣本的第一次加密ID進行二次加密,得到第一加密集合; 基於第二交換資訊中的第二次加密ID和第一加密集合中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本; 基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用以針對機器學習模型進行特徵選擇。 在一些實施例中,所述方法還包括: 在向第二設備發送第一交換資訊之前,基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱,並將第一樣本集中每一個樣本所在第二分箱的標識包括在所述第一交換資訊中; 在得到所述第一加密集合之後,擾亂第二樣本集中各樣本的相對順序,得到第四交換資訊; 向所述第二設備發送所述第四交換資訊,以便所述第二設備基於所述第四交換資訊中的第二次加密ID和第二加密集合中各樣本的第二次加密ID判定共有樣本,並基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,其中第二加密集合是使用所述第二金鑰對所述第一交換資訊中的第一次加密ID進行二次加密得到的。 在一些實施例中,所述基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱包括: 根據等頻分箱、等距分箱、卡方分箱中任一項,將第一樣本集分成所述多個第二分箱。 在一些實施例中,第一樣本集中各樣本的初始ID和第二樣本集中各樣本的初始ID均為正整數;在使用第一金鑰對第一樣本集中各樣本的初始ID進行加密之前,所述方法還包括: 判定大於第一樣本集中各樣本的初始ID中最大初始ID,且大於第二樣本集中各樣本的初始ID中最大初始ID的第一質數; 判定與第一質數互質的第一正整數為所述第一金鑰。 在一些實施例中,所述使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID包括: 對於第一樣本集中每一個樣本,判定該樣本初始ID和所述第一金鑰的乘積除以所述第一質數的餘數為該樣本的第一次加密ID。 在一些實施例中,第一樣本集包括標籤為正的多個樣本和標籤為負的多個樣本;所述基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值包括: 判定共有樣本中落入具有第一標識的第一分箱中且標籤為正的樣本個數,相對於共有樣本中標籤為正的樣本總個數的第一比例; 判定共有樣本中落入所述具有第一標識的第一分箱中且標籤為負的樣本個數,相對於共有樣本中標籤為負的樣本總個數的第二比例; 基於各個標識的第一分箱分別對應的所述第一比例,和所述第二比例,判定共有樣本的第一特徵的資訊價值。 在一些實施例中,所述第一樣本集中的樣本包括使用者樣本,所述機器學習模型為使用者分類模型;或者, 所述第一樣本集中的樣本包括業務樣本,所述機器學習模型為業務處理模型。 根據第二態樣,提供了一種保護隱私安全的多方聯合進行特徵評估的方法,所述多方至少包括第一設備和第二設備,所述第一設備儲存有第一樣本集和其中各樣本的標籤,所述第二設備儲存有第二樣本集,所述方法應用於第二設備;所述方法包括: 從第一設備接收第一交換資訊,其中至少包括,由所述第一設備使用第一金鑰對第一樣本集中每個樣本的初始ID進行加密後得到的第一次加密ID和對應的標籤; 使用第二金鑰,對所述第一交換資訊中各樣本的第一次加密ID進行二次加密,得到第二加密集合,然後擾亂所述第二加密集合中各樣本的相對順序; 向所述第一設備發送第二交換資訊,所述第二交換資訊包括已擾亂相對順序的第一樣本集中各樣本的第二次加密ID和標籤; 使用第二金鑰對第二樣本集中各個樣本的初始ID進行加密,得到第二樣本集中第一次加密ID; 基於第二樣本集中各樣本的第一特徵的特徵值,將第二樣本集分成多個第一分箱; 向所述第一設備發送第三交換資訊,所述第三交換資訊包括第二樣本集中各樣本的第一次加密ID和所在第一分箱的標識,以便所述第一設備使用第一金鑰對第三交換資訊中的第一次加密ID進行二次加密,得到第一加密集合,並基於第一加密集合中的第二次加密ID和所述第二交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本,以及基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 在一些實施例中,所述第一交換資訊還包括第一樣本集中每一個樣本所在第二分箱的標識,所述第二分箱的標識由所述第一設備基於第一樣本集中各樣本的第二特徵的特徵值進行分箱得到; 所述方法還包括: 從所述第一設備接收第四交換資訊,所述第四交換資訊包括第二樣本集中各樣本的第二次加密ID,且所述第四交換資訊中各樣本的相對順序已由所述第一設備擾亂; 基於所說第二加密集合的第二次加密ID和所述第四交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本; 基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 根據第三態樣,提供了一種保護隱私安全的多方聯合進行特徵評估的裝置,所述多方至少包括第一設備和第二設備,第一設備儲存有第一樣本集和其中各樣本的標籤,第二設備儲存有第二樣本集,所述裝置配置於第一設備;所述裝置包括: 第一加密單元,用於使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID; 第一發送單元,用於向所述第二設備發送第一交換資訊,其中至少包括,第一樣本集中每個樣本的第一次加密ID和標籤; 第一接收單元,用於從所述第二設備分別接收第二交換資訊和第三交換資訊,其中,所述第二交換資訊包括,由所述第二設備使用第二金鑰對第一樣本集中每個樣本的第一次加密ID進行二次加密後得到的第二次加密ID和對應的標籤,且所述第二交換資訊中各樣本的相對順序已由所述第二設備擾亂;所述第三交換資訊包括,針對第二樣本集中每一個樣本,由所述第二設備基於所述第二金鑰對其初始ID進行加密得到的第一次加密ID和該樣本所在第一分箱的標識,所述第一分箱的標識由所述第二設備基於第二樣本集中各樣本的第一特徵的特徵值進行分箱得到; 第二加密單元,用於使用所述第一金鑰,對所述第三交換資訊中各樣本的第一次加密ID進行二次加密,得到第一加密集合; 第一判定單元,用於基於第二交換資訊中的第二次加密ID和第一加密集合中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本; 第二判定單元,用於基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用以針對機器學習模型進行特徵選擇。 根據第四態樣,提供了一種保護隱私安全的多方聯合進行特徵評估的裝置,所述多方至少包括第一設備和第二設備,所述第一設備儲存有第一樣本集和其中各樣本的標籤,所述第二設備儲存有第二樣本集,所述裝置配置於第二設備;所述裝置包括: 第二接收單元,用於從第一設備接收第一交換資訊,其中至少包括,由所述第一設備使用第一金鑰對第一樣本集中每個樣本的初始ID進行加密後得到的第一次加密ID和對應的標籤; 第三加密單元,用於使用第二金鑰,對所述第一交換資訊中各樣本的第一次加密ID進行二次加密,得到第二加密集合,然後擾亂所述第二加密集合中各樣本的相對順序; 第二發送單元,用於向所述第一設備發送第二交換資訊,所述第二交換資訊包括已擾亂相對順序的第一樣本集中各樣本的第二次加密ID和標籤; 第四加密單元,用於使用第二金鑰對第二樣本集中各個樣本的初始ID進行加密,得到第二樣本集中第一次加密ID; 第二分箱單元,用於基於第二樣本集中各樣本的第一特徵的特徵值,將第二樣本集分成多個第一分箱; 第二發送單元還用於向所述第一設備發送第三交換資訊,所述第三交換資訊包括第二樣本集中各樣本的第一次加密ID和所在第一分箱的標識,以便所述第一設備使用第一金鑰對第三交換資訊中的第一次加密ID進行二次加密,得到第一加密集合,並基於第一加密集合中的第二次加密ID和所述第二交換資訊中的各樣本的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本,以及基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 根據第五態樣,提供了一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行第一態樣的方法或第六態樣所述的方法。 根據第六態樣,提供了一種計算終端,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現第一態樣的方法或第六態樣的方法。 本說明書實施例提供的方法及裝置,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值,具體較高的安全性。One or more embodiments of this specification describe a method and device for multi-party joint feature evaluation to protect privacy and security, which can calculate the features of users shared by both parties when both parties do not know the other user and when tags and feature data are isolated. The value of information. According to the first aspect, a method for multi-party joint feature evaluation to protect privacy is provided. The multi-party includes at least a first device and a second device. The first device stores a first sample set and a label of each sample therein. , The second device stores a second sample set, and the method is applied to the first device; the method includes: Use the first key to encrypt the initial ID of each sample in the first sample set to obtain the first encrypted ID of each sample in the first sample set; Sending the first exchange information to the second device, which includes at least the first encrypted ID and label of each sample in the first sample set; The second exchange information and the third exchange information are respectively received from the second device, wherein the second exchange information includes the second device using the second key to pair the first sample of each sample in the first sample set The second encrypted ID and the corresponding label are obtained after the first encrypted ID is encrypted twice, and the relative order of each sample in the second exchange information has been disturbed by the second device; the third exchange information includes For each sample in the second sample set, the first encrypted ID obtained by the second device encrypting its initial ID based on the second key and the identification of the first bin where the sample is located, the first The identification of a bin is obtained by the second device by binning based on the feature value of the first feature of each sample in the second sample set; Use the first key to perform secondary encryption on the first encrypted ID of each sample in the third exchange information to obtain a first encrypted set; Based on the second encryption ID in the second exchange information and the second encryption ID in the first encryption set, determine the common samples of the first sample set and the second sample set; Based on the label of each sample in the shared sample and the identification of the first bin where it is located, the information value of the first feature is determined for feature selection for the machine learning model. In some embodiments, the method further includes: Before sending the first exchange information to the second device, based on the feature value of the second feature of each sample in the first sample set, the first sample set is divided into multiple second bins, and each The identification of the second sub-box where a sample is located is included in the first exchange information; After obtaining the first encrypted set, disturb the relative order of the samples in the second sample set to obtain the fourth exchange information; Send the fourth exchange information to the second device, so that the second device determines that the second encryption ID in the fourth exchange information and the second encryption ID of each sample in the second encryption set are shared Sample, and determine the information value of the second feature based on the label of each sample in the shared sample and the identification of the second sub-box where the second feature is located. The second encrypted set uses the second key to exchange information with the first The ID in the first encryption is obtained by second encryption. In some embodiments, the dividing the first sample set into a plurality of second bins based on the feature value of the second feature of each sample in the first sample set includes: The first sample set is divided into the plurality of second sub-bins according to any one of equal-frequency sub-bins, equal-distance sub-bins, and chi-square sub-bins. In some embodiments, the initial ID of each sample in the first sample set and the initial ID of each sample in the second sample set are both positive integers; the first key is used to encrypt the initial ID of each sample in the first sample set. Previously, the method also included: Determine the first prime number greater than the largest initial ID among the initial IDs of the samples in the first sample set and greater than the largest initial ID among the initial IDs of the samples in the second sample set; The first positive integer determined to be relatively prime to the first prime number is the first key. In some embodiments, using the first key to encrypt the initial ID of each sample in the first sample set to obtain the first encrypted ID of each sample in the first sample set includes: For each sample in the first sample set, the remainder of the product of the initial ID of the sample and the first key divided by the first prime number is determined to be the first encrypted ID of the sample. In some embodiments, the first sample set includes a plurality of samples with positive labels and a plurality of samples with negative labels; the determination is made based on the label of each sample in the common sample and the identification of the first bin where it is located. The information value of the first feature includes: Determine the first ratio of the number of samples in the shared sample that fall into the first bin with the first identification and the label is positive relative to the total number of samples in the shared sample that have a positive label; Determine the second ratio of the number of samples in the shared sample that fall into the first bin with the first identification and whose labels are negative, relative to the total number of samples in the shared sample that have negative labels; Based on the first ratio and the second ratio respectively corresponding to the first bins of each identifier, the information value of the first feature of the shared sample is determined. In some embodiments, the samples in the first sample set include user samples, and the machine learning model is a user classification model; or, The samples in the first sample set include business samples, and the machine learning model is a business processing model. According to the second aspect, a method for multi-party joint feature evaluation to protect privacy is provided. The multi-party includes at least a first device and a second device. The first device stores a first sample set and each sample therein. The second device stores a second sample set, and the method is applied to the second device; the method includes: Receive the first exchange information from the first device, which includes at least the first encrypted ID obtained by encrypting the initial ID of each sample in the first sample set by the first device using the first key and the corresponding Label; Use the second key to perform secondary encryption on the first encrypted ID of each sample in the first exchanged information to obtain a second encrypted set, and then disturb the relative order of the samples in the second encrypted set; Sending second exchange information to the first device, where the second exchange information includes the second encrypted ID and tag of each sample in the first sample set whose relative order has been disturbed; Use the second key to encrypt the initial ID of each sample in the second sample set to obtain the first encrypted ID in the second sample set; Divide the second sample set into a plurality of first bins based on the feature value of the first feature of each sample in the second sample set; Send third exchange information to the first device. The third exchange information includes the first encrypted ID of each sample in the second sample set and the identification of the first bin where it is located, so that the first device can use the first gold The key performs secondary encryption on the first encrypted ID in the third exchange information to obtain the first encrypted set, which is based on the second encrypted ID in the first encrypted set and the second encryption in the second exchange information ID, to determine the common sample of the first sample set and the second sample set, and to determine the information value of the first feature based on the label of each sample in the common sample and the identification of the first bin in which it is located, which is used for machine learning The model performs feature selection. In some embodiments, the first exchange information further includes the identification of the second bin of each sample in the first sample set, and the identification of the second bin is determined by the first device based on the first sample set. The feature value of the second feature of each sample is obtained by binning; The method also includes: The fourth exchange information is received from the first device, the fourth exchange information includes the second encrypted ID of each sample in the second sample set, and the relative order of each sample in the fourth exchange information has been changed by the first A device disturbance; Determine the common samples of the first sample set and the second sample set based on the second encryption ID of the second encryption set and the second encryption ID in the fourth exchange information; The information value of the second feature is determined based on the label of each sample in the shared sample and the identification of the second bin in which it is located, and is used for feature selection for the machine learning model. According to a third aspect, there is provided an apparatus for protecting privacy and security for joint feature evaluation by multiple parties. The multiple parties at least include a first device and a second device. The first device stores a first sample set and a label of each sample therein. , The second device stores a second sample set, and the device is configured in the first device; the device includes: The first encryption unit is configured to use the first key to encrypt the initial ID of each sample in the first sample set to obtain the first encrypted ID of each sample in the first sample set; The first sending unit is configured to send the first exchange information to the second device, which includes at least the first encrypted ID and label of each sample in the first sample set; The first receiving unit is configured to receive the second exchange information and the third exchange information from the second device, wherein the second exchange information includes the second device using a second key to pair the first exchange information The second encrypted ID and the corresponding label are obtained after the second encrypted ID of the first encrypted ID of each sample in this set, and the relative order of the samples in the second exchange information has been disturbed by the second device; The third exchange information includes, for each sample in the second sample set, the first encrypted ID obtained by the second device encrypting its initial ID based on the second key and the first part where the sample is located. The identification of the box, the identification of the first bin is obtained by the second device based on the feature value of the first feature of each sample in the second sample set; The second encryption unit is configured to use the first key to perform secondary encryption on the first encrypted ID of each sample in the third exchange information to obtain a first encrypted set; The first determining unit is configured to determine the common samples of the first sample set and the second sample set based on the second encryption ID in the second exchange information and the second encryption ID in the first encryption set; The second determining unit is used to determine the information value of the first feature based on the label of each sample in the shared sample and the identification of the first bin in which it is located, so as to perform feature selection for the machine learning model. According to a fourth aspect, there is provided an apparatus for multi-party joint feature evaluation to protect privacy. The multi-party includes at least a first device and a second device. The first device stores a first sample set and each sample therein. , The second device stores a second sample set, and the device is configured in the second device; the device includes: The second receiving unit is configured to receive the first exchange information from the first device, which includes at least the first information obtained by encrypting the initial ID of each sample in the first sample set by the first device using the first key Encrypt ID and corresponding label once; The third encryption unit is configured to use a second key to perform secondary encryption on the first encrypted ID of each sample in the first exchanged information to obtain a second encrypted set, and then disrupt each of the second encrypted set The relative order of the samples; The second sending unit is configured to send second exchange information to the first device, where the second exchange information includes the second encrypted ID and label of each sample in the first sample set whose relative order has been disturbed; The fourth encryption unit is configured to use the second key to encrypt the initial ID of each sample in the second sample set to obtain the first encrypted ID in the second sample set; The second binning unit is configured to divide the second sample set into a plurality of first bins based on the feature value of the first feature of each sample in the second sample set; The second sending unit is also used to send third exchange information to the first device, where the third exchange information includes the first encrypted ID of each sample in the second sample set and the identification of the first sub-box where it is located, so that the The first device uses the first key to perform secondary encryption on the first encrypted ID in the third exchange information to obtain the first encrypted set, and based on the second encrypted ID in the first encrypted set and the second exchange The second encrypted ID of each sample in the information determines the common sample of the first sample set and the second sample set, and based on the label of each sample in the common sample and the identification of the first bin where it is located, it is determined that the first The information value of features is used for feature selection for machine learning models. According to a fifth aspect, there is provided a computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the method described in the sixth aspect method. According to a sixth aspect, a computing terminal is provided, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the method or the first aspect is implemented The sixth aspect of the method. The method and device provided in the embodiments of the present specification can calculate the information value of the characteristics of the users shared by both parties under the circumstances that the two parties do not know the other user and the tag and the characteristic data are isolated, and the specific security is relatively high.

下面結合圖式,對本說明書提供的方案進行描述。 圖1A示出了本說明書實施例披露的資料方A擁有的資料。圖1B示出了本說明書實施例披露的資料方B擁有的資料。圖1A和圖1B中的每一個ID(Identity Document,身份標識號)可以為唯一標識一個使用者的數位編碼,例如手機號等。如圖1A和圖1B所示,ID1、ID2、ID3為資料方A和資料方B共有的ID。圖1A中的每一個ID具有標籤和特徵Fa的特徵值。示例性的,如圖1A所示,標籤可分為正標籤和負標籤兩種。圖1B中的每一個ID具有特徵Fb的特徵值。 在一個示例性場景中,資料方A可以為電子支付平台(例如支付寶),標籤可以為欺詐商家的標記或非欺詐商家的標記。特徵Fa可以為交易流水資料。資料方B可以為銀行機構,特徵Fb可以為借貸資料。每個ID對應的交易流水資料的特徵值或者借貸資料的特徵值,可以透過特徵工程計算得到,具體可以參考現有技術介紹,此處不再贅述。 在一個示例性場景中,資料方A可以為電子商務平台(例如淘寶),標籤可以為正常買家的標記或非正常買家的標記,特徵Fa可以為銷售資料。資料方B可以為銀行機構,特徵Fb可以為借貸資料。 多方聯合訓練機器學習模型,需要使用資料方A和資料方B共有使用者的特徵。為了有效訓練機器學習模型,需要評估特徵和標籤的相關性。 可以透過圖2所示方案進行特徵篩選。其中,資料方A中的多個ID(ID集合),可以稱為set_A。B中的多個ID(ID集合),可以稱為set_B。在進行聯合計算時,資料方A可以將set_A和set_A的標籤發送給資料方B。由此,資料方B可以判定set_A和set_B的共有ID,然後,計算共有ID的特徵Fb的資訊價值,以評估特徵Fb和標籤的相關性。資料方B可以將set_B發送給資料方A。由此,資料方A可以判定set_A和set_B的共有ID,然後,計算共有ID的特徵Fa的資訊價值,以評估特徵Fa和標籤的相關性。在該方案中,資料雙方需要交換明文ID。 用於評估特徵和標籤的相關性的另一種方案為,構建可信執行環境(例如利用intel的sgx技術構建一個可信執行環境),資料方A的資料(set_A、set_A的標籤、set_A的特徵Fa)以及資料方B的資料(set_B、set_B的特徵Fb)可以各自經公開金鑰加密後,傳入可信執行環境。在可信執行環境內進行私密金鑰解密,並完成特徵的資訊價值計算,以及將特徵的資訊價值計算結果傳出可信環境。 用於評估特徵和標籤的相關性的又一種方案為,資料方A的資料(set_A、set_A的標籤、set_A的特徵Fa)以及資料方B的資料(set_B、set_B的特徵Fb)發送給第三方機構,由第三方完成特徵的資訊價值計算。 為進一步增強隱私資料安全,本說明書實施例提供了一種多方聯合進行特徵評估的方法,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值。在一個實施例中,該方法可以包括如圖3所示的步驟。需要說明的是,圖3雖然按照序列順序示出步驟301a-步驟310a以及步驟301b-步驟310b,並不限定這些步驟300-步驟310的執行順序。在一些示例中,可以按照圖3所示循序執行步驟301a-步驟310a以及步驟301b-步驟310b。在一些示例中,可以按照與圖3所示順序不同的循序執行步驟301a-步驟310a以及步驟301b-步驟310b。在一些示例中,可以並存執行步驟301a-步驟310a以及步驟301b-步驟310b中的兩個或更多個步驟。 接下來,結合圖3對本說明書提供的保護隱私安全的多方聯合進行特徵評估的方法進行示例說明。 資料方A和資料方B可以為具有計算、處理能力的裝置、設備、平台、設備集群,可相互配合以執行圖3所示的方法。 在步驟300a和步驟300b,資料方A和資料方B可以相互配合以執行初始化操作。具體的,資料方A和資料方B可以判定其擁有的ID的取值上限。以ID為手機號為例,其為11位數字構成的整數,即每一個ID為一整數。任一方的ID的取值上限為該方擁有的ID中數值最大ID。 在一個示例中,資料方A可以判定大於或等於資料方A的數值最大ID的整數C1。示例性的,以ID為11位數字組成手機號為例,整數C1可以為12位數字構成的整數。資料方A可以向資料方B發送資料方A的整數C1。資料方B可以判定大於資料方B的數值最大ID,且大於整數C1的質數P,並將質數P發送給資料方A。 在一個示例中,資料方B可以判定大於或等於資料方B的數值最大ID的整數C2。示例性的,以ID為11位數字組成手機號為例,整數C2可以為12位數字構成的整數。資料方B可以向資料方A發送資料方A的整數C2。資料方A可以判定大於資料方A的數值最大ID,且大於整數C2的質數P,並將質數P發送給資料方B。 資料方A可以隨機產生與質數P互質的正整數keyA。keyA也可以稱為第一金鑰。資料方B可以隨機產生與質數P互質的正整數keyB。keyB也可以稱為第二金鑰。 透過上述方式資料方A和資料方B完成初始化,得到各自的金鑰。接下來,資料方A和資料方B分別透過各自的金鑰對各自的ID進行第一次加密,得到各自的第一次加密ID。然後分別將各自的第一次加密ID發送給對方,由對方使用其金鑰再進行第二次加密。就數值相同的ID而言,經過兩次加密後,數值仍然相同,由此,可以使得資料方A和資料方B可以在無需向對方透漏未加密ID(也可以稱為初始ID)情況下,分別得到雙方共有的ID。具體如下。 為表述方便,可以將資料方A擁有的ID集合,即資料方A的樣本集中各樣本的ID的集合,稱為set_A。可以將資料方B擁有的ID集合,即資料方B的樣本集中各樣本的ID的集合,稱為set_B。可理解的,樣本和ID具有一一對應關係。在進行下文所述的加密之前,set_A和set_B中的各ID可以稱為樣本的初始ID。 在步驟302a中,資料方A使用keyA對set_A的每一個ID(初始ID),進行第一次加密,得到第一次加密ID。示例性的,就set_A的每一個ID而言,其第一次加密方式為,計算該ID和keyA的乘積,並將乘積除以質數P得到的餘數用作對應於該ID對應的第一次加密ID。第一次加密ID可以記為Encry(ID,keyA)。 具體可以如圖4所示,待加密的ID可以為set_A中的每一個ID。初始化p即為上述質數p。max(ID)為資料方A中數值最大ID。可以將待加密ID乘以待加密ID,得到TMP。然後,將TMP模質數p的餘數(即TMP除以質數p得到的餘數)E,作為待加密ID的加密結果。 資料方A可以根據特徵Fa的特徵值對set_A進行特徵分箱,以將set_A中第一次加密ID分到多個分箱中。參閱圖3,特徵Fa可以為包括了特徵Fa1、特徵Fa2等多種特徵的特徵集合,特徵Fa1、特徵Fa2可以統稱為Fai,即Fai中i可以為1,也可以為2,等等。其中,每個樣本具有特徵Fai的特徵值(特徵Fai的特徵值也可以稱為特徵Fai的取值)。就特徵Fai而言,資料方A可以根據set_A中各ID對應的特徵Fai的特徵值,進行特徵分箱,以將set_A中ID的第一次加密ID分到特徵Fai對應的多個分箱中。每一個分箱均具有分箱標識,以特徵Fa1為例,其分箱標識可以記為Fa1_bin。以特徵Fa2為例,其分箱標識可以記為Fa2_bin。可以將每一個第一次加密ID、Fa1_bin、Fa2_bin等進行關聯,可以記為(Encry(ID,keyA),Fa1_bin,Fa2_bin,…)。其中,Fa1_bin、Fa2_bin等可以統稱為Fai_bin,其表示ID根據特徵Fai的特徵值被分到了第Fai_bin分箱中。 在一個例子中,可以採用等頻分箱演算法進行特徵分箱。在另一個例子中,可以採用等距分箱演算法進行特徵分箱。在又一個例子中,可以採用卡方分箱演算法進行特徵分箱。 可以將set_A每一個樣本的第一次加密ID、標籤以及按照特徵Fai的特徵值進行分箱後所在分箱的標識進行關聯,得到set_A每一個樣本的第一次加密ID的關聯資訊,可以記為(Encry(ID,keyA),標籤,Fa1_bin,Fa2_bin,…)。set_A所有第一次加密ID的關聯資訊構成了第一交換資訊。資料方A可以將第一交換資訊發送給資料方B。 可理解的,每一個分箱中可以包括多個ID,例如K個ID。這相當於B得到的A的特徵分箱資訊是K匿名化的,即對應任意一個ID,都至少有K各ID與其特徵分箱資訊是相同的,因此,資料方B難以根據ID對應的特徵資訊,來推測ID和特徵資訊的對應關係。 在步驟302b中,資料方B使用keyB對set_B的每一個ID(初始ID),進行第一次加密,得到第一次加密ID。示例性的,就set_B的每一個ID而言,其第一次加密方式為,計算該ID和keyB的乘積,並將乘積除以質數P得到的餘數用作對應於該ID的第一次加密ID。第一次加密ID可以記為Encry(ID,keyB)。 資料方B可以根據特徵Fb的特徵值對set_B進行特徵分箱,以將set_B中第一次加密ID分到多個分箱中。參閱圖3,特徵Fb可以為包括了特徵Fb1、特徵Fb2等多種特徵的特徵集合。特徵Fb1、特徵Fb2可以統稱為Fbi,即Fai中i可以為1,也可以為2,等等。其中,每個樣本具有特徵Fbi的特徵值。可以根據特徵Fbi的特徵值,對set_B進行特徵分箱。具體可以參考上文關於步驟302a所示實施例的介紹,在此不再贅述。 可以將set_B中每一個樣本的第一次加密ID、按照Fbi的特徵值進行分箱後所在分箱的標識進行關聯,得到set_B每一個樣本的第一次加密ID的關聯資訊,可以記為(Encry(ID,keyB),Fb1_bin,Fb2_bin,…)。set_B所有第一次加密ID的關聯資訊構成了第三交換資訊。資料方B可以將第三交換資訊發送給資料方A。 在步驟304a中,資料方A在接收到第三交換資訊後,可以使用keyA對第三交換資訊中set_B的各個第一次加密ID分別進行二次加密,分別得到set_B的各個第一次加密ID的第二次加密ID。具體為,計算第一次加密ID和keyA的乘積,並將乘積除以質數P得到的餘數用作對應於該第一次加密ID的第二次加密ID,可以記為Encry(Encry(ID,keyB),keyA)。連同所在分箱標識,可以記為(Encry (Encry(ID,keyB),keyA),Fb1_bin,Fb2_bin,…),該資訊構成第一加密集合。 在步驟306a中,打亂(擾亂)set_B的各個第二次加密ID之間的相對順序,並將擾亂後的set_B的各個第二次加密ID,作為第四交換資訊發送給資料方B。 需要理解,第三交換資訊中的set_B的各個第一次加密ID之間具有相對順序,在使用第一金鑰對set_B的各個第一次加密ID進行二次加密,得到的set_B的各個第二次加密ID之間的相對順序與set_B的各個第一次加密ID之間具有相對順序相同。如不打亂set_B各個第二次加密ID之間的相對順序,就將set_B各個第二次加密ID發送給資料方B,則資料方B可以根據set_B各個第二次加密ID之間的相對順序,判定set_B各個第二次加密ID和set_B各個第一次加密ID的一一對應關係,由此可以得到第一金鑰,進而可以判定定set_A中的ID,導致資料方A的ID以及黑白名單洩露。 並且,在第三交換資訊中並不攜帶set_B的各個ID的所在分箱的標識,以避免資料方B根據set_B的各個第二次加密ID的所在分箱的標識,推測出各樣本第二次加密ID和各樣本的初始ID(或第一次加密ID)的對應關係,由此,得到第一金鑰,進而可以判定set_A中的ID,導致資料方A的ID以及黑白名單洩露。 在步驟304b中,資料方B在接收到第一交換資訊後,可以使用keyB對第一交換資訊中set_A的各個第一次加密ID分別進行二次加密,分別得到set_A的各個第一次加密ID對應的第二次加密ID。具體為,計算第一次加密ID和keyB的乘積,並將乘積除以質數P得到的餘數用作對應於該第一次加密ID的第二次加密ID,可以記為Encry (Encry(ID,keyA),keyB)。連同所在分箱標識,可以記為(Encry(Encry(ID,keyA),keyB),標籤,Fa1_bin,Fa2_bin,…),該資訊構成第二加密集合。 在步驟306b中,打亂(擾亂)set_A的各個第二次加密ID之間的相對順序,並將擾亂後的set_A的各個第二次加密ID連同各自的標籤,作為第二交換資訊發送給資料方A。在步驟306b中,擾亂set_A的各個第二次加密ID之間的相對順序,以及不向資料方發送set_A中ID的所在分箱的標識,以避免資料方A推測出第二金鑰。 透過上述步驟,set_A和set_B中各初始ID的均進行了兩次加密。其中,set_A中的初始ID,先在資料方A使用第一金鑰進行第一次加密,然後在資料方B使用第二金鑰進行第二次加密。set_B中的初始ID,先在資料方B使用第一金鑰進行第一次加密,然後在資料方A使用第二金鑰進行第二次加密。資料方A和B彼此交換各自二次加密的結果,使得資料方A和資料方B都擁有set_A和set_B中各初始ID對應的第二次加密ID。第一金鑰和第二金鑰均與質數p的互質,並且第一次和第二次的加密方式均為將金鑰和ID乘積除以質數p的餘數作為加密ID。由餘數系統的性質,使得上述加密方式具有如下性質: 可疊加性,ID加密前後具有相同的取值範圍,可進行多次加密運算; 可交換性,加密符合交換律,同一個ID透過兩個不同的金鑰進行二次加密,交換加密次序,得到的密文一致,即Encry(Encry(ID,keyA),keyB)=Encry(Encry(ID,keyB),keyA)。 難解密性,加密的金鑰未知時,解密是極難的。 唯一性,當且僅當ID(整數)相等時,ID的加密結果才相同。 接下來,結合餘數系統的性質對本說明書實施例所述的加密方式的性質進行證明。 在本說明書實施例中,x mod(y),可以稱為x模y,表示x除以y所得的餘數。餘數系統具有如下性質。 模n的完整餘數系統的任意兩個數模n的餘數不同,且正整數中任意正整數模n必定與模n的完整餘數系統中的某個數模n的餘數相同。模n完整餘數系統中,與模n互質的代表數所構成的集合,稱為模n的簡約餘數系統。 對於質數p和任意與p互質的正整數a,模p的最小簡約餘數系統集合S={1,2,3,…,(p-1)}的元素都乘以a,得到新的集合a*S={a,2a,3a,…,(p-1)a},滿足a*S mod(p)=S。證明如下。 若x屬於S,由餘數性質可知a*x mod(p)屬於集合S或0。假設a*x mod(p)=0,則a*x是p的整數倍。因p是質數,x不能被p整除,推出a能被p整除,與“a與p互質的條件”矛盾,因而假設不成立,a*x mod(p)不等於0,即知a*x mod(p)屬於集合S。 若x1、x2都屬於S且x1>x2,假設a*x1和a*x2模p同餘,即a*x1 mod(p)= a*x2 mod(p),則a*x1-k1*p=a*x2-k2*p,推出a*(x1-x2)=(k1-k2)*p。因-p<x1-x2<p,p是質數。若前式 a*(x1-x2)=(k1-k2)*p成立,即a是p的整數倍,這與“a與p互質的條件”矛盾,因而a*x1和a*x2模p同余不成立,a*x1和a*x2模p不同餘。由上可知,集合a*S中的p-1個元素,模p後的餘數是集合S中的元素,且互不相等,那麼顯而易見,集合S中的任何一個元素,都一定是a*S中某個元素模p的餘數。即,集合a*S mod(p)與集合S相同。 在本說明書實施例中,max(ID)<p,所以ID屬於集合S={1,2,3,...(p-1)},由此,可疊加性得證。即集合S的元素,經本說明書實施例提供的加密方式加密後,仍屬於集合S,因而可以繼續進行下一次加密。 對於質數p,對任意與p互質的正整數a和b,滿足交換律b*(a*x mod(p)) mod(p) = a*(b*x mod(p)) mod(p)。證明如下。 易證明x*y mod(z)=(x mod(z)) * (y mod(z)),於是,b*(a*x mod(p)) mod(p)=[b mod(p)] * [(a*x mod(p)) mod(p)] =[b mod(p)] * [a*x mod(p)]=[b mod(p)] * [a mod(p)] * [x mod(p)],同理可得a*(b*x mod(p)) mod(p)=[a mod(p)] * [b mod(p)] * [x mod(p)],由上,b*(a*x mod(p)) mod(p)=a*(b*x mod(p)) mod(p)得證。 在本說明書實施例中,同一個ID透過兩個不同的key進行二次加密,交換加密次序,得到的密文一致,即Encry(Encry(ID,keyA),keyB)= Encry(Encry(ID,keyB),keyA)。由此,可交換性得證。 已知質數p,和a*x mod(p)的值v,已知x屬於集合{1,2,3,…,(p-1)},a是一個與p互質的正整數,求x是一件很難的事。證明:這裡有兩個未知數a和x,a的取值範圍是1至正無窮,x的取值範圍是1~(p-1),有無窮組可能解,因而不可能解出x的值。即加密key未知時,解密是極難。由此,難解密性得證。 對於質數p和任意與p互質的正整數a,m和n是集合S={1,2,3,…,(p-1)}的兩個不同的元素,那麼a*m mod(p)一定不等於a*n mod(p)。證明如下。 假設a*m mod(p)=a*n mod(p),那麼a*m-k1*p=a*n-k2*p,k1和k2是整數。可推出a*(m-n)=(k1-k2)*p。因a與p互質,那麼必然有m-n可被p整除。因為m和n都屬於集合S,因而只可能有m-n=0,m和n相等,不符合條件,推出矛盾。因而a*m mod(p)不等於a*n mod(p)得證。 因此,透過本說明書提供的加密方式,當且僅當ID相等時,ID的加密結果才相同;當ID不相等時,ID的加密結果一定不同。 透過上述論證可知,set_A和set_B中具有相同ID時,set_A中的該ID經過上文所述加密方式加密後的加密結果,等於set_B中該ID經過上述所述加密方式加密後的加密介面。 由此,在步驟308a中,資料方A可以判定出set_A和set_B共有ID。並且第二交換資訊中攜帶了各ID的標籤,透過第三次交換資訊可以得到共有ID透過特徵Fbi(Fb1、特徵Fb2等)的特徵值進行分箱得到的所在分箱的標識。 在步驟310a中,可以根據步驟308a得到的資訊,利用圖3所示的公式,計算各特徵Fbi的資訊價值。其中,label=1表示標籤為正,label=0表示標籤為負。對於任一特徵Fbi而言,Precallk 表示分箱k中標籤為正的ID的數量相對於共有樣本中標籤為正的樣本總個數的比例,Nrecallk 表示分箱k中標籤為負的ID的數量相對於共有樣本中標籤為負的樣本總個數的比例,IV表示資訊價值。 在步驟308b中,資料方B可以判定出set_A和set_B共有ID。並且第一交換資訊中攜帶了各ID的標籤以及所在分箱的標識,由此,可以在步驟310b中,計算各特徵Fai的資訊價值。 本說明書實施例提供的方法,能夠實現各方資料隔離的情況下,完成特徵的資訊價值的安全計算,不洩露各方資料。具體如下。 在資訊價值計算過程中,資料方A拿到了資料方B的ID是由keyB加密的結果和對應的Fb特徵分箱,但這個資料對資料方A來說是足夠隱密的,因為:1)資料方A拿到的ID是經過keyB加密的,資料方A無法知道其背後對應的原ID,因而也無法把Fb分箱結果與真實ID對應起來;2)計算資訊價值時用的分箱資訊無關分箱的順序,因而資料方B傳給資料方A的所在分箱的標識可以是打亂順序的(可以在打亂第二次加密ID順序時實現),或者所在分箱的標識只是一個代號,這樣資料方A無法知道分箱對應的特徵大小順序;3)特徵的每個分箱裡包含K個ID,相當於資料方A得到的關於資料方B特徵的資訊是經過K匿名化的,任何一個ID的資訊,都有至少K個ID與之是一樣。資料方A還拿到了資料方A ID經過二次加密後的結果,這個加密ID因為已經被B打亂順序,且沒有攜帶任何其它可供辨識的額外資訊,因而資料方A只知道,這些ID都是自身ID被加密後得到的結果,且一一對應,但是並不清楚其中的對應關係。資料方A在拿到兩份資料後進行匹配、取交集、運算,這些操作相當於在一個ID加密後的空間內進行,且這個加密空間與原空間的對應關係未知(這個映射關係必須擁有兩方的keyA和keyB兩個金鑰才可知),因此,計算是安全的。類似可知,資料方B可獲得的資料,也不足以讓資料方B推導出資料方A的資料資訊。 參閱圖5,本說明書實施例提供了一種保護隱私安全的多方聯合進行特徵評估的方法,所述多方至少包括第一設備和第二設備,第一設備儲存有第一樣本集和其中各樣本的標籤,第二設備儲存有第二樣本集,所述方法應用於第一設備。參閱圖5,所述方法包括如下步驟。 步驟501,使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID。具體可以參閱上文對圖3中步驟302a的介紹,在此不再贅述。 需要理解,在描述302a時結合餘數加密的演算法進行描述。餘數加密演算法計算量少,並且安全性高,為一種較佳的加密演算法。應該理解,餘數加密演算法並非唯一的加密演算法,只要加密演算法滿足可疊加性、交換性、唯一性,都可以用於在步驟302a以及步驟302b中對樣本ID進行加密。在本說明書實施例中,資料方A與資料方B可以預先協商其他加密演算法。這裡的加密演算法可以為任一基於同一組金鑰對目標資料進行加密時,金鑰的使用順序不影響加密結果的演算法。這裡的加密演算法除圖3所示實施例中描述的餘數加密演算法外,還可以為異或(XOR)演算法、DH演算法、ECC-DH演算法等中任一種。 步驟503,向所述第二設備發送第一交換資訊,其中至少包括,第一樣本集中每個樣本的第一次加密ID和標籤。具體可以參閱上文對圖3中步驟302a的介紹,在此步驟贅述。 步驟505,從所述第二設備分別接收第二交換資訊和第三交換資訊,其中,所述第二交換資訊包括,由所述第二設備使用第二金鑰對第一樣本集中每個樣本的第一次加密ID進行二次加密後得到的第二次加密ID和對應的標籤,且所述第二交換資訊中各樣本的相對順序已由所述第二設備擾亂;所述第三交換資訊包括,針對第二樣本集中每一個樣本,由所述第二設備基於所述第二金鑰對其初始ID進行加密得到的第一次加密ID和該樣本所在第一分箱的標識,所述第一分箱的標識由所述第二設備基於第二樣本集中各樣本的第一特徵的特徵值進行分箱得到。 具體可以參閱上文對圖3中步驟302b、304b、306b的介紹,在此不再贅述。 步驟507,使用所述第一金鑰,對所述第三交換資訊中各樣本的第一次加密ID進行二次加密,得到第一加密集合。具體可以參閱上文對圖3中步驟304a的介紹,在此不再贅述。 步驟509,基於第二交換資訊中的第二次加密ID和第一加密集合中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本。具體可以參閱上文對圖3步驟308a的介紹,在此不再贅述。 步驟511,基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用以針對機器學習模型進行特徵選擇。具體可以參閱上文對圖3中步驟310a的介紹,在此不再贅述。 在一些實施例中,所述方法還包括:在向第二設備發送第一交換資訊之前,基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱,並將第一樣本集中每一個樣本所在第二分箱的標識包括在所述第一交換資訊中;在得到所述第一加密集合之後,擾亂第二樣本集中各樣本的相對順序,得到第四交換資訊;向所述第二設備發送所述第四交換資訊,以便所述第二設備基於所述第四交換資訊中的第二次加密ID和第二加密集合中的第二次加密ID判定共有樣本,並基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,其中第二加密集合是使用所述第二金鑰對所述第一交換資訊中的第一次加密ID進行二次加密得到的。具體可以參閱上文對圖3中步驟302a、306a、308b、310b的介紹,在此不再贅述。 在該實施例的一個示例中,所述基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱包括:根據等頻分箱、等距分箱、卡方分箱中任一項,將第一樣本集分成所述多個第二分箱。 在一些實施例中,第一樣本集中各樣本的初始ID和第二樣本集中各樣本的初始ID均為正整數;在使用第一金鑰對第一樣本集中各樣本的初始ID進行加密之前,所述方法還包括:判定大於第一樣本集中各樣本的初始ID中最大初始ID,且大於第二樣本集中各樣本的初始ID中最大初始ID的第一質數;判定與第一質數互質的第一正整數為所述第一金鑰。具體可以參閱上文對圖3中步驟300a和步驟300b的介紹,在此不再贅述。 在一些實施例中,所述使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID包括:對於第一樣本集中每一個樣本,判定該樣本初始ID和所述第一金鑰的乘積除以所述第一質數的餘數為該樣本的第一次加密ID。具體可以參閱上文對圖3中步驟302的介紹,在此不再贅述。 在一些實施例中,第一樣本集包括標籤為正的多個樣本和標籤為負的多個樣本;所述基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值包括:判定共有樣本中落入具有第一標識的第一分箱中且標籤為正的樣本個數,相對於共有樣本中標籤為正的樣本總個數的第一比例;判定共有樣本中落入所述具有第一標識的第一分箱中且標籤為負的樣本個數,相對於共有樣本中標籤為負的樣本總個數的第二比例;基於各個標識的第一分箱分別對應的所述第一比例,和所述第二比例,判定共有樣本的第一特徵的資訊價值。具體可以參閱上文對圖3中步驟310a的介紹,在此不再贅述。 在一些實施例中,所述第一樣本集中的樣本包括使用者樣本,所述機器學習模型為使用者分類模型;或者,所述第一樣本集中的樣本包括業務樣本,所述機器學習模型為業務處理模型。 本說明書實施例提供的方法,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值,安全性高。 參閱圖6,本說明書實施例提供了一種保護隱私安全的多方聯合進行特徵評估的方法,所述多方至少包括第一設備和第二設備,所述第一設備儲存有第一樣本集和其中各樣本的標籤,所述第二設備儲存有第二樣本集,所述方法應用於第二設備。如圖6所示,該方法包括如下步驟。 步驟601,從第一設備接收第一交換資訊,其中至少包括,由所述第一設備使用第一金鑰對第一樣本集中每個樣本的初始ID進行加密後得到的第一次加密ID和對應的標籤。具體可以參閱上文對圖3中步驟302a的介紹,在此不再贅述。 步驟603,使用第二金鑰,對所述第一交換資訊中各樣本的第一次加密ID進行二次加密,得到第二加密集合,然後擾亂所述第二加密集合中各樣本的相對順序。具體可以參閱上文對圖3中步驟304b、306b的介紹,在此不再贅述。 步驟605,向所述第一設備發送第二交換資訊,所述第二交換資訊包括已擾亂相對順序的第一樣本集中各樣本的第二次加密ID和標籤。具體可以參閱上文對圖3中步驟306b的介紹,在此不再贅述。 步驟607,使用第二金鑰對第二樣本集中各個樣本的初始ID進行加密,得到第二樣本集中第一次加密ID。具體可以參閱上文對圖3中步驟302b的介紹,在此不再贅述。 步驟609,基於第二樣本集中各樣本的第一特徵的特徵值,將第二樣本集分成多個第一分箱。具體可以參閱上文對圖3中步驟302b的介紹,在此不再贅述。 步驟611,向所述第一設備發送第三交換資訊,所述第三交換資訊包括第二樣本集中各樣本的第一次加密ID和所在第一分箱的標識,以便所述第一設備使用第一金鑰對第三交換資訊中的第一次加密ID進行加密,得到第一加密集合,並基於第一加密集合中的第二次加密ID和所述第二交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本,以及基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 具體可以參閱上文對圖3中步驟302b的介紹,在此不再贅述。 在一些實施例中,所述第一交換資訊還包括第一樣本集中每一個樣本所在第二分箱的標識,所述第二分箱的標識由所述第一設備基於第一樣本集中各樣本的第二特徵的特徵值進行分箱得到;所述方法還包括:從所述第一設備接收第四交換資訊,所述第四交換資訊包括第二樣本集中各樣本的第二次加密ID,且所述第四交換資訊中各樣本的相對順序已由所述第一設備擾亂;基於所述第二加密集合的第二次加密ID和所述第四交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本;基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,用於針對機器學習模型進行特徵選擇。具體可以參閱上文對圖3中步驟302a、304a、306a、308b、310b的介紹,在此不再贅述。 本說明書實施例提供的方法,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值,安全性高。 參閱圖7,本說明書實施例提供了一種保護隱私安全的多方聯合進行特徵評估的裝置700,所述多方至少包括第一設備和第二設備,第一設備儲存有第一樣本集和其中各樣本的標籤,第二設備儲存有第二樣本集,所述裝置配置於第一設備。如圖7所示,所述裝置700包括: 第一加密單元710,用於使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID; 第一發送單元720,用於向所述第二設備發送第一交換資訊,其中至少包括,第一樣本集中每個樣本的第一次加密ID和標籤; 第一接收單元730,用於從所述第二設備分別接收第二交換資訊和第三交換資訊,其中,所述第二交換資訊包括,由所述第二設備使用第二金鑰對第一樣本集中每個樣本的第一次加密ID進行二次加密後得到的第二次加密ID和對應的標籤,且所述第二交換資訊中各樣本的相對順序已由所述第二設備擾亂;所述第三交換資訊包括,針對第二樣本集中每一個樣本,由所述第二設備基於所述第二金鑰對其初始ID進行加密得到的第一次加密ID和該樣本所在第一分箱的標識,所述第一分箱的標識由所述第二設備基於第二樣本集中各樣本的第一特徵的特徵值進行分箱得到; 第二加密單元740,用於基於所述第一金鑰,對所述第三交換資訊中各樣本的第一次加密ID進行二次加密,得到第二樣本集中各樣本的第二次加密ID; 第一判定單元750,用於基於第一樣本集中各樣本的第二次加密ID和第二樣本集中各樣本的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本; 第二判定單元760,用於基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用以針對機器學習模型進行特徵選擇。 裝置700的各功能單元的功能可以參考圖5所示方法實施例實現,在此不再贅述。 本說明書實施例提供的裝置,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值,安全性高。 參閱圖8,本說明書實施例提供了一種保護隱私安全的多方聯合進行特徵評估的裝置,所述多方至少包括第一設備和第二設備,所述第一設備儲存有第一樣本集和其中各樣本的標籤,所述第二設備儲存有第二樣本集,所述裝置配置於第二設備;所述裝置包括: 第二接收單元810,用於從第一設備接收第一交換資訊,其中至少包括,由所述第一設備使用第一金鑰對第一樣本集中每個樣本的初始ID進行加密後得到的第一次加密ID和對應的標籤; 第三加密單元820,用於使用第二金鑰,對所述第一交換資訊中各樣本的第一次加密ID進行二次加密,得到第二加密集合,然後擾亂所述第二加密集合中各樣本的相對順序; 第二發送單元830,用於向所述第一設備發送第二交換資訊,所述第二交換資訊包括已擾亂相對順序的第一樣本集中各樣本的第二次加密ID和標籤; 第四加密單元840,用於使用第二金鑰對第二樣本集中各個樣本的初始ID進行加密,得到第二樣本集中第一次加密ID; 第二分箱單元850,用於基於第二樣本集中各樣本的第一特徵的特徵值,將第二樣本集分成多個第一分箱; 第二發送單元830還用於向所述第一設備發送第三交換資訊,所述第三交換資訊包括第二樣本集中各樣本的第一次加密ID和所在第一分箱的標識,以便所述第一設備使用第一金鑰對第三交換資訊中的第一次加密ID進行二次加密,得到第一加密集合,並基於第一加密集合中的第二次加密ID和所述第二交換資訊中的各樣本的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本,以及基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 裝置800的各功能單元的功能可以參考圖6所示方法實施例實現,在此不再贅述。 本說明書實施例提供的裝置,可以在雙方未知對方使用者以及在標籤和特徵資料隔離的情況下,計算雙方共有使用者的特徵的資訊價值,安全性高。 另一方面,本說明書的實施例提供了一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行圖5所示的方法或圖6所示的方法。 另一方面,本說明書的實施例提供了一種計算終端,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現圖5所示的方法或圖6所示的方法。 本領域技術人員應該可以意識到,在上述一個或多個示例中,本說明書所描述的功能可以用硬體、軟體、韌體或它們的任意組合來實現。當使用軟體實現時,可以將這些功能儲存在電腦可讀媒體中或者作為電腦可讀媒體上的一個或多個指令或代碼進行傳輸。 以上所述的具體實施方式,對本發明的目的、技術方案和有益效果進行了進一步詳細說明,所應理解的是,以上所述僅為本發明的具體實施方式而已,並不用於限定本發明的保護範圍,凡在本發明的技術方案的基礎之上,所做的任何修改、等同替換、改進等,均應包括在本發明的保護範圍之內。The following describes the solutions provided in this specification in conjunction with the drawings. Fig. 1A shows the data owned by the data party A disclosed in the embodiment of this specification. Figure 1B shows the data owned by the data party B disclosed in the embodiment of this specification. Each ID (Identity Document, identity identification number) in FIG. 1A and FIG. 1B may be a digital code that uniquely identifies a user, such as a mobile phone number. As shown in Figure 1A and Figure 1B, ID1, ID2, and ID3 are the IDs shared by the data party A and the data party B. Each ID in FIG. 1A has a tag and a characteristic value of the characteristic Fa. Exemplarily, as shown in FIG. 1A, tags can be classified into two types: positive tags and negative tags. Each ID in FIG. 1B has the characteristic value of the characteristic Fb. In an exemplary scenario, the data party A may be an electronic payment platform (for example, Alipay), and the label may be a mark of a fraudulent merchant or a mark of a non-fraudulent merchant. The feature Fa may be transaction flow data. The data party B can be a banking institution, and the feature Fb can be loan data. The feature value of the transaction flow data or the feature value of the loan data corresponding to each ID can be calculated through feature engineering. For details, please refer to the introduction of the prior art, which will not be repeated here. In an exemplary scenario, the data party A may be an e-commerce platform (such as Taobao), the label may be a mark of a normal buyer or a mark of an abnormal buyer, and the feature Fa may be a sales material. The data party B can be a banking institution, and the feature Fb can be loan data. Multi-party joint training of machine learning models requires the use of user characteristics shared by data party A and data party B. In order to effectively train a machine learning model, it is necessary to evaluate the correlation between features and labels. The feature screening can be performed through the scheme shown in Figure 2. Among them, multiple IDs (ID set) in data party A can be called set_A. Multiple IDs (ID set) in B can be called set_B. When performing joint calculations, data party A can send the tags of set_A and set_A to data party B. From this, the data party B can determine the shared ID of set_A and set_B, and then calculate the information value of the feature Fb of the shared ID to evaluate the correlation between the feature Fb and the label. Data party B can send set_B to data party A. From this, the data party A can determine the shared ID of set_A and set_B, and then calculate the information value of the feature Fa of the shared ID to evaluate the correlation between the feature Fa and the label. In this solution, both parties need to exchange plaintext IDs. Another solution for evaluating the correlation between features and tags is to build a trusted execution environment (for example, using Intel’s sgx technology to build a trusted execution environment), and the data of data party A (set_A, set_A tags, set_A features) Fa) and the data of data party B (characteristic Fb of set_B and set_B) can be transmitted to the trusted execution environment after being encrypted with a public key. The private key is decrypted in the trusted execution environment, and the information value calculation of the feature is completed, and the information value calculation result of the feature is transmitted to the trusted environment. Another solution for evaluating the correlation between features and tags is to send data from data party A (set_A, set_A tags, set_A feature Fa) and data party B (set_B, set_B feature Fb) to a third party For institutions, a third party completes the calculation of the information value of the characteristics. In order to further enhance the security of private data, the embodiment of this specification provides a method for multi-party joint feature evaluation, which can calculate the information value of the features of the users shared by both parties when the other party is unknown to the user and the tag and feature data are isolated. . In one embodiment, the method may include the steps shown in FIG. 3. It should be noted that although FIG. 3 shows step 301a-step 310a and step 301b-step 310b in sequence, it does not limit the execution order of these steps 300-310. In some examples, step 301a to step 310a and step 301b to step 310b can be performed in sequence as shown in FIG. 3. In some examples, step 301a to step 310a and step 301b to step 310b may be performed in a different order from that shown in FIG. 3. In some examples, two or more steps of step 301a to step 310a and step 301b to step 310b may be performed concurrently. Next, in conjunction with Fig. 3, the method of multi-party joint feature evaluation for protecting privacy and security provided in this specification will be illustrated as an example. The data party A and the data party B can be devices, equipment, platforms, and equipment clusters with computing and processing capabilities, and can cooperate with each other to execute the method shown in FIG. 3. In step 300a and step 300b, the data party A and the data party B can cooperate with each other to perform the initialization operation. Specifically, the data party A and the data party B can determine the upper limit of the value of the ID they own. Taking the ID as a mobile phone number as an example, it is an integer composed of 11 digits, that is, each ID is an integer. The upper limit of the ID of either party is the ID with the largest value among the IDs owned by that party. In one example, the data party A may determine that the data party A is greater than or equal to the integer C1 of the largest ID of the data party A. Exemplarily, taking the ID of 11 digits forming a mobile phone number as an example, the integer C1 may be an integer consisting of 12 digits. Data party A can send data party A's integer C1 to data party B. The data party B can determine the prime number P which is greater than the data party B's numerical maximum ID and is greater than the integer C1, and send the prime number P to the data party A. In an example, the data party B can determine that the data party B is greater than or equal to the integer C2 of the largest ID of the data party B. Exemplarily, taking the ID of 11 digits forming a mobile phone number as an example, the integer C2 may be an integer consisting of 12 digits. Data party B can send data party A's integer C2 to data party A. The data party A can determine the prime number P which is greater than the data party A's numerical maximum ID and is greater than the integer C2, and send the prime number P to the data party B. The data party A can randomly generate a positive integer keyA that is relatively prime to the prime number P. keyA can also be called the first key. The data party B can randomly generate a positive integer keyB that is relatively prime to the prime number P. keyB can also be called the second key. The data party A and data party B complete the initialization through the above-mentioned method, and obtain their respective keys. Next, the data party A and the data party B respectively use their own keys to encrypt their IDs for the first time to obtain their first encrypted IDs. Then respectively send their first encrypted ID to the other party, and the other party uses its key to perform the second encryption. For IDs with the same value, the value is still the same after two encryptions. This allows the data party A and the data party B to disclose the unencrypted ID (also known as the initial ID) to the other party. Obtain the IDs shared by both parties. details as follows. For the convenience of presentation, the ID set owned by data party A, that is, the set of IDs of each sample in the sample set of data party A, can be called set_A. The ID set owned by data party B, that is, the set of IDs of each sample in the sample set of data party B, can be called set_B. Understandably, there is a one-to-one correspondence between samples and IDs. Before the encryption described below, each ID in set_A and set_B can be referred to as the initial ID of the sample. In step 302a, the data party A uses keyA to encrypt each ID (initial ID) of set_A for the first time to obtain the first encrypted ID. Exemplarily, for each ID of set_A, the first encryption method is to calculate the product of the ID and keyA, and divide the product by the prime number P to obtain the remainder as the first corresponding to the ID Encrypted ID. The first encryption ID can be recorded as Encry(ID, keyA). Specifically, as shown in Figure 4, the ID to be encrypted can be each ID in set_A. Initialization p is the above prime number p. max(ID) is the ID with the largest value in data party A. You can multiply the ID to be encrypted by the ID to be encrypted to get the TMP. Then, the remainder E of the TMP modulus prime number p (that is, the remainder obtained by dividing TMP by the prime number p) E is used as the encryption result of the ID to be encrypted. Data party A can perform feature binning on set_A according to the feature value of feature Fa, so as to split the first encrypted ID in set_A into multiple bins. Referring to FIG. 3, the feature Fa can be a feature set including multiple features such as feature Fa1, feature Fa2, etc. Feature Fa1, feature Fa2 can be collectively referred to as Fai, that is, in Fai, i can be 1, or 2, and so on. Among them, each sample has the feature value of the feature Fai (the feature value of the feature Fai may also be referred to as the value of the feature Fai). In terms of feature Fai, data party A can perform feature binning according to the feature value of feature Fai corresponding to each ID in set_A, so as to divide the first encrypted ID of ID in set_A into multiple bins corresponding to feature Fai . Each bin has a bin identification. Taking feature Fa1 as an example, its bin identification can be recorded as Fa1_bin. Taking feature Fa2 as an example, its bin identification can be recorded as Fa2_bin. You can associate each first encrypted ID, Fa1_bin, Fa2_bin, etc., which can be recorded as (Encry(ID, keyA), Fa1_bin, Fa2_bin,...). Among them, Fa1_bin, Fa2_bin, etc. can be collectively referred to as Fai_bin, which means that the ID is sorted into the Fai_bin bin according to the feature value of the feature Fai. In one example, an equal frequency binning algorithm can be used to perform feature binning. In another example, the equidistant binning algorithm can be used for feature binning. In another example, the chi-square binning algorithm can be used to perform feature binning. The first encrypted ID and label of each sample of set_A can be associated with the identification of the bin after being binned according to the feature value of the feature Fai, and the associated information of the first encrypted ID of each sample of set_A can be obtained. It is (Encry(ID, keyA), label, Fa1_bin, Fa2_bin,...). All related information of the first encrypted ID of set_A constitutes the first exchange information. Data party A can send the first exchange information to data party B. It is understandable that each sub-box may include multiple IDs, for example, K IDs. This is equivalent to that the feature binning information of A obtained by B is anonymized by K, that is, corresponding to any ID, at least K each ID and its feature binning information are the same. Therefore, it is difficult for data party B to correspond to the characteristics of the ID. Information to infer the correspondence between ID and feature information. In step 302b, the data party B uses keyB to encrypt each ID (initial ID) of set_B for the first time to obtain the first encrypted ID. Exemplarily, for each ID of set_B, the first encryption method is to calculate the product of the ID and keyB, and divide the product by the prime number P to obtain the remainder as the first encryption corresponding to the ID ID. The first encryption ID can be recorded as Encry(ID, keyB). Data party B can perform feature binning on set_B according to the feature value of feature Fb, so as to split the first encrypted ID in set_B into multiple bins. Referring to FIG. 3, the feature Fb may be a feature set including multiple features such as feature Fb1 and feature Fb2. Feature Fb1 and Feature Fb2 can be collectively referred to as Fbi, that is, i in Fai can be 1, or 2, and so on. Among them, each sample has the characteristic value of the characteristic Fbi. The set_B can be binned according to the feature value of the feature Fbi. For details, reference may be made to the above description of the embodiment shown in step 302a, which will not be repeated here. The first encrypted ID of each sample in set_B can be correlated with the identification of the bin after binning according to the characteristic value of Fbi, and the associated information of the first encrypted ID of each sample in set_B can be obtained, which can be recorded as ( Encry(ID, keyB), Fb1_bin, Fb2_bin,...). All related information of the first encrypted ID of set_B constitutes the third exchange information. Data party B can send the third exchange information to data party A. In step 304a, after the data party A receives the third exchange information, it can use keyA to encrypt each first encrypted ID of set_B in the third exchange information respectively to obtain each first encrypted ID of set_B. The second encrypted ID. Specifically, the product of the first encrypted ID and keyA is calculated, and the remainder obtained by dividing the product by the prime number P is used as the second encrypted ID corresponding to the first encrypted ID, which can be recorded as Encry(Encry(ID, keyB), keyA). Together with the bin identification, it can be recorded as (Encry (Encry (ID, keyB), keyA), Fb1_bin, Fb2_bin,...), and this information constitutes the first encrypted set. In step 306a, the relative order between the second encrypted IDs of set_B is disturbed (disturbed), and the disturbed second encrypted IDs of set_B are sent to the data party B as the fourth exchange information. It should be understood that the first encrypted IDs of set_B in the third exchange information have a relative order. After the first key is used to encrypt the first encrypted IDs of set_B, each second encrypted ID of set_B is obtained. The relative order between the secondary encryption IDs is the same as the relative order between the first encryption IDs of set_B. If the relative order between the second encrypted IDs of set_B is not disturbed, then the second encrypted IDs of set_B will be sent to the data party B, and then the data party B can follow the relative order between the second encrypted IDs of set_B , Determine the one-to-one correspondence between each second encrypted ID of set_B and each first encrypted ID of set_B, from which the first key can be obtained, and then the ID in set_A can be determined, resulting in the ID of data party A and the black and white list Give way. In addition, the third exchange information does not carry the identification of the bin where each ID of set_B is located, so as to prevent the data party B from inferring the second time of each sample based on the identification of the bin where each second encryption ID of set_B is located The corresponding relationship between the encrypted ID and the initial ID (or the first encrypted ID) of each sample, thereby obtaining the first key, and then the ID in set_A can be determined, which leads to the disclosure of the ID of the data party A and the black and white list. In step 304b, after receiving the first exchange information, the data party B can use keyB to perform secondary encryption on each first encrypted ID of set_A in the first exchange information to obtain each first encrypted ID of set_A. The corresponding ID for the second encryption. Specifically, the product of the first encrypted ID and keyB is calculated, and the remainder obtained by dividing the product by the prime number P is used as the second encrypted ID corresponding to the first encrypted ID, which can be recorded as Encry (Encry(ID, keyA), keyB). Together with the bin identification, it can be recorded as (Encry(Encry(ID, keyA), keyB), label, Fa1_bin, Fa2_bin,...), and this information constitutes the second encrypted set. In step 306b, the relative order between the second encrypted IDs of set_A is disrupted (disrupted), and the respective second encrypted IDs of set_A after the scrambled are sent to the data as the second exchange information along with their respective tags. Party A. In step 306b, the relative sequence between the second encrypted IDs of set_A is disturbed, and the identification of the bin where the ID in set_A is located is not sent to the data party, so as to prevent the data party A from inferring the second key. Through the above steps, each initial ID in set_A and set_B has been encrypted twice. Among them, the initial ID in set_A is first encrypted by the data party A using the first key, and then encrypted by the data party B using the second key for the second time. The initial ID in set_B is first encrypted by the data party B using the first key, and then encrypted by the data party A using the second key for the second time. The data parties A and B exchange the results of their respective secondary encryptions with each other, so that both the data party A and the data party B have the second encryption IDs corresponding to the initial IDs in set_A and set_B. Both the first key and the second key are relatively prime to the prime number p, and the first and second encryption methods both use the remainder of the product of the key and ID divided by the prime number p as the encryption ID. Due to the nature of the remainder system, the above encryption method has the following properties: Superimposability, ID encryption has the same value range before and after encryption, and multiple encryption operations can be performed; Exchangeability, encryption conforms to the commutative law, and the same ID passes through two Different keys are encrypted twice, and the encryption sequence is exchanged, and the obtained ciphertext is the same, that is, Encry(Encry(ID, keyA), keyB)=Encry(Encry(ID, keyB), keyA). It is difficult to decrypt. When the encryption key is unknown, decryption is extremely difficult. Uniqueness, if and only if the ID (integer) is equal, the encryption result of the ID is the same. Next, the nature of the encryption method described in the embodiment of this specification is proved in conjunction with the nature of the remainder system. In the embodiment of this specification, x mod(y) can be called x mod y, which represents the remainder obtained by dividing x by y. The remainder system has the following properties. The remainder of any two modulo n of the complete remainder system modulo n is different, and any positive integer modulo n in the positive integer must be the same as the remainder of a certain number modulo n in the complete remainder system of modulo n. In the complete remainder system modulo n, the set of representative numbers that are relatively prime to modulo n is called the reduced remainder system modulo n. For a prime number p and any positive integer a that is relatively prime to p, the elements of the minimum reduced remainder system set S={1,2,3,...,(p-1)} modulo p are multiplied by a to obtain a new set a*S={a,2a,3a,...,(p-1)a}, satisfying a*S mod(p)=S. The proof is as follows. If x belongs to S, we know that a*x mod(p) belongs to the set S or 0 from the property of the remainder. Assuming a*x mod(p)=0, then a*x is an integer multiple of p. Since p is a prime number and x cannot be divisible by p, it is concluded that a can be divisible by p, which contradicts the "condition that a and p are mutually prime". Therefore, the hypothesis does not hold and a*x mod(p) is not equal to 0, that is, a*x mod(p) belongs to the set S. If x1 and x2 belong to S and x1>x2, assuming that a*x1 and a*x2 modulo p are congruent, that is, a*x1 mod(p) = a*x2 mod(p), then a*x1-k1*p =a*x2-k2*p, infer a*(x1-x2)=(k1-k2)*p. Because -p<x1-x2<p, p is a prime number. If the previous formula a*(x1-x2)=(k1-k2)*p holds, that is, a is an integer multiple of p, which contradicts the “condition of a and p being mutually prime”, so a*x1 and a*x2 modulo p congruence does not hold, a*x1 and a*x2 modulo p are not identical. It can be seen from the above that for the p-1 elements in the set a*S, the remainder after modulo p is the elements in the set S, and they are not equal to each other, then it is obvious that any element in the set S must be a*S The remainder of an element modulo p in. That is, the set a*S mod(p) is the same as the set S. In the embodiment of this specification, max(ID)<p, so ID belongs to the set S={1,2,3,...(p-1)}, thus, the superimposability is proved. That is, the elements of the set S still belong to the set S after being encrypted by the encryption method provided in the embodiment of this specification, so the next encryption can be continued. For a prime number p, for any positive integers a and b that are relatively prime to p, the commutative law b*(a*x mod(p)) mod(p) = a*(b*x mod(p)) mod(p ). The proof is as follows. It is easy to prove that x*y mod(z)=(x mod(z)) * (y mod(z)), so b*(a*x mod(p)) mod(p)=[b mod(p) ] * [(a*x mod(p)) mod(p)] =[b mod(p)] * [a*x mod(p)]=[b mod(p)] * [a mod(p) ] * [x mod(p)], in the same way, a*(b*x mod(p)) mod(p)=[a mod(p)] * [b mod(p)] * [x mod( p)], from the above, b*(a*x mod(p)) mod(p)=a*(b*x mod(p)) mod(p) is proved. In the embodiment of this specification, the same ID is encrypted twice through two different keys, the encryption order is exchanged, and the obtained ciphertext is consistent, that is, Encry(Encry(ID, keyA), keyB) = Encry(Encry(ID, keyB), keyA). Thus, the exchangeability is proved. Given the prime number p, and the value v of a*x mod(p), it is known that x belongs to the set {1,2,3,...,(p-1)}, and a is a positive integer that is relatively prime to p. x is a difficult thing. Proof: There are two unknowns a and x. The value range of a is 1 to positive infinity, and the value range of x is 1~(p-1). There are infinite groups of possible solutions, so it is impossible to solve the value of x . That is, when the encryption key is unknown, decryption is extremely difficult. As a result, it is difficult to decipher. For a prime number p and any positive integer a that is relatively prime to p, m and n are two different elements of the set S={1,2,3,...,(p-1)}, then a*m mod(p ) Must not be equal to a*n mod(p). The proof is as follows. Assuming a*m mod(p)=a*n mod(p), then a*m-k1*p=a*n-k2*p, k1 and k2 are integers. It can be deduced that a*(mn)=(k1-k2)*p. Since a and p are relatively prime, then mn must be divisible by p. Because both m and n belong to the set S, it is only possible that mn=0, m and n are equal, and the conditions are not met, and a contradiction is derived. Therefore, it is proved that a*m mod(p) is not equal to a*n mod(p). Therefore, through the encryption method provided in this manual, if and only when the IDs are equal, the encryption result of the ID is the same; when the IDs are not equal, the encryption result of the ID must be different. Through the above argumentation, when set_A and set_B have the same ID, the encryption result of the ID in set_A after being encrypted by the above encryption method is equal to the encryption interface of the ID in set_B after being encrypted by the above encryption method. Therefore, in step 308a, the data party A can determine that set_A and set_B share IDs. In addition, the second exchange information carries the tags of each ID, and through the third exchange information, the identification of the bin where the shared ID is binned by the feature value of the feature Fbi (Fb1, feature Fb2, etc.) can be obtained. In step 310a, based on the information obtained in step 308a, the information value of each feature Fbi can be calculated using the formula shown in FIG. 3. Among them, label=1 indicates that the label is positive, and label=0 indicates that the label is negative. For any feature Fbi, Precall k represents the ratio of the number of positively labeled IDs in bin k to the total number of positively labeled samples in the common sample, and Nrecall k represents the negatively labeled IDs in bin k The ratio of the number of to the total number of samples with negative labels in the total sample. IV represents the value of information. In step 308b, the data party B can determine that set_A and set_B share IDs. In addition, the first exchange information carries the label of each ID and the identification of the sub-box where it is located. Therefore, in step 310b, the information value of each feature Fai can be calculated. The method provided by the embodiment of this specification can complete the secure calculation of the information value of the feature without divulging the data of the parties under the condition that the data of the parties are isolated. details as follows. In the process of information value calculation, data party A got data party B’s ID which was encrypted by keyB and the corresponding Fb feature box, but this data is sufficiently secret for data party A because: 1) The ID obtained by data party A is encrypted by keyB, and data party A cannot know the corresponding original ID behind it, and therefore cannot match the Fb binning result with the real ID; 2) binning information used when calculating the value of the information It is irrelevant to the order of binning, so the identification of the bin where the data party B transmits to the party A can be in disorder (it can be implemented when the order of the second encryption ID is disrupted), or the identification of the bin is just one Code, so that data party A cannot know the order of feature size corresponding to the bins; 3) Each sub-box of the feature contains K IDs, which is equivalent to that the information obtained by data party A about the features of data party B is anonymized by K , The information of any ID has at least K IDs that are the same. Data party A also got the result of the second encryption of data party A’s ID. This encrypted ID has been shuffled by B and does not carry any additional information that can be identified. Therefore, data party A only knows these IDs. They are all the results obtained after their own ID is encrypted, and there is a one-to-one correspondence, but the correspondence relationship is not clear. Data party A performs matching, intersections, and calculations after obtaining the two pieces of information. These operations are equivalent to being performed in an ID-encrypted space, and the corresponding relationship between this encrypted space and the original space is unknown (this mapping relationship must have two The two keys of the party's keyA and keyB are known), so the calculation is safe. Similarly, it can be seen that the data available to data party B is not sufficient for data party B to derive data information of data party A. Referring to FIG. 5, this embodiment of the specification provides a method for protecting privacy and security by multiple parties jointly performing feature evaluation. The multiple parties include at least a first device and a second device. The first device stores a first sample set and each sample therein. The second device stores a second sample set, and the method is applied to the first device. Referring to Figure 5, the method includes the following steps. Step 501: Use the first key to encrypt the initial ID of each sample in the first sample set to obtain the first encrypted ID of each sample in the first sample set. For details, please refer to the above description of step 302a in FIG. 3, which will not be repeated here. It should be understood that when describing 302a, it is described in conjunction with the remainder encryption algorithm. The remainder encryption algorithm has a small amount of calculation and high security, making it a better encryption algorithm. It should be understood that the remainder encryption algorithm is not the only encryption algorithm. As long as the encryption algorithm satisfies superimposability, interchangeability, and uniqueness, it can be used to encrypt the sample ID in step 302a and step 302b. In the embodiment of this specification, the data party A and the data party B may negotiate other encryption algorithms in advance. The encryption algorithm here can be any algorithm that encrypts the target data based on the same set of keys, and the sequence of using the keys does not affect the encryption result. In addition to the remainder encryption algorithm described in the embodiment shown in FIG. 3, the encryption algorithm here can also be any one of an exclusive OR (XOR) algorithm, a DH algorithm, an ECC-DH algorithm, and the like. Step 503: Send the first exchange information to the second device, which includes at least the first encrypted ID and tag of each sample in the first sample set. For details, please refer to the above description of step 302a in FIG. 3, and this step is repeated here. Step 505: Receive the second exchange information and the third exchange information from the second device, where the second exchange information includes: the second device uses a second key to pair each item in the first sample set The second encrypted ID and the corresponding label obtained after the first encrypted ID of the sample is encrypted twice, and the relative order of each sample in the second exchange information has been disturbed by the second device; the third The exchange information includes, for each sample in the second sample set, the first encrypted ID obtained by the second device encrypting its initial ID based on the second key and the identification of the first bin where the sample is located, The identification of the first binning is obtained by the second device performing binning based on the feature value of the first feature of each sample in the second sample set. For details, please refer to the above description of steps 302b, 304b, and 306b in FIG. 3, which will not be repeated here. Step 507: Use the first key to perform secondary encryption on the first encrypted ID of each sample in the third exchange information to obtain a first encrypted set. For details, please refer to the above description of step 304a in FIG. 3, which will not be repeated here. Step 509, based on the second encryption ID in the second exchange information and the second encryption ID in the first encryption set, determine the common samples of the first sample set and the second sample set. For details, please refer to the above description of step 308a in FIG. 3, which will not be repeated here. Step 511: Determine the information value of the first feature based on the label of each sample in the shared sample and the identification of the first bin in which it is located, so as to perform feature selection for the machine learning model. For details, please refer to the above description of step 310a in FIG. 3, which will not be repeated here. In some embodiments, the method further includes: before sending the first exchange information to the second device, dividing the first sample set into a plurality of second features based on the feature value of the second feature of each sample in the first sample set Two bins, and include the identification of the second bin where each sample in the first sample set is located in the first exchange information; after obtaining the first encrypted set, disturb the relative relationship of each sample in the second sample set Order to obtain the fourth exchange information; send the fourth exchange information to the second device so that the second device is based on the second encryption ID in the fourth exchange information and the first encryption set in the second encryption set The secondary encryption ID determines the shared sample, and determines the information value of the second feature based on the label of each sample in the shared sample and the identification of the second sub-box where the second encryption set uses the second key pair The first encrypted ID in the first exchange information is obtained by performing secondary encryption. For details, please refer to the above description of steps 302a, 306a, 308b, and 310b in FIG. 3, which will not be repeated here. In an example of this embodiment, the dividing the first sample set into a plurality of second bins based on the feature values of the second features of each sample in the first sample set includes: according to equal frequency bins, equal distances Any one of binning and chi-square binning, dividing the first sample set into the plurality of second bins. In some embodiments, the initial ID of each sample in the first sample set and the initial ID of each sample in the second sample set are both positive integers; the first key is used to encrypt the initial ID of each sample in the first sample set. Previously, the method further includes: determining a first prime number greater than the largest initial ID in the initial ID of each sample in the first sample set, and greater than the largest initial ID in the initial ID of each sample in the second sample set; determining and the first prime number The first positive integer that is relatively prime is the first key. For details, please refer to the above description of step 300a and step 300b in FIG. 3, which will not be repeated here. In some embodiments, using the first key to encrypt the initial ID of each sample in the first sample set to obtain the first encrypted ID of each sample in the first sample set includes: For a sample, it is determined that the remainder of the product of the initial ID of the sample and the first key divided by the first prime number is the first encrypted ID of the sample. For details, please refer to the above description of step 302 in FIG. 3, which will not be repeated here. In some embodiments, the first sample set includes a plurality of samples with positive labels and a plurality of samples with negative labels; the determination is made based on the label of each sample in the common sample and the identification of the first bin where it is located. The information value of the first feature includes: determining the number of samples in the common sample that fall into the first bin with the first identification and the label is positive, relative to the first ratio of the total number of samples in the common sample with positive labels ; Determine the number of samples in the shared sample that fall into the first bin with the first identification and the label is negative, relative to the second proportion of the total number of samples with negative labels in the shared sample; based on each identification The first ratio and the second ratio corresponding to the first bin respectively determine the information value of the first feature of the shared sample. For details, please refer to the above description of step 310a in FIG. 3, which will not be repeated here. In some embodiments, the samples in the first sample set include user samples, and the machine learning model is a user classification model; or, the samples in the first sample set include business samples, and the machine learning The model is a business processing model. The method provided by the embodiment of the present specification can calculate the information value of the characteristics of the users shared by both parties under the circumstances that the two parties do not know the other user and the tag and the feature data are isolated, and have high security. Referring to Fig. 6, an embodiment of this specification provides a method for protecting privacy and security by multiple parties jointly performing feature evaluation. The multiple parties at least include a first device and a second device. The first device stores a first sample set and The label of each sample, the second device stores a second sample set, and the method is applied to the second device. As shown in Figure 6, the method includes the following steps. Step 601: Receive first exchange information from a first device, which includes at least the first encrypted ID obtained by encrypting the initial ID of each sample in the first sample set by the first device using the first key And the corresponding label. For details, please refer to the above description of step 302a in FIG. 3, which will not be repeated here. Step 603: Use the second key to perform secondary encryption on the first encrypted ID of each sample in the first exchange information to obtain a second encrypted set, and then disturb the relative order of each sample in the second encrypted set . For details, please refer to the above description of steps 304b and 306b in FIG. 3, which will not be repeated here. Step 605: Send second exchange information to the first device, where the second exchange information includes the second encrypted ID and tag of each sample in the first sample set whose relative order has been disturbed. For details, please refer to the above description of step 306b in FIG. 3, which will not be repeated here. Step 607: Use the second key to encrypt the initial ID of each sample in the second sample set to obtain the first encrypted ID in the second sample set. For details, please refer to the above description of step 302b in FIG. 3, which will not be repeated here. Step 609: Based on the feature value of the first feature of each sample in the second sample set, divide the second sample set into a plurality of first bins. For details, please refer to the above description of step 302b in FIG. 3, which will not be repeated here. Step 611: Send third exchange information to the first device. The third exchange information includes the first encrypted ID of each sample in the second sample set and the identification of the first bin where it is located, so that the first device can use it. The first key encrypts the first encrypted ID in the third exchange information to obtain the first encrypted set, which is based on the second encrypted ID in the first encrypted set and the second encrypted ID in the second exchange information Encrypt the ID, determine the common sample of the first sample set and the second sample set, and determine the information value of the first feature based on the label of each sample in the common sample and the identification of the first bin where it is located, which is used to target the machine The learning model performs feature selection. For details, please refer to the above description of step 302b in FIG. 3, which will not be repeated here. In some embodiments, the first exchange information further includes the identification of the second bin of each sample in the first sample set, and the identification of the second bin is determined by the first device based on the first sample set. The feature value of the second feature of each sample is obtained by binning; the method further includes: receiving fourth exchange information from the first device, where the fourth exchange information includes the second encryption of each sample in the second sample set ID, and the relative order of each sample in the fourth exchange information has been disturbed by the first device; the second encryption ID based on the second encryption set and the second encryption in the fourth exchange information ID, to determine the common sample of the first sample set and the second sample set; based on the label of each sample in the common sample and the identification of the second bin where it is located, the information value of the second feature is determined, which is used for the machine learning model Perform feature selection. For details, please refer to the above description of steps 302a, 304a, 306a, 308b, and 310b in FIG. 3, which will not be repeated here. The method provided by the embodiment of the present specification can calculate the information value of the characteristics of the users shared by both parties under the circumstances that the two parties do not know the other user and the tag and the feature data are isolated, and have high security. Referring to FIG. 7, an embodiment of this specification provides a privacy protection device 700 for joint feature evaluation by multiple parties. The multiple parties at least include a first device and a second device. The first device stores a first sample set and each of them. The label of the sample, the second device stores a second sample set, and the device is configured in the first device. As shown in FIG. 7, the device 700 includes: a first encryption unit 710, configured to use a first key to encrypt the initial ID of each sample in the first sample set to obtain the first sample of each sample in the first sample set. Secondary encryption ID; a first sending unit 720, configured to send first exchange information to the second device, which includes at least the first encryption ID and tag of each sample in the first sample set; first receiving unit 730 , Used to receive the second exchange information and the third exchange information from the second device, wherein the second exchange information includes: the second device uses a second key to pair each of the first sample sets The second encrypted ID and the corresponding label obtained after the first encrypted ID of the sample is encrypted twice, and the relative order of each sample in the second exchange information has been disturbed by the second device; the third The exchange information includes, for each sample in the second sample set, the first encrypted ID obtained by the second device encrypting its initial ID based on the second key and the identification of the first bin where the sample is located, The identification of the first binning is obtained by the second device performing binning based on the characteristic value of the first feature of each sample in the second sample set; The first encrypted ID of each sample in the third exchange information is encrypted twice to obtain the second encrypted ID of each sample in the second sample set; the first determining unit 750 is configured to be based on each sample in the first sample set The second encryption ID of each sample in the second sample set and the second encryption ID of each sample in the second sample set are used to determine the common samples of the first sample set and the second sample set; the second determining unit 760 is used to determine the common samples of each sample in the common sample set. The label and the identification of the first bin in which it is located are used to determine the information value of the first feature for feature selection for the machine learning model. The functions of each functional unit of the device 700 can be implemented with reference to the method embodiment shown in FIG. 5, and details are not described herein again. The device provided by the embodiment of the present specification can calculate the information value of the characteristics of the users shared by both parties when the other users are unknown and the tag and the characteristic data are isolated, and the security is high. Referring to Fig. 8, an embodiment of this specification provides an apparatus for protecting privacy and security by multiple parties jointly performing feature evaluation. The multiple parties include at least a first device and a second device. The first device stores a first sample set and The label of each sample, the second device stores a second sample set, and the device is configured in the second device; the device includes: a second receiving unit 810, configured to receive the first exchange information from the first device, wherein It includes at least the first encrypted ID and the corresponding label obtained by the first device using the first key to encrypt the initial ID of each sample in the first sample set; the third encryption unit 820 is used for The second key performs secondary encryption on the first encrypted ID of each sample in the first exchange information to obtain a second encrypted set, and then disturbs the relative order of the samples in the second encrypted set; second sending The unit 830 is configured to send second exchange information to the first device, where the second exchange information includes the second encryption ID and tag of each sample in the first sample set whose relative order has been disturbed; a fourth encryption unit 840 , Used to encrypt the initial ID of each sample in the second sample set using the second key to obtain the first encrypted ID in the second sample set; the second binning unit 850 is used to encrypt the first ID of each sample in the second sample set A feature value of a feature divides the second sample set into a plurality of first bins; the second sending unit 830 is further configured to send third exchange information to the first device, and the third exchange information includes the second sample set The first encrypted ID of each sample and the identification of the first sub-box where the sample is located, so that the first device uses the first key to perform secondary encryption on the first encrypted ID in the third exchange information to obtain the first encrypted set , And based on the second encryption ID in the first encryption set and the second encryption ID of each sample in the second exchange information, determine the common samples of the first sample set and the second sample set, and based on the common The label of each sample in the sample and the identification of the first bin in the sample determine the information value of the first feature, which is used for feature selection for the machine learning model. The functions of each functional unit of the device 800 can be implemented with reference to the method embodiment shown in FIG. 6, and details are not described herein again. The device provided by the embodiment of the present specification can calculate the information value of the characteristics of the users shared by both parties when the other users are unknown and the tag and the characteristic data are isolated, and the security is high. On the other hand, the embodiments of this specification provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed in the computer, the computer is caused to execute the method shown in FIG. 5 or the method shown in FIG. 6 Methods. On the other hand, the embodiment of this specification provides a computing terminal, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the implementation shown in FIG. 5的 method or the method shown in Figure 6. Those skilled in the art should be aware that in one or more of the above examples, the functions described in this specification can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

300a,300b,302a,302b,304a,304b,306a,306b,308a,308b,310a,310b,501,503,505,507,509,511:步驟 700,800:裝置 710:第一加密單元 720:第一發送單元 730:第一接收單元 740:第二加密單元 750:第一判定單元 760:第二判定單元 810:第二接收單元 820:第三加密單元 830:第二發送單元 840:第四加密單元 850:第二分箱單元300a, 300b, 302a, 302b, 304a, 304b, 306a, 306b, 308a, 308b, 310a, 310b, 501, 503, 505, 507, 509, 511: steps 700,800: device 710: The first encryption unit 720: first sending unit 730: first receiving unit 740: The second encryption unit 750: The first determination unit 760: Second Judgment Unit 810: second receiving unit 820: third encryption unit 830: second sending unit 840: the fourth encryption unit 850: second binning unit

為了更清楚地說明本說明書實施例的技術方案,下面將對實施例描述中所需要使用的圖式作簡單地介紹,顯而易見地,下面描述中的圖式僅僅是本說明書的一些實施例,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些圖式獲得其它的圖式。 [圖1A]示出根據一個實施例的資料方A的資料示意圖; [圖1B]示出根據一個實施例的資料方B的資料示意圖; [圖2]示出根據一個實施例的聯合計算特徵的資訊價值的流程圖; [圖3]示出根據一個實施例的保護隱私安全的多方聯合進行特徵評估的方法的流程圖; [圖4]示出根據一個實施例的對ID進行加密的流程圖; [圖5]示出根據一個實施例的保護隱私安全的多方聯合進行特徵評估的方法的流程圖; [圖6]示出根據一個實施例的保護隱私安全的多方聯合進行特徵評估的方法的流程圖; [圖7]示出根據一個實施例的保護隱私安全的多方聯合進行特徵評估的裝置的示意性方塊圖; [圖8]示出根據一個實施例的保護隱私安全的多方聯合進行特徵評估的裝置的示意性方塊圖。In order to explain the technical solutions of the embodiments of this specification more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the specification. For those of ordinary skill in the art, other schemas can be obtained based on these schemas without creative work. [Fig. 1A] A schematic diagram showing the data of data party A according to an embodiment; [Fig. 1B] A schematic diagram showing the data of data party B according to an embodiment; [FIG. 2] A flow chart showing the information value of a joint calculation feature according to an embodiment; [Fig. 3] A flowchart showing a method for multi-party joint feature evaluation to protect privacy and security according to an embodiment; [Fig. 4] shows a flowchart of encrypting ID according to an embodiment; [Fig. 5] A flowchart showing a method for multi-party joint feature evaluation to protect privacy and security according to an embodiment; [Fig. 6] A flowchart showing a method for multi-party joint feature evaluation to protect privacy and security according to an embodiment; [Fig. 7] A schematic block diagram showing an apparatus for multi-party joint feature evaluation to protect privacy and security according to an embodiment; [Fig. 8] A schematic block diagram showing an apparatus for multi-party joint feature evaluation to protect privacy and security according to an embodiment.

Claims (20)

一種保護隱私安全的多方聯合進行評估特徵和標籤相關性的方法,所述多方至少包括第一設備和第二設備,第一設備儲存有第一樣本集和其中各樣本的標籤,第二設備儲存有第二樣本集,所述方法應用於第一設備;所述方法包括:使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID;向所述第二設備發送第一交換資訊,其中至少包括,第一樣本集中每個樣本的第一次加密ID和標籤;從所述第二設備分別接收第二交換資訊和第三交換資訊,其中,所述第二交換資訊包括,由所述第二設備使用第二金鑰對第一樣本集中每個樣本的第一次加密ID進行二次加密後得到的第二次加密ID和對應的標籤,且所述第二交換資訊中各樣本的相對順序已由所述第二設備擾亂;所述第三交換資訊包括,針對第二樣本集中每一個樣本,由所述第二設備基於所述第二金鑰對其初始ID進行加密得到的第一次加密ID和該樣本所在第一分箱的標識,所述第一分箱的標識由所述第二設備基於第二樣本集中各樣本的第一特徵的特徵值進行分箱得到;使用所述第一金鑰,對所述第三交換資訊中各樣本的第一次加密ID進行二次加密,得到第一加密集合;基於第二交換資訊中的第二次加密ID和第一加密集合中的第二次加密ID,判定第一樣本集和第二樣本集的共有 樣本;基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用以針對機器學習模型進行特徵選擇。 A privacy protection method for multiple parties to jointly evaluate the correlation of features and tags. The multiple parties at least include a first device and a second device. The first device stores a first sample set and the label of each sample therein, and the second device A second sample set is stored, and the method is applied to the first device; the method includes: using a first key to encrypt the initial ID of each sample in the first sample set to obtain the first sample of each sample in the first sample set One-time encryption ID; sending first exchange information to the second device, including at least the first-time encryption ID and tag of each sample in the first sample set; receiving second exchange information from the second device And the third exchange information, wherein the second exchange information includes the first encrypted ID of each sample in the first sample set obtained by the second device using the second key. The ID and the corresponding label are encrypted twice, and the relative order of each sample in the second exchange information has been disturbed by the second device; the third exchange information includes, for each sample in the second sample set, The first encrypted ID obtained by the second device encrypting its initial ID based on the second key and the identification of the first bin where the sample is located, the identification of the first bin is based on the second device The feature value of the first feature of each sample in the second sample set is obtained by binning; using the first key, the first encrypted ID of each sample in the third exchange information is encrypted twice to obtain the first Encryption set; based on the second encryption ID in the second exchange information and the second encryption ID in the first encryption set, determine the commonality of the first sample set and the second sample set Sample: Based on the label of each sample in the shared sample and the identification of the first bin where it is located, the information value of the first feature is determined for feature selection for the machine learning model. 如請求項1所述的方法,其中,所述方法還包括:在向第二設備發送第一交換資訊之前,基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱,並將第一樣本集中每一個樣本所在第二分箱的標識包括在所述第一交換資訊中;在得到所述第一加密集合之後,擾亂第二樣本集中各樣本的相對順序,得到第四交換資訊;向所述第二設備發送所述第四交換資訊,以便所述第二設備基於所述第四交換資訊中的第二次加密ID和第二加密集合中的第二次加密ID判定共有樣本,並基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,其中第二加密集合是使用所述第二金鑰對所述第一交換資訊中的第一次加密ID進行二次加密得到的。 The method according to claim 1, wherein the method further comprises: before sending the first exchange information to the second device, based on the characteristic value of the second characteristic of each sample in the first sample set, dividing the first sample The set is divided into multiple second bins, and the identifier of the second bin where each sample in the first sample set is located is included in the first exchange information; after the first encrypted set is obtained, the second sample is disturbed Collect the relative order of each sample to obtain the fourth exchange information; send the fourth exchange information to the second device so that the second device encrypts the ID and the second time based on the second encryption ID in the fourth exchange information The second encrypted ID in the encrypted set determines the shared sample, and determines the information value of the second feature based on the label of each sample in the shared sample and the identification of the second sub-box where the second encrypted set uses the The second key is obtained by re-encrypting the first encrypted ID in the first exchange information. 如請求項2所述的方法,其中,所述基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱包括:根據等頻分箱、等距分箱、卡方分箱中任一項,將第一樣本集分成所述多個第二分箱。 The method according to claim 2, wherein the dividing the first sample set into a plurality of second bins based on the feature values of the second features of the samples in the first sample set includes: according to equal frequency bins, Any one of equidistant binning and chi-square binning, dividing the first sample set into the plurality of second bins. 如請求項1所述的方法,其中,第一樣本集中各樣本的初始ID和第二樣本集中各樣本的初始ID均為正整數;在使用第一金鑰對第一樣本集中各樣本的初始ID進行加密之前,所述方法還包括:判定大於第一樣本集中各樣本的初始ID中最大初始ID,且大於第二樣本集中各樣本的初始ID中最大初始ID的第一質數;判定與第一質數互質的第一正整數為所述第一金鑰。 The method according to claim 1, wherein the initial ID of each sample in the first sample set and the initial ID of each sample in the second sample set are both positive integers; the first key is used to pair each sample in the first sample set. Before the initial ID is encrypted, the method further includes: determining a first prime number greater than the largest initial ID among the initial IDs of each sample in the first sample set and greater than the largest initial ID among the initial IDs of each sample in the second sample set; The first positive integer determined to be relatively prime to the first prime number is the first key. 如請求項4所述的方法,其中,所述使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID包括:對於第一樣本集中每一個樣本,判定該樣本初始ID和所述第一金鑰的乘積除以所述第一質數的餘數為該樣本的第一次加密ID。 The method according to claim 4, wherein the using the first key to encrypt the initial ID of each sample in the first sample set to obtain the first encrypted ID of each sample in the first sample set includes: For each sample in the same set, the remainder of the product of the initial ID of the sample and the first key divided by the first prime number is determined as the first encrypted ID of the sample. 如請求項1所述的方法,其中,第一樣本集包括標籤為正的多個樣本和標籤為負的多個樣本;所述基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值包括:判定共有樣本中落入具有第一標識的第一分箱中且標籤為正的樣本個數,相對於共有樣本中標籤為正的樣本總個數的第一比例;判定共有樣本中落入所述具有第一標識的第一分箱中且標籤為負的樣本個數,相對於共有樣本中標籤為負的樣本總個數的第二比例; 基於各個標識的第一分箱分別對應的所述第一比例,和所述第二比例,判定共有樣本的第一特徵的資訊價值。 The method according to claim 1, wherein the first sample set includes multiple samples with positive labels and multiple samples with negative labels; Identification, determining the information value of the first feature includes: determining the number of samples in the shared sample that fall into the first bin with the first identification and the label is positive, relative to the total number of samples in the shared sample that have a positive label The first proportion of the number of samples; the second proportion of the total number of samples that fall into the first bin with the first identification and the label is negative among the common samples, relative to the total number of the negative labels in the common samples ; Based on the first ratio and the second ratio respectively corresponding to the first bins of each identifier, the information value of the first feature of the shared sample is determined. 如請求項1所述的方法,其中,所述第一樣本集中的樣本包括使用者樣本,所述機器學習模型為使用者分類模型;或者,所述第一樣本集中的樣本包括業務樣本,所述機器學習模型為業務處理模型。 The method according to claim 1, wherein the samples in the first sample set include user samples, and the machine learning model is a user classification model; or, the samples in the first sample set include business samples , The machine learning model is a business processing model. 一種保護隱私安全的多方聯合進行評估特徵和標籤相關性的方法,所述多方至少包括第一設備和第二設備,所述第一設備儲存有第一樣本集和其中各樣本的標籤,所述第二設備儲存有第二樣本集,所述方法應用於第二設備;所述方法包括:從第一設備接收第一交換資訊,其中至少包括,由所述第一設備使用第一金鑰對第一樣本集中每個樣本的初始ID進行加密後得到的第一次加密ID和對應的標籤;使用第二金鑰,對所述第一交換資訊中各樣本的第一次加密ID進行二次加密,得到第二加密集合,然後擾亂所述第二加密集合中各樣本的相對順序;向所述第一設備發送第二交換資訊,所述第二交換資訊包括已擾亂相對順序的第一樣本集中各樣本的第二次加密ID和標籤;使用第二金鑰對第二樣本集中各個樣本的初始ID進行加密,得到第二樣本集中第一次加密ID;基於第二樣本集中各樣本的第一特徵的特徵值,將第 二樣本集分成多個第一分箱;向所述第一設備發送第三交換資訊,所述第三交換資訊包括第二樣本集中各樣本的第一次加密ID和所在第一分箱的標識,以便所述第一設備使用第一金鑰對第三交換資訊中的第一次加密ID進行二次加密,得到第一加密集合,並基於第一加密集合中的第二次加密ID和所述第二交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本,以及基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 A privacy protection method for multiple parties to jointly evaluate the correlation of features and labels. The multiple parties include at least a first device and a second device. The first device stores a first sample set and a label of each sample therein. The second device stores a second sample set, and the method is applied to the second device; the method includes: receiving the first exchange information from the first device, which includes at least the use of the first key by the first device The first encrypted ID and the corresponding label obtained by encrypting the initial ID of each sample in the first sample set; using the second key to perform the first encrypted ID of each sample in the first exchange information Encrypt the second time to obtain a second encrypted set, and then disturb the relative order of each sample in the second encrypted set; send second exchange information to the first device, and the second exchange information includes the first exchange information that has disturbed the relative order Same as the second encrypted ID and label of each sample in this set; use the second key to encrypt the initial ID of each sample in the second sample set to obtain the first encrypted ID in the second sample set; based on each sample in the second sample set The eigenvalue of the first feature of the sample, the The second sample set is divided into a plurality of first bins; the third exchange information is sent to the first device, and the third exchange information includes the first encryption ID of each sample in the second sample set and the identification of the first bin where it is located , So that the first device uses the first key to re-encrypt the first encrypted ID in the third exchange information to obtain the first encrypted set, which is based on the second encrypted ID in the first encrypted set and all According to the second encrypted ID in the second exchange information, the common samples of the first sample set and the second sample set are determined, and based on the label of each sample in the common sample and the identification of the first bin where it is located, it is determined that the second The information value of a feature is used for feature selection for the machine learning model. 如請求項8所述的方法,其中,所述第一交換資訊還包括第一樣本集中每一個樣本所在第二分箱的標識,所述第二分箱的標識由所述第一設備基於第一樣本集中各樣本的第二特徵的特徵值進行分箱得到;所述方法還包括:從所述第一設備接收第四交換資訊,所述第四交換資訊包括第二樣本集中各樣本的第二次加密ID,且所述第四交換資訊中各樣本的相對順序已由所述第一設備擾亂;基於所述第二加密集合的第二次加密ID和所述第四交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本;基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 The method according to claim 8, wherein the first exchange information further includes the identification of the second sub-box where each sample in the first sample set is located, and the identification of the second sub-box is determined by the first device based on The feature value of the second feature of each sample in the first sample set is obtained by binning; the method further includes: receiving fourth exchange information from the first device, where the fourth exchange information includes each sample in the second sample set The second encryption ID of the second encryption set, and the relative order of the samples in the fourth exchange information has been disrupted by the first device; the second encryption ID based on the second encryption set and the fourth exchange information The ID is encrypted for the second time to determine the common sample of the first sample set and the second sample set; the information value of the second feature is determined based on the label of each sample in the common sample and the identification of the second bin where it is located. For feature selection for machine learning models. 一種保護隱私安全的多方聯合進行評估特徵和標籤相關性的裝置,所述多方至少包括第一設備和第二設備,第一設備儲存有第一樣本集和其中各樣本的標籤,第二設備儲存有第二樣本集,所述裝置配置於第一設備;所述裝置包括:第一加密單元,用於使用第一金鑰對第一樣本集中各樣本的初始ID進行加密,得到第一樣本集中各樣本的第一次加密ID;第一發送單元,用於向所述第二設備發送第一交換資訊,其中至少包括,第一樣本集中每個樣本的第一次加密ID和標籤;第一接收單元,用於從所述第二設備分別接收第二交換資訊和第三交換資訊,其中,所述第二交換資訊包括,由所述第二設備使用第二金鑰對第一樣本集中每個樣本的第一次加密ID進行二次加密後得到的第二次加密ID和對應的標籤,且所述第二交換資訊中各樣本的相對順序已由所述第二設備擾亂;所述第三交換資訊包括,針對第二樣本集中每一個樣本,由所述第二設備基於所述第二金鑰對其初始ID進行加密得到的第一次加密ID和該樣本所在第一分箱的標識,所述第一分箱的標識由所述第二設備基於第二樣本集中各樣本的第一特徵的特徵值進行分箱得到;第二加密單元,用於使用所述第一金鑰,對所述第三交換資訊中各樣本的第一次加密ID進行二次加密,得到第一加密集合; 第一判定單元,用於基於第二交換資訊中的第二次加密ID和第一加密集合中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本;第二判定單元,用於基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用以針對機器學習模型進行特徵選擇。 A privacy protection device for multiple parties to jointly evaluate the correlation of features and labels. The multiple parties include at least a first device and a second device. The first device stores a first sample set and the label of each sample therein, and the second device A second sample set is stored, and the device is configured in the first device; the device includes: a first encryption unit for encrypting the initial ID of each sample in the first sample set by using the first key to obtain the first The first encrypted ID of each sample in the sample set; the first sending unit is used to send the first exchange information to the second device, which includes at least the first encrypted ID of each sample in the first sample set and Label; a first receiving unit for receiving the second exchange information and the third exchange information from the second device, wherein the second exchange information includes, the second device uses a second key to the first The second encrypted ID and the corresponding label obtained after the first encrypted ID of each sample in the same set are encrypted twice, and the relative order of each sample in the second exchange information has been determined by the second device Disruption; The third exchange information includes, for each sample in the second sample set, the first encrypted ID obtained by the second device encrypting its initial ID based on the second key and where the sample is located The identification of a bin, the identification of the first bin is obtained by the second device based on the feature value of the first feature of each sample in the second sample set; the second encryption unit is used to use the first feature A key for re-encrypting the first encrypted ID of each sample in the third exchange information to obtain the first encrypted set; The first determination unit is used to determine the common sample of the first sample set and the second sample set based on the second encryption ID in the second exchange information and the second encryption ID in the first encryption set; The unit is used to determine the information value of the first feature based on the label of each sample in the shared sample and the identification of the first bin in which it is located, so as to perform feature selection for the machine learning model. 如請求項10所述的裝置,其中,所述裝置還包括:第一分箱單元和第一擾亂單元;所述第一分箱單元用於,在向第二設備發送第一交換資訊之前,基於第一樣本集中各樣本的第二特徵的特徵值,將第一樣本集分成多個第二分箱,其中,並將第一樣本集中每一個樣本所在第二分箱的標識包括在所述第一交換資訊中;所述第一擾亂單元用於,在得到所述第一加密集合之後,擾亂第二樣本集中各樣本的相對順序,得到第四交換資訊;所述第一發送單元還用於,向所述第二設備發送所述第四交換資訊,以便所述第二設備基於所述第四交換資訊中的第二次加密ID和第二加密集合中的第二次加密ID判定共有樣本,並基於共有樣本中各樣本的標籤、所在第二分箱的標識,判定所述第二特徵的資訊價值,其中第二加密集合是使用所述第二金鑰對所述第一交換資訊中的第一次加密ID進行二次加密得到的。 The device according to claim 10, wherein the device further includes: a first binning unit and a first scrambling unit; the first binning unit is configured to, before sending the first exchange information to the second device, Based on the feature value of the second feature of each sample in the first sample set, the first sample set is divided into a plurality of second bins, where the identification of the second bin where each sample in the first sample set is located includes In the first exchange information; the first scrambling unit is configured to, after obtaining the first encrypted set, disturb the relative order of the samples in the second sample set to obtain the fourth exchange information; the first sending The unit is further configured to send the fourth exchange information to the second device, so that the second device is based on the second encryption ID in the fourth exchange information and the second encryption in the second encryption set ID determines the shared sample, and determines the information value of the second feature based on the label of each sample in the shared sample and the identification of the second sub-box where the second feature is located. The second encrypted set uses the second key to pair the first The first encrypted ID in an exchange of information is obtained by performing secondary encryption. 如請求項11所述的裝置,其中,所述第 一分箱單元用於根據等頻分箱、等距分箱、卡方分箱中任一項,將第一樣本集分成所述多個第二分箱。 The device according to claim 11, wherein the first A binning unit is used for dividing the first sample set into the plurality of second bins according to any one of equal-frequency binning, equal-distance binning, and chi-square binning. 如請求項10所述的裝置,其中,第一樣本集中各樣本的初始ID和第二樣本集中各樣本的初始ID均為正整數;所述裝置還包括:第三判定單元和第四判定單元;所述第三判定單元用於,判定大於第一樣本集中各樣本的初始ID中最大初始ID,且大於第二樣本集中各樣本的初始ID中最大初始ID的第一質數;所述第四判定單元用於,判定與第一質數互質的第一正整數為所述第一金鑰。 The device according to claim 10, wherein the initial ID of each sample in the first sample set and the initial ID of each sample in the second sample set are both positive integers; the device further includes: a third determination unit and a fourth determination Unit; the third determining unit is used to determine the first prime number greater than the largest initial ID in the initial ID of each sample in the first sample set, and greater than the largest initial ID in the initial ID of each sample in the second sample set; the The fourth determining unit is configured to determine that a first positive integer that is relatively prime to the first prime number is the first key. 如請求項13所述的裝置,其中,所述第一加密單元還用於,對於第一樣本集中每一個樣本,判定該樣本初始ID和所述第一金鑰的乘積除以所述第一質數的餘數為該樣本的第一次加密ID。 The device according to claim 13, wherein the first encryption unit is further configured to, for each sample in the first sample set, determine the product of the initial ID of the sample and the first key divided by the first The remainder of a prime number is the first encryption ID of the sample. 如請求項10所述的裝置,其中,所述第二判定單元還用於,判定共有樣本中落入具有第一標識的第一分箱中且標籤為正的樣本個數,相對於共有樣本中標籤為正的樣本總個數的第一比例;所述第二判定單元還用於,判定共有樣本中落入所述具有第一標識的第一分箱中且標籤為負的樣本個數,相對於共有樣本中標籤為負的樣本總個數的第二比例;所述第二判定單元還用於,基於各個標識的第一分箱分別對應的所述第一比例,和所述第二比例,判定共有樣 本的第一特徵的資訊價值。 The device according to claim 10, wherein the second determining unit is further configured to determine the number of samples that fall into the first bin with the first identification and the label is positive among the common samples, relative to the common sample The first ratio of the total number of samples with positive labels in the middle; the second determining unit is further configured to determine the number of samples that fall into the first bin with the first identification and the labels are negative among the common samples , Relative to the second ratio of the total number of samples with negative labels in the shared sample; the second determining unit is further configured to, based on the first ratios corresponding to the first bins of the respective identifiers, and the first Two proportions, determine the common sample The information value of the first feature of this book. 如請求項10所述的裝置,其中,所述第一樣本集中的樣本包括使用者樣本,所述機器學習模型為使用者分類模型;或者,所述第一樣本集中的樣本包括業務樣本,所述機器學習模型為業務處理模型。 The device according to claim 10, wherein the samples in the first sample set include user samples, and the machine learning model is a user classification model; or, the samples in the first sample set include business samples , The machine learning model is a business processing model. 一種保護隱私安全的多方聯合進行評估特徵和標籤相關性的裝置,所述多方至少包括第一設備和第二設備,所述第一設備儲存有第一樣本集和其中各樣本的標籤,所述第二設備儲存有第二樣本集,所述裝置配置於第二設備;所述裝置包括:第二接收單元,用於從第一設備接收第一交換資訊,其中至少包括,由所述第一設備使用第一金鑰對第一樣本集中每個樣本的初始ID進行加密後得到的第一次加密ID和對應的標籤;第三加密單元,用於使用第二金鑰,對所述第一交換資訊中各樣本的第一次加密ID進行二次加密,得到第二加密集合,然後擾亂第一樣本集中各樣本的相對順序;第二發送單元,用於向所述第一設備發送第二交換資訊,所述第二交換資訊包括已擾亂相對順序的第一樣本集中各樣本的第二次加密ID和標籤;第四加密單元,用於使用第二金鑰對第二樣本集中各個樣本的初始ID進行加密,得到第二樣本集中第一次加密ID; 第二分箱單元,用於基於第二樣本集中各樣本的第一特徵的特徵值,將第二樣本集分成多個第一分箱;第二發送單元還用於向所述第一設備發送第三交換資訊,所述第三交換資訊包括第二樣本集中各樣本的第一次加密ID和所在第一分箱的標識,以便所述第一設備使用第一金鑰對第三交換資訊中的第一次加密ID進行二次加密,得到第一加密集合,並基於第一加密集合中的第二次加密ID和所述第二交換資訊中的各樣本的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本,以及基於共有樣本中各樣本的標籤、所在第一分箱的標識,判定所述第一特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 A privacy protection device for multiple parties to jointly evaluate the correlation of features and labels. The multiple parties include at least a first device and a second device. The first device stores a first sample set and the label of each sample therein. The second device stores a second sample set, and the device is configured in the second device; the device includes: a second receiving unit for receiving the first exchange information from the first device, which includes at least: A device uses the first key to encrypt the initial ID of each sample in the first sample set and obtains the first encrypted ID and the corresponding label; the third encryption unit is used to use the second key to The first encrypted ID of each sample in the first exchange information is encrypted twice to obtain a second encrypted set, and then the relative order of each sample in the first sample set is disturbed; the second sending unit is used to send the first device Send the second exchange information, the second exchange information includes the second encryption ID and label of each sample in the first sample set whose relative order has been disturbed; the fourth encryption unit is used to use the second key to pair the second sample The initial ID of each sample is encrypted to obtain the first encrypted ID in the second sample set; The second binning unit is used to divide the second sample set into a plurality of first bins based on the feature value of the first feature of each sample in the second sample set; the second sending unit is also used to send to the first device The third exchange information, the third exchange information includes the first encrypted ID of each sample in the second sample set and the identification of the first bin where it is located, so that the first device uses the first key to pair the third exchange information The first encrypted ID is encrypted twice to obtain the first encrypted set, and based on the second encrypted ID in the first encrypted set and the second encrypted ID of each sample in the second exchange information, the second encrypted ID is determined The common samples of the same set and the second sample set, and based on the label of each sample in the common sample and the identification of the first bin in which it is located, determine the information value of the first feature, which is used for feature selection for the machine learning model . 如請求項17所述的裝置,其中,所述第一交換資訊還包括第一樣本集中每一個樣本所在第二分箱的標識,所述第二分箱的標識由所述第一設備基於第一樣本集中各樣本的第二特徵的特徵值進行分箱得到;所述裝置還包括:第五單元和第六單元;所述第二接收單元用於,從所述第一設備接收第四交換資訊,所述第四交換資訊包括第二樣本集中各樣本的第二次加密ID,且所述第四交換資訊中各樣本的相對順序已由所述第一設備擾亂;所述第五單元用於,基於所述第二加密集合的第二次加密ID和所述第四交換資訊中的第二次加密ID,判定第一樣本集和第二樣本集的共有樣本;所述第六單元用於,基於共有樣本中各樣本的標籤、 所在第二分箱的標識,判定第二特徵的資訊價值,用於針對機器學習模型進行特徵選擇。 The device according to claim 17, wherein the first exchange information further includes the identification of the second sub-box where each sample in the first sample set is located, and the identification of the second sub-box is determined by the first device based on The feature value of the second feature of each sample in the first sample set is obtained by binning; the device further includes: a fifth unit and a sixth unit; the second receiving unit is configured to receive the first device from the first device Four exchange information, the fourth exchange information includes the second encrypted ID of each sample in the second sample set, and the relative order of each sample in the fourth exchange information has been disturbed by the first device; the fifth The unit is used to determine the common samples of the first sample set and the second sample set based on the second encryption ID of the second encryption set and the second encryption ID in the fourth exchange information; The six units are used, based on the label of each sample in the common sample, The identification of the second bin in which it is located, determines the information value of the second feature, and is used for feature selection for the machine learning model. 一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行如請求項1-7中任一項所述的方法或如請求項8-9任一項所述的方法。 A computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is made to execute the method described in any one of claims 1-7 or any one of claims 8-9. The method described in one item. 一種計算裝置,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現如請求項1-7中任一項所述的方法或如請求項8-9任一項所述的方法。A computing device includes a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the method or the method described in any one of claim items 1-7 is implemented or The method described in any one of claims 8-9.
TW109115723A 2019-12-11 2020-05-12 Method and device for multi-party joint feature evaluation for protecting privacy and safety TWI738333B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911269227.5 2019-12-11
CN201911269227.5A CN110990857B (en) 2019-12-11 2019-12-11 Multi-party combined feature evaluation method and device for protecting privacy and safety

Publications (2)

Publication Number Publication Date
TW202123049A TW202123049A (en) 2021-06-16
TWI738333B true TWI738333B (en) 2021-09-01

Family

ID=70092518

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109115723A TWI738333B (en) 2019-12-11 2020-05-12 Method and device for multi-party joint feature evaluation for protecting privacy and safety

Country Status (3)

Country Link
CN (1) CN110990857B (en)
TW (1) TWI738333B (en)
WO (1) WO2021114927A1 (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990857B (en) * 2019-12-11 2021-04-06 支付宝(杭州)信息技术有限公司 Multi-party combined feature evaluation method and device for protecting privacy and safety
CN112667741B (en) * 2020-04-13 2022-07-08 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN111506485B (en) * 2020-04-15 2021-07-27 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN111242244B (en) * 2020-04-24 2020-09-18 支付宝(杭州)信息技术有限公司 Characteristic value sorting method, system and device
CN111695675B (en) * 2020-05-14 2024-05-07 平安科技(深圳)有限公司 Federal learning model training method and related equipment
CN111539535B (en) * 2020-06-05 2022-04-12 支付宝(杭州)信息技术有限公司 Joint feature binning method and device based on privacy protection
CN111401572B (en) * 2020-06-05 2020-08-21 支付宝(杭州)信息技术有限公司 Supervision characteristic box dividing method and device based on privacy protection
CN111539009B (en) * 2020-06-05 2023-05-23 支付宝(杭州)信息技术有限公司 Supervised feature binning method and device for protecting private data
CN113824546B (en) * 2020-06-19 2024-04-02 百度在线网络技术(北京)有限公司 Method and device for generating information
CN112231768B (en) * 2020-10-27 2021-06-18 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN112711765A (en) * 2020-12-30 2021-04-27 深圳前海微众银行股份有限公司 Sample characteristic information value determination method, terminal, device and storage medium
CN112597525B (en) * 2021-03-04 2021-05-28 支付宝(杭州)信息技术有限公司 Data processing method and device based on privacy protection and server
CN113362048B (en) * 2021-08-11 2021-11-30 腾讯科技(深圳)有限公司 Data label distribution determining method and device, computer equipment and storage medium
CN113807415A (en) * 2021-08-30 2021-12-17 中国再保险(集团)股份有限公司 Federal feature selection method and device, computer equipment and storage medium
CN113722738B (en) * 2021-09-02 2023-08-08 脸萌有限公司 Data protection method, device, medium and electronic equipment
CN113591133B (en) * 2021-09-27 2021-12-24 支付宝(杭州)信息技术有限公司 Method and device for performing feature processing based on differential privacy
CN114398671B (en) * 2021-12-30 2023-07-11 翼健(上海)信息科技有限公司 Privacy calculation method, system and readable storage medium based on feature engineering IV value
CN114386336B (en) * 2022-03-22 2022-07-15 成都飞机工业(集团)有限责任公司 Joint training method based on multi-party 3D printing database
CN114401079B (en) * 2022-03-25 2022-06-14 腾讯科技(深圳)有限公司 Multi-party united information value calculation method, related equipment and storage medium
CN114611008B (en) * 2022-05-09 2022-07-22 北京淇瑀信息科技有限公司 User service strategy determination method and device based on federal learning and electronic equipment
CN115081004B (en) * 2022-08-22 2022-11-04 北京瑞莱智慧科技有限公司 Data processing method, related device and storage medium
CN115659381B (en) * 2022-12-26 2023-03-10 北京数牍科技有限公司 Federal learning WOE encoding method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201602830A (en) * 2014-07-02 2016-01-16 柯呈翰 A method and system for adding dynamic labels to a file and encrypting the file
US20160294788A1 (en) * 2013-11-08 2016-10-06 Mustbin, Inc. Bin enabled data object encryption and storage apparatuses, methods and systems
TW201740305A (en) * 2016-05-06 2017-11-16 Alibaba Group Services Ltd Data encryption method, data decryption method, device and system capable of ensuring the security of the key distribution process and flexibly using different keys for data encryption
CN110032878A (en) * 2019-03-04 2019-07-19 阿里巴巴集团控股有限公司 A kind of safe Feature Engineering method and apparatus

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101340282B (en) * 2008-05-28 2011-05-11 北京易恒信认证科技有限公司 Generation method of composite public key
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN108256348B (en) * 2017-11-30 2021-08-20 深圳大学 Ciphertext search result verification method and system
CN108764273B (en) * 2018-04-09 2023-12-05 中国平安人寿保险股份有限公司 Data processing method, device, terminal equipment and storage medium
CN109325357B (en) * 2018-08-10 2021-12-14 深圳前海微众银行股份有限公司 RSA-based information value calculation method, device and readable storage medium
CN109636482B (en) * 2018-12-21 2021-07-27 南京星云数字技术有限公司 Data processing method and system based on similarity model
CN109492420B (en) * 2018-12-28 2021-07-20 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federal learning
CN109886417B (en) * 2019-03-01 2024-05-03 深圳前海微众银行股份有限公司 Model parameter training method, device, equipment and medium based on federal learning
CN109858566A (en) * 2019-03-01 2019-06-07 成都新希望金融信息有限公司 A method of it being added to the scorecard of mould dimension based on multilayered model building
CN110276210B (en) * 2019-06-12 2021-04-23 深圳前海微众银行股份有限公司 Method and device for determining model parameters based on federal learning
CN110309923B (en) * 2019-07-03 2024-04-26 深圳前海微众银行股份有限公司 Transverse federal learning method, device, equipment and computer storage medium
CN110378487B (en) * 2019-07-18 2021-02-26 深圳前海微众银行股份有限公司 Method, device, equipment and medium for verifying model parameters in horizontal federal learning
CN110751291B (en) * 2019-10-29 2021-02-12 支付宝(杭州)信息技术有限公司 Method and device for realizing multi-party combined training neural network of security defense
CN111104731B (en) * 2019-11-19 2023-09-15 北京集奥聚合科技有限公司 Graphical model full life cycle modeling method for federal learning
CN110990857B (en) * 2019-12-11 2021-04-06 支付宝(杭州)信息技术有限公司 Multi-party combined feature evaluation method and device for protecting privacy and safety

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160294788A1 (en) * 2013-11-08 2016-10-06 Mustbin, Inc. Bin enabled data object encryption and storage apparatuses, methods and systems
TW201602830A (en) * 2014-07-02 2016-01-16 柯呈翰 A method and system for adding dynamic labels to a file and encrypting the file
TW201740305A (en) * 2016-05-06 2017-11-16 Alibaba Group Services Ltd Data encryption method, data decryption method, device and system capable of ensuring the security of the key distribution process and flexibly using different keys for data encryption
CN110032878A (en) * 2019-03-04 2019-07-19 阿里巴巴集团控股有限公司 A kind of safe Feature Engineering method and apparatus

Also Published As

Publication number Publication date
WO2021114927A1 (en) 2021-06-17
CN110990857B (en) 2021-04-06
TW202123049A (en) 2021-06-16
CN110990857A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
TWI738333B (en) Method and device for multi-party joint feature evaluation for protecting privacy and safety
US10547444B2 (en) Cloud encryption key broker apparatuses, methods and systems
Wang et al. FastGeo: Efficient geometric range queries on encrypted spatial data
CN110086817B (en) Reliable user service system and method
CN111539009B (en) Supervised feature binning method and device for protecting private data
CN114175028B (en) Cryptographic pseudonym mapping method, computer system, computer program and computer-readable medium
US11741242B2 (en) Cryptographic pseudonym mapping method, computer system computer program and computer-readable medium
CN113672949A (en) Data transmission method and system for protecting advertisement multiparty privacy
Ahamed et al. SMS encryption and decryption using modified vigenere cipher algorithm
Suthanthiramani et al. Secured data storage and retrieval using elliptic curve cryptography in cloud.
Ahmad et al. A secure network communication protocol based on text to barcode encryption algorithm
Varshney et al. Big data privacy breach prevention strategies
CN114491637A (en) Data query method and device, computer equipment and storage medium
KR100995123B1 (en) Methods and apparatuses for cipher indexing in order to effective search of ciphered-database
CN113965310B (en) Method for realizing mixed privacy calculation processing based on label capable of being controlled to be de-identified
CN112800479B (en) Multi-party combined data processing method and device by using trusted third party
Song et al. Traceable and privacy-preserving non-interactive data sharing in mobile crowdsensing
Sumaryanti et al. Improvement security in e-business systems using hybrid algorithm
Suganya et al. Data Communication Using Cryptography Encryption
Sinha et al. Image encryption using modified rubik’s cube algorithm
CN114500006B (en) Query request processing method and device
Varma et al. Secure Outsourced Association Rule Mining using Homomorphic Encryption
CN114338164B (en) Anonymous security comparison method and system
CN109670329A (en) A kind of safe lead-in and lead-out method of server data and server
CN113065156B (en) Multi-party combined data processing method and device for controlling time delay