WO2021249500A1 - Method and apparatus for clustering private data of multiple parties - Google Patents

Method and apparatus for clustering private data of multiple parties Download PDF

Info

Publication number
WO2021249500A1
WO2021249500A1 PCT/CN2021/099479 CN2021099479W WO2021249500A1 WO 2021249500 A1 WO2021249500 A1 WO 2021249500A1 CN 2021099479 W CN2021099479 W CN 2021099479W WO 2021249500 A1 WO2021249500 A1 WO 2021249500A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
party
cluster
distance
private data
Prior art date
Application number
PCT/CN2021/099479
Other languages
French (fr)
Chinese (zh)
Inventor
陈超超
王力
周俊
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021249500A1 publication Critical patent/WO2021249500A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Definitions

  • One or more embodiments of this specification relate to the computer field, and in particular to a method and device for clustering private data from multiple parties.
  • Clustering is a very common technique in machine learning. It is often applied to tasks such as community discovery and anomaly detection.
  • the usual clustering algorithm is an unsupervised learning algorithm whose purpose is to group similar objects into the same cluster. The more similar the objects in the cluster, the better the clustering effect.
  • the biggest difference between clustering and classification is that the target of classification is known in advance, while clustering is different. The result is the same as the classification, but the classification is not pre-defined.
  • the data is distributed horizontally across multiple parties.
  • the data possessed by the parties may be private data.
  • the private data possessed by one party cannot be disclosed to other parties.
  • One or more embodiments of this specification describe a method and device for clustering private data of multiple parties, which can prevent the leakage of private data when clustering private data of multiple parties.
  • a method for clustering private data of multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data
  • the method includes multiple first private data, which is executed by the first party, and includes multiple rounds of iterative processes, where any round of iteration includes: determining the first center data corresponding to each cluster in the first type cluster set
  • the second party has the second center data corresponding to each cluster in the second type cluster set; the first type cluster set and the second type cluster set constitute a total cluster set; calculate the first type cluster set 1.
  • the method based on homomorphic encryption and the second party jointly calculate the first ciphertext distance between the first private data and each second central data, including: The first private data is encrypted with the public key of the first party to obtain the first encrypted data; the first encrypted data is sent to the second party, so that the second party can homomorphically calculate the The distance between the first encrypted data and the second central data to obtain the first ciphertext distance; receiving the first ciphertext distance from the second party.
  • the decrypting the first ciphertext distance includes: using the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance;
  • the private key and the public key of the first party form a set of public-private key pairs.
  • the arbitrary round of iteration is the first iteration
  • the first center data corresponding to each cluster in the first type cluster set is randomly initialized data.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data; and the cluster corresponding to the shortest plaintext distance is selected as the After the class cluster to which the first private data currently belongs, the method further includes: for the first private data set, determining a first average value of each first private data belonging to the first class cluster; the first class cluster Is any type of cluster in the first type cluster set; receiving from the second party the second average value of each second private data belonging to the first type cluster determined by the second party; The first average value and the second average value are updated, and the first center data corresponding to the first type cluster is updated.
  • the method further includes: jointly determining with the second party that changes in the center data of each type of cluster in the general type cluster set Whether the amount meets the preset iterative stop condition; if the result of the judgment is that the change in the central data of the various clusters in the total cluster set does not meet the preset iterative stop condition, then the multi-round iteration process is performed The next iteration.
  • the method further includes: if the result of the judgment is that the variation of the central data of the various clusters in the total cluster set meets a preset condition for stopping iteration, then assigning the first private data to the category to which the first private data currently belongs The cluster is determined as the class cluster to which the first private data ultimately belongs.
  • the joint judgment includes: a local judgment, whether the amount of change in the center data of the various clusters in the first type cluster set meets a preset iterative stop condition, to obtain a first judgment result;
  • the two parties receive a second judgment result, the second judgment result being used to indicate whether the amount of change in the center data of each type of cluster in the second type cluster set satisfies a preset condition for stopping iteration; according to the first judgment
  • a comprehensive judgment is made between the result and the second judgment result, and whether the change amount of the center data of the various clusters in the general cluster set meets the preset iterative stop condition.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data; the method further includes a method based on homomorphic encryption, Calculate the second ciphertext distance between the second privacy data and each first central data jointly with the second party, so that the second party obtains the first ciphertext distance after decryption according to the second ciphertext distance Second, the second plaintext distance between the private data and the first central data.
  • the method based on homomorphic encryption and the second party to jointly calculate the second ciphertext distance between the second private data and each first central data includes: receiving from the second party Second encrypted data; the second encrypted data is obtained by the second party encrypting the second private data with the public key of the second party; homomorphic calculation of the second encrypted data and the first A distance between central data to obtain the second ciphertext distance; sending the second ciphertext distance to the second party.
  • an apparatus for clustering private data of multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data
  • the device includes a plurality of first privacy data, and the device is set at the first party and is used to perform multiple rounds of iterative processes, including the following units for performing any round of iteration: a center determining unit, used to determine the first type of cluster Each cluster in the set currently corresponds to the first center data; the second party has the second center data currently corresponding to each cluster in the second cluster set; the first cluster set and the second cluster set
  • the cluster set constitutes a total cluster set; an independent calculation unit is used to calculate the first plaintext distance between the first privacy data and each first center data determined by the center determination unit; the first joint calculation unit uses Based on the homomorphic encryption method, the first ciphertext distance between the first private data and each second central data is jointly calculated with the second party; the decryption unit is used for the first joint calculation unit The obtained
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
  • the first party determines the first center data corresponding to each cluster in the first type cluster set.
  • the two parties determine the second central data corresponding to each cluster in the second type cluster set, the first type cluster set and the second type cluster set constitute a general cluster set; and the first party subsequently calculates the first private data
  • the first plaintext distance between each first center data determined by the party based on homomorphic encryption, the first private data and each second center data determined by the second party are jointly calculated with the second party
  • the whole process is based on homomorphic encryption, which can prevent the leakage of private data when clustering private data from multiple parties.
  • Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification
  • Fig. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment
  • Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment.
  • Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification.
  • This implementation scenario involves clustering private data from multiple parties.
  • the above-mentioned multiple parties may be two parties or more than two parties, for example, three parties, four parties, and so on.
  • clustering of private data of two parties is taken as an example for description.
  • the first party 11 has privacy data 1, privacy data 2, privacy data 3, privacy data 4, and privacy data 5;
  • the second party 12 has privacy data 6, privacy data 7, privacy data 8, and privacy data 9.
  • the first party and the second party are only a distinction between the two parties.
  • the first party may also be referred to as party A
  • the second party may be referred to as party B, and so on.
  • the information covered by the private data is not limited, and it can be any information that cannot be communicated, for example, the user's personal information or business secrets.
  • the private data is the user's personal information, including the user's name, age, income, etc., for details, please refer to the correspondence table of the information contained in each private data shown in Table 1.
  • Table 1 Correspondence table of information contained in each private data
  • Table 1 the data of different rows in Table 1 may be distributed in different parties.
  • private data 1 is distributed in the first party
  • private data 8 is distributed in the second party.
  • This kind of data distribution is distributed among multiple parties. It can be called horizontal segmentation.
  • FIG. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment, and the method may be based on the implementation scenario shown in FIG. 1.
  • the multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the method is executed by the first party , Including multiple rounds of iterative process, as shown in Figure 2, where any round of iteration includes the following steps: step 21, determine the first center data corresponding to each cluster in the first cluster set; the second party has The second center data corresponding to each cluster in the second type cluster set; the first type cluster set and the second type cluster set constitute a total cluster set; step 22, calculate the first privacy data and The first plaintext distance between each first center data; step 23, based on homomorphic encryption, jointly calculate the first ciphertext between the first private data and each second center data with the second party Distance; Step 24, decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data
  • step 21 determine the first center data corresponding to each cluster in the first type cluster set; the second party has the second center data currently corresponding to each cluster in the second type cluster set;
  • the first-type cluster set and the second-type cluster set constitute a general-type cluster set. It is understandable that the first number of clusters included in the first type cluster set can be preset, or the second number of clusters included in the second type cluster set can be preset, or Determine the third number of each cluster included in the above-mentioned total cluster set.
  • each center data in the total cluster set is jointly determined by the first party and the second party, and the first party can only determine the first center data corresponding to each cluster in the first cluster set.
  • the second party determines the corresponding second center data of each cluster in the second type cluster set. Either the first party and the second party only have part of the center data in the total cluster set, and not It has all the central data in the total cluster set.
  • the first number and the second number may be equal or unequal, and the first and second numbers may be adjusted according to the distribution of the amount of data in the first and second parties. Number and second number to speed up the clustering process.
  • the arbitrary round of iteration is the first iteration
  • the first center data corresponding to each cluster in the first type cluster set is randomly initialized data.
  • the third number is 6, the first number is 3, and the second number is 3, the first party randomly initializes 3 first center data, denoted as (c_1, c_2, c_3); Ground, the second party randomly initializes 3 second center data, denoted as (c_4, c_5, c_6).
  • K is the number of clusters
  • the initial vector is all 0, that is, [0,0,0,0,0,0,0].
  • step 22 the first plaintext distance between the first privacy data and each first central data is calculated. It is understandable that since the first party has both the first private data and the first central data, the above-mentioned first plaintext distance can be calculated independently.
  • x1 represents the first privacy data
  • c_1, c_2, and c_3 represent the respective first central data
  • the first plaintext distance between x1 and c_1 can be obtained by calculating (c_1-x1) ⁇ 2, denoted Is dx1
  • the first plaintext distance between x1 and c_2 can be obtained by calculating (c_2-x1) ⁇ 2, denoted as dx2
  • the first plaintext distance between x1 and c_3 can be obtained by calculating (c_3-x1) ⁇ 2 , Denoted as dx3.
  • the first privacy data and the first center data are both in the form of vectors, and the process of calculating the first plaintext distance is a vector operation process.
  • step 23 based on the homomorphic encryption method, the first ciphertext distance between the first private data and each second central data is jointly calculated with the second party. It is understandable that the first party has first private data, and the second party has second central data. Therefore, the first party needs to jointly calculate the first ciphertext distance with the second party.
  • the method based on homomorphic encryption and the second party jointly calculate the first ciphertext distance between the first private data and each second central data, including: The private data is encrypted with the public key of the first party to obtain the first encrypted data; the first encrypted data is sent to the second party, so that the second party homomorphically calculates the first encrypted data The distance from the second central data to obtain the first ciphertext distance; receiving the first ciphertext distance from the second party.
  • the first party can encrypt the first private data with Pka and give it to the second party.
  • the first encrypted data can be represented as [x1]a.
  • the second party homomorphically calculates the distance between the first encrypted data [x1]a and each of the second central data c_4, c_5, c_6, where [x1]a and c_4, c_5, c_6 are all vectors, and the distance calculation can be through
  • the vector elements are calculated, and the ciphertext of the distance can be directly homomorphically calculated.
  • the ciphertext is the aforementioned first ciphertext distance, which is recorded as [dx4]a,[dx5]a,[dx6]a, and the ciphertext is returned To the first party.
  • step 24 the first ciphertext distance is decrypted to obtain the second plaintext distance between the first private data and the second central data. It is understandable that in order to facilitate the comparison of the distances calculated in different ways, the first ciphertext distance needs to be decrypted.
  • the decrypting the first ciphertext distance includes: using the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance;
  • the private key of the party and the public key of the first party form a set of public-private key pairs.
  • the first party also referred to as party A
  • Pka represents the public key of the first party
  • Ska represents the private key of the first party.
  • the first party decrypts each first ciphertext distance [dx4]a, [dx5]a, [dx6]a, and obtains each second plaintext distance dx4, dx5, dx6.
  • step 25 according to each first plaintext distance and each second plaintext distance, the cluster corresponding to the shortest plaintext distance is selected as the cluster to which the first private data currently belongs. It is understandable that in different rounds of iterative processes, the clusters to which the first private data belongs may be different.
  • each first plaintext distance is dx1, dx2, dx3, and each second plaintext distance is dx4, dx5, dx6, where the shortest plaintext distance is dx2, that is, the first private data x1 is away from the first center
  • the data c_2 is the most recent, and the cluster to which the first private data x1 currently belongs is the cluster corresponding to c_2, and the cluster vector corresponding to x1 is updated to [0,1,0,0,0,0].
  • the second party has a second private data set, and the second private data set includes a plurality of second private data; the selecting the cluster corresponding to the shortest plaintext distance is the first private data
  • the method further includes: for the first private data set, determining a first average value of each first private data belonging to the first category cluster; the first category cluster is the first category cluster. Any type of cluster in a type of cluster set; receiving from the second party the second average value of each second private data belonging to the first type of cluster determined by the second party; according to the first average value And the second average value, and update the first center data corresponding to the first type cluster.
  • the first party and the second party update the central data c_1 to c_6 according to the cluster vector of all private data.
  • the update process is as follows:
  • the first party calculates the mean value of all the first private data whose cluster vector is [1,0,0,0,0,0], denoted as e1;
  • the second party calculates the mean value of all the second private data whose cluster vector is [1,0,0,0,0,0], denote it as e2, and sends e2 to the first party;
  • the first party calculates the average value of e1 and e2, and the average value is the new c_1.
  • the method further includes: jointly determining with the second party that changes in the center data of each type of cluster in the general type cluster set Whether the amount meets the preset iterative stop condition; if the result of the judgment is that the change in the central data of the various clusters in the total cluster set does not meet the preset iterative stop condition, then the multi-round iteration process is performed The next iteration.
  • the method further includes: if the result of the judgment is that the variation of the central data of the various clusters in the total cluster set meets a preset condition for stopping iteration, then assigning the first private data to the category to which the first private data currently belongs The cluster is determined as the class cluster to which the first private data ultimately belongs.
  • the joint judgment includes: a local judgment, whether the amount of change in the center data of the various clusters in the first type cluster set meets a preset iterative stop condition, to obtain a first judgment result;
  • the two parties receive a second judgment result, the second judgment result being used to indicate whether the amount of change in the center data of each type of cluster in the second type cluster set satisfies a preset condition for stopping iteration; according to the first judgment
  • a comprehensive judgment is made between the result and the second judgment result, and whether the change amount of the center data of the various clusters in the general cluster set meets the preset iterative stop condition.
  • the above stop iteration condition is
  • the processing procedure from step 21 to step 25 is mainly to describe the first party's first private data for its own party, and determine the cluster to which the first private data belongs.
  • the first party's first private data for the second party The second private data also needs to cooperate with the second party in the homomorphic encryption method to determine the cluster to which the second private data belongs.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data; the method further includes: based on a homomorphic encryption method, and the first private data set The two parties jointly calculate the second ciphertext distance between the second privacy data and each first central data, so that the second party obtains the second privacy data and the second ciphertext distance after decryption according to the second ciphertext distance The second plaintext distance between the first center data.
  • the method based on homomorphic encryption and the second party to jointly calculate the second ciphertext distance between the second private data and each first central data includes: receiving from the second party Second encrypted data; the second encrypted data is obtained by the second party encrypting the second private data with the public key of the second party; homomorphic calculation of the second encrypted data and the first A distance between central data to obtain the second ciphertext distance; sending the second ciphertext distance to the second party.
  • the second party (also called party B) may have a public-private key pair (Pkb, Skb), where Pkb represents the public key of the second party, and Skb represents the private key of the second party.
  • Pkb represents the public key of the second party
  • Skb represents the private key of the second party.
  • the second party can encrypt the second private data with Pkb to obtain the second encrypted data, and send the second encrypted data to the first party .
  • neither party determines the center data of each cluster individually, but the first party determines the current corresponding first center data of each cluster in the first cluster set, and the second party Determine the second center data corresponding to each cluster in the second type of cluster set, the first type of cluster set and the second type of cluster set constitute the total type of cluster set; and the subsequent first party calculates the first private data and the original
  • the first plaintext distance between each first center data determined by the party based on the homomorphic encryption method, and the second party jointly calculate the distance between the first privacy data and each second center data determined by the second party
  • the first ciphertext distance decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data and the second central data; finally, according to each first plaintext distance, and each first ciphertext distance
  • Two plaintext distance select the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs.
  • the whole process is based on homomorphic encryption, which can prevent the leakage of private data when cluster
  • a device for clustering private data of multiple parties is also provided, and the device is used to implement the method for clustering private data of multiple parties provided in the embodiment of this specification.
  • the multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the device is set on the first party , Used to perform multiple rounds of iterative process.
  • Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment. As shown in FIG.
  • the device 300 includes the following units for performing any round of iteration: a center determining unit 31, configured to determine the first center data corresponding to each cluster in the first type cluster set; The two parties have the second center data corresponding to each cluster in the second type cluster set; the first type cluster set and the second type cluster set constitute a total cluster set; the independent calculation unit 32 is used for calculation The first plaintext distance between the first privacy data and the respective first center data determined by the center determination unit 31; the first joint calculation unit 33 is configured to communicate with the second party based on a homomorphic encryption method Jointly calculate the first ciphertext distance between the first private data and each second central data; the decryption unit 34 is configured to decrypt the first ciphertext distance obtained by the first joint calculation unit 33 to obtain the The second plaintext distance between the first privacy data and the second central data; the cluster determining unit 35 is used to determine the first plaintext distances obtained by the independent computing unit 32 and the decryption unit 34 For each second plaintext distance, the cluster corresponding to the shortest plaintext distance is
  • the first joint computing unit 33 includes: an encryption subunit, configured to encrypt the first private data with the public key of the first party to obtain the first encrypted data; and send; A subunit, configured to send the first encrypted data obtained by the encryption subunit to the second party, so that the second party homomorphically calculates the relationship between the first encrypted data and the second central data To obtain the first ciphertext distance; and a receiving subunit for receiving the first ciphertext distance from the second party.
  • the decryption unit 34 is specifically configured to use the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance; the private key of the first party and the The public key of the first party forms a set of public-private key pairs.
  • the arbitrary round of iteration is the first iteration
  • the center determining unit 31 is specifically configured to determine the first center corresponding to each cluster in the first cluster set.
  • the data is randomly initialized data.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data;
  • the device further includes: an average value determining unit, configured to: After the cluster determining unit 35 selects the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs, for the first private data set, determine each first privacy attribute belonging to the first cluster The first average value of the data;
  • the first type cluster is any type of cluster in the first type cluster set;
  • the receiving unit is configured to receive from the second party that the second party determines that it belongs to the The second average value of each second privacy data of the first type cluster; an update unit, configured to update the corresponding first type cluster according to the first average value determined by the average value determining unit and the second average value received by the receiving unit The first central data.
  • the device further includes: a joint judging unit, configured to jointly judge with the second party after the update unit updates the first center data corresponding to the first type of cluster, the set of total type clusters Whether the variation of the central data of each type of cluster in the cluster meets the preset iterative stop condition; an iteration trigger unit is used to determine if the judgment result of the joint judging unit is the central data of each type of cluster in the general cluster set If the amount of change does not meet the preset iterative stop condition, then the next iteration of the multiple iterations is performed.
  • a joint judging unit configured to jointly judge with the second party after the update unit updates the first center data corresponding to the first type of cluster, the set of total type clusters Whether the variation of the central data of each type of cluster in the cluster meets the preset iterative stop condition
  • an iteration trigger unit is used to determine if the judgment result of the joint judging unit is the central data of each type of cluster in the general cluster set If the amount of change does not meet the preset iter
  • the device further includes: a final determination unit, configured to, if the judgment result of the joint judgment unit is that the change amount of the center data of the various clusters in the general cluster set satisfies a preset stop iteration condition, Then the cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
  • a final determination unit configured to, if the judgment result of the joint judgment unit is that the change amount of the center data of the various clusters in the general cluster set satisfies a preset stop iteration condition, Then the cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
  • the joint judging unit includes: a local judging subunit for locally judging whether the change amount of the center data of the various clusters in the first type cluster set meets the preset stop iteration condition, and obtain the first Judgment result; a receiving subunit for receiving a second judgment result from the second party, where the second judgment result is used to indicate whether the variation of the central data of each type of cluster in the second type cluster set satisfies a predetermined Set iterative stop condition; a comprehensive judgment subunit for comprehensive judgment based on the first judgment result obtained by the local judgment subunit and the second judgment result received by the receiving subunit, in the general cluster set Whether the variation of the center data of various clusters meets the preset conditions for stopping iteration.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data; the device further includes: a second joint computing unit, In a homomorphic encryption-based manner, the second ciphertext distance between the second private data and each first central data is jointly calculated with the second party, so that the second party can be based on the second ciphertext After the text distance is decrypted, the second plain text distance between the second private data and the first central data is obtained.
  • the second joint computing unit includes: a receiving subunit for receiving second encrypted data from the second party; the second encrypted data is used by the second party to use the second private data Obtained by encrypting the public key of the second party; a calculation subunit for homomorphically calculating the distance between the second encrypted data received by the receiving subunit and the first central data to obtain the second Ciphertext distance; a sending subunit for sending the second ciphertext distance obtained by the calculation subunit to the second party.
  • the center determining unit 31 of the first party determines the first center corresponding to each cluster in the first cluster set.
  • the second party determines the second center data corresponding to each cluster in the second type cluster set, the first type cluster set and the second type cluster set constitute the general cluster set;
  • the subsequent independent computing unit of the first party 32 Calculate the first plaintext distance between the first private data and each first center data determined by the party;
  • the first joint calculation unit 33 jointly calculates the second party with the second party based on homomorphic encryption A first ciphertext distance between private data and each second central data determined by the second party;
  • the decryption unit 34 decrypts the first ciphertext distance to obtain the first private data and the second central data
  • the final cluster determining unit 35 selects the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs.
  • the whole process is based on homomorphic encryption, which
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Embodiments of the present description provide a method and apparatus for clustering private data of multiple parties. The method comprises: a first party determines first center data currently corresponding to each class cluster in a first class cluster set, a second party having second center data currently corresponding to each class cluster in a second class cluster set, and the first class cluster set and the second class cluster set forming a total class cluster set; calculates a first plaintext distance between first private data and each first center data; on the basis of a homomorphic encryption method, calculates, together with the second party, a first ciphertext distance between the first private data and each second center data; decrypts the first ciphertext distances to obtain second plaintext distances between the first private data and the second center data; and according to the first plaintext distances and the second plaintext distances, selects a class cluster corresponding to the shortest plaintext distance as the class cluster to which the first private data currently belongs. Leakage of private data can be prevented.

Description

针对多方的隐私数据进行聚类的方法和装置Method and device for clustering private data of multiple parties 技术领域Technical field
本说明书一个或多个实施例涉及计算机领域,尤其涉及针对多方的隐私数据进行聚类的方法和装置。One or more embodiments of this specification relate to the computer field, and in particular to a method and device for clustering private data from multiple parties.
背景技术Background technique
聚类是机器学习中一种很常用的技术。它常常被应用于社区发现、异常检测等任务。通常的聚类算法,是一种无监督学习算法,目的是将相似的对象归到同一个类簇中。类簇内的对象越相似,聚类的效果就越好。聚类和分类最大的不同在于,分类的目标事先已知,而聚类则不一样。其产生的结果和分类相同,而只是类别没有预先定义。Clustering is a very common technique in machine learning. It is often applied to tasks such as community discovery and anomaly detection. The usual clustering algorithm is an unsupervised learning algorithm whose purpose is to group similar objects into the same cluster. The more similar the objects in the cluster, the better the clustering effect. The biggest difference between clustering and classification is that the target of classification is known in advance, while clustering is different. The result is the same as the classification, but the classification is not pre-defined.
在某些场景下,数据水平分布在多方。各方具有的数据可能为隐私数据。也就是说,一方具有的隐私数据不能公开给其他方。这种情况下,希望能有改进的方案,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。In some scenarios, the data is distributed horizontally across multiple parties. The data possessed by the parties may be private data. In other words, the private data possessed by one party cannot be disclosed to other parties. In this case, it is hoped that there will be an improved solution that can prevent the leakage of private data when clustering private data from multiple parties.
发明内容Summary of the invention
本说明书一个或多个实施例描述了一种针对多方的隐私数据进行聚类的方法和装置,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。One or more embodiments of this specification describe a method and device for clustering private data of multiple parties, which can prevent the leakage of private data when clustering private data of multiple parties.
第一方面,提供了一种针对多方的隐私数据进行聚类的方法,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,该方法通过所述第一方执行,包括多轮迭代过程,其中任意一轮迭代包括:确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合;计算所述第一隐私数据和各个第一中心数据之间的第一明文距离;基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离;对所述第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。In a first aspect, a method for clustering private data of multiple parties is provided, the multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data The method includes multiple first private data, which is executed by the first party, and includes multiple rounds of iterative processes, where any round of iteration includes: determining the first center data corresponding to each cluster in the first type cluster set The second party has the second center data corresponding to each cluster in the second type cluster set; the first type cluster set and the second type cluster set constitute a total cluster set; calculate the first type cluster set 1. The first plaintext distance between private data and each first center data; based on homomorphic encryption, the first secret between the first private data and each second center data is jointly calculated with the second party Text distance; decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data and the second central data; according to each first plaintext distance and each second plaintext distance, select The cluster corresponding to the shortest plaintext distance is used as the cluster to which the first private data currently belongs.
在一种可能的实施方式中,所述基于同态加密的方式,与所述第二方联合计算所述 第一隐私数据和各个第二中心数据之间的第一密文距离,包括:将所述第一隐私数据用所述第一方的公钥加密,得到第一加密数据;将所述第一加密数据发送给所述第二方,以使所述第二方同态计算所述第一加密数据与所述第二中心数据之间的距离,以得到所述第一密文距离;从所述第二方接收所述第一密文距离。In a possible implementation manner, the method based on homomorphic encryption and the second party jointly calculate the first ciphertext distance between the first private data and each second central data, including: The first private data is encrypted with the public key of the first party to obtain the first encrypted data; the first encrypted data is sent to the second party, so that the second party can homomorphically calculate the The distance between the first encrypted data and the second central data to obtain the first ciphertext distance; receiving the first ciphertext distance from the second party.
进一步地,所述对所述第一密文距离解密,包括:利用所述第一方的私钥对所述第一密文距离解密,得到所述第二明文距离;所述第一方的私钥和所述第一方的公钥组成一组公私钥对。Further, the decrypting the first ciphertext distance includes: using the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance; The private key and the public key of the first party form a set of public-private key pairs.
在一种可能的实施方式中,所述任意一轮迭代为第一次迭代,所述第一类簇集合中各个类簇当前分别对应的第一中心数据为随机初始化的数据。In a possible implementation manner, the arbitrary round of iteration is the first iteration, and the first center data corresponding to each cluster in the first type cluster set is randomly initialized data.
在一种可能的实施方式中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;所述选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇之后,所述方法还包括:针对所述第一隐私数据集合,确定归属于第一类簇的各第一隐私数据的第一均值;所述第一类簇为所述第一类簇集合中的任一类簇;从所述第二方接收所述第二方确定的归属于所述第一类簇的各第二隐私数据的第二均值;根据所述第一均值和所述第二均值,更新所述第一类簇对应的第一中心数据。In a possible implementation manner, the second party has a second private data set, and the second private data set includes a plurality of second private data; and the cluster corresponding to the shortest plaintext distance is selected as the After the class cluster to which the first private data currently belongs, the method further includes: for the first private data set, determining a first average value of each first private data belonging to the first class cluster; the first class cluster Is any type of cluster in the first type cluster set; receiving from the second party the second average value of each second private data belonging to the first type cluster determined by the second party; The first average value and the second average value are updated, and the first center data corresponding to the first type cluster is updated.
进一步地,所述更新所述第一类簇对应的第一中心数据之后,所述方法还包括:与所述第二方联合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;若判断结果为所述总类簇集合中各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。Further, after the update of the first center data corresponding to the first type cluster, the method further includes: jointly determining with the second party that changes in the center data of each type of cluster in the general type cluster set Whether the amount meets the preset iterative stop condition; if the result of the judgment is that the change in the central data of the various clusters in the total cluster set does not meet the preset iterative stop condition, then the multi-round iteration process is performed The next iteration.
进一步地,所述方法还包括:若判断结果为所述总类簇集合中各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。Further, the method further includes: if the result of the judgment is that the variation of the central data of the various clusters in the total cluster set meets a preset condition for stopping iteration, then assigning the first private data to the category to which the first private data currently belongs The cluster is determined as the class cluster to which the first private data ultimately belongs.
进一步地,所述联合判断,包括:本地判断,所述第一类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,得到第一判断结果;从所述第二方接收第二判断结果,所述第二判断结果用于指示所述第二类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;根据所述第一判断结果和所述第二判断结果进行综合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Further, the joint judgment includes: a local judgment, whether the amount of change in the center data of the various clusters in the first type cluster set meets a preset iterative stop condition, to obtain a first judgment result; The two parties receive a second judgment result, the second judgment result being used to indicate whether the amount of change in the center data of each type of cluster in the second type cluster set satisfies a preset condition for stopping iteration; according to the first judgment A comprehensive judgment is made between the result and the second judgment result, and whether the change amount of the center data of the various clusters in the general cluster set meets the preset iterative stop condition.
在一种可能的实施方式中,所述第二方具有第二隐私数据集合,所述第二隐私数据 集合中包括多个第二隐私数据;所述方法还包括:基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,以使所述第二方根据所述第二密文距离解密后得到所述第二隐私数据和所述第一中心数据之间的第二明文距离。In a possible implementation manner, the second party has a second private data set, and the second private data set includes a plurality of second private data; the method further includes a method based on homomorphic encryption, Calculate the second ciphertext distance between the second privacy data and each first central data jointly with the second party, so that the second party obtains the first ciphertext distance after decryption according to the second ciphertext distance Second, the second plaintext distance between the private data and the first central data.
进一步地,所述基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,包括:从所述第二方接收第二加密数据;所述第二加密数据为所述第二方将所述第二隐私数据用所述第二方的公钥加密得到的;同态计算所述第二加密数据与所述第一中心数据之间的距离,以得到所述第二密文距离;向所述第二方发送所述第二密文距离。Further, the method based on homomorphic encryption and the second party to jointly calculate the second ciphertext distance between the second private data and each first central data includes: receiving from the second party Second encrypted data; the second encrypted data is obtained by the second party encrypting the second private data with the public key of the second party; homomorphic calculation of the second encrypted data and the first A distance between central data to obtain the second ciphertext distance; sending the second ciphertext distance to the second party.
第二方面,提供了一种针对多方的隐私数据进行聚类的装置,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,该装置设置于所述第一方,用于执行多轮迭代过程,包括用于执行任意一轮迭代的如下单元:中心确定单元,用于确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合;独立计算单元,用于计算所述第一隐私数据和所述中心确定单元确定的各个第一中心数据之间的第一明文距离;第一联合计算单元,用于基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离;解密单元,用于对所述第一联合计算单元得到的第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;类簇确定单元,用于根据所述独立计算单元得到的各第一明文距离,以及所述解密单元得到的各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。In a second aspect, there is provided an apparatus for clustering private data of multiple parties, the multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data The device includes a plurality of first privacy data, and the device is set at the first party and is used to perform multiple rounds of iterative processes, including the following units for performing any round of iteration: a center determining unit, used to determine the first type of cluster Each cluster in the set currently corresponds to the first center data; the second party has the second center data currently corresponding to each cluster in the second cluster set; the first cluster set and the second cluster set The cluster set constitutes a total cluster set; an independent calculation unit is used to calculate the first plaintext distance between the first privacy data and each first center data determined by the center determination unit; the first joint calculation unit uses Based on the homomorphic encryption method, the first ciphertext distance between the first private data and each second central data is jointly calculated with the second party; the decryption unit is used for the first joint calculation unit The obtained first ciphertext distance is decrypted to obtain the second plaintext distance between the first private data and the second central data; the cluster determining unit is used for each first plaintext obtained by the independent computing unit The distance, and each second plaintext distance obtained by the decryption unit, select the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs.
第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
第四方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。In a fourth aspect, a computing device is provided, including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
通过本说明书实施例提供的方法和装置,不是由任何一方单独确定各个类簇的中心数据,而是由第一方确定第一类簇集合中各个类簇当前分别对应的第一中心数据,第二方确定第二类簇集合中各个类簇当前分别对应的第二中心数据,第一类簇集合和第二类簇集合构成总类簇集合;并且后续第一方计算所述第一隐私数据和本方确定的各个第一 中心数据之间的第一明文距离;基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和第二方确定的各个第二中心数据之间的第一密文距离;对所述第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;最后根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。整个过程以同态加密为基础,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。With the method and device provided in the embodiments of this specification, not one party alone determines the center data of each cluster, but the first party determines the first center data corresponding to each cluster in the first type cluster set. The two parties determine the second central data corresponding to each cluster in the second type cluster set, the first type cluster set and the second type cluster set constitute a general cluster set; and the first party subsequently calculates the first private data The first plaintext distance between each first center data determined by the party; based on homomorphic encryption, the first private data and each second center data determined by the second party are jointly calculated with the second party The first ciphertext distance between, and decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data and the second central data; finally, according to each first plaintext distance, and For each second plaintext distance, the cluster corresponding to the shortest plaintext distance is selected as the cluster to which the first private data currently belongs. The whole process is based on homomorphic encryption, which can prevent the leakage of private data when clustering private data from multiple parties.
附图说明Description of the drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本说明书披露的一个实施例的实施场景示意图;Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification;
图2示出根据一个实施例的针对多方的隐私数据进行聚类的方法流程图;Fig. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment;
图3示出根据一个实施例的针对多方的隐私数据进行聚类的装置的示意性框图。Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment.
具体实施方式detailed description
下面结合附图,对本说明书提供的方案进行描述。The following describes the solutions provided in this specification with reference to the accompanying drawings.
图1为本说明书披露的一个实施例的实施场景示意图。该实施场景涉及针对多方的隐私数据进行聚类。可以理解的是,上述多方可以为两方或两方以上,例如,三方、四方等。本说明书实施例,以针对两方的隐私数据进行聚类为例进行说明。参照图1,第一方11具有隐私数据1、隐私数据2、隐私数据3、隐私数据4、隐私数据5;第二方12具有隐私数据6、隐私数据7、隐私数据8、隐私数据9。其中,第一方和第二方仅为对两方的区分,还可以将第一方称为A方,将第二方称为B方,等。Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification. This implementation scenario involves clustering private data from multiple parties. It is understandable that the above-mentioned multiple parties may be two parties or more than two parties, for example, three parties, four parties, and so on. In the embodiments of this specification, clustering of private data of two parties is taken as an example for description. 1, the first party 11 has privacy data 1, privacy data 2, privacy data 3, privacy data 4, and privacy data 5; the second party 12 has privacy data 6, privacy data 7, privacy data 8, and privacy data 9. Among them, the first party and the second party are only a distinction between the two parties. The first party may also be referred to as party A, the second party may be referred to as party B, and so on.
本说明书实施例中,对于隐私数据涵盖的信息不做限定,可以是任何不可外传的信息,例如,用户的个人信息或商业秘密等。举例来说,隐私数据为用户的个人信息,包括了用户的姓名、年龄、收入等,具体可以参照表一所示的各隐私数据包含信息的对应关系表。In the embodiments of this specification, the information covered by the private data is not limited, and it can be any information that cannot be communicated, for example, the user's personal information or business secrets. For example, the private data is the user's personal information, including the user's name, age, income, etc., for details, please refer to the correspondence table of the information contained in each private data shown in Table 1.
表一:各隐私数据包含信息的对应关系表Table 1: Correspondence table of information contained in each private data
 To 姓名Name 年龄(岁)age) 收入(万元)Income (ten thousand yuan)
隐私数据1 Privacy data 1 张一Zhang Yi 2525 1.51.5
隐私数据2Privacy data 2 张二Zhang Er 2626 2.22.2
隐私数据3Privacy data 3 张三Zhang San 3535 0.80.8
隐私数据4Privacy data 4 赵一Zhao Yi 4141 1.81.8
隐私数据5Privacy data 5 赵二Zhao Er 1919 0.60.6
隐私数据6Privacy data 6 赵三Zhao San 2828 3.53.5
隐私数据7Privacy data 7 赵四Zhao Si 3636 1.21.2
隐私数据8Privacy data 8 王一Yi Wang 2929 1.31.3
隐私数据9Privacy data 9 王二King Two 3030 2.22.2
由表一可见,表一中不同行的数据可能分布在不同方,例如,隐私数据1分布在第一方,隐私数据8分布在第二方,这种数据水平分布在多方的数据分布方式,可以称为水平切分。It can be seen from Table 1 that the data of different rows in Table 1 may be distributed in different parties. For example, private data 1 is distributed in the first party, and private data 8 is distributed in the second party. This kind of data distribution is distributed among multiple parties. It can be called horizontal segmentation.
本说明书实施例,需要针对多方的隐私数据进行聚类,以图1为例,就是针对隐私数据1、隐私数据2、隐私数据3、隐私数据4、隐私数据5、隐私数据6、隐私数据7、隐私数据8、隐私数据9进行聚类,分布在不同方的隐私数据有可能被划分到同一类簇中,例如,隐私数据1、隐私数据3、隐私数据6、隐私数据7被划分到类簇1,隐私数据2、隐私数据4、隐私数据5、隐私数据8、隐私数据9被划分到类簇2。本说明书实施例,利用同态加密的方式,在不泄露隐私数据的前提下,实现针对多方的隐私数据进行聚类。In the embodiment of this specification, it is necessary to cluster private data of multiple parties. Taking Figure 1 as an example, it is for private data 1, private data 2, private data 3, private data 4, private data 5, private data 6, and private data 7. ,Privacy data8,Privacy data9 are clustered. Private data distributed in different parties may be classified into the same cluster. For example, private data 1, private data 3, private data 6, and private data 7 are classified into categories Cluster 1, private data 2, private data 4, private data 5, private data 8, and private data 9 are divided into cluster 2. In the embodiment of this specification, the method of homomorphic encryption is used to realize the clustering of the private data of multiple parties under the premise of not leaking the private data.
图2示出根据一个实施例的针对多方的隐私数据进行聚类的方法流程图,该方法可以基于图1所示的实施场景。所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述方法通过所述第一方执行,包括多轮迭代过程,如图2所示,其中任意一轮迭代包括以下步骤:步骤21,确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合;步骤22,计算所述第一隐私数据和各个第一中心数据之间的第一明文距离;步骤23,基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离;步骤24,对所述第一密文距离解密,得到所述第一隐 私数据和所述第二中心数据之间的第二明文距离;步骤25,根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。下面描述以上各个步骤的具体执行方式。FIG. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment, and the method may be based on the implementation scenario shown in FIG. 1. The multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the method is executed by the first party , Including multiple rounds of iterative process, as shown in Figure 2, where any round of iteration includes the following steps: step 21, determine the first center data corresponding to each cluster in the first cluster set; the second party has The second center data corresponding to each cluster in the second type cluster set; the first type cluster set and the second type cluster set constitute a total cluster set; step 22, calculate the first privacy data and The first plaintext distance between each first center data; step 23, based on homomorphic encryption, jointly calculate the first ciphertext between the first private data and each second center data with the second party Distance; Step 24, decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data and the second central data; step 25, according to each first plaintext distance, and each first Two plaintext distance, select the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs. The following describes the specific implementation of each of the above steps.
首先在步骤21,确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合。可以理解的是,可以预先设定上述第一类簇集合中包含的各个类簇的第一数目,或者预先设定上述第二类簇集合中包含的各个类簇的第二数目,或者预先设定上述总类簇集合中包含的各个类簇的第三数目。例如,预先设定第一数目为3,第二数目为3,第三数目为6;或者,预先设定第一数目为1,第二数目为2,第三数目为3;或者,预先设定第一数目为2,第二数目为5,第三数目为7。First, in step 21, determine the first center data corresponding to each cluster in the first type cluster set; the second party has the second center data currently corresponding to each cluster in the second type cluster set; The first-type cluster set and the second-type cluster set constitute a general-type cluster set. It is understandable that the first number of clusters included in the first type cluster set can be preset, or the second number of clusters included in the second type cluster set can be preset, or Determine the third number of each cluster included in the above-mentioned total cluster set. For example, preset the first number to 3, the second number to 3, and the third number to 6; or, to preset the first number to 1, the second number to 2, and the third number to 3; or, to preset Let the first number be 2, the second number be 5, and the third number be 7.
本说明书实施例中,总类簇集合中各中心数据是由第一方和第二方联合确定的,第一方只能确定第一类簇集合中各个类簇当前分别对应的第一中心数据,由第二方确定第二类簇集合中各个类簇当前分别对应的第二中心数据,第一方和第二方中的任何一方都只具有总类簇集合中的部分中心数据,而不具有总类簇集合中的所有中心数据。In the embodiment of this specification, each center data in the total cluster set is jointly determined by the first party and the second party, and the first party can only determine the first center data corresponding to each cluster in the first cluster set. , The second party determines the corresponding second center data of each cluster in the second type cluster set. Either the first party and the second party only have part of the center data in the total cluster set, and not It has all the central data in the total cluster set.
需要说明的是,在上述第三数目已确定的前提下,上述第一数目和第二数目可以相等,也可以不相等,可以根据数据量在第一方和第二方的分布调整上述第一数目和第二数目,以加速聚类过程。It should be noted that, on the premise that the third number has been determined, the first number and the second number may be equal or unequal, and the first and second numbers may be adjusted according to the distribution of the amount of data in the first and second parties. Number and second number to speed up the clustering process.
在一个示例中,所述任意一轮迭代为第一次迭代,所述第一类簇集合中各个类簇当前分别对应的第一中心数据为随机初始化的数据。In an example, the arbitrary round of iteration is the first iteration, and the first center data corresponding to each cluster in the first type cluster set is randomly initialized data.
举例来说,假定上述第三数目为6,上述第一数目为3,上述第二数目为3,则第一方随机初始化3个第一中心数据,记为(c_1,c_2,c_3);相应地,第二方随机初始化3个第二中心数据,记为(c_4,c_5,c_6)。For example, assuming that the third number is 6, the first number is 3, and the second number is 3, the first party randomly initializes 3 first center data, denoted as (c_1, c_2, c_3); Ground, the second party randomly initializes 3 second center data, denoted as (c_4, c_5, c_6).
进一步地,第一方可以针对每个第一隐私数据,初始化一个K维的类簇向量,用于标记该第一隐私数据所属的类簇,其中,K为类簇的数目,当K=6时,初始化一个6维的类簇向量,例如,初始为全0的向量,即[0,0,0,0,0,0]。Further, the first party may initialize a K-dimensional cluster vector for each first private data to mark the cluster to which the first private data belongs, where K is the number of clusters, when K=6 When, initialize a 6-dimensional cluster vector, for example, the initial vector is all 0, that is, [0,0,0,0,0,0].
然后在步骤22,计算所述第一隐私数据和各个第一中心数据之间的第一明文距离。可以理解的是,由于第一方既具有第一隐私数据,又具有第一中心数据,因此可以独立计算上述第一明文距离。Then in step 22, the first plaintext distance between the first privacy data and each first central data is calculated. It is understandable that since the first party has both the first private data and the first central data, the above-mentioned first plaintext distance can be calculated independently.
本说明书实施例中,假设由x1表示第一隐私数据,c_1,c_2,c_3表示各个第一中心 数据,x1和c_1之间的第一明文距离可以通过计算(c_1-x1)^2得到,记为dx1;x1和c_2之间的第一明文距离可以通过计算(c_2-x1)^2得到,记为dx2;x1和c_3之间的第一明文距离可以通过计算(c_3-x1)^2得到,记为dx3。In the embodiment of this specification, it is assumed that x1 represents the first privacy data, c_1, c_2, and c_3 represent the respective first central data, and the first plaintext distance between x1 and c_1 can be obtained by calculating (c_1-x1)^2, denoted Is dx1; the first plaintext distance between x1 and c_2 can be obtained by calculating (c_2-x1)^2, denoted as dx2; the first plaintext distance between x1 and c_3 can be obtained by calculating (c_3-x1)^2 , Denoted as dx3.
可以理解的是,通常地,第一隐私数据和第一中心数据均为向量的形式,计算第一明文距离的过程就是向量运算过程。It can be understood that, generally, the first privacy data and the first center data are both in the form of vectors, and the process of calculating the first plaintext distance is a vector operation process.
接着在步骤23,基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离。可以理解的是,第一方具有第一隐私数据,第二方具有第二中心数据,因此第一方需要与第二方联合计算上述第一密文距离。Next, in step 23, based on the homomorphic encryption method, the first ciphertext distance between the first private data and each second central data is jointly calculated with the second party. It is understandable that the first party has first private data, and the second party has second central data. Therefore, the first party needs to jointly calculate the first ciphertext distance with the second party.
在一个示例中,所述基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离,包括:将所述第一隐私数据用所述第一方的公钥加密,得到第一加密数据;将所述第一加密数据发送给所述第二方,以使所述第二方同态计算所述第一加密数据与所述第二中心数据之间的距离,以得到所述第一密文距离;从所述第二方接收所述第一密文距离。In an example, the method based on homomorphic encryption and the second party jointly calculate the first ciphertext distance between the first private data and each second central data, including: The private data is encrypted with the public key of the first party to obtain the first encrypted data; the first encrypted data is sent to the second party, so that the second party homomorphically calculates the first encrypted data The distance from the second central data to obtain the first ciphertext distance; receiving the first ciphertext distance from the second party.
假定用[]表示加密,[]a表示用第一方的公钥加密,则第一方可以将第一隐私数据用Pka加密,给到第二方。假定由x1表示第一隐私数据,则第一加密数据可以表示为[x1]a。Assuming that [] is used for encryption and []a is used for encryption with the public key of the first party, the first party can encrypt the first private data with Pka and give it to the second party. Assuming that the first privacy data is represented by x1, the first encrypted data can be represented as [x1]a.
第二方同态计算第一加密数据[x1]a与各个第二中心数据c_4,c_5,c_6之间的距离,这里[x1]a与c_4,c_5,c_6都是向量,距离的计算可以通过向量元素的运算得到,可以直接同态计算得到距离的密文,该密文即前述第一密文距离,记为[dx4]a,[dx5]a,[dx6]a,并将密文返回给第一方。The second party homomorphically calculates the distance between the first encrypted data [x1]a and each of the second central data c_4, c_5, c_6, where [x1]a and c_4, c_5, c_6 are all vectors, and the distance calculation can be through The vector elements are calculated, and the ciphertext of the distance can be directly homomorphically calculated. The ciphertext is the aforementioned first ciphertext distance, which is recorded as [dx4]a,[dx5]a,[dx6]a, and the ciphertext is returned To the first party.
再在步骤24,对所述第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离。可以理解的是,为了便于对不同方式计算得到的各距离进行比较,还需要对第一密文距离解密。In step 24, the first ciphertext distance is decrypted to obtain the second plaintext distance between the first private data and the second central data. It is understandable that in order to facilitate the comparison of the distances calculated in different ways, the first ciphertext distance needs to be decrypted.
在一个示例中,所述对所述第一密文距离解密,包括:利用所述第一方的私钥对所述第一密文距离解密,得到所述第二明文距离;所述第一方的私钥和所述第一方的公钥组成一组公私钥对。In an example, the decrypting the first ciphertext distance includes: using the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance; The private key of the party and the public key of the first party form a set of public-private key pairs.
举例来说,第一方(也可称为A方)具有公私钥对(Pka,Ska),其中,Pka表示第一方的公钥,Ska表示第一方的私钥。For example, the first party (also referred to as party A) has a public-private key pair (Pka, Ska), where Pka represents the public key of the first party, and Ska represents the private key of the first party.
第一方对各第一密文距离[dx4]a,[dx5]a,[dx6]a进行解密,得到各第二明文距离dx4,dx5,dx6。The first party decrypts each first ciphertext distance [dx4]a, [dx5]a, [dx6]a, and obtains each second plaintext distance dx4, dx5, dx6.
最后在步骤25,根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。可以理解的是,在不同轮的迭代过程中,第一隐私数据归属的类簇可能不同。Finally, in step 25, according to each first plaintext distance and each second plaintext distance, the cluster corresponding to the shortest plaintext distance is selected as the cluster to which the first private data currently belongs. It is understandable that in different rounds of iterative processes, the clusters to which the first private data belongs may be different.
举例来说,各第一明文距离为dx1,dx2,dx3,各第二明文距离为dx4,dx5,dx6,其中,最短的明文距离为dx2,也就是说,第一隐私数据x1离第一中心数据c_2最近,得到第一隐私数据x1当前归属的类簇为c_2对应的类簇,并更新x1对应的类簇向量为[0,1,0,0,0,0]。For example, each first plaintext distance is dx1, dx2, dx3, and each second plaintext distance is dx4, dx5, dx6, where the shortest plaintext distance is dx2, that is, the first private data x1 is away from the first center The data c_2 is the most recent, and the cluster to which the first private data x1 currently belongs is the cluster corresponding to c_2, and the cluster vector corresponding to x1 is updated to [0,1,0,0,0,0].
在一个示例中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;所述选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇之后,所述方法还包括:针对所述第一隐私数据集合,确定归属于第一类簇的各第一隐私数据的第一均值;所述第一类簇为所述第一类簇集合中的任一类簇;从所述第二方接收所述第二方确定的归属于所述第一类簇的各第二隐私数据的第二均值;根据所述第一均值和所述第二均值,更新所述第一类簇对应的第一中心数据。In an example, the second party has a second private data set, and the second private data set includes a plurality of second private data; the selecting the cluster corresponding to the shortest plaintext distance is the first private data After the current category cluster belongs, the method further includes: for the first private data set, determining a first average value of each first private data belonging to the first category cluster; the first category cluster is the first category cluster. Any type of cluster in a type of cluster set; receiving from the second party the second average value of each second private data belonging to the first type of cluster determined by the second party; according to the first average value And the second average value, and update the first center data corresponding to the first type cluster.
举例来说,第一方和第二方根据所有隐私数据的类簇向量,更新中心数据c_1至c_6,以c_1为例,更新过程如下:For example, the first party and the second party update the central data c_1 to c_6 according to the cluster vector of all private data. Taking c_1 as an example, the update process is as follows:
第一方计算所有类簇向量为[1,0,0,0,0,0]的第一隐私数据的均值,记为e1;The first party calculates the mean value of all the first private data whose cluster vector is [1,0,0,0,0,0], denoted as e1;
第二方计算所有类簇向量为[1,0,0,0,0,0]的第二隐私数据的均值,记为e2,并将e2发送给第一方;The second party calculates the mean value of all the second private data whose cluster vector is [1,0,0,0,0,0], denote it as e2, and sends e2 to the first party;
第一方计算e1和e2的均值,该均值即为新的c_1。The first party calculates the average value of e1 and e2, and the average value is the new c_1.
进一步地,所述更新所述第一类簇对应的第一中心数据之后,所述方法还包括:与所述第二方联合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;若判断结果为所述总类簇集合中各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。Further, after the update of the first center data corresponding to the first type cluster, the method further includes: jointly determining with the second party that changes in the center data of each type of cluster in the general type cluster set Whether the amount meets the preset iterative stop condition; if the result of the judgment is that the change in the central data of the various clusters in the total cluster set does not meet the preset iterative stop condition, then the multi-round iteration process is performed The next iteration.
进一步地,所述方法还包括:若判断结果为所述总类簇集合中各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。Further, the method further includes: if the result of the judgment is that the variation of the central data of the various clusters in the total cluster set meets a preset condition for stopping iteration, then assigning the first private data to the category to which the first private data currently belongs The cluster is determined as the class cluster to which the first private data ultimately belongs.
进一步地,所述联合判断,包括:本地判断,所述第一类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,得到第一判断结果;从所述第二方接收第二判断结果,所述第二判断结果用于指示所述第二类簇集合中各类簇的中心数据的变 化量是否满足预先设定的停止迭代条件;根据所述第一判断结果和所述第二判断结果进行综合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Further, the joint judgment includes: a local judgment, whether the amount of change in the center data of the various clusters in the first type cluster set meets a preset iterative stop condition, to obtain a first judgment result; The two parties receive a second judgment result, the second judgment result being used to indicate whether the amount of change in the center data of each type of cluster in the second type cluster set satisfies a preset condition for stopping iteration; according to the first judgment A comprehensive judgment is made between the result and the second judgment result, and whether the change amount of the center data of the various clusters in the general cluster set meets the preset iterative stop condition.
例如,上述停止迭代条件为|C(t)-C(t+1)|^2<delta,其中,delta可以是个预设置的值,C(t)表示更新前的中心数据,C(t+1)表示更新后的中心数据。For example, the above stop iteration condition is |C(t)-C(t+1)|^2<delta, where delta can be a preset value, C(t) represents the central data before update, C(t+ 1) Represents the updated central data.
本说明书实施例中,前述步骤21至步骤25的处理过程主要是描述第一方针对本方的第一隐私数据,确定第一隐私数据归属的类簇,此外,第一方针对第二方的第二隐私数据,还需在同态加密的方式中,配合第二方确定第二隐私数据归属的类簇。In the embodiment of this specification, the processing procedure from step 21 to step 25 is mainly to describe the first party's first private data for its own party, and determine the cluster to which the first private data belongs. In addition, the first party's first private data for the second party The second private data also needs to cooperate with the second party in the homomorphic encryption method to determine the cluster to which the second private data belongs.
在一个示例中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;所述方法还包括:基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,以使所述第二方根据所述第二密文距离解密后得到所述第二隐私数据和所述第一中心数据之间的第二明文距离。In an example, the second party has a second private data set, and the second private data set includes a plurality of second private data; the method further includes: based on a homomorphic encryption method, and the first private data set The two parties jointly calculate the second ciphertext distance between the second privacy data and each first central data, so that the second party obtains the second privacy data and the second ciphertext distance after decryption according to the second ciphertext distance The second plaintext distance between the first center data.
进一步地,所述基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,包括:从所述第二方接收第二加密数据;所述第二加密数据为所述第二方将所述第二隐私数据用所述第二方的公钥加密得到的;同态计算所述第二加密数据与所述第一中心数据之间的距离,以得到所述第二密文距离;向所述第二方发送所述第二密文距离。Further, the method based on homomorphic encryption and the second party to jointly calculate the second ciphertext distance between the second private data and each first central data includes: receiving from the second party Second encrypted data; the second encrypted data is obtained by the second party encrypting the second private data with the public key of the second party; homomorphic calculation of the second encrypted data and the first A distance between central data to obtain the second ciphertext distance; sending the second ciphertext distance to the second party.
举例来说,第二方(也可称为B方)可以具有公私钥对(Pkb,Skb),其中,Pkb表示第二方的公钥,Skb表示第二方的私钥。For example, the second party (also called party B) may have a public-private key pair (Pkb, Skb), where Pkb represents the public key of the second party, and Skb represents the private key of the second party.
假定用[]表示加密,[]b表示用第二方的公钥加密,则第二方可以将第二隐私数据用Pkb加密,得到第二加密数据,将第二加密数据发送给第一方。Assuming that [] is used for encryption and []b is used for encryption with the public key of the second party, the second party can encrypt the second private data with Pkb to obtain the second encrypted data, and send the second encrypted data to the first party .
可以理解的是,在针对多方的隐私数据进行聚类的方法中,第一方和第二方的地位平等,第一方和第二方的处理过程无实质的不同,本说明书实施例中,主要以第一方为执行主体描述相应的处理过程。It is understandable that in the method for clustering private data of multiple parties, the status of the first party and the second party are equal, and the processing procedures of the first party and the second party are not substantially different. In the embodiment of this specification, Mainly take the first party as the executive body to describe the corresponding processing process.
通过本说明书实施例提供的方法,不是由任何一方单独确定各个类簇的中心数据,而是由第一方确定第一类簇集合中各个类簇当前分别对应的第一中心数据,第二方确定第二类簇集合中各个类簇当前分别对应的第二中心数据,第一类簇集合和第二类簇集合构成总类簇集合;并且后续第一方计算所述第一隐私数据和本方确定的各个第一中心数据之间的第一明文距离;基于同态加密的方式,与所述第二方联合计算所述第一隐私数 据和第二方确定的各个第二中心数据之间的第一密文距离;对所述第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;最后根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。整个过程以同态加密为基础,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。Through the method provided by the embodiment of this specification, neither party determines the center data of each cluster individually, but the first party determines the current corresponding first center data of each cluster in the first cluster set, and the second party Determine the second center data corresponding to each cluster in the second type of cluster set, the first type of cluster set and the second type of cluster set constitute the total type of cluster set; and the subsequent first party calculates the first private data and the original The first plaintext distance between each first center data determined by the party; based on the homomorphic encryption method, and the second party jointly calculate the distance between the first privacy data and each second center data determined by the second party The first ciphertext distance; decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data and the second central data; finally, according to each first plaintext distance, and each first ciphertext distance Two plaintext distance, select the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs. The whole process is based on homomorphic encryption, which can prevent the leakage of private data when clustering private data from multiple parties.
根据另一方面的实施例,还提供一种针对多方的隐私数据进行聚类的装置,该装置用于执行本说明书实施例提供的针对多方的隐私数据进行聚类的方法。所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述装置设置于所述第一方,用于执行多轮迭代过程。图3示出根据一个实施例的针对多方的隐私数据进行聚类的装置的示意性框图。如图3所示,该装置300包括用于执行任意一轮迭代的如下单元:中心确定单元31,用于确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合;独立计算单元32,用于计算所述第一隐私数据和所述中心确定单元31确定的各个第一中心数据之间的第一明文距离;第一联合计算单元33,用于基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离;解密单元34,用于对所述第一联合计算单元33得到的第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;类簇确定单元35,用于根据所述独立计算单元32得到的各第一明文距离,以及所述解密单元34得到的各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。According to another embodiment, a device for clustering private data of multiple parties is also provided, and the device is used to implement the method for clustering private data of multiple parties provided in the embodiment of this specification. The multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the device is set on the first party , Used to perform multiple rounds of iterative process. Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment. As shown in FIG. 3, the device 300 includes the following units for performing any round of iteration: a center determining unit 31, configured to determine the first center data corresponding to each cluster in the first type cluster set; The two parties have the second center data corresponding to each cluster in the second type cluster set; the first type cluster set and the second type cluster set constitute a total cluster set; the independent calculation unit 32 is used for calculation The first plaintext distance between the first privacy data and the respective first center data determined by the center determination unit 31; the first joint calculation unit 33 is configured to communicate with the second party based on a homomorphic encryption method Jointly calculate the first ciphertext distance between the first private data and each second central data; the decryption unit 34 is configured to decrypt the first ciphertext distance obtained by the first joint calculation unit 33 to obtain the The second plaintext distance between the first privacy data and the second central data; the cluster determining unit 35 is used to determine the first plaintext distances obtained by the independent computing unit 32 and the decryption unit 34 For each second plaintext distance, the cluster corresponding to the shortest plaintext distance is selected as the cluster to which the first private data currently belongs.
可选地,作为一个实施例,所述第一联合计算单元33包括:加密子单元,用于将所述第一隐私数据用所述第一方的公钥加密,得到第一加密数据;发送子单元,用于将所述加密子单元得到的第一加密数据发送给所述第二方,以使所述第二方同态计算所述第一加密数据与所述第二中心数据之间的距离,以得到所述第一密文距离;接收子单元,用于从所述第二方接收所述第一密文距离。Optionally, as an embodiment, the first joint computing unit 33 includes: an encryption subunit, configured to encrypt the first private data with the public key of the first party to obtain the first encrypted data; and send; A subunit, configured to send the first encrypted data obtained by the encryption subunit to the second party, so that the second party homomorphically calculates the relationship between the first encrypted data and the second central data To obtain the first ciphertext distance; and a receiving subunit for receiving the first ciphertext distance from the second party.
进一步地,所述解密单元34,具体用于利用所述第一方的私钥对所述第一密文距离解密,得到所述第二明文距离;所述第一方的私钥和所述第一方的公钥组成一组公私钥对。Further, the decryption unit 34 is specifically configured to use the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance; the private key of the first party and the The public key of the first party forms a set of public-private key pairs.
可选地,作为一个实施例,所述任意一轮迭代为第一次迭代,所述中心确定单元31,具体用于确定所述第一类簇集合中各个类簇当前分别对应的第一中心数据为随机初始 化的数据。Optionally, as an embodiment, the arbitrary round of iteration is the first iteration, and the center determining unit 31 is specifically configured to determine the first center corresponding to each cluster in the first cluster set. The data is randomly initialized data.
可选地,作为一个实施例,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;所述装置还包括:均值确定单元,用于在所述类簇确定单元35选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇之后,针对所述第一隐私数据集合,确定归属于第一类簇的各第一隐私数据的第一均值;所述第一类簇为所述第一类簇集合中的任一类簇;接收单元,用于从所述第二方接收所述第二方确定的归属于所述第一类簇的各第二隐私数据的第二均值;更新单元,用于根据所述均值确定单元确定的第一均值和所述接收单元接收的第二均值,更新所述第一类簇对应的第一中心数据。Optionally, as an embodiment, the second party has a second private data set, and the second private data set includes a plurality of second private data; the device further includes: an average value determining unit, configured to: After the cluster determining unit 35 selects the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs, for the first private data set, determine each first privacy attribute belonging to the first cluster The first average value of the data; the first type cluster is any type of cluster in the first type cluster set; the receiving unit is configured to receive from the second party that the second party determines that it belongs to the The second average value of each second privacy data of the first type cluster; an update unit, configured to update the corresponding first type cluster according to the first average value determined by the average value determining unit and the second average value received by the receiving unit The first central data.
进一步地,所述装置还包括:联合判断单元,用于在所述更新单元更新所述第一类簇对应的第一中心数据之后,与所述第二方联合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;迭代触发单元,用于若所述联合判断单元的判断结果为所述总类簇集合中各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。Further, the device further includes: a joint judging unit, configured to jointly judge with the second party after the update unit updates the first center data corresponding to the first type of cluster, the set of total type clusters Whether the variation of the central data of each type of cluster in the cluster meets the preset iterative stop condition; an iteration trigger unit is used to determine if the judgment result of the joint judging unit is the central data of each type of cluster in the general cluster set If the amount of change does not meet the preset iterative stop condition, then the next iteration of the multiple iterations is performed.
进一步地,所述装置还包括:最终确定单元,用于若所述联合判断单元的判断结果为所述总类簇集合中各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。Further, the device further includes: a final determination unit, configured to, if the judgment result of the joint judgment unit is that the change amount of the center data of the various clusters in the general cluster set satisfies a preset stop iteration condition, Then the cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
进一步地,所述联合判断单元包括:本地判断子单元,用于本地判断,所述第一类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,得到第一判断结果;接收子单元,用于从所述第二方接收第二判断结果,所述第二判断结果用于指示所述第二类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;综合判断子单元,用于根据所述本地判断子单元得到的第一判断结果和所述接收子单元接收的第二判断结果进行综合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Further, the joint judging unit includes: a local judging subunit for locally judging whether the change amount of the center data of the various clusters in the first type cluster set meets the preset stop iteration condition, and obtain the first Judgment result; a receiving subunit for receiving a second judgment result from the second party, where the second judgment result is used to indicate whether the variation of the central data of each type of cluster in the second type cluster set satisfies a predetermined Set iterative stop condition; a comprehensive judgment subunit for comprehensive judgment based on the first judgment result obtained by the local judgment subunit and the second judgment result received by the receiving subunit, in the general cluster set Whether the variation of the center data of various clusters meets the preset conditions for stopping iteration.
可选地,作为一个实施例,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;所述装置还包括:第二联合计算单元,用于基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,以使所述第二方根据所述第二密文距离解密后得到所述第二隐私数据和所述第一中心数据之间的第二明文距离。Optionally, as an embodiment, the second party has a second private data set, and the second private data set includes a plurality of second private data; the device further includes: a second joint computing unit, In a homomorphic encryption-based manner, the second ciphertext distance between the second private data and each first central data is jointly calculated with the second party, so that the second party can be based on the second ciphertext After the text distance is decrypted, the second plain text distance between the second private data and the first central data is obtained.
进一步地,所述第二联合计算单元包括:接收子单元,用于从所述第二方接收第二加密数据;所述第二加密数据为所述第二方将所述第二隐私数据用所述第二方的公钥加密得到的;计算子单元,用于同态计算所述接收子单元接收的第二加密数据与所述第一中心数据之间的距离,以得到所述第二密文距离;发送子单元,用于向所述第二方发送所述计算子单元得到的第二密文距离。Further, the second joint computing unit includes: a receiving subunit for receiving second encrypted data from the second party; the second encrypted data is used by the second party to use the second private data Obtained by encrypting the public key of the second party; a calculation subunit for homomorphically calculating the distance between the second encrypted data received by the receiving subunit and the first central data to obtain the second Ciphertext distance; a sending subunit for sending the second ciphertext distance obtained by the calculation subunit to the second party.
通过本说明书实施例提供的装置,不是由任何一方单独确定各个类簇的中心数据,而是由第一方的中心确定单元31确定第一类簇集合中各个类簇当前分别对应的第一中心数据,第二方确定第二类簇集合中各个类簇当前分别对应的第二中心数据,第一类簇集合和第二类簇集合构成总类簇集合;并且后续第一方的独立计算单元32计算所述第一隐私数据和本方确定的各个第一中心数据之间的第一明文距离;第一联合计算单元33基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和第二方确定的各个第二中心数据之间的第一密文距离;解密单元34对所述第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;最后类簇确定单元35根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。整个过程以同态加密为基础,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。With the device provided by the embodiment of this specification, no one party alone determines the center data of each cluster, but the center determining unit 31 of the first party determines the first center corresponding to each cluster in the first cluster set. Data, the second party determines the second center data corresponding to each cluster in the second type cluster set, the first type cluster set and the second type cluster set constitute the general cluster set; and the subsequent independent computing unit of the first party 32 Calculate the first plaintext distance between the first private data and each first center data determined by the party; the first joint calculation unit 33 jointly calculates the second party with the second party based on homomorphic encryption A first ciphertext distance between private data and each second central data determined by the second party; the decryption unit 34 decrypts the first ciphertext distance to obtain the first private data and the second central data According to each first plaintext distance and each second plaintext distance, the final cluster determining unit 35 selects the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs. The whole process is based on homomorphic encryption, which can prevent the leakage of private data when clustering private data from multiple parties.
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所描述的方法。According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 method.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The specific embodiments described above further describe the objectives, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims (22)

  1. 一种针对多方的隐私数据进行聚类的方法,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述方法通过所述第一方执行,包括多轮迭代过程,其中任意一轮迭代包括:A method for clustering private data of multiple parties. The multiple parties include a first party and a second party. The first party has a first set of private data. The first set of private data includes multiple first parties. For private data, the method is executed by the first party and includes multiple rounds of iterative processes, where any round of iteration includes:
    确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合;Determine the first center data corresponding to each cluster in the first type cluster set; the second party has the second center data currently corresponding to each cluster in the second type cluster set; the first type cluster set And the second-type cluster set constitute a general-type cluster set;
    计算所述第一隐私数据和各个第一中心数据之间的第一明文距离;Calculating the first plaintext distance between the first privacy data and each first central data;
    基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离;Based on a homomorphic encryption method, jointly calculate the first ciphertext distance between the first private data and each second central data with the second party;
    对所述第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;Decrypt the first ciphertext distance to obtain the second plaintext distance between the first private data and the second central data;
    根据各第一明文距离,以及各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。According to each first plaintext distance and each second plaintext distance, the cluster corresponding to the shortest plaintext distance is selected as the cluster to which the first private data currently belongs.
  2. 如权利要求1所述的方法,其中,所述基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离,包括:The method of claim 1, wherein the method based on homomorphic encryption and the second party jointly calculate the first ciphertext distance between the first private data and each second central data, comprising :
    将所述第一隐私数据用所述第一方的公钥加密,得到第一加密数据;Encrypting the first private data with the public key of the first party to obtain first encrypted data;
    将所述第一加密数据发送给所述第二方,以使所述第二方同态计算所述第一加密数据与所述第二中心数据之间的距离,以得到所述第一密文距离;Send the first encrypted data to the second party, so that the second party homomorphically calculates the distance between the first encrypted data and the second central data to obtain the first encrypted data Text distance
    从所述第二方接收所述第一密文距离。Receiving the first ciphertext distance from the second party.
  3. 如权利要求2所述的方法,其中,所述对所述第一密文距离解密,包括:The method according to claim 2, wherein said decrypting said first ciphertext distance comprises:
    利用所述第一方的私钥对所述第一密文距离解密,得到所述第二明文距离;所述第一方的私钥和所述第一方的公钥组成一组公私钥对。Use the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance; the private key of the first party and the public key of the first party form a set of public-private key pairs .
  4. 如权利要求1所述的方法,其中,所述任意一轮迭代为第一次迭代,所述第一类簇集合中各个类簇当前分别对应的第一中心数据为随机初始化的数据。The method according to claim 1, wherein the arbitrary round of iteration is the first iteration, and the first center data corresponding to each cluster in the first cluster set is randomly initialized data.
  5. 如权利要求1所述的方法,其中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;The method of claim 1, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data;
    所述选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇之后,所述方法还包括:After the selecting the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs, the method further includes:
    针对所述第一隐私数据集合,确定归属于第一类簇的各第一隐私数据的第一均值;所述第一类簇为所述第一类簇集合中的任一类簇;For the first private data set, determine the first average value of each first private data belonging to the first type of cluster; the first type of cluster is any type of cluster in the first type of cluster;
    从所述第二方接收所述第二方确定的归属于所述第一类簇的各第二隐私数据的第二均值;Receiving, from the second party, the second average value of each second private data belonging to the first type cluster determined by the second party;
    根据所述第一均值和所述第二均值,更新所述第一类簇对应的第一中心数据。According to the first average value and the second average value, the first center data corresponding to the first type cluster is updated.
  6. 如权利要求5所述的方法,其中,所述更新所述第一类簇对应的第一中心数据之后,所述方法还包括:5. The method according to claim 5, wherein after the updating the first center data corresponding to the first type cluster, the method further comprises:
    与所述第二方联合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;Jointly judge with the second party whether the change amount of the center data of the various clusters in the general cluster set meets a preset condition for stopping iteration;
    若判断结果为所述总类簇集合中各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。If the result of the judgment is that the amount of change of the center data of each type of cluster in the total cluster set does not meet the preset iterative stop condition, then the next iteration of the multiple rounds of iterative process is performed.
  7. 如权利要求6所述的方法,其中,所述方法还包括:The method of claim 6, wherein the method further comprises:
    若判断结果为所述总类簇集合中各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。If the result of the judgment is that the amount of change in the central data of the various clusters in the general cluster set satisfies the preset iterative stop condition, the cluster to which the first private data currently belongs is determined as the first private data The final category cluster.
  8. 如权利要求6所述的方法,其中,所述联合判断,包括:The method according to claim 6, wherein the joint judgment comprises:
    本地判断,所述第一类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,得到第一判断结果;Locally judge whether the change amount of the center data of the various clusters in the first-type cluster set meets a preset iterative stop condition, and obtain the first judgment result;
    从所述第二方接收第二判断结果,所述第二判断结果用于指示所述第二类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;Receiving a second judgment result from the second party, where the second judgment result is used to indicate whether the amount of change in the center data of each type of cluster in the second type cluster set meets a preset iterative stop condition;
    根据所述第一判断结果和所述第二判断结果进行综合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件。According to the first judgment result and the second judgment result, a comprehensive judgment is made to determine whether the change amount of the center data of the various types of clusters in the general cluster set meets a preset iterative stop condition.
  9. 如权利要求1所述的方法,其中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;The method of claim 1, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data;
    所述方法还包括:The method also includes:
    基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,以使所述第二方根据所述第二密文距离解密后得到所述第二隐私数据和所述第一中心数据之间的第二明文距离。Based on the homomorphic encryption method, the second ciphertext distance between the second private data and each first central data is jointly calculated with the second party, so that the second party can be based on the second ciphertext After the distance is decrypted, the second plaintext distance between the second private data and the first central data is obtained.
  10. 如权利要求9所述的方法,其中,所述基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,包括:The method according to claim 9, wherein the method based on homomorphic encryption and the second party jointly calculate the second ciphertext distance between the second private data and each first central data, comprising :
    从所述第二方接收第二加密数据;所述第二加密数据为所述第二方将所述第二隐私数据用所述第二方的公钥加密得到的;Receiving second encrypted data from the second party; the second encrypted data is obtained by the second party encrypting the second private data with the public key of the second party;
    同态计算所述第二加密数据与所述第一中心数据之间的距离,以得到所述第二密文 距离;Homomorphically calculating the distance between the second encrypted data and the first central data to obtain the second ciphertext distance;
    向所述第二方发送所述第二密文距离。Sending the second ciphertext distance to the second party.
  11. 一种针对多方的隐私数据进行聚类的装置,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述装置设置于所述第一方,用于执行多轮迭代过程,包括用于执行任意一轮迭代的如下单元:A device for clustering private data of multiple parties. The multiple parties include a first party and a second party. The first party has a first set of private data. The first set of private data includes multiple first parties. Private data, the device is set on the first party, and is used to perform multiple rounds of iterative processes, including the following units for performing any round of iteration:
    中心确定单元,用于确定第一类簇集合中各个类簇当前分别对应的第一中心数据;所述第二方具有第二类簇集合中各个类簇当前分别对应的第二中心数据;所述第一类簇集合和所述第二类簇集合构成总类簇集合;The center determining unit is used to determine the first center data corresponding to each cluster in the first type cluster set; the second party has the second center data currently corresponding to each cluster in the second type cluster set; The first-type cluster set and the second-type cluster set constitute a general-type cluster set;
    独立计算单元,用于计算所述第一隐私数据和所述中心确定单元确定的各个第一中心数据之间的第一明文距离;An independent calculation unit, configured to calculate the first plaintext distance between the first privacy data and each first center data determined by the center determination unit;
    第一联合计算单元,用于基于同态加密的方式,与所述第二方联合计算所述第一隐私数据和各个第二中心数据之间的第一密文距离;The first joint calculation unit is configured to jointly calculate the first ciphertext distance between the first private data and each second central data based on the homomorphic encryption method with the second party;
    解密单元,用于对所述第一联合计算单元得到的第一密文距离解密,得到所述第一隐私数据和所述第二中心数据之间的第二明文距离;A decryption unit, configured to decrypt the first ciphertext distance obtained by the first joint computing unit to obtain the second plaintext distance between the first private data and the second central data;
    类簇确定单元,用于根据所述独立计算单元得到的各第一明文距离,以及所述解密单元得到的各第二明文距离,选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇。The cluster determining unit is configured to select the cluster corresponding to the shortest plaintext distance as the current first privacy data according to each first plaintext distance obtained by the independent calculation unit and each second plaintext distance obtained by the decryption unit The class cluster to which it belongs.
  12. 如权利要求11所述的装置,其中,所述第一联合计算单元包括:The apparatus of claim 11, wherein the first joint computing unit comprises:
    加密子单元,用于将所述第一隐私数据用所述第一方的公钥加密,得到第一加密数据;An encryption subunit, configured to encrypt the first private data with the public key of the first party to obtain the first encrypted data;
    发送子单元,用于将所述加密子单元得到的第一加密数据发送给所述第二方,以使所述第二方同态计算所述第一加密数据与所述第二中心数据之间的距离,以得到所述第一密文距离;The sending subunit is configured to send the first encrypted data obtained by the encrypting subunit to the second party, so that the second party can homomorphically calculate the difference between the first encrypted data and the second central data To obtain the first ciphertext distance;
    接收子单元,用于从所述第二方接收所述第一密文距离。The receiving subunit is configured to receive the first ciphertext distance from the second party.
  13. 如权利要求12所述的装置,其中,所述解密单元,具体用于利用所述第一方的私钥对所述第一密文距离解密,得到所述第二明文距离;所述第一方的私钥和所述第一方的公钥组成一组公私钥对。The device according to claim 12, wherein the decryption unit is specifically configured to use the private key of the first party to decrypt the first ciphertext distance to obtain the second plaintext distance; The private key of the party and the public key of the first party form a set of public-private key pairs.
  14. 如权利要求11所述的装置,其中,所述任意一轮迭代为第一次迭代,所述中心确定单元,具体用于确定所述第一类簇集合中各个类簇当前分别对应的第一中心数据为随机初始化的数据。The device according to claim 11, wherein the any round of iteration is the first iteration, and the center determining unit is specifically configured to determine the first type cluster currently corresponding to each cluster in the first type cluster set. The central data is randomly initialized data.
  15. 如权利要求11所述的装置,其中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;The apparatus of claim 11, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data;
    所述装置还包括:The device also includes:
    均值确定单元,用于在所述类簇确定单元选择最短明文距离对应的类簇作为所述第一隐私数据当前归属的类簇之后,针对所述第一隐私数据集合,确定归属于第一类簇的各第一隐私数据的第一均值;所述第一类簇为所述第一类簇集合中的任一类簇;The mean value determining unit is configured to determine that the first private data set belongs to the first category after the cluster determining unit selects the cluster corresponding to the shortest plaintext distance as the cluster to which the first private data currently belongs A first mean value of each first private data of the cluster; the first type of cluster is any type of cluster in the first type of cluster set;
    接收单元,用于从所述第二方接收所述第二方确定的归属于所述第一类簇的各第二隐私数据的第二均值;A receiving unit, configured to receive, from the second party, the second average value of each second private data belonging to the first type cluster determined by the second party;
    更新单元,用于根据所述均值确定单元确定的第一均值和所述接收单元接收的第二均值,更新所述第一类簇对应的第一中心数据。The updating unit is configured to update the first center data corresponding to the first type cluster according to the first average value determined by the average value determining unit and the second average value received by the receiving unit.
  16. 如权利要求15所述的装置,其中,所述装置还包括:The device of claim 15, wherein the device further comprises:
    联合判断单元,用于在所述更新单元更新所述第一类簇对应的第一中心数据之后,与所述第二方联合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;The joint judgment unit is configured to jointly judge with the second party after the update unit updates the first center data corresponding to the first type of cluster, the change of the center data of each type of cluster in the general type cluster set Whether the quantity meets the preset conditions for stopping iteration;
    迭代触发单元,用于若所述联合判断单元的判断结果为所述总类簇集合中各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。An iteration triggering unit, configured to perform the multi-round iteration process if the judgment result of the joint judging unit is that the change amount of the center data of the various clusters in the general cluster set does not meet the preset iterative stop condition The next iteration in.
  17. 如权利要求16所述的装置,其中,所述装置还包括:The device of claim 16, wherein the device further comprises:
    最终确定单元,用于若所述联合判断单元的判断结果为所述总类簇集合中各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。The final determination unit is configured to: if the judgment result of the joint judgment unit is that the variation of the central data of the various clusters in the general cluster set satisfies a preset stop iteration condition, then the first private data is currently The class cluster to which the first private data belongs is determined as the class cluster to which the first private data ultimately belongs.
  18. 如权利要求16所述的装置,其中,所述联合判断单元包括:The apparatus according to claim 16, wherein the joint judgment unit comprises:
    本地判断子单元,用于本地判断,所述第一类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,得到第一判断结果;The local judgment subunit is used to locally judge whether the change amount of the center data of the various clusters in the first type cluster set meets the preset stop iteration condition, and obtain the first judgment result;
    接收子单元,用于从所述第二方接收第二判断结果,所述第二判断结果用于指示所述第二类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;The receiving subunit is configured to receive a second judgment result from the second party, where the second judgment result is used to indicate whether the amount of change in the center data of each type of cluster in the second type cluster set meets a preset Stop the iteration condition;
    综合判断子单元,用于根据所述本地判断子单元得到的第一判断结果和所述接收子单元接收的第二判断结果进行综合判断,所述总类簇集合中各类簇的中心数据的变化量是否满足预先设定的停止迭代条件。The comprehensive judgment subunit is used to make a comprehensive judgment based on the first judgment result obtained by the local judgment subunit and the second judgment result received by the receiving subunit. Whether the amount of change meets the preset conditions for stopping iteration.
  19. 如权利要求11所述的装置,其中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据;The apparatus of claim 11, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data;
    所述装置还包括:The device also includes:
    第二联合计算单元,用于基于同态加密的方式,与所述第二方联合计算所述第二隐私数据和各个第一中心数据之间的第二密文距离,以使所述第二方根据所述第二密文距离解密后得到所述第二隐私数据和所述第一中心数据之间的第二明文距离。The second joint calculation unit is configured to jointly calculate the second ciphertext distance between the second private data and each first central data based on the homomorphic encryption method, so that the second The party obtains the second plaintext distance between the second private data and the first central data after decryption according to the second ciphertext distance.
  20. 如权利要求19所述的装置,其中,所述第二联合计算单元包括:The apparatus of claim 19, wherein the second joint computing unit comprises:
    接收子单元,用于从所述第二方接收第二加密数据;所述第二加密数据为所述第二方将所述第二隐私数据用所述第二方的公钥加密得到的;A receiving subunit, configured to receive second encrypted data from the second party; the second encrypted data is obtained by the second party encrypting the second private data with the public key of the second party;
    计算子单元,用于同态计算所述接收子单元接收的第二加密数据与所述第一中心数据之间的距离,以得到所述第二密文距离;A calculating subunit for homomorphically calculating the distance between the second encrypted data received by the receiving subunit and the first central data to obtain the second ciphertext distance;
    发送子单元,用于向所述第二方发送所述计算子单元得到的第二密文距离。The sending subunit is configured to send the second ciphertext distance obtained by the calculating subunit to the second party.
  21. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-10中任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1-10.
  22. 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-10中任一项所述的方法。A computing device, comprising a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the method according to any one of claims 1-10 is implemented.
PCT/CN2021/099479 2020-06-12 2021-06-10 Method and apparatus for clustering private data of multiple parties WO2021249500A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010536743.6A CN111444545B (en) 2020-06-12 2020-06-12 Method and device for clustering private data of multiple parties
CN202010536743.6 2020-06-12

Publications (1)

Publication Number Publication Date
WO2021249500A1 true WO2021249500A1 (en) 2021-12-16

Family

ID=71653621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099479 WO2021249500A1 (en) 2020-06-12 2021-06-10 Method and apparatus for clustering private data of multiple parties

Country Status (2)

Country Link
CN (1) CN111444545B (en)
WO (1) WO2021249500A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114696991A (en) * 2022-05-31 2022-07-01 蓝象智联(杭州)科技有限公司 Homomorphic encryption-based data clustering method and device
CN115577380A (en) * 2022-12-01 2023-01-06 武汉惠强新能源材料科技有限公司 Material data management method and system based on MES system
CN117808643A (en) * 2024-02-29 2024-04-02 四川师范大学 Teaching management system based on Chinese language
WO2024082515A1 (en) * 2022-10-18 2024-04-25 上海零数众合信息科技有限公司 Decentralized federated clustering learning method and apparatus, and device and medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444545B (en) * 2020-06-12 2020-09-04 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties
CN111738238B (en) * 2020-08-14 2020-11-13 支付宝(杭州)信息技术有限公司 Face recognition method and device
CN112101579B (en) * 2020-11-18 2021-02-09 杭州趣链科技有限公司 Federal learning-based machine learning method, electronic device, and storage medium
CN117194350B (en) * 2023-11-07 2024-03-15 广东云下汇金科技有限公司 Document storage method and system in engineering construction stage of data center

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145792A (en) * 2017-04-07 2017-09-08 哈尔滨工业大学深圳研究生院 Multi-user's secret protection data clustering method and system based on ciphertext data
CN108881204A (en) * 2018-06-08 2018-11-23 浙江捷尚人工智能研究发展有限公司 Secret protection cluster data mining method, electronic equipment, storage medium and system
CN109858269A (en) * 2019-02-20 2019-06-07 安徽师范大学 A kind of secret protection density peak clustering method based on homomorphic cryptography
CN110233730A (en) * 2019-05-22 2019-09-13 暨南大学 A kind of method for protecting privacy based on K mean cluster
CN111444545A (en) * 2020-06-12 2020-07-24 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145791B (en) * 2017-04-07 2020-07-10 哈尔滨工业大学深圳研究生院 K-means clustering method and system with privacy protection function
CN110334757A (en) * 2019-06-27 2019-10-15 南京邮电大学 Secret protection clustering method and computer storage medium towards big data analysis
CN110609831B (en) * 2019-08-27 2020-07-03 浙江工商大学 Data link method based on privacy protection and safe multi-party calculation
CN111143865B (en) * 2019-12-26 2022-12-30 国网湖北省电力有限公司 User behavior analysis system and method for automatically generating label on ciphertext data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145792A (en) * 2017-04-07 2017-09-08 哈尔滨工业大学深圳研究生院 Multi-user's secret protection data clustering method and system based on ciphertext data
CN108881204A (en) * 2018-06-08 2018-11-23 浙江捷尚人工智能研究发展有限公司 Secret protection cluster data mining method, electronic equipment, storage medium and system
CN109858269A (en) * 2019-02-20 2019-06-07 安徽师范大学 A kind of secret protection density peak clustering method based on homomorphic cryptography
CN110233730A (en) * 2019-05-22 2019-09-13 暨南大学 A kind of method for protecting privacy based on K mean cluster
CN111444545A (en) * 2020-06-12 2020-07-24 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114696991A (en) * 2022-05-31 2022-07-01 蓝象智联(杭州)科技有限公司 Homomorphic encryption-based data clustering method and device
WO2024082515A1 (en) * 2022-10-18 2024-04-25 上海零数众合信息科技有限公司 Decentralized federated clustering learning method and apparatus, and device and medium
CN115577380A (en) * 2022-12-01 2023-01-06 武汉惠强新能源材料科技有限公司 Material data management method and system based on MES system
CN117808643A (en) * 2024-02-29 2024-04-02 四川师范大学 Teaching management system based on Chinese language
CN117808643B (en) * 2024-02-29 2024-05-28 四川师范大学 Teaching management system based on Chinese language

Also Published As

Publication number Publication date
CN111444545A (en) 2020-07-24
CN111444545B (en) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2021249500A1 (en) Method and apparatus for clustering private data of multiple parties
WO2021249502A1 (en) Method and apparatus for clustering privacy data of multiple parties
WO2021114927A1 (en) Method and apparatus for multiple parties jointly performing feature assessment to protect privacy security
CN110995409B (en) Mimicry defense arbitration method and system based on partial homomorphic encryption algorithm
US10609000B2 (en) Data tokenization
EP3816918A1 (en) Blockchain-based invoice access method and apparatus, and electronic device
WO2020220755A1 (en) Reliable user service system and method
CN108881291B (en) Weight attribute base encryption method based on hierarchical authorization mechanism
WO2021159798A1 (en) Method for optimizing longitudinal federated learning system, device and readable storage medium
CN114330759B (en) Training method and system for longitudinal federated learning model
US20240135008A1 (en) Methods, apparatuses and systems for obtaining data authorization
CN107948152A (en) Information storage means, acquisition methods, device and equipment
US20240039896A1 (en) Bandwidth controlled multi-party joint data processing methods and apparatuses
CN114065252A (en) Privacy set intersection method and device with condition retrieval and computer equipment
CN114362940B (en) Server-free asynchronous federation learning method for protecting data privacy
CN114584294A (en) Method and device for careless scattered arrangement
CN108132977A (en) Ciphertext database querying method and system based on vertical division
WO2023093090A1 (en) Sample alignment method and apparatus, device, and storage medium
CN111241596B (en) Block chain asset account recovery method and device
CN112929151B (en) Entity alignment method based on privacy protection and computer storage medium
CN111342967A (en) Method and device for solving block chain user certificate loss or damage
CN114866312A (en) Common data determination method and device for protecting data privacy
CN112732776B (en) Secure approximate pattern matching method and system and electronic equipment
CN114186202A (en) Unreliable user tracking and cancelling method in privacy protection federal learning
CN110912695A (en) Quantum arbitration signature method and system based on six-particle invisible transmission state

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21822403

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21822403

Country of ref document: EP

Kind code of ref document: A1