WO2021249502A1 - Method and apparatus for clustering privacy data of multiple parties - Google Patents

Method and apparatus for clustering privacy data of multiple parties Download PDF

Info

Publication number
WO2021249502A1
WO2021249502A1 PCT/CN2021/099485 CN2021099485W WO2021249502A1 WO 2021249502 A1 WO2021249502 A1 WO 2021249502A1 CN 2021099485 W CN2021099485 W CN 2021099485W WO 2021249502 A1 WO2021249502 A1 WO 2021249502A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
cluster
party
center
Prior art date
Application number
PCT/CN2021/099485
Other languages
French (fr)
Chinese (zh)
Inventor
陈超超
周俊
王力
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021249502A1 publication Critical patent/WO2021249502A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Definitions

  • One or more embodiments of the present disclosure relate to the computer field, and in particular, to a method and device for clustering private data from multiple parties.
  • Clustering is a very common technique in machine learning. It is often applied to tasks such as community discovery and anomaly detection.
  • the usual clustering algorithm is an unsupervised learning algorithm whose purpose is to group similar objects into the same cluster. The more similar the objects in the cluster, the better the clustering effect.
  • the biggest difference between clustering and classification is that the target of classification is known in advance, while clustering is different. The result is the same as the classification, but the classification is not pre-defined.
  • the data is distributed horizontally across multiple parties.
  • the data owned by each party may be private data, that is, the private data owned by one party cannot be disclosed to other parties.
  • the prior art does not provide a suitable clustering method.
  • One or more embodiments of the present disclosure describe a method and device for clustering private data of multiple parties, which can prevent the leakage of private data when clustering private data of multiple parties.
  • a method for clustering private data of multiple parties includes multiple first private data, which is executed by the first party, and includes multiple rounds of iterative processes, where any round of iteration includes: determining the first shard of each center data corresponding to each cluster; so The second party has the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and the respective central data is used as the target
  • the central data is based on the first local private data and the first fragment of the target center data, and the first joint calculation is performed with the second fragment of the target center data in the second party by means of secret sharing to obtain the result
  • the first joint calculation includes: locally calculating a first distance between the first private data and a first fragment of the target center data; The difference between the first segment and the first private data is multiplied with the second segment of the target center data in the second party, and multiplied in the secret sharing mode to obtain the first segment of the product; The first segment of the first distance and the product determines the first segment of the first target distance of the first privacy data and the target center data.
  • the arbitrary round of iteration is the first iteration, and the first fragments of the respective center data currently corresponding to the respective clusters are randomly initialized data.
  • the joint comparison includes: based on the first fragments of any two first target distances among the first target distances, using a secret sharing manner to communicate with the second party The second fragments of any two first target distances are jointly compared to determine the comparison result of the distance between the any two first target distances; according to the comparison results, determine the first target distance The closest first target distance.
  • the method further includes: according to the same cluster The average value of each first private data of, update the first shard of the central data of this type of cluster.
  • the method further includes: judging whether the amount of change in the central data of each type of cluster satisfies a preset condition for stopping iteration; if the judgment result is each The change amount of the center data of the cluster does not meet the preset iterative stop condition, and then the next iteration in the multiple rounds of iterative process is performed.
  • the method further includes: if the result of the judgment is that the variation of the central data of various clusters meets a preset iterative stop condition, determining the cluster to which the first private data currently belongs is the first The class cluster to which the private data ultimately belongs.
  • the judging whether the variation of the center data of the various clusters meets the preset iterative stop condition includes: taking any one of the various types of clusters as a target cluster, and according to the value of the target cluster The first fragment of the center data before the update, the first fragment of the updated center data of the target cluster, and the center of the target cluster before the update in the second party are secretly shared. The second segment of the data and the second segment of the updated center data of the target cluster are jointly compared to determine whether the amount of change in the center data of the target cluster meets a preset iterative stop condition.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data
  • the method further includes: separately combining the data of each center As the target center data, based on the first segment of the local target center data, a second joint calculation is performed with the second private data in the second party and the second segment of the target center data by means of secret sharing, Obtain the second segment of the second privacy data and the second target distance of the target center data; the second party has the first segment of the second target distance.
  • the second joint calculation includes: locally calculating the square of the first fragment of the target center data; and combining the first fragment of the target center data with the target center data in the second party.
  • the difference between the second fragment of and the second private data is multiplied in the secret sharing mode to obtain the second fragment of the product; the second fragment of the product is determined according to the square and the second fragment of the product.
  • an apparatus for clustering private data of multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data
  • the device includes a plurality of first privacy data, and the device is set in the first party to perform multiple rounds of iterative processes, including the following units for performing any round of iteration: a central determination unit, used to determine the current class of each cluster Corresponding to the first shard of each central data; the second party has the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to The center data; the first joint computing unit, used to respectively use the center data determined by the center determination unit as the target center data, based on the first local privacy data and the first fragment of the target center data, using secret sharing In this way, a first joint calculation is performed with the second segment of the target center data in the second party to obtain the first segment of the first privacy data and the first target distance of
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
  • neither party determines the central data of each cluster individually, but the first party determines the first fragment of each central data corresponding to each cluster, and the second party Determine the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and subsequently determine the first private data and the target center
  • the secret sharing method is used.
  • the first party determines the first segment of the first target distance
  • the second party determines the second segment of the first target distance; in determining each first target distance
  • the secret sharing method is also used; finally, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs.
  • the whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.
  • FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present disclosure
  • Fig. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment
  • Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment.
  • FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present disclosure.
  • This implementation scenario involves clustering private data from multiple parties.
  • the above-mentioned multiple parties may be two parties or more than two parties, for example, three parties, four parties, and so on.
  • clustering of private data of two parties is taken as an example for description.
  • the first party 11 has privacy data 1, privacy data 2, privacy data 3, privacy data 4, and privacy data 5;
  • the second party 12 has privacy data 6, privacy data 7, privacy data 8, and privacy data 9.
  • the first party and the second party are only a distinction between the two parties.
  • the first party may also be referred to as party A
  • the second party may be referred to as party B, and so on.
  • the information covered by the private data is not limited, and it may be any information that cannot be communicated, for example, the user's personal information or business secrets.
  • the private data is the user's personal information, including the user's name, age, income, etc., for details, please refer to the correspondence table of the information contained in each private data shown in Table 1.
  • Table 1 Correspondence table of information contained in each private data
  • Table 1 the data of different rows in Table 1 may be distributed in different parties.
  • private data 1 is distributed in the first party
  • private data 8 is distributed in the second party.
  • This kind of data distribution is distributed among multiple parties. It can be called horizontal segmentation.
  • FIG. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment, and the method may be based on the implementation scenario shown in FIG. 1.
  • the multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the method is executed by the first party , Including multiple rounds of iterative process, as shown in Figure 2, any round of iteration includes the following steps: Step 21, determine the first segment of each central data corresponding to each cluster; the second party has the The second shard of each center data; the sum of the first shard of any center data and the second shard of the center data is equal to the center data; step 22, each center data is used as the target center data, based on The local first private data and the first fragment of the target center data are secretly shared, and the first joint calculation is performed with the second fragment of the target center data in the second party to obtain the first privacy Data and the first segment of the first target distance of the target
  • step 21 determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and the The sum of the second shards of the central data is equal to the central data.
  • the number of the above-mentioned respective clusters may be preset, for example, it is preset to divide the private data of multiple parties into two clusters or three clusters.
  • each center data is jointly determined by the first party and the second party.
  • the first party can only determine the first shard of each center data
  • the second party determines the second shard of each center data.
  • Neither the first party nor the second party can individually determine the central data.
  • the arbitrary round of iteration is the first iteration, and the first fragments of the respective central data currently corresponding to the respective clusters are randomly initialized data.
  • the first party randomly initializes the first share of the two central data, denoted as ( ⁇ c1>1, ⁇ c2>1); accordingly, The second party randomly initializes the second fragment of 2 central data, denoted as ( ⁇ c1>2, ⁇ c2>2).
  • K is the number of clusters
  • the initial vector is all 0, that is, [0, 0].
  • each center data is used as the target center data, based on the first local privacy data and the first fragment of the target center data, and the secret sharing method is used to communicate with the target center in the second party.
  • the first target distance between the first private data and the target center data can be expressed as (c1-x1) ⁇ 2, and then ⁇ c1 >1 indicates the first fragment of the target center data, and ⁇ c1>2 indicates the second fragment of the target center data.
  • the following formula derivation process can be performed:
  • the solution (c1-x1) ⁇ 2 can be transformed into the solution ( ⁇ c1>1-x1) ⁇ 2, ( ⁇ c1>1-x1) ⁇ c1>2 and ( ⁇ c1>2) ⁇ 2.
  • the first joint calculation includes: locally calculating the first distance between the first private data and the first fragment of the target center data; and dividing the first fragment of the target center data The difference between the first privacy data and the second segment of the target center data in the second party is multiplied in the secret sharing mode to obtain the first segment of the product; according to the first The first segment of the distance and the product determines the first segment of the first target distance of the first privacy data and the target center data.
  • the above-mentioned first distance corresponds to ( ⁇ c1>1-x1) ⁇ 2 in the derivation result of the aforementioned formula
  • the above product corresponds to ( ⁇ c1>1-x1) ⁇ c1>2 in the derivation result of the aforementioned formula
  • the first slice of the product can be expressed as ⁇ ( ⁇ c1>1-x1) ⁇ c1>2>1.
  • the first party can sum ( ⁇ c1>1-x1) ⁇ 2 and ⁇ ( ⁇ c1>1-x1) ⁇ c1>2>1 to get the first segment of the first target distance, and the first target distance
  • the first fragment can be expressed as ⁇ x1c1>1.
  • the second party can determine the second segment of the first target distance in the following manner: the second party uses the respective center data as the target center data, based on the second segment of the local target center data, and uses the secret In a shared manner, joint calculation is performed with the first shard of the first private data in the first party and the target center data to obtain the second target distance between the first private data and the target center data. Fragmentation.
  • the aforementioned joint calculation includes: the second party locally calculates the square of the second segment of the target center data; and the second segment of the target center data is combined with the target center data in the first party.
  • the difference between the first fragment of and the first private data is multiplied in the secret sharing mode to obtain the second fragment of the product; the second fragment of the product is determined according to the square and the second fragment of the product.
  • the above square corresponds to ( ⁇ c1>2) ⁇ 2 in the derivation result of the aforementioned formula
  • the above product corresponds to ( ⁇ c1>1-x1) ⁇ c1>2 in the derivation result of the aforementioned formula
  • the second of the above product Fragmentation can be expressed as ⁇ ( ⁇ c1>1-x1) ⁇ c1>2>2.
  • the second party can sum ( ⁇ c1>2) ⁇ 2 and ⁇ ( ⁇ c1>1-x1) ⁇ c1>2>2 to get the second segment of the first target distance, and the second segment of the first target distance.
  • Fragmentation can be expressed as ⁇ x1c1>2.
  • the distance between x1 and c2 can be determined in the same way as the distance between x1 and c1. .
  • step 23 based on the first segment of each first target distance, use the secret sharing method to jointly compare with the second segment of each first target distance in the second party to determine the first target distance The closest first target distance.
  • each first target distance is the distance between the first private data and each central data, and the sum of the first segment of the first target distance and the second segment of the first target distance is the first target distance .
  • each first target distance can be determined.
  • the closest first target distance among target distances For example, compare the size of x1c1 and x1c2, where the smaller corresponding cluster is the cluster to which x1 belongs. Assuming that x1c2 is a small value, it means that x1 is closest to c2, and its cluster vector changes It is [0,1].
  • the number of clusters is more than three, there are more than three center data, and correspondingly, there are more than three first target distances. Compare the size of any two first target distances among them to determine the first target distance. The closest first target distance.
  • the joint comparison includes: based on any two first fragments of the first target distances among the first target distances, using a secret sharing manner, to compare with any two of the second party
  • the second segment of the first target distance is jointly compared to determine the comparison result of the distance between any two first target distances; according to the comparison results, the closest first target distance among the first target distances is determined One target distance.
  • step 24 the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. It is understandable that in different rounds of iterative processes, the clusters to which the first private data belongs may be different.
  • the method further includes: according to the first clusters of the same cluster The average value of the private data, the first segment of the central data of the cluster is updated.
  • the aforementioned clusters of the same type are any type of clusters among the aforementioned types of clusters.
  • the first party and the second party update the central data (c1 and c2) according to the cluster vector of all private data.
  • the update process is as follows:
  • the first party calculates the mean value of all private data whose cluster vector is [1, 0], denoted as ⁇ c1>1;
  • the second party calculates the mean value of all private data whose cluster vector is [1, 0], denoted as ⁇ c1>2.
  • the method further includes: judging whether the amount of change in the central data of each type of cluster satisfies a preset condition for stopping iteration; if the judgment result is each The change amount of the center data of the cluster does not meet the preset iterative stop condition, and then the next iteration in the multiple rounds of iterative process is performed.
  • the method further includes: if the result of the judgment is that the variation of the central data of various clusters meets a preset iterative stop condition, determining the cluster to which the first private data currently belongs is the first The class cluster to which the private data ultimately belongs.
  • the judging whether the variation of the center data of the various clusters meets the preset iterative stop condition includes: taking any one of the various types of clusters as a target cluster, and according to the value of the target cluster The first fragment of the center data before the update, the first fragment of the updated center data of the target cluster, and the center of the target cluster before the update in the second party are secretly shared. The second segment of the data and the second segment of the updated center data of the target cluster are jointly compared to determine whether the amount of change in the center data of the target cluster meets a preset iterative stop condition.
  • the above stop iteration condition is
  • the processing procedure from step 21 to step 24 is mainly to describe the first party's first private data for its own party, and determine the category cluster to which the first private data belongs.
  • the first party's first private data for the second party The second private data also needs to cooperate with the second party to determine the cluster to which the second private data belongs in a secret sharing manner.
  • the second party has a second private data set, and the second private data set includes a plurality of second private data
  • the method further includes: the first party uses the respective center data as The target center data is based on the first fragment of the local target center data, and the second joint calculation is performed with the second private data in the second party and the second fragment of the target center data by means of secret sharing, to obtain The second private data and the second segment of the target center data at the second target distance; the second party has the first segment of the second target distance.
  • the second joint calculation includes: locally calculating the square of the first fragment of the target center data; and combining the first fragment of the target center data with the target center data in the second party.
  • the difference between the second fragment of and the second private data is subjected to a multiplication operation in the secret sharing mode to obtain the second fragment of the product;
  • the first party determines the first segment of each central data currently corresponding to each cluster, and the second party determines each cluster.
  • the second shard of the central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and the subsequent determination of the first private data and the target central data In the case of the first target distance, the secret sharing method is used.
  • the first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance; in determining the first target distance, the In the case of the closest first target distance, a secret sharing method is also used; finally, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs.
  • the whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.
  • an apparatus for clustering private data of multiple parties and the apparatus is configured to execute the method for clustering private data of multiple parties provided in the embodiments of the present disclosure.
  • the multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the device is set on the first party , Used to perform multiple rounds of iterative process.
  • Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment. As shown in FIG.
  • the device 300 includes the following units for performing any round of iteration: a center determining unit 31, used to determine the first segment of each center data corresponding to each cluster; the second party There are the second shards of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; the first joint computing unit 32 is configured to separately The center data determined by the center determining unit 31 is used as the target center data. Based on the first local privacy data and the first fragment of the target center data, the secret sharing method is used to communicate with the target center data in the second party.
  • the joint comparison unit 33 is configured to use the secret sharing method based on the first fragments of the first target distances obtained by the first joint calculation unit 32 to obtain the second distance from the first targets in the second party Joint comparison is performed in pieces to determine the closest first target distance among the first target distances; the cluster determining unit 34 is configured to determine the cluster corresponding to the closest first target distance determined by the joint comparing unit 33 Is the cluster to which the first private data currently belongs.
  • the first joint computing unit 32 includes: a local computing subunit, configured to locally calculate the first data between the first private data and the first fragment of the target center data Distance; joint calculation subunit, used to share the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party for secret sharing The first segment of the product is obtained by the multiplication operation in the mode; the determining subunit is used to determine the first segment of the product based on the first distance obtained by the local calculation subunit and the first segment of the product obtained by the joint calculation subunit The first segment of the first target distance of the first privacy data and the target center data.
  • the arbitrary round of iteration is the first iteration
  • the center determining unit 31 is specifically configured to determine that the first shard of each center data corresponding to each cluster is randomly initialized. The data.
  • the joint comparison unit 33 includes: a joint comparison subunit, configured to use a secret shared first segment based on any two first target distances among the first target distances In the second party, a joint comparison is made with the second segment of any two first target distances in the second party to determine the comparison result of the distance between any two first target distances; to determine the subunit, use Based on the comparison results determined by the joint comparison subunit, the closest first target distance among the first target distances is determined.
  • the device further includes: an update unit, configured to determine, in the cluster determining unit 34, the cluster corresponding to the closest first target distance as the first private data After the cluster currently belongs, the first segment of the central data of the cluster is updated according to the average value of each first private data of the same cluster.
  • an update unit configured to determine, in the cluster determining unit 34, the cluster corresponding to the closest first target distance as the first private data After the cluster currently belongs, the first segment of the central data of the cluster is updated according to the average value of each first private data of the same cluster.
  • the device further includes: a judging unit for judging whether the amount of change in the central data of each type of cluster meets a preset stop after the update unit updates the first fragment of the central data of the cluster. Iteration condition; an iteration triggering unit, which is used to perform the next iteration of the multi-round iteration process if the judgment result of the judging unit is that the variation of the center data of various clusters does not meet the preset iterative stop condition .
  • the device further includes: a final determination unit, configured to determine the cluster determination unit if the change amount of the center data of each type of cluster satisfies a preset iterative stop condition if the determination result of the determination unit is 34
  • the cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
  • the judging unit is specifically configured to use any one of the various types of clusters as a target class cluster, and according to the first segment of the center data of the target class cluster before updating, and the value of the target class cluster
  • the first segment of the updated central data is shared with the second segment of the pre-updated central data of the target cluster in the second party and the updated center of the target cluster by means of secret sharing.
  • the second segment of the data is jointly compared to determine whether the amount of change in the center data of the target cluster meets the preset stop iteration condition.
  • the second party has a second private data set
  • the second private data set includes a plurality of second private data
  • the device further includes: a second joint computing unit, Using the data of each center as the target center data respectively, based on the first fragment of the local target center data, using the secret sharing method, the second private data in the second party and the second private data of the target center data are shared with each other. Perform a second joint calculation on the fragments to obtain a second fragment of the second target distance of the second privacy data and the target center data; the second party has the first fragment of the second target distance.
  • the second joint calculation unit includes: a local calculation subunit for locally calculating the square of the first slice of the target center data; a joint calculation subunit for calculating the first shard of the target center data Fragment, and the difference between the second fragment of the target center data in the second party and the second privacy data is multiplied in the secret sharing mode to obtain the second fragment of the product; confirm A subunit for determining a second target distance between the second privacy data and the target center data according to the second segment of the square obtained by the local calculation subunit and the product obtained by the joint calculation subunit The second fragment.
  • the center determining unit 31 of the first party determines the first segment of each center data currently corresponding to each cluster.
  • the second party determines the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and subsequently the first joint computing unit 32 determines the When the first target distance between the first privacy data and the target center data is used, the secret sharing method is used.
  • the first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance.
  • the joint comparison unit 33 determines the closest first target distance among the first target distances, a secret sharing method is also used; the last class cluster determination unit 34 classifies the class cluster corresponding to the nearest first target distance , It is determined as the cluster to which the first private data currently belongs.
  • the whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a method and apparatus for clustering privacy data of multiple parties. The method comprises: a first party determining first shares of pieces of center data currently respectively corresponding to clusters; respectively taking these pieces of center data as target center data, and on the basis of local first privacy data and the first shares of the target center data and by means of secret sharing, performing first joint calculation with second shares of target center data in a second party, so as to obtain first shares of first target distances between the first privacy data and the target center data; on the basis of the first shares of the first target distances and by means of secret sharing, performing joint comparison with second shares of the first target distances in the second party, so as to determine the nearest first target distance among the first target distances; and determining a cluster corresponding to the nearest first target distance to be a cluster to which the first privacy data currently belongs. The disclosure of privacy data can be prevented.

Description

针对多方的隐私数据进行聚类的方法和装置Method and device for clustering private data of multiple parties 技术领域Technical field
本公开一个或多个实施例涉及计算机领域,尤其涉及针对多方的隐私数据进行聚类的方法和装置。One or more embodiments of the present disclosure relate to the computer field, and in particular, to a method and device for clustering private data from multiple parties.
背景技术Background technique
聚类是机器学习中一种很常用的技术。它常常被应用于社区发现、异常检测等任务。通常的聚类算法,是一种无监督学习算法,目的是将相似的对象归到同一个类簇中。类簇内的对象越相似,聚类的效果就越好。聚类和分类最大的不同在于,分类的目标事先已知,而聚类则不一样。其产生的结果和分类相同,而只是类别没有预先定义。Clustering is a very common technique in machine learning. It is often applied to tasks such as community discovery and anomaly detection. The usual clustering algorithm is an unsupervised learning algorithm whose purpose is to group similar objects into the same cluster. The more similar the objects in the cluster, the better the clustering effect. The biggest difference between clustering and classification is that the target of classification is known in advance, while clustering is different. The result is the same as the classification, but the classification is not pre-defined.
在某些场景下,数据水平分布在多方。各方具有的数据可能为隐私数据,也就是说,一方具有的隐私数据不能公开给其他方。这种情况下,现有技术未提供合适的聚类方法。In some scenarios, the data is distributed horizontally across multiple parties. The data owned by each party may be private data, that is, the private data owned by one party cannot be disclosed to other parties. In this case, the prior art does not provide a suitable clustering method.
因此,希望能有改进的方案,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。Therefore, it is hoped that there will be an improved solution that can prevent the leakage of private data when clustering private data from multiple parties.
发明内容Summary of the invention
本公开一个或多个实施例描述了一种针对多方的隐私数据进行聚类的方法和装置,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。One or more embodiments of the present disclosure describe a method and device for clustering private data of multiple parties, which can prevent the leakage of private data when clustering private data of multiple parties.
第一方面,提供了一种针对多方的隐私数据进行聚类的方法,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,该方法通过所述第一方执行,包括多轮迭代过程,其中任意一轮迭代包括:确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;分别将所述各中心数据作为目标中心数据,基于本地的第一隐私数据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片;基于各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离;将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。In a first aspect, a method for clustering private data of multiple parties is provided, the multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data The method includes multiple first private data, which is executed by the first party, and includes multiple rounds of iterative processes, where any round of iteration includes: determining the first shard of each center data corresponding to each cluster; so The second party has the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and the respective central data is used as the target The central data is based on the first local private data and the first fragment of the target center data, and the first joint calculation is performed with the second fragment of the target center data in the second party by means of secret sharing to obtain the result The first segment of the first privacy data and the first target distance of the target center data; the second segment having the first target distance; the first segment based on each first target distance Slices, using a secret sharing method to jointly compare with the second shards of the first target distances in the second party to determine the closest first target distance among the first target distances; The cluster corresponding to the target distance is determined as the cluster to which the first private data currently belongs.
在一种可能的实施方式中,所述第一联合计算包括:本地计算所述第一隐私数据和所述目标中心数据的第一分片之间的第一距离;将所述目标中心数据的第一分片和所述第一隐私数据的差值,与所述第二方中目标中心数据的第二分片,进行秘密共享方式下 的相乘运算,得到乘积的第一分片;根据所述第一距离和所述乘积的第一分片,确定所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片。In a possible implementation manner, the first joint calculation includes: locally calculating a first distance between the first private data and a first fragment of the target center data; The difference between the first segment and the first private data is multiplied with the second segment of the target center data in the second party, and multiplied in the secret sharing mode to obtain the first segment of the product; The first segment of the first distance and the product determines the first segment of the first target distance of the first privacy data and the target center data.
在一种可能的实施方式中,所述任意一轮迭代为第一次迭代,所述各个类簇当前分别对应的各中心数据的第一分片为随机初始化的数据。In a possible implementation manner, the arbitrary round of iteration is the first iteration, and the first fragments of the respective center data currently corresponding to the respective clusters are randomly initialized data.
在一种可能的实施方式中,所述联合比较包括:基于所述各第一目标距离中任意两个第一目标距离的第一分片,利用秘密共享的方式,与所述第二方中的该任意两个第一目标距离的第二分片进行联合比较,确定该任意两个第一目标距离之间的距离远近的比较结果;根据各比较结果,确定所述各第一目标距离中的最近的第一目标距离。In a possible implementation manner, the joint comparison includes: based on the first fragments of any two first target distances among the first target distances, using a secret sharing manner to communicate with the second party The second fragments of any two first target distances are jointly compared to determine the comparison result of the distance between the any two first target distances; according to the comparison results, determine the first target distance The closest first target distance.
在一种可能的实施方式中,所述将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇之后,所述方法还包括:根据同一类簇的各第一隐私数据的均值,更新该类簇的中心数据的第一分片。In a possible implementation manner, after the determining the cluster corresponding to the closest first target distance as the cluster to which the first private data currently belongs, the method further includes: according to the same cluster The average value of each first private data of, update the first shard of the central data of this type of cluster.
进一步地,所述更新该类簇的中心数据的第一分片之后,所述方法还包括:判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;若判断结果为各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。Further, after the update of the first fragment of the central data of this type of cluster, the method further includes: judging whether the amount of change in the central data of each type of cluster satisfies a preset condition for stopping iteration; if the judgment result is each The change amount of the center data of the cluster does not meet the preset iterative stop condition, and then the next iteration in the multiple rounds of iterative process is performed.
进一步地,所述方法还包括:若判断结果为各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。Further, the method further includes: if the result of the judgment is that the variation of the central data of various clusters meets a preset iterative stop condition, determining the cluster to which the first private data currently belongs is the first The class cluster to which the private data ultimately belongs.
进一步地,所述判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,包括:将所述各类簇中的任一类簇作为目标类簇,根据该目标类簇的更新前的中心数据的第一分片、该目标类簇的更新后的中心数据的第一分片,利用秘密共享的方式,与所述第二方中的该目标类簇的更新前的中心数据的第二分片、该目标类簇的更新后的中心数据的第二分片进行联合比较,判断该目标类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Further, the judging whether the variation of the center data of the various clusters meets the preset iterative stop condition includes: taking any one of the various types of clusters as a target cluster, and according to the value of the target cluster The first fragment of the center data before the update, the first fragment of the updated center data of the target cluster, and the center of the target cluster before the update in the second party are secretly shared. The second segment of the data and the second segment of the updated center data of the target cluster are jointly compared to determine whether the amount of change in the center data of the target cluster meets a preset iterative stop condition.
在一种可能的实施方式中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据,所述方法还包括:分别将所述各中心数据作为目标中心数据,基于本地的目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的第二隐私数据和目标中心数据的第二分片进行第二联合计算,得到所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片;所述第二方具有所述第二目标距离的第一分片。In a possible implementation manner, the second party has a second private data set, and the second private data set includes a plurality of second private data, and the method further includes: separately combining the data of each center As the target center data, based on the first segment of the local target center data, a second joint calculation is performed with the second private data in the second party and the second segment of the target center data by means of secret sharing, Obtain the second segment of the second privacy data and the second target distance of the target center data; the second party has the first segment of the second target distance.
进一步地,所述第二联合计算包括:本地计算所述目标中心数据的第一分片的平方;将所述目标中心数据的第一分片,与所述第二方中所述目标中心数据的第二分片和所述第二隐私数据的差值,进行秘密共享方式下的相乘运算,得到乘积的第二分片;根据所述平方和所述乘积的第二分片,确定所述第二隐私数据和所述目标中心数据的第二目标 距离的第二分片。Further, the second joint calculation includes: locally calculating the square of the first fragment of the target center data; and combining the first fragment of the target center data with the target center data in the second party. The difference between the second fragment of and the second private data is multiplied in the secret sharing mode to obtain the second fragment of the product; the second fragment of the product is determined according to the square and the second fragment of the product The second segment of the second privacy data and the second target distance of the target center data.
第二方面,提供了一种针对多方的隐私数据进行聚类的装置,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,该装置设置于所述第一方,用于执行多轮迭代过程,包括用于执行任意一轮迭代的如下单元:中心确定单元,用于确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;第一联合计算单元,用于分别将所述中心确定单元确定的各中心数据作为目标中心数据,基于本地的第一隐私数据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片;联合比较单元,用于基于所述第一联合计算单元得到的各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离;类簇确定单元,用于将所述联合比较单元确定的最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。In a second aspect, there is provided an apparatus for clustering private data of multiple parties, the multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data The device includes a plurality of first privacy data, and the device is set in the first party to perform multiple rounds of iterative processes, including the following units for performing any round of iteration: a central determination unit, used to determine the current class of each cluster Corresponding to the first shard of each central data; the second party has the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to The center data; the first joint computing unit, used to respectively use the center data determined by the center determination unit as the target center data, based on the first local privacy data and the first fragment of the target center data, using secret sharing In this way, a first joint calculation is performed with the second segment of the target center data in the second party to obtain the first segment of the first privacy data and the first target distance of the target center data; The second party has the second segment of the first target distance; the joint comparison unit is configured to use the secret sharing method based on the first segment of each first target distance obtained by the first joint calculation unit, and The second segment of each first target distance in the second party is jointly compared to determine the closest first target distance among the first target distances; the cluster determining unit is configured to compare the closest first target distance determined by the joint comparing unit The cluster corresponding to the first target distance of is determined as the cluster to which the first private data currently belongs.
第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
第四方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。In a fourth aspect, a computing device is provided, including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
通过本公开实施例提供的方法和装置,不是由任何一方单独确定各个类簇的中心数据,而是由第一方确定各个类簇当前分别对应的各中心数据的第一分片,第二方确定各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;并且后续在确定所述第一隐私数据和所述目标中心数据的第一目标距离时,利用了秘密共享的方式,第一方确定第一目标距离的第一分片,第二方确定第一目标距离的第二分片;在确定各第一目标距离中的最近的第一目标距离时,也利用了秘密共享的方式;最后将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。整个过程以秘密共享为基础,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。With the method and device provided by the embodiments of the present disclosure, neither party determines the central data of each cluster individually, but the first party determines the first fragment of each central data corresponding to each cluster, and the second party Determine the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and subsequently determine the first private data and the target center When the first target distance of the data is used, the secret sharing method is used. The first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance; in determining each first target distance When the closest first target distance in, the secret sharing method is also used; finally, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. The whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.
附图说明Description of the drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本公开披露的一个实施例的实施场景示意图;FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present disclosure;
图2示出根据一个实施例的针对多方的隐私数据进行聚类的方法流程图;Fig. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment;
图3示出根据一个实施例的针对多方的隐私数据进行聚类的装置的示意性框图。Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment.
具体实施方式detailed description
下面结合附图,对本公开提供的方案进行描述。The solution provided by the present disclosure will be described below with reference to the accompanying drawings.
图1为本公开披露的一个实施例的实施场景示意图。该实施场景涉及针对多方的隐私数据进行聚类。可以理解的是,上述多方可以为两方或两方以上,例如,三方、四方等。本公开实施例,以针对两方的隐私数据进行聚类为例进行说明。参照图1,第一方11具有隐私数据1、隐私数据2、隐私数据3、隐私数据4、隐私数据5;第二方12具有隐私数据6、隐私数据7、隐私数据8、隐私数据9。其中,第一方和第二方仅为对两方的区分,还可以将第一方称为A方,将第二方称为B方,等。FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present disclosure. This implementation scenario involves clustering private data from multiple parties. It is understandable that the above-mentioned multiple parties may be two parties or more than two parties, for example, three parties, four parties, and so on. In the embodiments of the present disclosure, clustering of private data of two parties is taken as an example for description. 1, the first party 11 has privacy data 1, privacy data 2, privacy data 3, privacy data 4, and privacy data 5; the second party 12 has privacy data 6, privacy data 7, privacy data 8, and privacy data 9. Among them, the first party and the second party are only a distinction between the two parties. The first party may also be referred to as party A, the second party may be referred to as party B, and so on.
本公开实施例中,对于隐私数据涵盖的信息不做限定,可以是任何不可外传的信息,例如,用户的个人信息或商业秘密等。举例来说,隐私数据为用户的个人信息,包括了用户的姓名、年龄、收入等,具体可以参照表一所示的各隐私数据包含信息的对应关系表。In the embodiments of the present disclosure, the information covered by the private data is not limited, and it may be any information that cannot be communicated, for example, the user's personal information or business secrets. For example, the private data is the user's personal information, including the user's name, age, income, etc., for details, please refer to the correspondence table of the information contained in each private data shown in Table 1.
表一:各隐私数据包含信息的对应关系表Table 1: Correspondence table of information contained in each private data
 To 姓名Name 年龄(岁)age) 收入(万元)Income (ten thousand yuan)
隐私数据1Privacy data 1 张一Zhang Yi 2525 1.51.5
隐私数据2Privacy data 2 张二Zhang Er 2626 2.22.2
隐私数据3Privacy data 3 张三Zhang San 3535 0.80.8
隐私数据4Privacy data 4 赵一Zhao Yi 4141 1.81.8
隐私数据5Privacy data 5 赵二Zhao Er 1919 0.60.6
隐私数据6Privacy data 6 赵三Zhao San 2828 3.53.5
隐私数据7Privacy data 7 赵四Zhao Si 3636 1.21.2
隐私数据8Privacy data 8 王一Yi Wang 2929 1.31.3
隐私数据9Privacy data 9 王二King Two 3030 2.22.2
由表一可见,表一中不同行的数据可能分布在不同方,例如,隐私数据1分布在第一方,隐私数据8分布在第二方,这种数据水平分布在多方的数据分布方式,可以称为水平切分。It can be seen from Table 1 that the data of different rows in Table 1 may be distributed in different parties. For example, private data 1 is distributed in the first party, and private data 8 is distributed in the second party. This kind of data distribution is distributed among multiple parties. It can be called horizontal segmentation.
本公开实施例,需要针对多方的隐私数据进行聚类,以图1为例,就是针对隐私数据1、隐私数据2、隐私数据3、隐私数据4、隐私数据5、隐私数据6、隐私数据7、隐 私数据8、隐私数据9进行聚类,分布在不同方的隐私数据有可能被划分到同一类簇中,例如,隐私数据1、隐私数据3、隐私数据6、隐私数据7被划分到类簇1,隐私数据2、隐私数据4、隐私数据5、隐私数据8、隐私数据9被划分到类簇2。本公开实施例,利用秘密共享的方式,在不泄露隐私数据的前提下,实现针对多方的隐私数据进行聚类。In the embodiments of the present disclosure, it is necessary to cluster private data of multiple parties. Taking Figure 1 as an example, it is for private data 1, private data 2, private data 3, private data 4, private data 5, private data 6, and private data 7. ,Privacy data8,Privacy data9 are clustered. Private data distributed in different parties may be classified into the same cluster. For example, private data 1, private data 3, private data 6, and private data 7 are classified into categories Cluster 1, private data 2, private data 4, private data 5, private data 8, and private data 9 are divided into cluster 2. In the embodiments of the present disclosure, a method of secret sharing is used to implement clustering of private data from multiple parties without revealing private data.
图2示出根据一个实施例的针对多方的隐私数据进行聚类的方法流程图,该方法可以基于图1所示的实施场景。所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述方法通过所述第一方执行,包括多轮迭代过程,如图2所示,其中任意一轮迭代包括以下步骤:步骤21,确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;步骤22,分别将所述各中心数据作为目标中心数据,基于本地的第一隐私数据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片;步骤23,基于各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离;步骤24,将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。下面描述以上各个步骤的具体执行方式。FIG. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment, and the method may be based on the implementation scenario shown in FIG. 1. The multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the method is executed by the first party , Including multiple rounds of iterative process, as shown in Figure 2, any round of iteration includes the following steps: Step 21, determine the first segment of each central data corresponding to each cluster; the second party has the The second shard of each center data; the sum of the first shard of any center data and the second shard of the center data is equal to the center data; step 22, each center data is used as the target center data, based on The local first private data and the first fragment of the target center data are secretly shared, and the first joint calculation is performed with the second fragment of the target center data in the second party to obtain the first privacy Data and the first segment of the first target distance of the target center data; the second party has the second segment of the first target distance; step 23, the first segment based on each first target distance , Using a secret sharing method to jointly compare with the second shards of the first target distances in the second party to determine the closest first target distance among the first target distances; step 24, the nearest The cluster corresponding to the first target distance is determined as the cluster to which the first private data currently belongs. The following describes the specific implementation of each of the above steps.
首先在步骤21,确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据。可以理解的是,上述各个类簇的数目可以是预先设定的,例如,预先设定将多方的隐私数据划分为两个类簇或三个类簇等。First, in step 21, determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and the The sum of the second shards of the central data is equal to the central data. It is understandable that the number of the above-mentioned respective clusters may be preset, for example, it is preset to divide the private data of multiple parties into two clusters or three clusters.
本公开实施例中,各中心数据是由第一方和第二方联合确定的,第一方只能确定各中心数据的第一分片,由第二方确定各中心数据的第二分片,第一方和第二方中的任何一方都不能够单独确定中心数据。In the embodiment of the present disclosure, each center data is jointly determined by the first party and the second party. The first party can only determine the first shard of each center data, and the second party determines the second shard of each center data. , Neither the first party nor the second party can individually determine the central data.
在一个示例中,所述任意一轮迭代为第一次迭代,所述各个类簇当前分别对应的各中心数据的第一分片为随机初始化的数据。In an example, the arbitrary round of iteration is the first iteration, and the first fragments of the respective central data currently corresponding to the respective clusters are randomly initialized data.
举例来说,假定上述各个类簇的数目为2,则第一方随机初始化2个中心数据的第一分片(share),记为(<c1>1,<c2>1);相应地,第二方随机初始化2个中心数据的第二分片,记为(<c1>2,<c2>2)。For example, assuming that the number of each of the above-mentioned clusters is 2, the first party randomly initializes the first share of the two central data, denoted as (<c1>1,<c2>1); accordingly, The second party randomly initializes the second fragment of 2 central data, denoted as (<c1>2,<c2>2).
进一步地,第一方可以针对每个第一隐私数据,初始化一个K维的类簇向量,用于标记该第一隐私数据所属的类簇,其中,K为类簇的数目,当K=2时,初始化一个2维的类簇向量,例如,初始为全0的向量,即[0,0]。Further, the first party may initialize a K-dimensional cluster vector for each first private data to mark the cluster to which the first private data belongs, where K is the number of clusters, when K=2 When, initialize a 2-dimensional cluster vector, for example, the initial vector is all 0, that is, [0, 0].
然后在步骤22,分别将所述各中心数据作为目标中心数据,基于本地的第一隐私数 据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片。可以理解的是,目标中心数据的第一分片与目标中心数据的第二分片之和为目标中心数据。Then, in step 22, each center data is used as the target center data, based on the first local privacy data and the first fragment of the target center data, and the secret sharing method is used to communicate with the target center in the second party. Perform the first joint calculation on the second segment of data to obtain the first segment of the first target distance of the first privacy data and the target center data; the second party has the first segment of the first target distance Two slices. It is understandable that the sum of the first fragment of the target center data and the second fragment of the target center data is the target center data.
本公开实施例中,假设由c1表示目标中心数据,x1表示第一隐私数据,则第一隐私数据和目标中心数据的第一目标距离可以表示为(c1-x1)^2,再由<c1>1表示目标中心数据的第一分片,<c1>2表示目标中心数据的第二分片,可以进行如下的公式推导过程:In the embodiment of the present disclosure, assuming that c1 represents the target center data, and x1 represents the first private data, the first target distance between the first private data and the target center data can be expressed as (c1-x1)^2, and then <c1 >1 indicates the first fragment of the target center data, and <c1>2 indicates the second fragment of the target center data. The following formula derivation process can be performed:
(c1-x1)^2(c1-x1)^2
=(<c1>1+<c1>2-x1)^2=(<c1>1+<c1>2-x1)^2
=(<c1>1-x1)^2+2(<c1>1-x1)<c1>2+(<c1>2)^2=(<c1>1-x1)^2+2(<c1>1-x1)<c1>2+(<c1>2)^2
根据上述公式推导结果,可以将求解(c1-x1)^2,转化为求解(<c1>1-x1)^2、(<c1>1-x1)<c1>2和(<c1>2)^2。According to the derivation result of the above formula, the solution (c1-x1)^2 can be transformed into the solution (<c1>1-x1)^2, (<c1>1-x1)<c1>2 and (<c1>2) ^2.
在一个示例中,所述第一联合计算包括:本地计算所述第一隐私数据和所述目标中心数据的第一分片之间的第一距离;将所述目标中心数据的第一分片和所述第一隐私数据的差值,与所述第二方中目标中心数据的第二分片,进行秘密共享方式下的相乘运算,得到乘积的第一分片;根据所述第一距离和所述乘积的第一分片,确定所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片。In an example, the first joint calculation includes: locally calculating the first distance between the first private data and the first fragment of the target center data; and dividing the first fragment of the target center data The difference between the first privacy data and the second segment of the target center data in the second party is multiplied in the secret sharing mode to obtain the first segment of the product; according to the first The first segment of the distance and the product determines the first segment of the first target distance of the first privacy data and the target center data.
可以理解的是,上述第一距离对应前述公式推导结果中的(<c1>1-x1)^2;上述乘积对应前述公式推导结果中的(<c1>1-x1)<c1>2,上述乘积的第一分片可以表示为<(<c1>1-x1)<c1>2>1。第一方可以对(<c1>1-x1)^2和<(<c1>1-x1)<c1>2>1求和,得到第一目标距离的第一分片,第一目标距离的第一分片可以表示为<x1c1>1。It is understandable that the above-mentioned first distance corresponds to (<c1>1-x1)^2 in the derivation result of the aforementioned formula; the above product corresponds to (<c1>1-x1)<c1>2 in the derivation result of the aforementioned formula. The first slice of the product can be expressed as <(<c1>1-x1)<c1>2>1. The first party can sum (<c1>1-x1)^2 and <(<c1>1-x1)<c1>2>1 to get the first segment of the first target distance, and the first target distance The first fragment can be expressed as <x1c1>1.
相应地,第二方可以通过如下方式确定第一目标距离的第二分片:第二方分别将所述各中心数据作为目标中心数据,基于本地的目标中心数据的第二分片,利用秘密共享的方式,与所述第一方中的第一隐私数据和目标中心数据的第一分片进行联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第二分片。Correspondingly, the second party can determine the second segment of the first target distance in the following manner: the second party uses the respective center data as the target center data, based on the second segment of the local target center data, and uses the secret In a shared manner, joint calculation is performed with the first shard of the first private data in the first party and the target center data to obtain the second target distance between the first private data and the target center data. Fragmentation.
进一步地,上述联合计算包括:第二方本地计算所述目标中心数据的第二分片的平方;将所述目标中心数据的第二分片,与所述第一方中所述目标中心数据的第一分片和所述第一隐私数据的差值,进行秘密共享方式下的相乘运算,得到乘积的第二分片;根据所述平方和所述乘积的第二分片,确定所述第一隐私数据和所述目标中心数据的第一目标距离的第二分片。Further, the aforementioned joint calculation includes: the second party locally calculates the square of the second segment of the target center data; and the second segment of the target center data is combined with the target center data in the first party. The difference between the first fragment of and the first private data is multiplied in the secret sharing mode to obtain the second fragment of the product; the second fragment of the product is determined according to the square and the second fragment of the product The second segment of the first target distance of the first privacy data and the target center data.
可以理解的是,上述平方对应前述公式推导结果中的(<c1>2)^2;上述乘积对应前述公式推导结果中的(<c1>1-x1)<c1>2,上述乘积的第二分片可以表示为<(<c1>1-x1)<c1>2>2。第二方可以对(<c1>2)^2和<(<c1>1-x1)<c1>2>2求和,得到第一目标距离的 第二分片,第一目标距离的第二分片可以表示为<x1c1>2。It is understandable that the above square corresponds to (<c1>2)^2 in the derivation result of the aforementioned formula; the above product corresponds to (<c1>1-x1)<c1>2 in the derivation result of the aforementioned formula, the second of the above product Fragmentation can be expressed as <(<c1>1-x1)<c1>2>2. The second party can sum (<c1>2)^2 and <(<c1>1-x1)<c1>2>2 to get the second segment of the first target distance, and the second segment of the first target distance. Fragmentation can be expressed as <x1c1>2.
本公开实施例中,假设c2表示c1之外的另一个目标中心数据,x1表示第一隐私数据,则可以采用与确定x1和c1之间的距离一样的方式,确定x1和c2之间的距离。In the embodiment of the present disclosure, assuming that c2 represents another target center data other than c1, and x1 represents the first privacy data, the distance between x1 and c2 can be determined in the same way as the distance between x1 and c1. .
接着在步骤23,基于各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离。可以理解的是,各第一目标距离为第一隐私数据与各中心数据之间的距离,第一目标距离的第一分片与第一目标距离的第二分片之和为第一目标距离。Then in step 23, based on the first segment of each first target distance, use the secret sharing method to jointly compare with the second segment of each first target distance in the second party to determine the first target distance The closest first target distance. It is understandable that each first target distance is the distance between the first private data and each central data, and the sum of the first segment of the first target distance and the second segment of the first target distance is the first target distance .
本公开实施例中,当类簇的数目为两个时,中心数据有两个,相应地,第一目标距离有两个,比较这两个第一目标距离的大小,即可确定各第一目标距离中的最近的第一目标距离。举例来说,比较x1c1和x1c2的大小,其中,较小的那个对应的类簇即为x1所属的类簇,这里假设x1c2是小的值,则说明x1离c2最近,则其类簇向量变为[0,1]。In the embodiment of the present disclosure, when the number of clusters is two, there are two center data. Correspondingly, there are two first target distances. By comparing the size of the two first target distances, each first target distance can be determined. The closest first target distance among target distances. For example, compare the size of x1c1 and x1c2, where the smaller corresponding cluster is the cluster to which x1 belongs. Assuming that x1c2 is a small value, it means that x1 is closest to c2, and its cluster vector changes It is [0,1].
当类簇的数目为三个以上时,中心数据有三个以上,相应地,第一目标距离有三个以上,比较其中任意两个第一目标距离的大小,即可确定各第一目标距离中的最近的第一目标距离。When the number of clusters is more than three, there are more than three center data, and correspondingly, there are more than three first target distances. Compare the size of any two first target distances among them to determine the first target distance. The closest first target distance.
在一个示例中,所述联合比较包括:基于所述各第一目标距离中任意两个第一目标距离的第一分片,利用秘密共享的方式,与所述第二方中的该任意两个第一目标距离的第二分片进行联合比较,确定该任意两个第一目标距离之间的距离远近的比较结果;根据各比较结果,确定所述各第一目标距离中的最近的第一目标距离。In an example, the joint comparison includes: based on any two first fragments of the first target distances among the first target distances, using a secret sharing manner, to compare with any two of the second party The second segment of the first target distance is jointly compared to determine the comparison result of the distance between any two first target distances; according to the comparison results, the closest first target distance among the first target distances is determined One target distance.
最后在步骤24,将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。可以理解的是,在不同轮的迭代过程中,第一隐私数据归属的类簇可能不同。Finally, in step 24, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. It is understandable that in different rounds of iterative processes, the clusters to which the first private data belongs may be different.
在一个示例中,所述将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇之后,所述方法还包括:根据同一类簇的各第一隐私数据的均值,更新该类簇的中心数据的第一分片。In an example, after the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs, the method further includes: according to the first clusters of the same cluster The average value of the private data, the first segment of the central data of the cluster is updated.
可以理解的是,上述同一类簇为前述各个类簇中的任一类簇。It can be understood that the aforementioned clusters of the same type are any type of clusters among the aforementioned types of clusters.
举例来说,第一方和第二方根据所有隐私数据的类簇向量,更新中心数据(c1和c2),以c1为例,更新过程如下:For example, the first party and the second party update the central data (c1 and c2) according to the cluster vector of all private data. Taking c1 as an example, the update process is as follows:
第一方计算所有类簇向量为[1,0]的隐私数据的均值,记为<c1>1;The first party calculates the mean value of all private data whose cluster vector is [1, 0], denoted as <c1>1;
第二方计算所有类簇向量为[1,0]的隐私数据的均值,记为<c1>2。The second party calculates the mean value of all private data whose cluster vector is [1, 0], denoted as <c1>2.
进一步地,所述更新该类簇的中心数据的第一分片之后,所述方法还包括:判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;若判断结果为各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次 迭代。Further, after the update of the first fragment of the central data of this type of cluster, the method further includes: judging whether the amount of change in the central data of each type of cluster satisfies a preset condition for stopping iteration; if the judgment result is each The change amount of the center data of the cluster does not meet the preset iterative stop condition, and then the next iteration in the multiple rounds of iterative process is performed.
进一步地,所述方法还包括:若判断结果为各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。Further, the method further includes: if the result of the judgment is that the variation of the central data of various clusters meets a preset iterative stop condition, determining the cluster to which the first private data currently belongs is the first The class cluster to which the private data ultimately belongs.
进一步地,所述判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,包括:将所述各类簇中的任一类簇作为目标类簇,根据该目标类簇的更新前的中心数据的第一分片、该目标类簇的更新后的中心数据的第一分片,利用秘密共享的方式,与所述第二方中的该目标类簇的更新前的中心数据的第二分片、该目标类簇的更新后的中心数据的第二分片进行联合比较,判断该目标类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Further, the judging whether the variation of the center data of the various clusters meets the preset iterative stop condition includes: taking any one of the various types of clusters as a target cluster, and according to the value of the target cluster The first fragment of the center data before the update, the first fragment of the updated center data of the target cluster, and the center of the target cluster before the update in the second party are secretly shared. The second segment of the data and the second segment of the updated center data of the target cluster are jointly compared to determine whether the amount of change in the center data of the target cluster meets a preset iterative stop condition.
例如,上述停止迭代条件为|C(t)-C(t+1)|^2<delta,其中,delta可以是个预设置的值,C(t)表示更新前的中心数据,C(t+1)表示更新后的中心数据。For example, the above stop iteration condition is |C(t)-C(t+1)|^2<delta, where delta can be a preset value, C(t) represents the central data before update, C(t+ 1) Represents the updated central data.
本公开实施例中,前述步骤21至步骤24的处理过程主要是描述第一方针对本方的第一隐私数据,确定第一隐私数据归属的类簇,此外,第一方针对第二方的第二隐私数据,还需在秘密共享的方式中,配合第二方确定第二隐私数据归属的类簇。In the embodiment of the present disclosure, the processing procedure from step 21 to step 24 is mainly to describe the first party's first private data for its own party, and determine the category cluster to which the first private data belongs. In addition, the first party's first private data for the second party The second private data also needs to cooperate with the second party to determine the cluster to which the second private data belongs in a secret sharing manner.
在一个示例中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据,所述方法还包括:第一方分别将所述各中心数据作为目标中心数据,基于本地的目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的第二隐私数据和目标中心数据的第二分片进行第二联合计算,得到所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片;所述第二方具有所述第二目标距离的第一分片。In an example, the second party has a second private data set, and the second private data set includes a plurality of second private data, and the method further includes: the first party uses the respective center data as The target center data is based on the first fragment of the local target center data, and the second joint calculation is performed with the second private data in the second party and the second fragment of the target center data by means of secret sharing, to obtain The second private data and the second segment of the target center data at the second target distance; the second party has the first segment of the second target distance.
进一步地,所述第二联合计算包括:本地计算所述目标中心数据的第一分片的平方;将所述目标中心数据的第一分片,与所述第二方中所述目标中心数据的第二分片和所述第二隐私数据的差值,进行秘密共享方式下的相乘运算,得到乘积的第二分片;Further, the second joint calculation includes: locally calculating the square of the first fragment of the target center data; and combining the first fragment of the target center data with the target center data in the second party. The difference between the second fragment of and the second private data is subjected to a multiplication operation in the secret sharing mode to obtain the second fragment of the product;
根据所述平方和所述乘积的第二分片,确定所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片。Determine the second segment of the second target distance between the second privacy data and the target center data according to the second segment of the square and the product.
可以理解的是,在针对多方的隐私数据进行聚类的方法中,第一方和第二方的地位平等,第一方和第二方的处理过程无实质的不同,本公开实施例中,主要以第一方为执行主体描述相应的处理过程。It is understandable that in the method for clustering private data of multiple parties, the status of the first party and the second party are equal, and the processing procedures of the first party and the second party are not substantially different. In the embodiments of the present disclosure, Mainly take the first party as the executive body to describe the corresponding processing process.
通过本公开实施例提供的方法,不是由任何一方单独确定各个类簇的中心数据,而是由第一方确定各个类簇当前分别对应的各中心数据的第一分片,第二方确定各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;并且后续在确定所述第一隐私数据和所述目标中心数据的第一目标距离时,利用了秘密共享的方式,第一方确定第一目标距离的第一分片,第二方确定第一目标距离的第二分 片;在确定各第一目标距离中的最近的第一目标距离时,也利用了秘密共享的方式;最后将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。整个过程以秘密共享为基础,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。With the method provided by the embodiments of the present disclosure, instead of determining the central data of each cluster individually by any party, the first party determines the first segment of each central data currently corresponding to each cluster, and the second party determines each cluster. The second shard of the central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and the subsequent determination of the first private data and the target central data In the case of the first target distance, the secret sharing method is used. The first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance; in determining the first target distance, the In the case of the closest first target distance, a secret sharing method is also used; finally, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. The whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.
根据另一方面的实施例,还提供一种针对多方的隐私数据进行聚类的装置,该装置用于执行本公开实施例提供的针对多方的隐私数据进行聚类的方法。所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述装置设置于所述第一方,用于执行多轮迭代过程。图3示出根据一个实施例的针对多方的隐私数据进行聚类的装置的示意性框图。如图3所示,该装置300包括用于执行任意一轮迭代的如下单元:中心确定单元31,用于确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;第一联合计算单元32,用于分别将所述中心确定单元31确定的各中心数据作为目标中心数据,基于本地的第一隐私数据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片;联合比较单元33,用于基于所述第一联合计算单元32得到的各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离;类簇确定单元34,用于将所述联合比较单元33确定的最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。According to another embodiment, there is also provided an apparatus for clustering private data of multiple parties, and the apparatus is configured to execute the method for clustering private data of multiple parties provided in the embodiments of the present disclosure. The multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the device is set on the first party , Used to perform multiple rounds of iterative process. Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment. As shown in FIG. 3, the device 300 includes the following units for performing any round of iteration: a center determining unit 31, used to determine the first segment of each center data corresponding to each cluster; the second party There are the second shards of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; the first joint computing unit 32 is configured to separately The center data determined by the center determining unit 31 is used as the target center data. Based on the first local privacy data and the first fragment of the target center data, the secret sharing method is used to communicate with the target center data in the second party. Perform the first joint calculation in two shards to obtain the first shard with the first target distance of the first privacy data and the target center data; the second shard with the first target distance for the second party The joint comparison unit 33 is configured to use the secret sharing method based on the first fragments of the first target distances obtained by the first joint calculation unit 32 to obtain the second distance from the first targets in the second party Joint comparison is performed in pieces to determine the closest first target distance among the first target distances; the cluster determining unit 34 is configured to determine the cluster corresponding to the closest first target distance determined by the joint comparing unit 33 Is the cluster to which the first private data currently belongs.
可选地,作为一个实施例,所述第一联合计算单元32包括:本地计算子单元,用于本地计算所述第一隐私数据和所述目标中心数据的第一分片之间的第一距离;联合计算子单元,用于将所述目标中心数据的第一分片和所述第一隐私数据的差值,与所述第二方中目标中心数据的第二分片,进行秘密共享方式下的相乘运算,得到乘积的第一分片;确定子单元,用于根据所述本地计算子单元得到的第一距离和所述联合计算子单元得到的乘积的第一分片,确定所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片。Optionally, as an embodiment, the first joint computing unit 32 includes: a local computing subunit, configured to locally calculate the first data between the first private data and the first fragment of the target center data Distance; joint calculation subunit, used to share the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party for secret sharing The first segment of the product is obtained by the multiplication operation in the mode; the determining subunit is used to determine the first segment of the product based on the first distance obtained by the local calculation subunit and the first segment of the product obtained by the joint calculation subunit The first segment of the first target distance of the first privacy data and the target center data.
可选地,作为一个实施例,所述任意一轮迭代为第一次迭代,所述中心确定单元31,具体用于确定各个类簇当前分别对应的各中心数据的第一分片为随机初始化的数据。Optionally, as an embodiment, the arbitrary round of iteration is the first iteration, and the center determining unit 31 is specifically configured to determine that the first shard of each center data corresponding to each cluster is randomly initialized. The data.
可选地,作为一个实施例,所述联合比较单元33包括:联合比较子单元,用于基于所述各第一目标距离中任意两个第一目标距离的第一分片,利用秘密共享的方式,与所述第二方中的该任意两个第一目标距离的第二分片进行联合比较,确定该任意两个第一目标距离之间的距离远近的比较结果;确定子单元,用于根据所述联合比较子单元确定的各比较结果,确定所述各第一目标距离中的最近的第一目标距离。Optionally, as an embodiment, the joint comparison unit 33 includes: a joint comparison subunit, configured to use a secret shared first segment based on any two first target distances among the first target distances In the second party, a joint comparison is made with the second segment of any two first target distances in the second party to determine the comparison result of the distance between any two first target distances; to determine the subunit, use Based on the comparison results determined by the joint comparison subunit, the closest first target distance among the first target distances is determined.
可选地,作为一个实施例,所述装置还包括:更新单元,用于在所述类簇确定单元34将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇之后,根据同一类簇的各第一隐私数据的均值,更新该类簇的中心数据的第一分片。Optionally, as an embodiment, the device further includes: an update unit, configured to determine, in the cluster determining unit 34, the cluster corresponding to the closest first target distance as the first private data After the cluster currently belongs, the first segment of the central data of the cluster is updated according to the average value of each first private data of the same cluster.
进一步地,所述装置还包括:判断单元,用于在所述更新单元更新该类簇的中心数据的第一分片之后,判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;迭代触发单元,用于若所述判断单元的判断结果为各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。Further, the device further includes: a judging unit for judging whether the amount of change in the central data of each type of cluster meets a preset stop after the update unit updates the first fragment of the central data of the cluster. Iteration condition; an iteration triggering unit, which is used to perform the next iteration of the multi-round iteration process if the judgment result of the judging unit is that the variation of the center data of various clusters does not meet the preset iterative stop condition .
进一步地,所述装置还包括:最终确定单元,用于若所述判断单元的判断结果为各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述类簇确定单元34确定的第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。Further, the device further includes: a final determination unit, configured to determine the cluster determination unit if the change amount of the center data of each type of cluster satisfies a preset iterative stop condition if the determination result of the determination unit is 34 The cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
进一步地,所述判断单元,具体用于将所述各类簇中的任一类簇作为目标类簇,根据该目标类簇的更新前的中心数据的第一分片、该目标类簇的更新后的中心数据的第一分片,利用秘密共享的方式,与所述第二方中的该目标类簇的更新前的中心数据的第二分片、该目标类簇的更新后的中心数据的第二分片进行联合比较,判断该目标类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Further, the judging unit is specifically configured to use any one of the various types of clusters as a target class cluster, and according to the first segment of the center data of the target class cluster before updating, and the value of the target class cluster The first segment of the updated central data is shared with the second segment of the pre-updated central data of the target cluster in the second party and the updated center of the target cluster by means of secret sharing. The second segment of the data is jointly compared to determine whether the amount of change in the center data of the target cluster meets the preset stop iteration condition.
可选地,作为一个实施例,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据,所述装置还包括:第二联合计算单元,用于分别将所述各中心数据作为目标中心数据,基于本地的目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的第二隐私数据和目标中心数据的第二分片进行第二联合计算,得到所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片;所述第二方具有所述第二目标距离的第一分片。Optionally, as an embodiment, the second party has a second private data set, and the second private data set includes a plurality of second private data, and the device further includes: a second joint computing unit, Using the data of each center as the target center data respectively, based on the first fragment of the local target center data, using the secret sharing method, the second private data in the second party and the second private data of the target center data are shared with each other. Perform a second joint calculation on the fragments to obtain a second fragment of the second target distance of the second privacy data and the target center data; the second party has the first fragment of the second target distance.
进一步地,所述第二联合计算单元包括:本地计算子单元,用于本地计算所述目标中心数据的第一分片的平方;联合计算子单元,用于将所述目标中心数据的第一分片,与所述第二方中所述目标中心数据的第二分片和所述第二隐私数据的差值,进行秘密共享方式下的相乘运算,得到乘积的第二分片;确定子单元,用于根据所述本地计算子单元得到的平方和所述联合计算子单元得到的乘积的第二分片,确定所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片。Further, the second joint calculation unit includes: a local calculation subunit for locally calculating the square of the first slice of the target center data; a joint calculation subunit for calculating the first shard of the target center data Fragment, and the difference between the second fragment of the target center data in the second party and the second privacy data is multiplied in the secret sharing mode to obtain the second fragment of the product; confirm A subunit for determining a second target distance between the second privacy data and the target center data according to the second segment of the square obtained by the local calculation subunit and the product obtained by the joint calculation subunit The second fragment.
通过本公开实施例提供的装置,不是由任何一方单独确定各个类簇的中心数据,而是由第一方的中心确定单元31确定各个类簇当前分别对应的各中心数据的第一分片,第二方确定各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;并且后续在第一联合计算单元32确定所述第一隐私数据和所述目标中心数据的第一目标距离时,利用了秘密共享的方式,第一方确定第一目标距离的第一分片,第二方确定第一目标距离的第二分片;在联合比较单元33确定各第一目标距离中的最近的第一目标距离时,也利用了秘密共享的方式;最后类簇确定单元34将 所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。整个过程以秘密共享为基础,在针对多方的隐私数据进行聚类时,能够防止泄露隐私数据。With the device provided by the embodiment of the present disclosure, instead of determining the center data of each cluster individually by any party, the center determining unit 31 of the first party determines the first segment of each center data currently corresponding to each cluster. The second party determines the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and subsequently the first joint computing unit 32 determines the When the first target distance between the first privacy data and the target center data is used, the secret sharing method is used. The first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance. Pieces; when the joint comparison unit 33 determines the closest first target distance among the first target distances, a secret sharing method is also used; the last class cluster determination unit 34 classifies the class cluster corresponding to the nearest first target distance , It is determined as the cluster to which the first private data currently belongs. The whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所描述的方法。According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 method.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The specific embodiments described above further describe the objectives, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims (22)

  1. 一种针对多方的隐私数据进行聚类的方法,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述方法通过所述第一方执行,包括多轮迭代过程,其中任意一轮迭代包括:A method for clustering private data of multiple parties. The multiple parties include a first party and a second party. The first party has a first set of private data. The first set of private data includes multiple first parties. For private data, the method is executed by the first party and includes multiple rounds of iterative processes, where any round of iteration includes:
    确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;Determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and the second shard of the center data The sum of the shards is equal to the data of the center;
    分别将所述各中心数据作为目标中心数据,基于本地的第一隐私数据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片;Regarding the respective center data as the target center data, based on the local first privacy data and the first fragment of the target center data, using the secret sharing method, the second partition of the target center data in the second party is used. Perform a first joint calculation on a slice to obtain a first slice with a first target distance between the first privacy data and the target center data; and a second slice with the first target distance for the second party;
    基于各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离;Based on the first segment of each first target distance, use the method of secret sharing to perform joint comparison with the second segment of each first target distance in the second party to determine the closest first target distance of each first target distance. Target distance
    将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。The cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs.
  2. 如权利要求1所述的方法,其中,所述第一联合计算包括:The method of claim 1, wherein the first joint calculation includes:
    本地计算所述第一隐私数据和所述目标中心数据的第一分片之间的第一距离;Locally calculating the first distance between the first private data and the first fragment of the target center data;
    将所述目标中心数据的第一分片和所述第一隐私数据的差值,与所述第二方中目标中心数据的第二分片,进行秘密共享方式下的相乘运算,得到乘积的第一分片;Multiply the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party, and perform the multiplication operation in the secret sharing mode to obtain the product The first fragment;
    根据所述第一距离和所述乘积的第一分片,确定所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片。Determine the first segment of the first target distance of the first privacy data and the target center data according to the first segment of the first distance and the product.
  3. 如权利要求1所述的方法,其中,所述任意一轮迭代为第一次迭代,所述各个类簇当前分别对应的各中心数据的第一分片为随机初始化的数据。The method according to claim 1, wherein the arbitrary round of iteration is the first iteration, and the first fragments of the respective central data currently corresponding to the respective clusters are randomly initialized data.
  4. 如权利要求1所述的方法,其中,所述联合比较包括:The method of claim 1, wherein the joint comparison comprises:
    基于所述各第一目标距离中任意两个第一目标距离的第一分片,利用秘密共享的方式,与所述第二方中的该任意两个第一目标距离的第二分片进行联合比较,确定该任意两个第一目标距离之间的距离远近的比较结果;Based on any two first fragments of the first target distance in each of the first target distances, using a secret sharing method, perform a secret sharing method with the second fragments of any two first target distances in the second party. Joint comparison to determine the comparison result of the distance between any two first target distances;
    根据各比较结果,确定所述各第一目标距离中的最近的第一目标距离。According to the comparison results, the closest first target distance among the first target distances is determined.
  5. 如权利要求1所述的方法,其中,所述将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇之后,所述方法还包括:The method according to claim 1, wherein after the determining the cluster corresponding to the closest first target distance as the cluster to which the first private data currently belongs, the method further comprises:
    根据同一类簇的各第一隐私数据的均值,更新该类簇的中心数据的第一分片。According to the average value of each first private data of the same type of cluster, the first segment of the center data of the type of cluster is updated.
  6. 如权利要求5所述的方法,其中,所述更新该类簇的中心数据的第一分片之后,所述方法还包括:The method according to claim 5, wherein after said updating the first fragment of the central data of the cluster, the method further comprises:
    判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;Determine whether the change of the center data of various clusters meets the preset conditions for stopping iteration;
    若判断结果为各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。If the result of the judgment is that the amount of change in the center data of various clusters does not meet the preset iterative stop condition, then the next iteration of the multiple rounds of iterative process is performed.
  7. 如权利要求6所述的方法,其中,所述方法还包括:The method of claim 6, wherein the method further comprises:
    若判断结果为各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。If the result of the judgment is that the variation of the central data of the various clusters meets the preset iterative stop condition, the cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
  8. 如权利要求6所述的方法,其中,所述判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件,包括:7. The method according to claim 6, wherein said determining whether the change amount of the center data of various clusters meets a preset condition for stopping iteration comprises:
    将所述各类簇中的任一类簇作为目标类簇,根据该目标类簇的更新前的中心数据的第一分片、该目标类簇的更新后的中心数据的第一分片,利用秘密共享的方式,与所述第二方中的该目标类簇的更新前的中心数据的第二分片、该目标类簇的更新后的中心数据的第二分片进行联合比较,判断该目标类簇的中心数据的变化量是否满足预先设定的停止迭代条件。Taking any one of the various types of clusters as the target cluster, according to the first fragment of the center data before the update of the target cluster and the first fragment of the updated center data of the target cluster, By means of secret sharing, a joint comparison is made with the second shard of the central data before the update of the target cluster and the second shard of the updated central data of the target cluster in the second party to determine Whether the change amount of the center data of the target cluster meets the preset stop iteration condition.
  9. 如权利要求1所述的方法,其中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据,所述方法还包括:The method according to claim 1, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data, and the method further comprises:
    分别将所述各中心数据作为目标中心数据,基于本地的目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的第二隐私数据和目标中心数据的第二分片进行第二联合计算,得到所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片;所述第二方具有所述第二目标距离的第一分片。Regarding the respective center data as the target center data, based on the first fragment of the local target center data, using a secret sharing method, and the second private data in the second party and the second partition of the target center data Perform a second joint calculation on the slice to obtain a second slice of the second target distance of the second privacy data and the target center data; the second party has the first slice of the second target distance.
  10. 如权利要求9所述的方法,其中,所述第二联合计算包括:The method of claim 9, wherein the second joint calculation includes:
    本地计算所述目标中心数据的第一分片的平方;Locally calculating the square of the first fragment of the target center data;
    将所述目标中心数据的第一分片,与所述第二方中所述目标中心数据的第二分片和所述第二隐私数据的差值,进行秘密共享方式下的相乘运算,得到乘积的第二分片;Multiply the difference between the first fragment of the target center data and the second fragment of the target center data in the second party and the second private data in a secret sharing mode, Get the second shard of the product;
    根据所述平方和所述乘积的第二分片,确定所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片。Determine the second segment of the second target distance between the second privacy data and the target center data according to the second segment of the square and the product.
  11. 一种针对多方的隐私数据进行聚类的装置,所述多方包括第一方和第二方,所述第一方具有第一隐私数据集合,所述第一隐私数据集合中包括多个第一隐私数据,所述装置设置于所述第一方,用于执行多轮迭代过程,包括用于执行任意一轮迭代的如下单元:A device for clustering private data of multiple parties. The multiple parties include a first party and a second party. The first party has a first set of private data. The first set of private data includes multiple first parties. Private data, the device is set on the first party, and is used to perform multiple rounds of iterative processes, including the following units for performing any round of iteration:
    中心确定单元,用于确定各个类簇当前分别对应的各中心数据的第一分片;所述第二方具有所述各中心数据的第二分片;任一中心数据的第一分片和该中心数据的第二分片之和等于该中心数据;The center determining unit is used to determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and The sum of the second shards of the central data is equal to the central data;
    第一联合计算单元,用于分别将所述中心确定单元确定的各中心数据作为目标中心数据,基于本地的第一隐私数据和目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的目标中心数据的第二分片进行第一联合计算,得到所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片;所述第二方具有所述第一目标距离的第二分片;The first joint computing unit is configured to use each center data determined by the center determination unit as the target center data, based on the first local privacy data and the first fragment of the target center data, and share the secret with all the data. The second segment of the target center data in the second party performs the first joint calculation to obtain the first segment of the first target distance between the first privacy data and the target center data; the second party has The second segment of the first target distance;
    联合比较单元,用于基于所述第一联合计算单元得到的各第一目标距离的第一分片,利用秘密共享的方式,与第二方中的各第一目标距离的第二分片进行联合比较,确定各第一目标距离中的最近的第一目标距离;The joint comparison unit is configured to perform a secret sharing method based on the first fragments of the first target distances obtained by the first joint calculation unit with the second fragments of the first target distances in the second party. Joint comparison to determine the closest first target distance among the first target distances;
    类簇确定单元,用于将所述联合比较单元确定的最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇。The cluster determining unit is configured to determine the cluster corresponding to the closest first target distance determined by the joint comparing unit as the cluster to which the first private data currently belongs.
  12. 如权利要求11所述的装置,其中,所述第一联合计算单元包括:The apparatus of claim 11, wherein the first joint computing unit comprises:
    本地计算子单元,用于本地计算所述第一隐私数据和所述目标中心数据的第一分片之间的第一距离;A local calculation subunit for locally calculating the first distance between the first private data and the first fragment of the target center data;
    联合计算子单元,用于将所述目标中心数据的第一分片和所述第一隐私数据的差值,与所述第二方中目标中心数据的第二分片,进行秘密共享方式下的相乘运算,得到乘积的第一分片;The joint calculation subunit is used to combine the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party in a secret sharing mode The multiplication operation of to get the first slice of the product;
    确定子单元,用于根据所述本地计算子单元得到的第一距离和所述联合计算子单元得到的乘积的第一分片,确定所述第一隐私数据和所述目标中心数据的第一目标距离的第一分片。The determining subunit is configured to determine the first distance between the first privacy data and the target center data according to the first segment of the product obtained by the local calculation subunit and the joint calculation subunit. The first segment of the target distance.
  13. 如权利要求11所述的装置,其中,所述任意一轮迭代为第一次迭代,所述中心确定单元,具体用于确定各个类簇当前分别对应的各中心数据的第一分片为随机初始化的数据。The device according to claim 11, wherein the any round of iteration is the first iteration, and the center determining unit is specifically configured to determine that the first shards of each center data corresponding to each cluster are random Initialized data.
  14. 如权利要求11所述的装置,其中,所述联合比较单元包括:The apparatus of claim 11, wherein the joint comparison unit comprises:
    联合比较子单元,用于基于所述各第一目标距离中任意两个第一目标距离的第一分片,利用秘密共享的方式,与所述第二方中的该任意两个第一目标距离的第二分片进行联合比较,确定该任意两个第一目标距离之间的距离远近的比较结果;The joint comparison subunit is used to compare any two first targets in the second party with the first fragments of any two first target distances in the first target distances by means of secret sharing. Perform a joint comparison on the second segment of the distance to determine the comparison result of the distance between any two first target distances;
    确定子单元,用于根据所述联合比较子单元确定的各比较结果,确定所述各第一目标距离中的最近的第一目标距离。The determining subunit is configured to determine the closest first target distance among the first target distances according to the comparison results determined by the joint comparison subunit.
  15. 如权利要求11所述的装置,其中,所述装置还包括:The device of claim 11, wherein the device further comprises:
    更新单元,用于在所述类簇确定单元将所述最近的第一目标距离对应的类簇,确定为所述第一隐私数据当前归属的类簇之后,根据同一类簇的各第一隐私数据的均值,更新该类簇的中心数据的第一分片。The update unit is configured to: after the cluster determining unit determines the cluster corresponding to the closest first target distance as the cluster to which the first privacy data currently belongs, according to the first privacy of the same cluster The mean value of the data, update the first shard of the central data of the cluster.
  16. 如权利要求15所述的装置,其中,所述装置还包括:The device of claim 15, wherein the device further comprises:
    判断单元,用于在所述更新单元更新该类簇的中心数据的第一分片之后,判断各类簇的中心数据的变化量是否满足预先设定的停止迭代条件;The judging unit is configured to, after the update unit updates the first segment of the central data of this type of cluster, judge whether the amount of change in the central data of each type of cluster satisfies a preset iterative stop condition;
    迭代触发单元,用于若所述判断单元的判断结果为各类簇的中心数据的变化量不满足预先设定的停止迭代条件,则进行所述多轮迭代过程中的下一次迭代。The iteration triggering unit is configured to perform the next iteration of the multiple rounds of iteration if the judgment result of the judging unit is that the variation of the center data of various clusters does not meet the preset iterative stop condition.
  17. 如权利要求16所述的装置,其中,所述装置还包括:The device of claim 16, wherein the device further comprises:
    最终确定单元,用于若所述判断单元的判断结果为各类簇的中心数据的变化量满足预先设定的停止迭代条件,则将所述类簇确定单元确定的第一隐私数据当前归属的类簇确定为所述第一隐私数据最终归属的类簇。The final determination unit is configured to, if the determination result of the determination unit is that the variation of the center data of each type of cluster satisfies a preset iterative stop condition, then the first privacy data determined by the type cluster determination unit currently belongs to The class cluster is determined as the class cluster to which the first private data ultimately belongs.
  18. 如权利要求16所述的装置,其中,所述判断单元,具体用于将所述各类簇中的任一类簇作为目标类簇,根据该目标类簇的更新前的中心数据的第一分片、该目标类簇的更新后的中心数据的第一分片,利用秘密共享的方式,与所述第二方中的该目标类簇 的更新前的中心数据的第二分片、该目标类簇的更新后的中心数据的第二分片进行联合比较,判断该目标类簇的中心数据的变化量是否满足预先设定的停止迭代条件。The device according to claim 16, wherein the judging unit is specifically configured to use any one of the various types of clusters as a target type cluster, according to the first center data of the target type cluster before being updated. The first shard of the updated central data of the target cluster and the second shard of the updated central data of the target cluster in the second party, the The second segment of the updated center data of the target cluster is jointly compared to determine whether the change in the center data of the target cluster satisfies a preset iterative stop condition.
  19. 如权利要求11所述的装置,其中,所述第二方具有第二隐私数据集合,所述第二隐私数据集合中包括多个第二隐私数据,所述装置还包括:The device according to claim 11, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data, and the device further comprises:
    第二联合计算单元,用于分别将所述各中心数据作为目标中心数据,基于本地的目标中心数据的第一分片,利用秘密共享的方式,与所述第二方中的第二隐私数据和目标中心数据的第二分片进行第二联合计算,得到所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片;所述第二方具有所述第二目标距离的第一分片。The second joint computing unit is configured to use the respective center data as the target center data, and use the secret sharing method to communicate with the second private data in the second party based on the first fragment of the local target center data. Perform a second joint calculation with the second segment of the target center data to obtain a second segment of the second target distance between the second privacy data and the target center data; the second party has the second target The first segment of the distance.
  20. 如权利要求19所述的装置,其中,所述第二联合计算单元包括:The apparatus of claim 19, wherein the second joint computing unit comprises:
    本地计算子单元,用于本地计算所述目标中心数据的第一分片的平方;A local calculation subunit for locally calculating the square of the first segment of the target center data;
    联合计算子单元,用于将所述目标中心数据的第一分片,与所述第二方中所述目标中心数据的第二分片和所述第二隐私数据的差值,进行秘密共享方式下的相乘运算,得到乘积的第二分片;The joint calculation subunit is used to share the difference between the first fragment of the target center data and the second fragment of the target center data in the second party and the second private data for secret sharing Multiplying operation in the mode to obtain the second shard of the product;
    确定子单元,用于根据所述本地计算子单元得到的平方和所述联合计算子单元得到的乘积的第二分片,确定所述第二隐私数据和所述目标中心数据的第二目标距离的第二分片。The determining subunit is configured to determine the second target distance between the second privacy data and the target center data according to the second segment of the square obtained by the local calculation subunit and the product obtained by the joint calculation subunit The second shard.
  21. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-10中任一项的所述的方法。A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1-10.
  22. 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-10中任一项的所述的方法。A computing device includes a memory and a processor, and executable code is stored in the memory, and when the processor executes the executable code, the method according to any one of claims 1-10 is implemented.
PCT/CN2021/099485 2020-06-12 2021-06-10 Method and apparatus for clustering privacy data of multiple parties WO2021249502A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010536190.4A CN111444544B (en) 2020-06-12 2020-06-12 Method and device for clustering private data of multiple parties
CN202010536190.4 2020-06-12

Publications (1)

Publication Number Publication Date
WO2021249502A1 true WO2021249502A1 (en) 2021-12-16

Family

ID=71653625

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099485 WO2021249502A1 (en) 2020-06-12 2021-06-10 Method and apparatus for clustering privacy data of multiple parties

Country Status (2)

Country Link
CN (1) CN111444544B (en)
WO (1) WO2021249502A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116094844A (en) * 2023-04-10 2023-05-09 蓝象智联(杭州)科技有限公司 Address checking method for multiparty security calculation

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444544B (en) * 2020-06-12 2020-09-11 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties
CN112560107B (en) * 2021-02-20 2021-05-14 支付宝(杭州)信息技术有限公司 Method and device for processing private data
CN113257378B (en) * 2021-06-16 2021-09-28 湖南创星科技股份有限公司 Medical service communication method and system based on micro-service technology
CN114282076B (en) * 2022-03-04 2022-06-14 支付宝(杭州)信息技术有限公司 Sorting method and system based on secret sharing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809242A (en) * 2015-05-15 2015-07-29 成都睿峰科技有限公司 Distributed-structure-based big data clustering method and device
CN106874367A (en) * 2016-12-30 2017-06-20 江苏号百信息服务有限公司 A kind of sampling distribution formula clustering method based on public sentiment platform
CN110609831A (en) * 2019-08-27 2019-12-24 浙江工商大学 Data link method based on privacy protection and safe multi-party calculation
CN111159406A (en) * 2019-12-30 2020-05-15 内蒙古工业大学 Big data text clustering method and system based on parallel improved K-means algorithm
CN111444544A (en) * 2020-06-12 2020-07-24 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996198B (en) * 2009-08-31 2016-06-29 中国移动通信集团公司 Cluster realizing method and system
CN105138923B (en) * 2015-08-11 2019-01-08 苏州大学 A kind of time series similarity calculation method for protecting privacy

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809242A (en) * 2015-05-15 2015-07-29 成都睿峰科技有限公司 Distributed-structure-based big data clustering method and device
CN106874367A (en) * 2016-12-30 2017-06-20 江苏号百信息服务有限公司 A kind of sampling distribution formula clustering method based on public sentiment platform
CN110609831A (en) * 2019-08-27 2019-12-24 浙江工商大学 Data link method based on privacy protection and safe multi-party calculation
CN111159406A (en) * 2019-12-30 2020-05-15 内蒙古工业大学 Big data text clustering method and system based on parallel improved K-means algorithm
CN111444544A (en) * 2020-06-12 2020-07-24 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116094844A (en) * 2023-04-10 2023-05-09 蓝象智联(杭州)科技有限公司 Address checking method for multiparty security calculation

Also Published As

Publication number Publication date
CN111444544B (en) 2020-09-11
CN111444544A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
WO2021249502A1 (en) Method and apparatus for clustering privacy data of multiple parties
WO2021249500A1 (en) Method and apparatus for clustering private data of multiple parties
US11275845B2 (en) Method and apparatus for clustering privacy data of plurality of parties
AU2015101194A4 (en) Semi-Supervised Learning Framework based on Cox and AFT Models with L1/2 Regularization for Patient’s Survival Prediction
US20200329063A1 (en) Method and device for determining data anomaly
CN112799708A (en) Method and system for jointly updating business model
Baker et al. Feature selection for data integration with mixed multiview data
JP2015162246A (en) efficient link management for graph clustering
Konstantinidis et al. Aspis: Robust detection for distributed learning
Grentzelos et al. A comparative study of methods to handle outliers in multivariate data analysis
Yu et al. Som 2 ce: Double self-organizing map based cluster ensemble framework and its application in cancer gene expression profiles
CN113537308A (en) Two-stage k-means clustering processing system and method based on localized differential privacy
Heveling et al. Existence, uniqueness, and algorithmic computation of general lilypond systems
CN114912627A (en) Recommendation model training method, system, computer device and storage medium
Zhu et al. Communication-optimal distributed dynamic graph clustering
CN107248929B (en) Strong correlation data generation method of multi-dimensional correlation data
CN110796189A (en) Clustering method of two-dimensional space points
WO2023216900A1 (en) Model performance evaluating method, apparatus, device, and storage medium
WO2023216899A1 (en) Model performance evaluation method and apparatus, device and medium
Wehner A new concept in advice complexity of job shop scheduling
Konstantinidis et al. Detection and Mitigation of Byzantine Attacks in Distributed Training
Zhang Statistical models for firearm and tool mark image comparisons based on the congruent matching cells (CMC) method
Hillebrand et al. Communication Cost Reduction for Subgraph Counting under Local Differential Privacy via Hash Functions
Bannick et al. Accounting for Inconsistent Use of Covariate Adjustment in Group Sequential Trials
Ngom et al. Strongly Consistent of Kullback-Leibler Divergence Estimator and Tests for Model Selection Based on a Bias Reduced Kernel Density Estimator

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21822901

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21822901

Country of ref document: EP

Kind code of ref document: A1