WO2021249502A1

WO2021249502A1 - Method and apparatus for clustering privacy data of multiple parties

Info

Publication number: WO2021249502A1
Application number: PCT/CN2021/099485
Authority: WO
Inventors: 陈超超; 周俊; 王力
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2020-06-12
Filing date: 2021-06-10
Publication date: 2021-12-16
Also published as: CN111444544A; CN111444544B

Abstract

Provided are a method and apparatus for clustering privacy data of multiple parties. The method comprises: a first party determining first shares of pieces of center data currently respectively corresponding to clusters; respectively taking these pieces of center data as target center data, and on the basis of local first privacy data and the first shares of the target center data and by means of secret sharing, performing first joint calculation with second shares of target center data in a second party, so as to obtain first shares of first target distances between the first privacy data and the target center data; on the basis of the first shares of the first target distances and by means of secret sharing, performing joint comparison with second shares of the first target distances in the second party, so as to determine the nearest first target distance among the first target distances; and determining a cluster corresponding to the nearest first target distance to be a cluster to which the first privacy data currently belongs. The disclosure of privacy data can be prevented.

Description

Method and device for clustering private data of multiple parties

Technical field

One or more embodiments of the present disclosure relate to the computer field, and in particular, to a method and device for clustering private data from multiple parties.

Background technique

Clustering is a very common technique in machine learning. It is often applied to tasks such as community discovery and anomaly detection. The usual clustering algorithm is an unsupervised learning algorithm whose purpose is to group similar objects into the same cluster. The more similar the objects in the cluster, the better the clustering effect. The biggest difference between clustering and classification is that the target of classification is known in advance, while clustering is different. The result is the same as the classification, but the classification is not pre-defined.

In some scenarios, the data is distributed horizontally across multiple parties. The data owned by each party may be private data, that is, the private data owned by one party cannot be disclosed to other parties. In this case, the prior art does not provide a suitable clustering method.

Therefore, it is hoped that there will be an improved solution that can prevent the leakage of private data when clustering private data from multiple parties.

Summary of the invention

One or more embodiments of the present disclosure describe a method and device for clustering private data of multiple parties, which can prevent the leakage of private data when clustering private data of multiple parties.

In a first aspect, a method for clustering private data of multiple parties is provided, the multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data The method includes multiple first private data, which is executed by the first party, and includes multiple rounds of iterative processes, where any round of iteration includes: determining the first shard of each center data corresponding to each cluster; so The second party has the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and the respective central data is used as the target The central data is based on the first local private data and the first fragment of the target center data, and the first joint calculation is performed with the second fragment of the target center data in the second party by means of secret sharing to obtain the result The first segment of the first privacy data and the first target distance of the target center data; the second segment having the first target distance; the first segment based on each first target distance Slices, using a secret sharing method to jointly compare with the second shards of the first target distances in the second party to determine the closest first target distance among the first target distances; The cluster corresponding to the target distance is determined as the cluster to which the first private data currently belongs.

In a possible implementation manner, the first joint calculation includes: locally calculating a first distance between the first private data and a first fragment of the target center data; The difference between the first segment and the first private data is multiplied with the second segment of the target center data in the second party, and multiplied in the secret sharing mode to obtain the first segment of the product; The first segment of the first distance and the product determines the first segment of the first target distance of the first privacy data and the target center data.

In a possible implementation manner, the arbitrary round of iteration is the first iteration, and the first fragments of the respective center data currently corresponding to the respective clusters are randomly initialized data.

In a possible implementation manner, the joint comparison includes: based on the first fragments of any two first target distances among the first target distances, using a secret sharing manner to communicate with the second party The second fragments of any two first target distances are jointly compared to determine the comparison result of the distance between the any two first target distances; according to the comparison results, determine the first target distance The closest first target distance.

In a possible implementation manner, after the determining the cluster corresponding to the closest first target distance as the cluster to which the first private data currently belongs, the method further includes: according to the same cluster The average value of each first private data of, update the first shard of the central data of this type of cluster.

Further, after the update of the first fragment of the central data of this type of cluster, the method further includes: judging whether the amount of change in the central data of each type of cluster satisfies a preset condition for stopping iteration; if the judgment result is each The change amount of the center data of the cluster does not meet the preset iterative stop condition, and then the next iteration in the multiple rounds of iterative process is performed.

Further, the method further includes: if the result of the judgment is that the variation of the central data of various clusters meets a preset iterative stop condition, determining the cluster to which the first private data currently belongs is the first The class cluster to which the private data ultimately belongs.

Further, the judging whether the variation of the center data of the various clusters meets the preset iterative stop condition includes: taking any one of the various types of clusters as a target cluster, and according to the value of the target cluster The first fragment of the center data before the update, the first fragment of the updated center data of the target cluster, and the center of the target cluster before the update in the second party are secretly shared. The second segment of the data and the second segment of the updated center data of the target cluster are jointly compared to determine whether the amount of change in the center data of the target cluster meets a preset iterative stop condition.

In a possible implementation manner, the second party has a second private data set, and the second private data set includes a plurality of second private data, and the method further includes: separately combining the data of each center As the target center data, based on the first segment of the local target center data, a second joint calculation is performed with the second private data in the second party and the second segment of the target center data by means of secret sharing, Obtain the second segment of the second privacy data and the second target distance of the target center data; the second party has the first segment of the second target distance.

Further, the second joint calculation includes: locally calculating the square of the first fragment of the target center data; and combining the first fragment of the target center data with the target center data in the second party. The difference between the second fragment of and the second private data is multiplied in the secret sharing mode to obtain the second fragment of the product; the second fragment of the product is determined according to the square and the second fragment of the product The second segment of the second privacy data and the second target distance of the target center data.

In a second aspect, there is provided an apparatus for clustering private data of multiple parties, the multiple parties including a first party and a second party, the first party having a first set of private data, and the first set of private data The device includes a plurality of first privacy data, and the device is set in the first party to perform multiple rounds of iterative processes, including the following units for performing any round of iteration: a central determination unit, used to determine the current class of each cluster Corresponding to the first shard of each central data; the second party has the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to The center data; the first joint computing unit, used to respectively use the center data determined by the center determination unit as the target center data, based on the first local privacy data and the first fragment of the target center data, using secret sharing In this way, a first joint calculation is performed with the second segment of the target center data in the second party to obtain the first segment of the first privacy data and the first target distance of the target center data; The second party has the second segment of the first target distance; the joint comparison unit is configured to use the secret sharing method based on the first segment of each first target distance obtained by the first joint calculation unit, and The second segment of each first target distance in the second party is jointly compared to determine the closest first target distance among the first target distances; the cluster determining unit is configured to compare the closest first target distance determined by the joint comparing unit The cluster corresponding to the first target distance of is determined as the cluster to which the first private data currently belongs.

In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.

In a fourth aspect, a computing device is provided, including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.

With the method and device provided by the embodiments of the present disclosure, neither party determines the central data of each cluster individually, but the first party determines the first fragment of each central data corresponding to each cluster, and the second party Determine the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and subsequently determine the first private data and the target center When the first target distance of the data is used, the secret sharing method is used. The first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance; in determining each first target distance When the closest first target distance in, the secret sharing method is also used; finally, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. The whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present disclosure;

Fig. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment;

Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment.

detailed description

The solution provided by the present disclosure will be described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present disclosure. This implementation scenario involves clustering private data from multiple parties. It is understandable that the above-mentioned multiple parties may be two parties or more than two parties, for example, three parties, four parties, and so on. In the embodiments of the present disclosure, clustering of private data of two parties is taken as an example for description. 1, the first party 11 has privacy data 1, privacy data 2, privacy data 3, privacy data 4, and privacy data 5; the second party 12 has privacy data 6, privacy data 7, privacy data 8, and privacy data 9. Among them, the first party and the second party are only a distinction between the two parties. The first party may also be referred to as party A, the second party may be referred to as party B, and so on.

In the embodiments of the present disclosure, the information covered by the private data is not limited, and it may be any information that cannot be communicated, for example, the user's personal information or business secrets. For example, the private data is the user's personal information, including the user's name, age, income, etc., for details, please refer to the correspondence table of the information contained in each private data shown in Table 1.

Table 1: Correspondence table of information contained in each private data

To	姓名Name	年龄(岁)age)	收入(万元)Income (ten thousand yuan)
隐私数据1Privacy data 1	张一Zhang Yi	2525	1.51.5
隐私数据2Privacy data 2	张二Zhang Er	2626	2.22.2
隐私数据3Privacy data 3	张三Zhang San	3535	0.80.8
隐私数据4Privacy data 4	赵一Zhao Yi	4141	1.81.8
隐私数据5Privacy data 5	赵二Zhao Er	1919	0.60.6
隐私数据6Privacy data 6	赵三Zhao San	2828	3.53.5
隐私数据7Privacy data 7	赵四Zhao Si	3636	1.21.2
隐私数据8Privacy data 8	王一Yi Wang	2929	1.31.3
隐私数据9Privacy data 9	王二King Two	3030	2.22.2

It can be seen from Table 1 that the data of different rows in Table 1 may be distributed in different parties. For example, private data 1 is distributed in the first party, and private data 8 is distributed in the second party. This kind of data distribution is distributed among multiple parties. It can be called horizontal segmentation.

In the embodiments of the present disclosure, it is necessary to cluster private data of multiple parties. Taking Figure 1 as an example, it is for private data 1, private data 2, private data 3, private data 4, private data 5, private data 6, and private data 7. ,Privacy data8,Privacy data9 are clustered. Private data distributed in different parties may be classified into the same cluster. For example, private data 1, private data 3, private data 6, and private data 7 are classified into categories Cluster 1, private data 2, private data 4, private data 5, private data 8, and private data 9 are divided into cluster 2. In the embodiments of the present disclosure, a method of secret sharing is used to implement clustering of private data from multiple parties without revealing private data.

FIG. 2 shows a flowchart of a method for clustering private data of multiple parties according to an embodiment, and the method may be based on the implementation scenario shown in FIG. 1. The multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the method is executed by the first party , Including multiple rounds of iterative process, as shown in Figure 2, any round of iteration includes the following steps: Step 21, determine the first segment of each central data corresponding to each cluster; the second party has the The second shard of each center data; the sum of the first shard of any center data and the second shard of the center data is equal to the center data; step 22, each center data is used as the target center data, based on The local first private data and the first fragment of the target center data are secretly shared, and the first joint calculation is performed with the second fragment of the target center data in the second party to obtain the first privacy Data and the first segment of the first target distance of the target center data; the second party has the second segment of the first target distance; step 23, the first segment based on each first target distance , Using a secret sharing method to jointly compare with the second shards of the first target distances in the second party to determine the closest first target distance among the first target distances; step 24, the nearest The cluster corresponding to the first target distance is determined as the cluster to which the first private data currently belongs. The following describes the specific implementation of each of the above steps.

First, in step 21, determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and the The sum of the second shards of the central data is equal to the central data. It is understandable that the number of the above-mentioned respective clusters may be preset, for example, it is preset to divide the private data of multiple parties into two clusters or three clusters.

In the embodiment of the present disclosure, each center data is jointly determined by the first party and the second party. The first party can only determine the first shard of each center data, and the second party determines the second shard of each center data. , Neither the first party nor the second party can individually determine the central data.

In an example, the arbitrary round of iteration is the first iteration, and the first fragments of the respective central data currently corresponding to the respective clusters are randomly initialized data.

For example, assuming that the number of each of the above-mentioned clusters is 2, the first party randomly initializes the first share of the two central data, denoted as (<c1>1,<c2>1); accordingly, The second party randomly initializes the second fragment of 2 central data, denoted as (<c1>2,<c2>2).

Further, the first party may initialize a K-dimensional cluster vector for each first private data to mark the cluster to which the first private data belongs, where K is the number of clusters, when K=2 When, initialize a 2-dimensional cluster vector, for example, the initial vector is all 0, that is, [0, 0].

Then, in step 22, each center data is used as the target center data, based on the first local privacy data and the first fragment of the target center data, and the secret sharing method is used to communicate with the target center in the second party. Perform the first joint calculation on the second segment of data to obtain the first segment of the first target distance of the first privacy data and the target center data; the second party has the first segment of the first target distance Two slices. It is understandable that the sum of the first fragment of the target center data and the second fragment of the target center data is the target center data.

In the embodiment of the present disclosure, assuming that c1 represents the target center data, and x1 represents the first private data, the first target distance between the first private data and the target center data can be expressed as (c1-x1)^2, and then <c1 >1 indicates the first fragment of the target center data, and <c1>2 indicates the second fragment of the target center data. The following formula derivation process can be performed:

(c1-x1)^2

＝(<c1>1+<c1>2-x1)^2

＝(<c1>1-x1)^2+2(<c1>1-x1)<c1>2+(<c1>2)^2

According to the derivation result of the above formula, the solution (c1-x1)^2 can be transformed into the solution (<c1>1-x1)^2, (<c1>1-x1)<c1>2 and (<c1>2) ^2.

In an example, the first joint calculation includes: locally calculating the first distance between the first private data and the first fragment of the target center data; and dividing the first fragment of the target center data The difference between the first privacy data and the second segment of the target center data in the second party is multiplied in the secret sharing mode to obtain the first segment of the product; according to the first The first segment of the distance and the product determines the first segment of the first target distance of the first privacy data and the target center data.

It is understandable that the above-mentioned first distance corresponds to (<c1>1-x1)^2 in the derivation result of the aforementioned formula; the above product corresponds to (<c1>1-x1)<c1>2 in the derivation result of the aforementioned formula. The first slice of the product can be expressed as <(<c1>1-x1)<c1>2>1. The first party can sum (<c1>1-x1)^2 and <(<c1>1-x1)<c1>2>1 to get the first segment of the first target distance, and the first target distance The first fragment can be expressed as <x1c1>1.

Correspondingly, the second party can determine the second segment of the first target distance in the following manner: the second party uses the respective center data as the target center data, based on the second segment of the local target center data, and uses the secret In a shared manner, joint calculation is performed with the first shard of the first private data in the first party and the target center data to obtain the second target distance between the first private data and the target center data. Fragmentation.

Further, the aforementioned joint calculation includes: the second party locally calculates the square of the second segment of the target center data; and the second segment of the target center data is combined with the target center data in the first party. The difference between the first fragment of and the first private data is multiplied in the secret sharing mode to obtain the second fragment of the product; the second fragment of the product is determined according to the square and the second fragment of the product The second segment of the first target distance of the first privacy data and the target center data.

It is understandable that the above square corresponds to (<c1>2)^2 in the derivation result of the aforementioned formula; the above product corresponds to (<c1>1-x1)<c1>2 in the derivation result of the aforementioned formula, the second of the above product Fragmentation can be expressed as <(<c1>1-x1)<c1>2>2. The second party can sum (<c1>2)^2 and <(<c1>1-x1)<c1>2>2 to get the second segment of the first target distance, and the second segment of the first target distance. Fragmentation can be expressed as <x1c1>2.

In the embodiment of the present disclosure, assuming that c2 represents another target center data other than c1, and x1 represents the first privacy data, the distance between x1 and c2 can be determined in the same way as the distance between x1 and c1. .

Then in step 23, based on the first segment of each first target distance, use the secret sharing method to jointly compare with the second segment of each first target distance in the second party to determine the first target distance The closest first target distance. It is understandable that each first target distance is the distance between the first private data and each central data, and the sum of the first segment of the first target distance and the second segment of the first target distance is the first target distance .

In the embodiment of the present disclosure, when the number of clusters is two, there are two center data. Correspondingly, there are two first target distances. By comparing the size of the two first target distances, each first target distance can be determined. The closest first target distance among target distances. For example, compare the size of x1c1 and x1c2, where the smaller corresponding cluster is the cluster to which x1 belongs. Assuming that x1c2 is a small value, it means that x1 is closest to c2, and its cluster vector changes It is [0,1].

When the number of clusters is more than three, there are more than three center data, and correspondingly, there are more than three first target distances. Compare the size of any two first target distances among them to determine the first target distance. The closest first target distance.

In an example, the joint comparison includes: based on any two first fragments of the first target distances among the first target distances, using a secret sharing manner, to compare with any two of the second party The second segment of the first target distance is jointly compared to determine the comparison result of the distance between any two first target distances; according to the comparison results, the closest first target distance among the first target distances is determined One target distance.

Finally, in step 24, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. It is understandable that in different rounds of iterative processes, the clusters to which the first private data belongs may be different.

In an example, after the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs, the method further includes: according to the first clusters of the same cluster The average value of the private data, the first segment of the central data of the cluster is updated.

It can be understood that the aforementioned clusters of the same type are any type of clusters among the aforementioned types of clusters.

For example, the first party and the second party update the central data (c1 and c2) according to the cluster vector of all private data. Taking c1 as an example, the update process is as follows:

The first party calculates the mean value of all private data whose cluster vector is [1, 0], denoted as <c1>1;

The second party calculates the mean value of all private data whose cluster vector is [1, 0], denoted as <c1>2.

For example, the above stop iteration condition is |C(t)-C(t+1)|^2<delta, where delta can be a preset value, C(t) represents the central data before update, C(t+ 1) Represents the updated central data.

In the embodiment of the present disclosure, the processing procedure from step 21 to step 24 is mainly to describe the first party's first private data for its own party, and determine the category cluster to which the first private data belongs. In addition, the first party's first private data for the second party The second private data also needs to cooperate with the second party to determine the cluster to which the second private data belongs in a secret sharing manner.

In an example, the second party has a second private data set, and the second private data set includes a plurality of second private data, and the method further includes: the first party uses the respective center data as The target center data is based on the first fragment of the local target center data, and the second joint calculation is performed with the second private data in the second party and the second fragment of the target center data by means of secret sharing, to obtain The second private data and the second segment of the target center data at the second target distance; the second party has the first segment of the second target distance.

Further, the second joint calculation includes: locally calculating the square of the first fragment of the target center data; and combining the first fragment of the target center data with the target center data in the second party. The difference between the second fragment of and the second private data is subjected to a multiplication operation in the secret sharing mode to obtain the second fragment of the product;

Determine the second segment of the second target distance between the second privacy data and the target center data according to the second segment of the square and the product.

It is understandable that in the method for clustering private data of multiple parties, the status of the first party and the second party are equal, and the processing procedures of the first party and the second party are not substantially different. In the embodiments of the present disclosure, Mainly take the first party as the executive body to describe the corresponding processing process.

With the method provided by the embodiments of the present disclosure, instead of determining the central data of each cluster individually by any party, the first party determines the first segment of each central data currently corresponding to each cluster, and the second party determines each cluster. The second shard of the central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and the subsequent determination of the first private data and the target central data In the case of the first target distance, the secret sharing method is used. The first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance; in determining the first target distance, the In the case of the closest first target distance, a secret sharing method is also used; finally, the cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs. The whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.

According to another embodiment, there is also provided an apparatus for clustering private data of multiple parties, and the apparatus is configured to execute the method for clustering private data of multiple parties provided in the embodiments of the present disclosure. The multiple parties include a first party and a second party, the first party has a first private data set, the first private data set includes a plurality of first private data, and the device is set on the first party , Used to perform multiple rounds of iterative process. Fig. 3 shows a schematic block diagram of an apparatus for clustering private data of multiple parties according to an embodiment. As shown in FIG. 3, the device 300 includes the following units for performing any round of iteration: a center determining unit 31, used to determine the first segment of each center data corresponding to each cluster; the second party There are the second shards of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; the first joint computing unit 32 is configured to separately The center data determined by the center determining unit 31 is used as the target center data. Based on the first local privacy data and the first fragment of the target center data, the secret sharing method is used to communicate with the target center data in the second party. Perform the first joint calculation in two shards to obtain the first shard with the first target distance of the first privacy data and the target center data; the second shard with the first target distance for the second party The joint comparison unit 33 is configured to use the secret sharing method based on the first fragments of the first target distances obtained by the first joint calculation unit 32 to obtain the second distance from the first targets in the second party Joint comparison is performed in pieces to determine the closest first target distance among the first target distances; the cluster determining unit 34 is configured to determine the cluster corresponding to the closest first target distance determined by the joint comparing unit 33 Is the cluster to which the first private data currently belongs.

Optionally, as an embodiment, the first joint computing unit 32 includes: a local computing subunit, configured to locally calculate the first data between the first private data and the first fragment of the target center data Distance; joint calculation subunit, used to share the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party for secret sharing The first segment of the product is obtained by the multiplication operation in the mode; the determining subunit is used to determine the first segment of the product based on the first distance obtained by the local calculation subunit and the first segment of the product obtained by the joint calculation subunit The first segment of the first target distance of the first privacy data and the target center data.

Optionally, as an embodiment, the arbitrary round of iteration is the first iteration, and the center determining unit 31 is specifically configured to determine that the first shard of each center data corresponding to each cluster is randomly initialized. The data.

Optionally, as an embodiment, the joint comparison unit 33 includes: a joint comparison subunit, configured to use a secret shared first segment based on any two first target distances among the first target distances In the second party, a joint comparison is made with the second segment of any two first target distances in the second party to determine the comparison result of the distance between any two first target distances; to determine the subunit, use Based on the comparison results determined by the joint comparison subunit, the closest first target distance among the first target distances is determined.

Optionally, as an embodiment, the device further includes: an update unit, configured to determine, in the cluster determining unit 34, the cluster corresponding to the closest first target distance as the first private data After the cluster currently belongs, the first segment of the central data of the cluster is updated according to the average value of each first private data of the same cluster.

Further, the device further includes: a judging unit for judging whether the amount of change in the central data of each type of cluster meets a preset stop after the update unit updates the first fragment of the central data of the cluster. Iteration condition; an iteration triggering unit, which is used to perform the next iteration of the multi-round iteration process if the judgment result of the judging unit is that the variation of the center data of various clusters does not meet the preset iterative stop condition .

Further, the device further includes: a final determination unit, configured to determine the cluster determination unit if the change amount of the center data of each type of cluster satisfies a preset iterative stop condition if the determination result of the determination unit is 34 The cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.

Further, the judging unit is specifically configured to use any one of the various types of clusters as a target class cluster, and according to the first segment of the center data of the target class cluster before updating, and the value of the target class cluster The first segment of the updated central data is shared with the second segment of the pre-updated central data of the target cluster in the second party and the updated center of the target cluster by means of secret sharing. The second segment of the data is jointly compared to determine whether the amount of change in the center data of the target cluster meets the preset stop iteration condition.

Optionally, as an embodiment, the second party has a second private data set, and the second private data set includes a plurality of second private data, and the device further includes: a second joint computing unit, Using the data of each center as the target center data respectively, based on the first fragment of the local target center data, using the secret sharing method, the second private data in the second party and the second private data of the target center data are shared with each other. Perform a second joint calculation on the fragments to obtain a second fragment of the second target distance of the second privacy data and the target center data; the second party has the first fragment of the second target distance.

Further, the second joint calculation unit includes: a local calculation subunit for locally calculating the square of the first slice of the target center data; a joint calculation subunit for calculating the first shard of the target center data Fragment, and the difference between the second fragment of the target center data in the second party and the second privacy data is multiplied in the secret sharing mode to obtain the second fragment of the product; confirm A subunit for determining a second target distance between the second privacy data and the target center data according to the second segment of the square obtained by the local calculation subunit and the product obtained by the joint calculation subunit The second fragment.

With the device provided by the embodiment of the present disclosure, instead of determining the center data of each cluster individually by any party, the center determining unit 31 of the first party determines the first segment of each center data currently corresponding to each cluster. The second party determines the second shard of each central data; the sum of the first shard of any central data and the second shard of the central data is equal to the central data; and subsequently the first joint computing unit 32 determines the When the first target distance between the first privacy data and the target center data is used, the secret sharing method is used. The first party determines the first segment of the first target distance, and the second party determines the second segment of the first target distance. Pieces; when the joint comparison unit 33 determines the closest first target distance among the first target distances, a secret sharing method is also used; the last class cluster determination unit 34 classifies the class cluster corresponding to the nearest first target distance , It is determined as the cluster to which the first private data currently belongs. The whole process is based on secret sharing, which can prevent the leakage of private data when clustering private data from multiple parties.

According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.

According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 method.

Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.

The specific embodiments described above further describe the objectives, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims

A method for clustering private data of multiple parties. The multiple parties include a first party and a second party. The first party has a first set of private data. The first set of private data includes multiple first parties. For private data, the method is executed by the first party and includes multiple rounds of iterative processes, where any round of iteration includes:

Determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and the second shard of the center data The sum of the shards is equal to the data of the center;

Regarding the respective center data as the target center data, based on the local first privacy data and the first fragment of the target center data, using the secret sharing method, the second partition of the target center data in the second party is used. Perform a first joint calculation on a slice to obtain a first slice with a first target distance between the first privacy data and the target center data; and a second slice with the first target distance for the second party;

Based on the first segment of each first target distance, use the method of secret sharing to perform joint comparison with the second segment of each first target distance in the second party to determine the closest first target distance of each first target distance. Target distance

The cluster corresponding to the closest first target distance is determined as the cluster to which the first private data currently belongs.
The method of claim 1, wherein the first joint calculation includes:

Locally calculating the first distance between the first private data and the first fragment of the target center data;

Multiply the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party, and perform the multiplication operation in the secret sharing mode to obtain the product The first fragment;

Determine the first segment of the first target distance of the first privacy data and the target center data according to the first segment of the first distance and the product.
The method according to claim 1, wherein the arbitrary round of iteration is the first iteration, and the first fragments of the respective central data currently corresponding to the respective clusters are randomly initialized data.
The method of claim 1, wherein the joint comparison comprises:

Based on any two first fragments of the first target distance in each of the first target distances, using a secret sharing method, perform a secret sharing method with the second fragments of any two first target distances in the second party. Joint comparison to determine the comparison result of the distance between any two first target distances;

According to the comparison results, the closest first target distance among the first target distances is determined.
The method according to claim 1, wherein after the determining the cluster corresponding to the closest first target distance as the cluster to which the first private data currently belongs, the method further comprises:

According to the average value of each first private data of the same type of cluster, the first segment of the center data of the type of cluster is updated.
The method according to claim 5, wherein after said updating the first fragment of the central data of the cluster, the method further comprises:

Determine whether the change of the center data of various clusters meets the preset conditions for stopping iteration;

If the result of the judgment is that the amount of change in the center data of various clusters does not meet the preset iterative stop condition, then the next iteration of the multiple rounds of iterative process is performed.
The method of claim 6, wherein the method further comprises:

If the result of the judgment is that the variation of the central data of the various clusters meets the preset iterative stop condition, the cluster to which the first private data currently belongs is determined as the cluster to which the first private data ultimately belongs.
7. The method according to claim 6, wherein said determining whether the change amount of the center data of various clusters meets a preset condition for stopping iteration comprises:

Taking any one of the various types of clusters as the target cluster, according to the first fragment of the center data before the update of the target cluster and the first fragment of the updated center data of the target cluster, By means of secret sharing, a joint comparison is made with the second shard of the central data before the update of the target cluster and the second shard of the updated central data of the target cluster in the second party to determine Whether the change amount of the center data of the target cluster meets the preset stop iteration condition.
The method according to claim 1, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data, and the method further comprises:

Regarding the respective center data as the target center data, based on the first fragment of the local target center data, using a secret sharing method, and the second private data in the second party and the second partition of the target center data Perform a second joint calculation on the slice to obtain a second slice of the second target distance of the second privacy data and the target center data; the second party has the first slice of the second target distance.
The method of claim 9, wherein the second joint calculation includes:

Locally calculating the square of the first fragment of the target center data;

Multiply the difference between the first fragment of the target center data and the second fragment of the target center data in the second party and the second private data in a secret sharing mode, Get the second shard of the product;

Determine the second segment of the second target distance between the second privacy data and the target center data according to the second segment of the square and the product.
A device for clustering private data of multiple parties. The multiple parties include a first party and a second party. The first party has a first set of private data. The first set of private data includes multiple first parties. Private data, the device is set on the first party, and is used to perform multiple rounds of iterative processes, including the following units for performing any round of iteration:

The center determining unit is used to determine the first shard of each center data corresponding to each cluster; the second party has the second shard of each center data; the first shard of any center data and The sum of the second shards of the central data is equal to the central data;

The first joint computing unit is configured to use each center data determined by the center determination unit as the target center data, based on the first local privacy data and the first fragment of the target center data, and share the secret with all the data. The second segment of the target center data in the second party performs the first joint calculation to obtain the first segment of the first target distance between the first privacy data and the target center data; the second party has The second segment of the first target distance;

The joint comparison unit is configured to perform a secret sharing method based on the first fragments of the first target distances obtained by the first joint calculation unit with the second fragments of the first target distances in the second party. Joint comparison to determine the closest first target distance among the first target distances;

The cluster determining unit is configured to determine the cluster corresponding to the closest first target distance determined by the joint comparing unit as the cluster to which the first private data currently belongs.
The apparatus of claim 11, wherein the first joint computing unit comprises:

A local calculation subunit for locally calculating the first distance between the first private data and the first fragment of the target center data;

The joint calculation subunit is used to combine the difference between the first segment of the target center data and the first private data with the second segment of the target center data in the second party in a secret sharing mode The multiplication operation of to get the first slice of the product;

The determining subunit is configured to determine the first distance between the first privacy data and the target center data according to the first segment of the product obtained by the local calculation subunit and the joint calculation subunit. The first segment of the target distance.
The device according to claim 11, wherein the any round of iteration is the first iteration, and the center determining unit is specifically configured to determine that the first shards of each center data corresponding to each cluster are random Initialized data.
The apparatus of claim 11, wherein the joint comparison unit comprises:

The joint comparison subunit is used to compare any two first targets in the second party with the first fragments of any two first target distances in the first target distances by means of secret sharing. Perform a joint comparison on the second segment of the distance to determine the comparison result of the distance between any two first target distances;

The determining subunit is configured to determine the closest first target distance among the first target distances according to the comparison results determined by the joint comparison subunit.
The device of claim 11, wherein the device further comprises:

The update unit is configured to: after the cluster determining unit determines the cluster corresponding to the closest first target distance as the cluster to which the first privacy data currently belongs, according to the first privacy of the same cluster The mean value of the data, update the first shard of the central data of the cluster.
The device of claim 15, wherein the device further comprises:

The judging unit is configured to, after the update unit updates the first segment of the central data of this type of cluster, judge whether the amount of change in the central data of each type of cluster satisfies a preset iterative stop condition;

The iteration triggering unit is configured to perform the next iteration of the multiple rounds of iteration if the judgment result of the judging unit is that the variation of the center data of various clusters does not meet the preset iterative stop condition.
The device of claim 16, wherein the device further comprises:

The final determination unit is configured to, if the determination result of the determination unit is that the variation of the center data of each type of cluster satisfies a preset iterative stop condition, then the first privacy data determined by the type cluster determination unit currently belongs to The class cluster is determined as the class cluster to which the first private data ultimately belongs.
The device according to claim 16, wherein the judging unit is specifically configured to use any one of the various types of clusters as a target type cluster, according to the first center data of the target type cluster before being updated. The first shard of the updated central data of the target cluster and the second shard of the updated central data of the target cluster in the second party, the The second segment of the updated center data of the target cluster is jointly compared to determine whether the change in the center data of the target cluster satisfies a preset iterative stop condition.
The device according to claim 11, wherein the second party has a second private data set, and the second private data set includes a plurality of second private data, and the device further comprises:

The second joint computing unit is configured to use the respective center data as the target center data, and use the secret sharing method to communicate with the second private data in the second party based on the first fragment of the local target center data. Perform a second joint calculation with the second segment of the target center data to obtain a second segment of the second target distance between the second privacy data and the target center data; the second party has the second target The first segment of the distance.
The apparatus of claim 19, wherein the second joint computing unit comprises:

A local calculation subunit for locally calculating the square of the first segment of the target center data;

The joint calculation subunit is used to share the difference between the first fragment of the target center data and the second fragment of the target center data in the second party and the second private data for secret sharing Multiplying operation in the mode to obtain the second shard of the product;

The determining subunit is configured to determine the second target distance between the second privacy data and the target center data according to the second segment of the square obtained by the local calculation subunit and the product obtained by the joint calculation subunit The second shard.
A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1-10.
A computing device includes a memory and a processor, and executable code is stored in the memory, and when the processor executes the executable code, the method according to any one of claims 1-10 is implemented.