CN114118180A

CN114118180A - Clustering method and device, electronic equipment and storage medium

Info

Publication number: CN114118180A
Application number: CN202110360295.3A
Authority: CN
Inventors: 韩雨锦; 李怡欣; 陈晓霖; 王虎; 黄志翔; 彭南博
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-03-01

Abstract

The application provides a clustering method and a clustering device, wherein the method suitable for service nodes comprises the steps of generating cluster vectors of the clusters based on the serial numbers of first target samples belonging to the clusters aiming at each cluster, encrypting the cluster vectors and then sending the encrypted cluster vectors to participating nodes, wherein the cluster vectors are used for representing the cluster centers of the clusters; acquiring a first difference matrix from a first sample of the service node corresponding to each class cluster to a cluster center and a second difference matrix from a second sample of the participating node to the cluster center; updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, taking each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and returning to execute the steps until the iteration is finished to generate a final target cluster. In the application, the data security and confidentiality are effectively ensured by encrypting the interactive data.

Description

Clustering method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data statistical analysis, and in particular, to a clustering method, an apparatus, an electronic device, and a storage medium.

Background

In the training process of the clustering algorithm model, the service node side wants to participate in the Identity mark number (ID) and the cluster center of the sample in the synchronous cluster of the node side in each round of training process, according to the characteristics of the clustering algorithm, the sample point of the same cluster output finally by the training of the clustering algorithm is known to have higher similarity, and the participating node side can easily deduce the label information of the sample in the whole cluster through the label information of a certain point in a certain cluster in the result output by the training, so that the safety of data in the training process cannot be ensured, and the risk of being divulged exists.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

To this end, the present application provides a clustering method in a first aspect.

The second aspect of the present application also provides another clustering method.

The third aspect of the present application provides a clustering apparatus.

A fourth aspect of the present application provides another clustering apparatus.

A fifth aspect of the present application provides an electronic device.

A sixth aspect of the present application provides a computer-readable storage medium.

A seventh aspect of the present application proposes a computer program product.

In a first aspect of the present application, a clustering method is provided, which is applicable to a service node, and includes: for each class cluster, generating a cluster vector of the class cluster based on the number of a first target sample belonging to the class cluster, encrypting the cluster vector and sending the encrypted cluster vector to a participating node, wherein the cluster vector is used for representing the cluster center of the class cluster; acquiring a first difference matrix from a first sample of the service node corresponding to each class cluster to a cluster center, and a second difference matrix from a second sample of the participating node to the cluster center; updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, taking each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and returning to execute the steps until the iteration is finished to generate a final target cluster.

In addition, the clustering method applied to the service node provided in the first aspect of the present application may further have the following additional technical features:

according to an embodiment of the present application, the obtaining of the first difference matrix includes: repeating the cluster vectors of the current iteration in the row direction according to the number of samples to construct a cluster matrix of the current iteration; and obtaining a multiplication of the cluster matrix and the characteristic space matrix of the service node to obtain a multiplication matrix, and subtracting the multiplication matrix from the characteristic space matrix to generate the first difference matrix corresponding to the current iteration.

According to an embodiment of the present application, for the first iteration, the obtaining process of the second difference matrix includes: and receiving an encrypted second difference matrix corresponding to the primary iteration sent by the participating node, and decrypting the encrypted second difference matrix to obtain the second difference matrix, wherein the encrypted second difference matrix is determined by the participating node based on an own encrypted characteristic space matrix and an encrypted initial cluster vector.

According to an embodiment of the present application, for a non-primary iteration, the obtaining process of the second difference matrix includes: receiving an encryption updating difference matrix corresponding to the current iteration and sent by the participating node, wherein the encryption updating difference matrix is used for representing the distance between the cluster center iterated last time and the cluster center iterated last time, and is generated by the participating node based on the characteristic space matrix of the participating node, the encrypted cluster vector iterated last time and the encrypted cluster vector iterated current time; decrypting the encrypted updating difference matrix to obtain an updating difference matrix; and adding the second difference matrix corresponding to the previous iteration and the updated difference matrix of the current iteration to obtain the second difference matrix of the current iteration.

According to an embodiment of the present application, the clustering method further includes: randomly setting a number of first samples during initial iteration, and taking each first sample in the set number of first samples as an initial cluster center, wherein one initial cluster center corresponds to one initial class cluster; determining the first sample as the initial cluster center as the target first sample of the initial cluster; generating a cluster vector for the initial cluster class with the number of the target first sample of the initial cluster class.

According to an embodiment of the present application, the process of generating the cluster vector includes: determining the position of the target first sample in the cluster vector according to the serial number, and encoding a vector element at the position into a first encoding value; and encoding the vector elements in the remaining positions into a second encoding value, wherein the vector elements in the remaining positions correspond to the first samples which do not belong to the class cluster.

According to an embodiment of the application, said encoding the vector element at the position into a first encoded value comprises: and aiming at any cluster, acquiring the number of the target first samples belonging to the any cluster, and determining the first coding value according to the number of the target first samples.

According to an embodiment of the present application, the clustering method further includes: receiving an encrypted verification vector sent by the participating node, and decrypting the encrypted verification vector based on the private key to obtain a decrypted verification vector, wherein the encrypted verification vector is generated according to the encrypted cluster vector; sending the decrypted authentication vector to the participating node for secure authentication of the participating node.

According to an embodiment of the present application, the clustering method further includes: based on the identification information of the samples, carrying out sample alignment with the reference node, and carrying out sequential numbering on the aligned first samples; and generating a feature space matrix of the service node based on the aligned first sample.

According to an embodiment of the present application, the clustering method further includes: and generating an encryption key, wherein the encryption key comprises a public key and a private key, and sending the public key to the participating node.

The second aspect of the present application further provides a clustering method suitable for participating nodes, including: receiving an encrypted cluster vector of each class cluster transmitted by each iteration of a service node, wherein the cluster vector is determined by the service node based on the number of a target first sample belonging to the class cluster, and the target first sample belongs to a sample in the first samples of the service node; and aiming at each class cluster, acquiring an encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector of the class cluster, and sending the encrypted target matrix to the service node until iteration is finished.

The clustering method provided by the second aspect of the present application may further have the following additional technical features:

according to an embodiment of the present application, in a first iteration, the obtaining an encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector of the class cluster is performed by using an encrypted target matrix that is an encrypted difference matrix of a second sample pair cluster center on the reference node corresponding to the class cluster, including: repeating the encrypted cluster vectors of the initial cluster in the column direction according to the sample number to construct a cluster matrix of the initial cluster; and obtaining a multiplication of the cluster matrix of the initial cluster and the feature space matrix of the participating node to obtain a multiplication matrix, and subtracting the multiplication matrix from the encrypted feature space matrix to generate the encrypted difference matrix corresponding to the initial iteration.

According to an embodiment of the present application, in a non-initial iteration, the obtaining an encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector of the class cluster includes: according to the number of samples, respectively repeating the encrypted cluster vectors corresponding to the last iteration and the encrypted cluster vectors corresponding to the current iteration in the column direction to construct two encrypted cluster matrixes; obtaining the two encryption cluster matrixes, and multiplying the two encryption cluster matrixes by the characteristic space matrix of the participating node respectively to obtain a multiplication matrix; and subtracting the multiplication matrix corresponding to the previous iteration from the multiplication matrix corresponding to the current iteration to obtain the encryption update difference matrix corresponding to the current iteration.

According to an embodiment of the application, at a first iteration, the corresponding encrypted cluster vector is determined based on the number of the target first sample belonging to an initial class cluster.

According to an embodiment of the present application, after receiving the encrypted cluster vector of each class cluster sent by the service node for each iteration, the method further includes: adding each dimension of the encrypted cluster vectors to obtain verification data; and verifying the safety verification of the service node based on the verification data.

According to an embodiment of the present application, the verifying the security verification of the service node based on the verification data includes: randomly generating a first column vector and a second column vector, and encrypting the second column vector by using a public key; performing affine transformation on the first column vector and the encrypted second column vector based on the verification data to generate an encrypted verification vector; sending the encryption verification vector to the service node, and receiving a decryption verification vector returned by the service node; and in response to the encrypted verification vector and the decrypted verification vector being consistent, determining that the service node passes security verification.

According to an embodiment of the present application, before receiving the encrypted cluster vector of each cluster class sent by the participating node at each iteration, the method further includes: based on the identification information of the sample, carrying out sample alignment with the service node, and carrying out sequential numbering on a second sample of the aligned participating node; generating a eigenspace matrix for the participating nodes based on the aligned second samples.

According to an embodiment of the present application, the clustering method further includes: and receiving the public key sent by the service node.

A third aspect of the present application provides a clustering apparatus, including: the cluster vector generation module is used for generating a cluster vector of each class cluster based on the number of a first target sample belonging to the class cluster, encrypting the cluster vector and then sending the encrypted cluster vector to a participating node, wherein the cluster vector is used for representing the cluster center of the class cluster; a difference matrix obtaining module, configured to obtain a first difference matrix from a first sample of the service node to a cluster center corresponding to each class cluster, and a second difference matrix from a second sample of the participating node to the cluster center corresponding to the participating node; and the updating module is used for updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, taking each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and returning to execute the steps until the iteration is finished to generate a final target cluster.

The clustering device provided by the third aspect of the present application may further have the following additional technical features:

according to an embodiment of the present application, the difference matrix obtaining module is further configured to: repeating the column direction of the cluster vector of the current iteration according to the number of samples to construct a cluster matrix of the current iteration; and obtaining a multiplication of the cluster matrix and the feature space matrix of the service node to obtain a multiplication matrix, and subtracting the multiplication matrix from the feature space matrix to generate the first difference matrix corresponding to the current iteration.

According to an embodiment of the present application, the difference matrix obtaining module is further configured to: and for the first iteration, receiving an encrypted second difference matrix corresponding to the first iteration sent by the participating node, and decrypting the encrypted second difference matrix to obtain the second difference matrix, wherein the encrypted second difference matrix is determined by the participating node based on an own encrypted feature space matrix and an encrypted initial cluster vector.

According to an embodiment of the present application, the difference matrix obtaining module is further configured to: aiming at non-primary iteration, receiving an encryption updating difference matrix which is sent by the participating node and corresponds to current iteration, wherein the encryption updating difference matrix is used for representing the distance between the cluster center of the last iteration and the cluster center of the current iteration, and is generated by the participating node based on a characteristic space matrix of the participating node, the encrypted cluster vector of the last iteration and the encrypted cluster vector of the current iteration; decrypting the encrypted updating difference matrix to obtain an updating difference matrix; and adding the second difference matrix corresponding to the previous iteration and the updated difference matrix of the current iteration to obtain the second difference matrix of the current iteration.

According to an embodiment of the present application, the apparatus for implementing a clustering method further includes: the cluster center selection module is used for randomly setting a number of first samples during initial iteration, and taking each first sample in the set number of first samples as an initial cluster center, wherein one initial cluster center corresponds to one initial class cluster; a sample determining module, configured to determine the first sample as the initial cluster center as the target first sample of the initial cluster; the cluster vector generating module is further configured to generate a cluster vector of the initial class cluster according to the number of the target first sample of the initial class cluster.

According to an embodiment of the present application, the cluster vector generation module further includes: and the coding unit is used for determining the position of the target first sample in the cluster vector according to the serial number, coding the vector elements at the position into a first coding value, and coding the vector elements at the residual positions into a second coding value, wherein the vector elements at the residual positions correspond to the first samples which do not belong to the cluster.

According to an embodiment of the application, the encoding unit is further configured to: and aiming at any cluster, acquiring the number of the target first samples belonging to the any cluster, and determining the first coding value according to the number of the target first samples.

According to an embodiment of the present application, the clustering apparatus further includes: and the verification module is used for receiving the encrypted verification vector sent by the participating node, decrypting the encrypted verification vector based on the private key to obtain a decrypted verification vector, and sending the decrypted verification vector to the participating node to perform security verification, wherein the encrypted verification vector is generated according to the encrypted cluster vector.

According to an embodiment of the present application, the apparatus for implementing a clustering method further includes: the numbering module is used for carrying out sample alignment on the participating nodes based on the identification information of the samples and carrying out sequential numbering on the aligned first samples; and the matrix generation module is used for generating a characteristic space matrix of the service node based on the aligned first sample.

According to an embodiment of the present application, the apparatus for implementing a clustering method further includes: and the key generation module is used for generating an encryption key, wherein the encryption key comprises a public key and a private key, and sending the public key to the participating node.

A fourth aspect of the present application provides another clustering apparatus, including: a cluster vector receiving module, configured to receive an encrypted cluster vector for each class cluster sent by a service node for each iteration, where the cluster vector is determined by the service node based on a number of a target first sample belonging to the class cluster, and the target first sample belongs to a sample in a first sample of the service node; and the target matrix generation module is used for acquiring an encrypted target matrix corresponding to each class cluster based on the encrypted cluster vector of the class cluster and sending the encrypted target matrix to the service node until iteration is finished.

The clustering device provided in the fourth aspect of the present application may further have the following additional technical features:

according to an embodiment of the present application, the target matrix generation module is further configured to: during initial iteration, repeating the encrypted cluster vectors of the initial cluster in the column direction according to the number of samples to construct a cluster matrix of the initial cluster; and obtaining a multiplication matrix by multiplying the cluster matrix of the initial cluster and the characteristic space matrix of the participating node, and subtracting the multiplication matrix from the encrypted characteristic space matrix to generate the encrypted difference matrix corresponding to the first iteration.

According to an embodiment of the present application, the target matrix generation module is further configured to: when the initial iteration is not performed, according to the sample number, the encrypted cluster vectors corresponding to the last iteration and the encrypted cluster vectors corresponding to the current iteration are repeated in the column direction respectively, and two encrypted cluster matrixes are constructed; obtaining the two encryption cluster matrixes, and multiplying the two encryption cluster matrixes by the characteristic space matrix of the participating node respectively to obtain a multiplication matrix; and subtracting the multiplication matrix corresponding to the previous iteration from the multiplication matrix corresponding to the current iteration to obtain the encryption updating difference matrix corresponding to the current iteration.

According to an embodiment of the present application, the clustering apparatus further includes: and the safety verification module is used for adding each dimensionality of the encrypted cluster vector to obtain verification data and verifying the safety verification of the service node based on the verification data.

According to an embodiment of the application, the security verification module comprises:

a column vector generating unit for randomly generating a first column vector and a second column vector and encrypting the second column vector using a public key; a verification vector generation unit configured to perform affine transformation on the first column vector and the encrypted second column vector based on verification data, and generate an encrypted verification vector; the decryption unit is used for sending the encryption verification vector to the service node and receiving the decryption verification vector returned by the service node; and the verification unit is used for responding to the consistency of the encrypted verification vector and the decrypted verification vector and determining that the service node passes the security verification.

According to an embodiment of the present application, the clustering apparatus further includes: the numbering module is used for carrying out sample alignment on the service node based on the identification information of the sample and carrying out sequential numbering on a second sample of the aligned participating node; and the matrix generation module is used for generating a characteristic space matrix of the participating node based on the aligned second sample.

According to an embodiment of the present application, the clustering apparatus further includes: and the receiving module is used for receiving the public key sent by the service node.

To achieve the above object, a fifth aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the clustering method set forth in any one of the first and second aspects.

A sixth aspect of the present application proposes a computer-readable storage medium, wherein the computer instructions are configured to cause the computer to perform the clustering method proposed in any one of the first and second aspects.

A seventh aspect of the present application proposes a computer program product comprising a computer program which, when executed by a processor, implements the clustering method proposed according to any one of the first and second aspects described above.

The clustering method suitable for the service nodes generates cluster vectors of clusters based on the number of a target first sample, encrypts the cluster vectors and sends the encrypted cluster vectors to the participating nodes, obtains a second difference matrix generated by the participating nodes and a first difference matrix generated by the service nodes, updates the cluster vectors based on the first difference matrix and the second difference matrix, re-clusters the first sample by using the updated cluster vectors, uses the updated clusters as initial clusters of the next iteration, continues iteration until all iterations are finished, and generates a final target cluster. In the method, the serial number marks are used for sample data interacted between the participating nodes and the service nodes, and meanwhile encryption processing is carried out, so that the participating nodes cannot acquire characteristic data of actual samples of the service nodes through data interaction, the data safety of the samples is effectively guaranteed while the clustering method effect is guaranteed, and the confidentiality of the sample data is strengthened.

It should be understood that the description herein is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present application will become apparent from the following description.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flow chart of a clustering method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 3 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 4 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 5 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 6 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 7 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 8 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 9 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 10 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 11 is a schematic flow chart of a clustering method according to another embodiment of the present application;

FIG. 12 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 13 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 14 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 15 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 16 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 17 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 18 is a schematic flow chart diagram illustrating a clustering method according to another embodiment of the present application;

FIG. 19 is a schematic structural diagram of a clustering device according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of a clustering device according to another embodiment of the present application;

fig. 21 is a schematic structural diagram of a clustering device according to another embodiment of the present application;

fig. 22 is a schematic structural diagram of a clustering device according to another embodiment of the present application;

fig. 23 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A clustering method, an apparatus, an electronic device, and a storage medium according to embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a clustering method according to an embodiment of the present application, where the clustering method is applied to a service node, and as shown in fig. 1, the method includes:

s101, aiming at each class cluster, generating a cluster vector of the class cluster based on the number of the first target sample belonging to the class cluster, encrypting the cluster vector and then sending the encrypted cluster vector to a participating node, wherein the cluster vector is used for representing the cluster center of the class cluster.

The clustering method is an unsupervised learning method, and is based on similarity, so that classification analysis is carried out on a large amount of data according to different data characteristics, and the effect of defining the characteristics of each classified cluster is achieved. The data subclasses with similar characteristics can be grouped together to the maximum extent, the clustering quality is improved, and the local optimal solution is approached.

The class cluster can combine private and concrete subclasses under a public and abstract super class, and is understood that the class cluster is a universal interface of a plurality of subclasses, and one class cluster contains the subclasses capable of realizing a plurality of functions.

In the embodiment of the present application, a service node may be understood as an initiator (Guest) of a clustering method model training, and a sample belonging to the service node is a first sample. The participating node (Host), which may be understood to include both participants and data providers, is the second sample belonging to the participating node. Wherein the first and second samples are generated based on different features of the same object.

It should be noted that the service node includes part of information of the user a, the user B, the user C, … …, and the user N, and the part of information is used as a first sample, such as call record information of the user, and so on. And the participating nodes comprise another part of information of the user A, the user B, the user C, … … and the user N, and the part of information is taken as a second sample, such as online shopping behavior information of the user and the like.

For each class cluster, the service node determines a first sample belonging to the class cluster as a target first sample, and the first sample has a unique non-repeating number. A cluster vector for the cluster type may be generated based on the number of the target first sample. The cluster vector of each cluster can represent the cluster center of the cluster, and in the iterative process of model training of each round, each cluster has only one cluster vector, and the cluster vector is a vector capable of reflecting the cluster center of the cluster. In order to ensure the confidentiality and the security of the feature data in the class cluster, the cluster vector needs to be encrypted, and then the encrypted cluster vector is sent to the participating node.

Further, before encrypting the cluster vector, a pair of public and private keys needs to be generated, which can be denoted as { p, p '}, where p is a public key, p' is a private key, the public key is used for encryption, and the private key is used for decryption.

The settings may be encoded using a 0, 1 encoding rule. Setting the sample points selected as cluster vectors to be coded by 1, and the other sample points to be coded by 0, so that 1 × m-dimensional cluster vectors can be obtained, setting k clusters in total, and k cluster vectors in total, obtaining k 1 × m cluster vectors after coding, then encrypting the k 1 × m cluster vectors by using a public key p, and sending the encrypted k 1 × m cluster vectors to the participating nodes.

S102, a first difference matrix from a first sample of a service node corresponding to each class cluster to a cluster center and a second difference matrix from a second sample of a corresponding participating node to the cluster center are obtained.

In this embodiment of the application, the service node may form a difference matrix, which is a first difference matrix, based on a difference between each first sample and the cluster center. Optionally, the service node may construct a cluster matrix of each class cluster based on the cluster vector of the class cluster.

Further, the service node acquires the first difference matrix of each class cluster based on the characteristic space matrix of the service node and the cluster matrix of each class cluster.

Further, the participating nodes may construct a difference matrix based on the difference of the second sample to the corresponding cluster center, as a second difference matrix. The service node may receive the second difference matrix sent by the participating node, and similarly, the participating node may obtain the encrypted cluster vector of each class cluster sent by the service node. Further, the participating nodes construct cluster matrixes of the clusters based on the cluster vectors of the clusters aiming at each cluster, and a second difference matrix of each cluster is obtained according to the characteristic space matrix of each participating node and the cluster matrix of each cluster. And the participating node sends the second difference matrix of each class cluster to the service node, and in order to ensure the safety of data transmission, the participating node can encrypt the second difference matrix and then send the second difference matrix to the service node. Optionally, the second difference matrix is a difference matrix encrypted with the public key p.

Correspondingly, the service node needs to decrypt the obtained second difference matrix by using the private key p', and then obtains a second difference matrix from a second sample of the participating node to the cluster center.

It should be noted that, the service node uses the feature information of each first sample to form a feature space matrix, and the feature information of each first sample is a row element of the feature space matrix of the service node. And the participating node forms a characteristic space matrix of the participating node by utilizing the characteristic information of each second sample of the participating node, wherein the characteristic information of each second sample is a row element of the characteristic space matrix.

S103, updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, taking each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and returning to execute the steps until the iteration is finished to generate a final target cluster.

In this embodiment of the application, after acquiring the first difference matrix and the second difference matrix for each cluster, the service node may update a cluster center corresponding to the cluster based on the first difference matrix and the second difference matrix of the cluster, that is, re-determine a new cluster vector of the cluster. And clustering each first sample again by using the updated cluster center to generate a new class cluster, and taking the generated new class cluster as the class cluster corresponding to the next iteration.

Setting a first difference matrix as GD for the ith cluster_iThe second difference matrix is HD_iThe service node obtains a first difference matrix GD_iAnd a second difference matrix HD_iAnd then, splicing the two samples, wherein the service node can acquire the distance from each first sample to the cluster center of the ith cluster through the splicing matrix, and clustering each first sample again based on the distance from each first sample to the cluster center of the ith cluster. After re-clustering, the first target sample belonging to the ith cluster can be re-determined, and the cluster vector of the cluster can be updated according to the re-determined first target sample for next iteration.

The distance from the first sample with the number i to the cluster center of the ith cluster can be calculated and obtained according to the following formula:

further, in order to achieve the optimal effect of the clustering method, conditions for ending the iteration can be set, when the effect of the clustering method output by a certain iteration or the overall iteration number reaches the set conditions defined by the ending of the iteration, the iteration of the clustering method can be ended, and the cluster generated by the last iteration is used as the final target cluster.

The iteration stop condition may be set as a change condition of the cluster center, the number of iterations of the algorithm, the minimum loss function, and the like.

For example, the end condition of the iteration is set as iteration number N, when the iteration number of a certain round reaches N, the current round of iteration is the last round of iteration, and the class cluster generated by the iteration of the round is the final target class cluster.

For another example, the end condition of the iteration is set such that the aggregate structure of the sample points having the feature a in the cluster generated by a certain ethical iteration reaches a certain degree B. And simultaneously setting a detection program for the aggregation degree of the sample points with the characteristics A in the cluster class based on the set iteration end conditions, and storing the detection program in a set position. And after each iteration is finished, the detection program detects the aggregation degree of the sample points with the characteristics A in the generated cluster, and when the aggregation organization of the sample points with the characteristics A in the cluster generated after a certain ethical iteration reaches the degree B, the set iteration effect is judged to be achieved, and the iteration can be finished. The iteration of the current round is the last iteration, and the generated class cluster is determined as the final target class cluster.

The clustering method comprises the steps of generating cluster vectors of clusters based on the number of a target first sample, encrypting the cluster vectors and then sending the cluster vectors to participating nodes, obtaining a second difference matrix generated by the participating nodes and a first difference matrix generated by service nodes, updating the cluster vectors based on the first difference matrix and the second difference matrix, re-clustering the first sample by using the updated cluster vectors, using the updated clusters as initial clusters of the next iteration, continuing the iteration until all iterations are finished, and generating final target clusters. In the method, the number is used for marking the sample data interacted between the participating node and the service node, and meanwhile, encryption processing is carried out, so that the participating node cannot acquire the characteristic data of the actual sample of the service node through data interaction, the data security of the sample is effectively guaranteed while the clustering method effect is guaranteed, and the confidentiality of the sample data is strengthened.

In order to better understand the above embodiment, the process of acquiring the first difference matrix proposed therein may be further understood with reference to fig. 2, as shown in fig. 2, fig. 2 is a schematic flow chart of a clustering method according to an embodiment of the present application, where the method includes:

s201, repeating the column direction of the cluster vector of the current iteration according to the number of samples, and constructing a cluster matrix of the current iteration.

The service node constructs a first difference matrix based on the difference from each first sample to the cluster center, the first sample on the service node is set to have t-dimensional characteristics, and the second sample on the participating node has s-dimensional characteristics.

In the embodiment of the present application, if the number of samples is set to m, a 1 × m cluster vector a of the ith class cluster of the current round of iteration is generated_iFrom a cluster vector A_iRepeating the iteration m times according to the column direction, and expanding to generate an m multiplied by m cluster matrix of the current iteration

Wherein the cluster vector A_iIs a 1 x m row vector.

S202, obtaining a multiplication of the cluster matrix and the characteristic space matrix of the service node to obtain a multiplication matrix, and subtracting the characteristic space matrix from the multiplication matrix to generate a first difference matrix corresponding to the current iteration.

Setting, forming a characteristic space matrix G of the first sample on the service node by characteristic information of the first sample, multiplying the cluster matrix by the characteristic space matrix of the service node to obtain a multiplication matrix

Will specially beSubtracting the multiplication matrix from the eigenspace matrix to obtain a first difference matrix GD corresponding to the current iteration_iThe formula is as follows:

according to the clustering method, the service node can acquire the first difference matrix corresponding to the current iteration based on the characteristic space matrix of the service node and the cluster vector corresponding to the cluster.

To better understand the above embodiments, further, regarding the generation process of the cluster vector proposed therein, with reference to fig. 3, fig. 3 is a schematic flow chart of a clustering method according to another embodiment of the present application, as shown in fig. 3, the method includes:

s301, determining the position of the target first sample in the cluster vector according to the serial number, and encoding the vector element at the position into a first encoding value.

In the embodiment of the application, each first sample has a unique non-repeated number, and the numbers exist in sequence. The position of the target first sample in the cluster vector may be determined according to the number, wherein a vector element corresponding to the position of the target first sample in the cluster vector may be encoded as the first encoded value.

Optionally, for the first iteration, the target first sample may be encoded using a 0, 1 encoding rule, and the vector element of the target first sample at the corresponding position in the cluster vector is encoded as 1. For example, if the sequence number of the target first sample in the cluster vector is set to 5, the vector element at the position of the 5 th sequence in the cluster vector is encoded to 1.

Optionally, for the non-primary iteration, the target first sample may be encoded using a 0, 1/z encoding rule, and the vector element of the target first sample at the corresponding position in the cluster vector is encoded as 1/z. The participating node may obtain, for any cluster, the number of target first samples belonging to any cluster, and determine the first coding value according to the number of target first samples. In implementation, a counting program may be set in a participating node, and for any cluster, the program may be executed to acquire the number z of target first samples in the cluster, so as to determine that the first code value is 1/z.

For example, if the sequence number of the first target sample in the cluster vector is set to 5 and the number of the first target samples in the cluster is 100, the vector element at the position of the 5 th sequence in the cluster vector is encoded to 1/100.

And S302, encoding the vector elements at the residual positions into second encoded values, wherein the vector elements at the residual positions correspond to the first samples which do not belong to the class cluster.

In this embodiment of the present application, after determining the position of the vector element corresponding to the target first sample in the cluster vector, all the vector elements except the position are encoded into the second encoded value.

Still taking the above example as an example, the vector elements other than the 5 th vector element are all encoded as 0.

Optionally, for the first iteration, the target first sample is encoded by using a 0, 1 encoding rule, and vector elements at positions other than the vector element corresponding to the target first sample in the cluster vector are all encoded to be 0. For example, if the sequence number of the target first sample in the cluster vector is set to 5, then the vector element at the 5 th position in the cluster vector is coded to 1, the vector elements at other positions are all coded to 0, and the obtained cluster vector is the 1 st position in the cluster vector<A₁>＝(<0>，<0>，<0>，<0>，<1>，…，<0>)。

Optionally, for non-primary iterations, the same is done for the targetThe encoding is carried out by using a 0, 1/z encoding rule, and vector elements at positions except the vector element corresponding to the target first sample in the cluster vector are all encoded into 0. For example, if the sequence number of the target first sample in the cluster vector is set to 5, the vector element at the 5 th position in the cluster vector is coded to 1/z and the vector elements at other positions are all coded to 0 by using a 0, 1/z coding rule. For example, the sequence number of the target first sample in the cluster vector is set to 5, the number of the target first samples is set to 100, the vector element at the 5 th position in the cluster vector is encoded to 1/100, the vector elements at other positions are all encoded to 0, and the obtained cluster vector is set to 1/100<A₁>＝(<0>，<0>，<0>，<0>，<1/ 100，...，0。

The cluster vector comprises a first sample, a second sample and a third sample, wherein the first sample is a target sample, the second sample is a cluster of a class, and the third sample is a cluster of a class.

For example, if the serial number of the cluster vector of the first cluster class is 3, the corresponding encrypted cluster vector is<A₁>＝(<0>，<0>，<1>，...，<0>) Wherein the cluster vector A₁Characterized by the cluster center of the first cluster class, the vector is coded as 1 for only the sequentially numbered 3 samples selected as the center point and 0 for the remaining samples. Cluster vector a of the first cluster class₁Generated after being encrypted by a public key p<A₁>Similarly, it can be seen that the cluster vectors of the second cluster, the third cluster, and so on can be generated by encrypting with the public key p<A₂>、<A₃>And so on, the service node can obtain k encrypted cluster vectors<A₁>，<A₂>，<A₃>，...，<A_k>And sending it to the participating nodes.

The clustering method provided by the application limits the generation process of the cluster vector, and encodes the cluster vector based on different rules aiming at the primary iteration and the non-primary iteration, so that the effective encryption of the cluster vector is realized, and the safety of sample data is ensured.

In the above embodiment, the proposed process for determining the initial cluster vector of the first sample can be further understood with reference to fig. 4, where fig. 4 is a schematic flow chart of a clustering method according to an embodiment of the present application, and as shown in fig. 4, the method includes:

s401, randomly setting a number of first samples during initial iteration, and taking each first sample in the set number of first samples as an initial cluster center, wherein one initial cluster center corresponds to one initial class cluster.

S402, determining the first sample as the initial cluster center as the target first sample of the initial cluster.

In the embodiment of the application, for the first iteration, a certain number of initial cluster centers need to be set for a service node, and a random initial cluster is generated based on the set initial cluster centers.

The service node may randomly determine a certain number of first samples as initial cluster centers in the first samples, where each initial cluster center corresponds to one initial class cluster.

In the embodiment of the present application, in the first iteration, the first sample selected as the initial cluster center is used as the target first sample of the initial class cluster.

S403, generating a cluster vector of the initial cluster by the number of the target first sample of the initial cluster.

The target first sample in the initial class cluster has a unique non-repeated number, the numbers have a sequence, and the service node generates a cluster vector corresponding to the initial class cluster according to the number in the target first sample of the initial class cluster.

For the first iteration, a 0, 1 coding rule may be used to code the cluster vector, and the service node generates the cluster vector corresponding to the initial cluster based on the cluster vector generation method provided in the above embodiment.

S404, starting from the initial cluster, encrypting the cluster vector of each iterated cluster and sending the encrypted cluster vector to the participating nodes, wherein the cluster vector is used for representing the cluster center of the cluster.

Step S404 may refer to the related details mentioned above, and will not be described herein again.

S405, a first difference matrix from a first sample of a service node corresponding to each class cluster to a cluster center and a second difference matrix from a second sample of a corresponding participating node to the cluster center are obtained.

Further, for the first iteration, the obtaining process of the second difference matrix may be understood in conjunction with fig. 5, as shown in fig. 5, including:

s501, receiving an encrypted second difference matrix corresponding to the first iteration sent by the participating node, and decrypting the encrypted second difference matrix to obtain a second difference matrix, wherein the encrypted second difference matrix is determined by the participating node based on the own encrypted feature space matrix and the encrypted initial cluster vector.

In the embodiment of the application, the service node may receive the encrypted second difference matrix corresponding to the initial iteration sent by the participating node, and perform decryption operation on the encrypted second difference matrix by using the private key p', so as to obtain the decrypted second difference matrix.

Setting, by the service node, the encrypted second difference matrix received by the service node, using the private key p' pair<HD_i>Decrypting to obtain the decrypted second difference matrix HD_iThe formula is as follows:

further, for a non-primary iteration, the participating node generates a difference matrix between the second sample and the corresponding cluster center as an updated difference matrix, and the service node further acquires the second difference matrix of the current iteration based on the acquired updated difference matrix, where the process may be understood with reference to fig. 6, and includes, as shown in fig. 6:

s601, receiving an encryption updating difference matrix corresponding to the current iteration sent by the participating node, wherein the encryption updating difference matrix is used for representing the distance between the cluster center iterated last time and the cluster center iterated next time, and is generated by the participating node based on the characteristic space matrix of the participating node, the encrypted cluster vector iterated last time and the encrypted cluster vector iterated next time.

In order to realize the updating of the cluster center based on the previous iteration in the current iteration, the participating nodes need to generate an updating difference matrix, and the updating difference matrix can represent the distance between the cluster center of the previous iteration and the cluster center of the current iteration.

Further, after acquiring the update vector sent by the service node, the participating node may generate an update difference matrix based on the update vector passing the security check. And the participating nodes determine the number of samples of the current iteration, repeat the encryption cluster vector of the previous iteration and the encryption cluster vector of the current iteration in the column direction for a certain number of times, and construct a corresponding encryption cluster matrix, wherein the number of the repetition times is the same as the number of the samples. And multiplying the two calculated encrypted coarse matrixes respectively with a characteristic space matrix of a participating node, respectively obtaining a multiplication matrix corresponding to the encrypted cluster vector of the previous iteration and a multiplication matrix corresponding to the encrypted cluster vector of the current iteration, and subtracting the two multiplication matrixes to obtain an update difference matrix corresponding to the current iteration. And encrypting the obtained updating difference matrix by using a public key, and sending the obtained updating difference matrix to the service node.

S602, the encrypted updating difference matrix is decrypted to obtain an updating difference matrix.

In the embodiment of the application, the service node receives the encrypted update difference matrix, and performs decryption operation on the encrypted update difference matrix, so as to obtain the decrypted update difference matrix.

Setting, participating node to update difference matrix after encryption<hD_i>Synchronizing to the service node, decrypting by the service node by using the private key p', and acquiring the decrypted update difference matrix hD_iThe formula is as follows:

s603, adding the second difference matrix corresponding to the previous iteration and the updated difference matrix of the current iteration to obtain the second difference matrix of the current iteration.

In the embodiment of the present application, it is set that the second difference matrix of the current iteration is HD'_iThe second difference matrix of the previous iteration is HD_iThen the formula is as follows:

and then a second difference matrix corresponding to the current iteration is obtained.

Further, the first sample of the service node is separated from the updated difference matrix GD 'of the cluster center'_iIt can be obtained by the following formula:

updating a difference matrix GD 'by obtaining a cluster center of a first sample corresponding to a current round iteration'_iAnd a corresponding second difference matrix HD_iAnd updating the cluster center to further complete the updating of the cluster.

S406, updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, using each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and continuing the iteration until a final target cluster is generated.

Step S406 may refer to the related details mentioned above, and will not be described herein again.

The clustering method provided by the application limits the generation process of the cluster vector of the initial cluster during initial iteration, encrypts the cluster vector and sends the encrypted cluster vector to the participating nodes. And limiting the acquisition process of a second difference matrix aiming at the first iteration and the non-first iteration, updating a cluster vector based on the acquired second difference matrix and the acquired first difference matrix and the first difference matrix, re-clustering the first sample by using the updated cluster vector, taking the updated cluster as an initial cluster of the next iteration, continuing the iteration until all iterations are finished, and generating a final target cluster. In the method, the number marks are used for sample data interacted between the participating nodes and the service nodes, and meanwhile encryption processing is carried out, so that the participating nodes cannot acquire the characteristic data of the actual sample of the service node through data interaction, the data safety of the sample is effectively guaranteed while the clustering method effect is guaranteed, and the confidentiality of the sample data is strengthened.

In order to ensure data security of the service node and the participating node in the data interaction process, the service node needs to assist the participating node in performing security check on the received encrypted vector sent by the service node, as shown in fig. 7, fig. 7 is a schematic flow chart of a clustering method according to an embodiment of the present application, where the clustering method is applied to the service node, and includes:

in the implementation, the clustering method is to generate the final target cluster based on the interaction between the cluster vectors of the service nodes and the participating nodes and the updated difference matrix, so that the confidentiality and the security of the data held by the service nodes and the participating nodes need to be guaranteed. Because the cluster vector as the basic data is generated by the service node and sent to the participating nodes, certain possibility exists, so that the service node can have a chance to acquire the spatial feature data of the participating nodes, which needs to be kept secret, through irregular operation.

Specifically, if the service node will cluster vector A_i(i ═ 1, 2.. times, k), replacing the zero vector, carrying out homomorphic encryption and sending the homomorphic encrypted zero vector to the participating node, wherein the participating node cannot judge the zero vector, and then the service node can realize illegal acquisition of the feature space data of the participating node according to a formula, wherein the formula is as follows:

for the ith cluster containing z sample points, an updated cluster vector b is constructed_iAnd encrypting the cluster vector by using a public key p to obtain an encrypted cluster vector<b_i>Wherein the encrypted cluster vector b_iConstructed based on the sample number and encoded using the 0, 1/z rule. For example, in the 3 rd cluster class, the encrypted cluster vector b₃Can be that

Wherein,

the sample point corresponding to the coded vector element belongs to the 3 rd class cluster, and the sample point corresponding to the vector element coded as 0 does not belong to the 3 rd class cluster, so that the encrypted cluster vector b is known₃In (3), sample points numbered 1, 3, and 6 … … belong to the 3 rd cluster.

Based on the method, k encrypted cluster vectors can be obtained<b_i>。

If the encrypted cluster vector is to be encrypted<b_i>And if the value is replaced by 0, the service node can also acquire the feature space data of the participating node according to a formula, wherein the formula is as follows:

therefore, in each iteration process, the participating node needs to perform security check on the cluster vector sent by the service node to determine whether the cluster vector is a zero vector. In order to achieve cooperation between the service node and the participating nodes, the service node needs to assist the participating nodes in performing security check on the received data sent by the participating nodes. The specific method applicable to the service node comprises the following steps:

s701, receiving the encrypted verification vector sent by the participating node, and decrypting the encrypted verification vector based on the private key to obtain a decrypted verification vector, wherein the encrypted verification vector is generated according to the encrypted cluster vector.

Zero knowledge proof, it is understood that proof enables a verifier to determine that a certain conclusion is correct without providing the verifier with any useful information. In the embodiment of the application, a zero-knowledge proof method can be used, and the security verification of the participating nodes on the acquired data is realized through the zero-knowledge proof method under the condition that the service nodes do not send important key information to the participating nodes.

And the service node acquires a verification vector which is sent by the participating node and generated according to the encrypted cluster vector, and decrypts the verification vector according to the private key p' to acquire the decrypted verification vector.

For example, the participating node needs to detect the security of the obtained cluster vector, the encryption method of the cluster vector has the characteristic of being checked, when the service node sends an unreal cluster vector to the participating node, the participating node needs to check the authenticity of the encrypted cluster vector, so as to ensure the security of the feature space of the participating node model on the premise of the data security and confidentiality of the service node. The service node needs to prove to the participating nodes that the sum of the vector elements contained in the cluster vector it sends is 1. Setting the service node to obtain a verification vector T' based on the obtained encrypted verification vector sent by the participating node as < T >, wherein the formula is as follows:

T′＝Decrypt₁(<T>) (11)

s702, sending the decryption verification vector to the participating node for security verification.

In the embodiment of the application, the verification vector T' obtained after decryption is sent to the participating node, so that the participating node can complete security verification.

The clustering method provided by the application shows the process of security verification of the encrypted data sent by the service node by the participating node, effectively ensures the security of the characteristic space data of the participating node while the characteristic data of the service node sample can be effectively kept secret, and better promotes the achievement of cooperation of the service node and the participating node.

The feature space matrix proposed in the foregoing embodiment is a basis on which a clustering method can be implemented, and a generation process of the feature space matrix can be further understood in conjunction with fig. 8, as shown in fig. 8, fig. 8 is a schematic flow diagram of the clustering method according to an embodiment of the present application, where the method includes:

and S801, aligning samples with the participating nodes based on the identification information of the samples, and numbering the aligned first samples in sequence.

In order to realize effective interaction between the service node and the data of the participating nodes, the samples of the service node need to be aligned based on the participating nodes, and the first samples to which the aligned service node belongs are numbered in sequence.

S802, generating a characteristic space matrix of the service node based on the aligned first sample.

In the embodiment of the application, after the first samples of the service nodes are aligned and numbered, the feature space of the service nodes is standardized, and then a corresponding feature space matrix is generated.

The clustering method provided by the application limits the generation process of the alignment number of the service node samples and the characteristic space matrix, effectively ensures the normal execution of the subsequent operation of the clustering method, and ensures the implementability of the clustering method.

In the clustering method provided in the foregoing embodiment, the interaction between the participating node and the service node needs to be encrypted and decrypted by using a key, and further, with reference to fig. 9, fig. 9 is a schematic flow chart of the clustering method according to an embodiment of the present application, and as shown in fig. 9, the method includes:

and S901, generating an encryption key, wherein the encryption key comprises a public key and a private key, and sending the public key to the participating node.

In the embodiment of the present application, in order to ensure the security and confidentiality of data sent by a service node, the sent data needs to be encrypted. An encryption key may be generated, the encryption key including a public key and a private key.

The public key is used for encryption and can be disclosed to the participating nodes, and the private key is used for decryption and can be held only by the service node. The public key and the private key are mutually corresponding and unique, and the data encrypted by the public key can be decrypted only by using the corresponding private key.

After the service node generates the public key and the private key, the public key can be sent to the participating node, and the participating node encrypts the generated data by using the public key sent by the service node.

According to the clustering algorithm, the service node can generate the encryption key and send the public key to the participating node, and the encryption of the interactive data is realized between the service node and the participating node based on the public key, so that the sample data can be effectively kept secret, and the guarantee of the sample data safety is strengthened.

Corresponding to the clustering method applied to the service node provided in the foregoing embodiment, the present application also provides a clustering method applied to the participating node, as shown in fig. 10, where fig. 10 is a schematic flow diagram of the clustering method according to another embodiment of the present application, and the method includes:

s1001, receiving an encrypted cluster vector of each class cluster sent by each iteration of the service node, where the cluster vector is determined by the service node based on a number of a first target sample belonging to the class cluster, and the first target sample belongs to a sample in the first sample of the service node.

In the embodiment of the present application, based on the method provided in the above embodiment, in each round of iteration, the service node generates a cluster vector corresponding to each class cluster, and sends the cluster vector to the participating node after encryption, and the participating node can receive the encrypted cluster vector.

Wherein, for the first iteration, the encrypted cluster vector is determined by the service node based on the number of the target first sample of the initial class cluster. For the specific process, reference may be made to the above-mentioned related details, which are not described herein again.

S1002, aiming at each class cluster, acquiring an encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector of the class cluster, and sending the encrypted target matrix to the service node until iteration is finished.

In the embodiment of the application, after acquiring the encrypted cluster vectors, the participating nodes generate corresponding target matrixes based on each cluster, encrypt the target matrixes and send the target matrixes to the service nodes.

For the first iteration, the target matrix is a second difference matrix of each second sample distance cluster center; and aiming at non-primary iteration, the target matrix is an update difference matrix from the cluster center of the previous iteration to the cluster center of the current iteration.

According to the clustering method, the participating nodes generate the target matrix based on the obtained encrypted cluster vectors, and send the target matrix to the service nodes after encryption, so that the service nodes can combine with the first difference matrix generated by the service nodes to update the cluster centers, and further update the cluster-like nodes.

For the first iteration, the encryption target matrix generated by the participating node is the corresponding second difference matrix, where a generation method of the second difference matrix can be further understood with reference to fig. 11, and the method includes:

s1101, repeating the encrypted cluster vectors of the initial cluster type in the column direction according to the sample number, and constructing a cluster matrix of the initial cluster type.

In the embodiment of the application, the number of the sample number is determined, and the column direction repetition is performed on the cluster vector of the initial cluster based on the same number times, so that the coarse matrix of the initial cluster is generated.

For example, the number of samples is set to m, the service node generates k encrypted cluster vectors of 1 × m, and the participating node needs to obtain a second difference matrix corresponding to the ith cluster vector. After the participating nodes obtain the encrypted k 1 × m cluster vectors, repeating the ith cluster vector m times in the column direction by the 1 × m row vector, and expanding to generate an m × m matrix

A cluster matrix of the initial cluster class is generated.

And S1102, multiplying the cluster matrix of the initial cluster and the characteristic space matrix of the participating nodes to obtain a multiplication matrix, and subtracting the multiplication matrix from the encrypted characteristic space matrix to generate an encryption difference matrix corresponding to the initial iteration.

The embodiment of the application limits the calculation process of generating the second difference matrix by the nodes aiming at the first iteration.

Setting, participating in node obtaining initial cluster matrix

And encrypting the characteristic spatial matrix H by using a public key p to obtain an encrypted spatial matrix<H>. And calculating according to a set formula to further obtain a corresponding second difference matrix, wherein the formula is as follows:

wherein H_ijIs the value in the j dimension of the ith sample point of the participating node.

The second difference matrix obtained from the first iteration is encrypted and then synchronized to the service node, and the result is HD_i(i＝1，2，...，k)。

The clustering method provided by the application shows a generation process of the second difference matrix aiming at the first iteration, so that the service node can realize the updating of the cluster center and the cluster type based on the first difference matrix generated by the service node and the acquired second difference matrix, and further, the data can obtain effective security guarantee.

For the non-primary iteration, the target matrix generated by the participating node according to the embodiment is an updated difference matrix between the cluster center of the current iteration and the cluster center of the previous iteration, where a generation method of the updated difference matrix can be further understood with reference to fig. 12, and the method includes:

and S1201, respectively repeating the encrypted cluster vector corresponding to the last iteration and the encrypted cluster vector corresponding to the current iteration in the column direction according to the number of samples, and constructing two encrypted cluster matrixes.

In the embodiment of the application, the number of samples is set to m, and the service node updates k update vectors of the previous iteration<b_i>Sending the update vector to a participating node, and acquiring the update vector by the participating node<b_i>Then, will<b_i>Repeating m times according to the column direction, and expanding to generate m-bookm cluster matrix

Using public key p to obtain cluster matrix

Encrypting to obtain the encrypted cluster matrix

The m × m cluster matrix generated by the cluster vector expansion of the previous iteration is

Encrypting the data by using the public key p to obtain an encrypted cluster matrix of the previous iteration

S1202, the two encryption cluster matrixes are obtained and multiplied with the characteristic space matrixes of the participating nodes respectively to obtain multiplication matrixes.

The encrypted cluster matrix of the cluster vector of the previous iteration is

Encrypted cluster matrix of cluster vectors iterated with current round

Multiplying the characteristic space matrixes H of the participated nodes respectively to obtain multiplication matrixes

And

and S1203, subtracting the multiplication matrix corresponding to the previous iteration from the multiplication matrix corresponding to the current iteration to obtain an encryption updating difference matrix corresponding to the current iteration.

According to a formula, the obtained previous roundMultiplication matrix corresponding to iterated cluster vectors

Multiplication matrix corresponding to cluster vector of current round iteration

Subtracting, the encrypted updated difference matrix corresponding to the current round iteration can be obtained<hD_i>Wherein:

and sending the obtained encrypted updating difference matrix to the service node until the iteration is finished.

The clustering method provided by the application shows a generation process of the update difference matrix aiming at non-primary iteration, so that a service node can generate a second difference matrix corresponding to current iteration based on a second difference matrix obtained in the last iteration in combination with the obtained update difference matrix, and meanwhile, the updating of a cluster center and a cluster type is realized in combination with a first difference matrix generated by the service node, and further, effective safety guarantee can be obtained for data.

In order to ensure the security of the feature space data of the participating nodes, after acquiring the encrypted cluster vector of each class cluster sent by the service node in each iteration, security check needs to be performed on the acquired encrypted cluster vector, as shown in fig. 13, fig. 13 is a schematic flow diagram of a clustering method according to another embodiment of the present application, and the method includes:

in the implementation, the safety check of the cluster vector sent by the service node can be realized by using the idea of zero knowledge proof. The service node needs to inform the participating nodes of the cluster vector a it sends_iThe sum of the dimensions (i ═ 1, 2.., k) is 1.

Optionally, only one vector element dimension of the cluster vector sent by the service node is 1, and the other vector element dimensions are 0, so that the cluster vector is an effective available cluster vector, and can be used for data interaction to implement effective aggregation of the class clusters.

Optionally, the cluster vector sent by the service node may have a plurality of vector elements with numerical values, and the sum of all numerical values is 1, in which case, the cluster vector sent by the service node is invalid and unavailable, but does not pose a threat to the security of the feature space data of the participating nodes.

Therefore, in the framework of zero knowledge proof, a service node only needs to inform a participating node that the sum of vector element values of each dimension of a cluster vector sent by the service node is 1, the participating node only needs to verify the sum, and determines whether the cluster vector sent by the service node is a zero vector, so as to realize security check of the participating node on the cluster vector and an update vector, taking the security check of the cluster vector as an example, and further understanding is performed by combining fig. 13, and the method includes:

and S1301, adding each dimension of the encrypted cluster vector to obtain verification data.

In the embodiment of the application, the encrypted cluster vector acquired by the participating node is set as<A_i>Will be<A_i>May be added to obtain verification data<R_i>The formula is as follows:

wherein A is_ijIs the ith encrypted cluster vector A_iThe value in the j-th dimension.

S1302, based on the verification data, the safety verification of the service node is verified.

In the embodiment of the application, the participating nodes can realize the safety verification of the service nodes based on the verification data based on a zero-knowledge proof verification method.

Further, a specific method of security verification of a participating node may be understood in conjunction with fig. 14, as shown in fig. 14, the method comprising:

s1401, randomly generates a first column vector and a second column vector, and encrypts the second column vector using a public key.

S1402, based on the verification data, performs affine transformation on the first column vector and the encrypted second column vector to generate an encrypted verification vector.

S1403, the method sends the encrypted verification vector to the service node, and receives the decrypted verification vector returned by the service node.

And S1404, in response to the consistency of the encryption verification vector and the decryption verification vector, determining that the service node passes the security verification.

For example, taking security verification of encrypted cluster vectors as an example, verification data acquired by participating nodes<R_i>The participating nodes can randomly generate two groups of column vectors, namely a first column vector a and a second column vector b, and the number of samples is set to be m, so that the first column vector a and the second column vector b of m multiplied by 1 can be generated according to the number of samples, and the public key p is used for encrypting the second column vector b to obtain the encrypted second column vector<b>For the first column vector a and the encrypted second column vector<b>Performing affine transformation to generate an encrypted authentication vector<T>And sending to the service node, wherein the verification vector is encrypted<T>The operation formula of (1) is as follows:

<T>＝<R>a+<b> (16)

and the service node decrypts the acquired encryption verification vector < T > sent by the participating node to acquire a decryption verification vector T ', and sends the decryption verification vector T' to the participating node.

In the implementation, when the encryption cluster vector sent by the service node to the participating node is correct and effective, the encryption cluster vector is sent to the participating node<R_i>＝<1>The participating nodes can obtain the encryption verification vector T through calculation by setting a formula, wherein the formula is as follows:

T＝a+b (17)

if the encryption verification vector T is a column vector of m × 1, the participating node can determine whether the encryption cluster vector sent by the service node can pass the security verification by comparing each dimension of the decryption verification vector T' with each dimension of the encryption verification vector T.

Optionally, if the vector elements of each dimension of the decryption verification vector T' and the encryption verification vector T are consistent, passing the security verification; optionally, if the vector elements of each dimension of the decryption verification vector T' and the encryption verification vector T are inconsistent, it may be determined that the encryption cluster vector sent by the service node is invalid and fails the security verification.

Similarly, the vector is updated<b_i>The same approach can be taken for security verification.

To better understand the security verification proposed by the above embodiments, fig. 15 can be combined with fig. 15, where fig. 15 shows an interactive process of the participating node and the service node jointly completing the security verification, where:

and S151, adding vector element values of all dimensions of the obtained encrypted vector, randomly generating a first column vector and a second column vector, carrying out affine transformation, and generating an encryption verification vector.

S152, the encryption verification vector is sent.

S153, decrypting the encrypted verification vector and sending the decrypted verification vector to the participating nodes.

And S154, sending the decryption verification vector.

And S155, performing consistency comparison of element values of the dimensional vectors on the encrypted verification vector and the decrypted verification vector, and further completing security verification.

And the service participating node adds the vector element values of all dimensions of the obtained encrypted vector, generates a first column vector and a second column vector at random for affine transformation, obtains an encrypted verification vector and sends the encrypted verification vector to the service node, the service node decrypts the encrypted verification vector and then returns the encrypted verification vector to the participating node, and meanwhile, the participating node compares the decrypted verification vector with the encrypted verification vector according to the obtained decrypted verification vector and completes security verification according to the consistency of the vector element values in each dimension.

According to the clustering method provided by the application, the participating node performs security check on the acquired encrypted data sent by the service node, and whether the encrypted data sent by the service node is correct and safe is further judged according to the comparison result of the encryption verification vector and the decryption verification vector. In the application, the verification mode only needs the service node to decrypt the encrypted verification vector and then synchronize the decrypted verification vector to the participating node, effective information does not need to be sent to the participating node, the participating node can verify the safety of the data sent by the service node according to the comparison result of the encrypted verification vector and the decrypted verification vector, and the safety and the confidentiality of the data are guaranteed.

In order to better implement the clustering method proposed in the above embodiment, the samples need to be further processed, as shown in fig. 16, fig. 16 is a schematic flow chart of the clustering method according to another embodiment of the present application, and the method is applicable to a participating node, and includes:

and S1601, performing sample alignment with the service node based on the identification information of the sample, and sequentially numbering second samples of the aligned participating nodes.

In the embodiment of the application, in order to implement data interaction between a participating node and a service node, the participating node needs to perform sample alignment with the service node, and randomly and sequentially number second samples belonging to the participating node.

And S1602, generating a feature space matrix of the participating node based on the aligned second sample.

In the embodiment of the application, after the participating nodes are aligned and numbered based on the second sample, the feature space is standardized, and then a corresponding feature space matrix is generated.

The clustering method provided by the application limits the generation process of the alignment numbers of the participated node samples and the characteristic space matrix, effectively ensures the normal execution of the subsequent operation of the clustering method, and ensures the implementability of the clustering method.

Fig. 17 is a schematic flow chart of a clustering method according to another embodiment of the present application, and as shown in fig. 17, the method is applicable to a participating node, and includes:

s1701 receives the public key transmitted by the service node.

The participating node needs to receive a public key in an encryption key sent by the service node to encrypt data such as vectors and matrixes returned to the service node.

The clustering algorithm provided by the application realizes the encryption of the data of the participating nodes by using the public key, so that the data of the participating nodes can be encrypted, the security of sample data is ensured, and the confidentiality of the sample data is enhanced.

To better understand the clustering method proposed in the above embodiment, fig. 18 may be combined with fig. 18, where fig. 18 shows a process of implementing clustering by data interaction between a participating node and a service node, where:

and S181, aligning the samples, standardizing the feature space, and randomly numbering the samples in sequence.

At S182, a key is generated, which includes a public key and a private key.

And S183, sending the public key.

S184, receiving the public key.

And S185, performing initial iteration to generate an encrypted cluster vector.

And S186, sending the encrypted cluster vector.

S187 performs security verification on the encrypted cluster vector, generates a second difference matrix based on the cluster vector that passes the security verification, and encrypts the second difference matrix.

And S188, sending the second difference matrix.

S189, decrypting to obtain a second difference matrix, calculating to obtain a first difference matrix, updating a cluster center and a cluster class based on the first difference matrix and the second difference matrix, generating an update vector, and encrypting.

S1810, the update vector is transmitted.

S1811, carrying out security check on the updating vector, calculating to obtain an updating difference matrix, and encrypting.

S1812, sending the updated difference matrix.

S1813, decrypting the updated difference matrix, updating the cluster center, updating the cluster based on the updated cluster center, and repeating the steps until iteration stops.

And the service node and the participating node carry out sample alignment, and a first sample and a second sample are determined, wherein the first sample and the second sample have unique non-repeated serial numbers in sequence. The service node generates an encryption key comprising a public key and a private key, and sends an openable public key for encryption to the participating node.

Aiming at the first iteration, based on the number of the first sample, the service node randomly determines an initial cluster center and an initial cluster from the first sample, further generates a cluster vector corresponding to the initial cluster, sends the encrypted cluster vector to the participating node, the participating node obtains the encrypted cluster vector, performs security check on the cluster vector, generates a second difference matrix based on the encrypted cluster vector passing the security verification, sends the second difference matrix to the service node after encryption, decrypts the encrypted second difference matrix by the service node, obtains a second difference matrix, updates the cluster center based on the first difference matrix and the second difference matrix, generates a new cluster, uses the updated cluster center and the updated cluster as the initial cluster center and the initial cluster of the next iteration, generates an updated vector for the next iteration, and sends the updated vector to the participating node after encryption.

And aiming at non-primary iteration, the participating node acquires the encrypted update vector, performs security verification, calculates and generates an update difference matrix based on the update vector passing the security verification, and transmits the update difference matrix to the service node after encryption. And the service node performs operation based on the obtained updating difference matrix and a second difference matrix corresponding to the previous iteration, further obtains a second difference matrix corresponding to the current iteration, obtains a first difference matrix corresponding to the current iteration through operation, updates the cluster center and the cluster class based on the first difference matrix and the second difference matrix corresponding to the current iteration, generates an updating vector for the next iteration, and sends the updating vector to the participating node after encryption.

And judging whether an iteration stop condition is met or not according to the class cluster updated in each iteration, if not, continuing the steps until the iteration is stopped, and finally generating a final target class cluster.

According to the clustering method, the sample characteristic data of the service node can be effectively kept secret by encrypting the interactive data between the participating nodes and the service node, the space characteristic data of the participating nodes can be guaranteed by safety verification, the safety of the data can be effectively guaranteed on the premise that the clustering method is realized, and the data confidentiality is enhanced.

Corresponding to the clustering methods proposed in the above embodiments, an embodiment of the present application also proposes a clustering method apparatus, and since the clustering method apparatus proposed in the embodiment of the present application corresponds to the clustering methods proposed in the above embodiments, the implementation of the clustering method is also applicable to the clustering method apparatus proposed in the embodiment of the present application, and will not be described in detail in the following embodiments.

Fig. 19 is a schematic structural diagram of a clustering apparatus according to an embodiment of the present application, where the apparatus is suitable for a service node, and as shown in fig. 19, the clustering apparatus 100 includes a cluster vector generation module 11, a difference matrix acquisition module 12, and an update module 13, where:

the cluster vector generation module 11 is configured to generate a cluster vector of each class cluster based on the number of the first target sample belonging to the class cluster, encrypt the cluster vector, and send the encrypted cluster vector to a participating node, where the cluster vector is used to represent a cluster center of the class cluster;

a difference matrix obtaining module 12, configured to obtain a first difference matrix from a first sample of a service node corresponding to each class cluster to a cluster center, and a second difference matrix from a second sample of a participating node to the cluster center;

and the updating module 13 is configured to update a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-cluster the first sample with the updated cluster center, use each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and return to execute the above steps until the iteration is finished to generate a final target cluster.

The clustering device generates cluster vectors of clusters based on the serial numbers of first target samples, encrypts the cluster vectors and sends the encrypted cluster vectors to participating nodes, obtains a second difference matrix generated by the participating nodes and a first difference matrix generated by service nodes, updates the cluster vectors based on the first difference matrix and the second difference matrix, re-clusters the first samples by the updated cluster vectors, uses the updated clusters as initial clusters of the next iteration, continues iteration until all iterations are finished, and generates final target clusters.

Fig. 20 is a schematic structural diagram of a clustering apparatus according to another embodiment of the present application, the apparatus is suitable for a service node, as shown in fig. 20, the clustering apparatus 200 includes a cluster vector generation module 21, a difference matrix acquisition module 22, an update module 23, a cluster center selection module 24, a sample determination module 25, a verification module 26, a numbering module 27, a matrix generation module 28, and a key generation module 29, where:

it should be noted that the cluster vector generation module 11, the difference matrix acquisition module 12, and the update module 13 have the same structure and function as the cluster vector generation module 21, the difference matrix acquisition module 22, and the update module 23.

In this embodiment of the application, the difference matrix obtaining module 22 is configured to:

repeating the column direction of the cluster vector of the current iteration according to the number of samples to construct a cluster matrix of the current iteration;

and multiplying the obtained cluster matrix and the characteristic space matrix of the service node to obtain a multiplication matrix, and subtracting the characteristic space matrix from the multiplication matrix to generate a first difference matrix corresponding to the current iteration.

In this embodiment of the application, the difference matrix obtaining module 22, for the first iteration, is further configured to:

and receiving an encrypted second difference matrix corresponding to the first iteration sent by the participating node, and decrypting the encrypted second difference matrix to obtain a second difference matrix, wherein the encrypted second difference matrix is determined by the participating node based on the own encrypted characteristic space matrix and the encrypted initial cluster vector.

In this embodiment of the application, the difference matrix obtaining module 22, for a non-primary iteration, is further configured to:

receiving an encryption updating difference matrix which is sent by a participating node and corresponds to current iteration, wherein the encryption updating difference matrix is used for representing the distance between the cluster center of the last iteration and the cluster center of the current iteration and is generated by the participating node based on a characteristic space matrix of the participating node, the encrypted cluster vector of the last iteration and the encrypted cluster vector of the current iteration;

decrypting the encrypted updating difference matrix to obtain an updating difference matrix;

and adding the second difference matrix corresponding to the previous iteration and the updated difference matrix of the current iteration to obtain the second difference matrix of the current iteration.

In the embodiment of the present application, the clustering apparatus 200 further includes a cluster center selecting module 24 and a sample determining module 25, where:

the cluster center selecting module 24 is configured to randomly set a number of first samples during initial iteration, and use each first sample in the set number of first samples as an initial cluster center, where one initial cluster center corresponds to one initial class cluster;

a sample determining module 25, configured to determine a first sample as an initial cluster center as a target first sample of the initial cluster;

the cluster vector generating module 21 is further configured to generate a cluster vector of the initial class cluster according to the number of the target first sample of the initial class cluster.

In this embodiment of the application, the cluster vector generating module 21 further includes:

an encoding unit 211, configured to determine a position of the target first sample in the cluster vector according to the number, and encode vector elements at the position as a first encoded value, and encode vector elements at remaining positions as second encoded values, where the vector elements at the remaining positions correspond to the first samples that do not belong to the class cluster.

In this embodiment, the encoding unit 211 is further configured to: and aiming at any cluster, acquiring the number of target first samples belonging to any cluster, and determining a first coding value according to the number of the target first samples.

In this embodiment of the application, the clustering apparatus 200 further includes a verification module 26, where:

and the verification module 26 is configured to receive the encrypted verification vector sent by the participating node, decrypt the encrypted verification vector based on the private key to obtain a decrypted verification vector, and send the decrypted verification vector to the participating node to perform security verification, where the encrypted verification vector is generated according to the encrypted cluster vector.

In this embodiment of the application, the clustering apparatus 200 further includes a numbering module 27 and a matrix generating module 28, where:

a numbering module 27, configured to perform sample alignment with a participating node based on the identification information of the sample, and sequentially number a first sample after alignment;

and a matrix generating module 28, configured to generate a feature space matrix of the service node based on the aligned first samples.

In this embodiment of the present application, the clustering apparatus 200 further includes a key generation module 29, where:

and a key generation module 29, configured to generate an encryption key, where the encryption key includes a public key and a private key, and send the public key to the participating node.

The clustering device generates cluster vectors of clusters based on the serial numbers of first target samples, encrypts the cluster vectors and sends the encrypted cluster vectors to participating nodes, obtains a second difference matrix generated by the participating nodes and a first difference matrix generated by service nodes, updates the cluster vectors based on the first difference matrix and the second difference matrix, re-clusters the first samples by the updated cluster vectors, uses the updated clusters as initial clusters of the next iteration, continues iteration until all iterations are finished, and generates final target clusters. In the method, the number is used for marking the sample data interacted between the participating node and the service node, and meanwhile, encryption processing is carried out, so that the participating node cannot acquire the characteristic data of the actual sample of the service node through data interaction, the data security of the sample is effectively guaranteed while the clustering method effect is guaranteed, and the confidentiality of the sample data is strengthened.

In order to implement the clustering method proposed by the above embodiment, the present application also proposes a clustering device, and with reference to fig. 21, fig. 21 is a schematic structural diagram of the clustering device according to another embodiment of the present application, as shown in fig. 21, the clustering device 300 includes a cluster vector receiving module 31 and an object matrix generating module 32, where:

a cluster vector receiving module 31, configured to receive an encrypted cluster vector of each class cluster sent by a service node for each iteration, where the cluster vector is determined by the service node based on a number of a first target sample belonging to the class cluster, and the first target sample belongs to a sample in a first sample of the service node;

and the target matrix generation module 32 is configured to, for each class cluster, obtain an encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector of the class cluster, and send the encrypted target matrix to the service node until the iteration is finished.

According to the clustering device, the participating nodes generate corresponding encrypted target matrixes based on the acquired encrypted data sent by the service nodes and send the encrypted target matrixes to the service nodes.

Fig. 22 is a schematic structural diagram of a clustering apparatus according to another embodiment of the present application, the apparatus is suitable for participating in a node, as shown in fig. 22, the clustering apparatus 400 includes a cluster vector receiving module 41, an object matrix generating module 42, a security verifying module 43, a numbering module 44, a matrix generating module 45, and a receiving module 46, where:

it should be noted that the cluster vector receiving module 31 and the target matrix generating module 32 have the same structure and function as the cluster vector receiving module 41 and the target matrix generating module 42.

In this embodiment of the present application, the target matrix generation module 42, during the first iteration, is configured to:

repeating the encrypted cluster vectors of the initial cluster type in the column direction according to the sample number to construct a cluster matrix of the initial cluster type;

and obtaining a multiplication of the cluster matrix of the initial cluster and the characteristic space matrix of the participating nodes to obtain a multiplication matrix, and subtracting the encrypted characteristic space matrix from the multiplication matrix to generate an encrypted difference matrix corresponding to the first iteration.

In this embodiment of the application, the target matrix generation module 42, in the non-primary iteration, is further configured to:

according to the number of samples, respectively repeating the encrypted cluster vector corresponding to the last iteration and the encrypted cluster vector corresponding to the current iteration in the column direction to construct two encrypted cluster matrixes;

acquiring two encryption cluster matrixes, and multiplying the two encryption cluster matrixes by a characteristic space matrix of a participating node respectively to obtain a multiplication matrix;

and subtracting the multiplication matrix corresponding to the previous iteration from the multiplication matrix corresponding to the current iteration to obtain an encryption update difference matrix corresponding to the current iteration.

In this embodiment of the application, in the first iteration, the clustering device 400 determines the corresponding encrypted cluster vector based on the number of the first sample of the target belonging to the initial cluster class.

In this embodiment of the application, the clustering apparatus 400 further includes a security verification module 43, wherein:

and the security verification module 43 is configured to add each dimension of the encrypted cluster vector to obtain verification data, and verify the security verification of the service node based on the verification data.

In this embodiment, the security verification module 43 includes:

a column vector generation unit 431 for randomly generating a first column vector and a second column vector and encrypting the second column vector using a public key;

an authentication vector generation unit 432 configured to perform affine transformation on the first column vector and the encrypted second column vector based on the authentication data to generate an encrypted authentication vector;

a decryption unit 433, configured to send the encrypted verification vector to the service node, and receive a decrypted verification vector returned by the service node;

and the verifying unit 434 is configured to determine that the service node passes the security verification in response to the encrypted verification vector and the decrypted verification vector being identical.

In this embodiment of the application, the clustering apparatus 400 further includes a numbering module 44 and a matrix generating module 45, where:

a numbering module 44, configured to perform sample alignment with the service node based on the identification information of the sample, and sequentially number a second sample of the aligned participating node;

and a feature space matrix generating module 45, configured to generate a feature space matrix of the participating node based on the aligned second sample.

In this embodiment of the application, the clustering apparatus 400 further includes a receiving module 46, wherein:

and the receiving module 46 is configured to receive the public key sent by the service node.

To achieve the above embodiments, the present application also proposes an electronic device, a computer-readable storage medium, and a computer program product.

Fig. 23 shows a schematic block diagram of an example electronic device 2300 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 23, the apparatus 2300 includes a memory 231, a processor 232, and a computer program stored on the memory 231 and executable on the processor 232, and when the processor 232 executes the program instructions, the clustering method proposed by the above-mentioned embodiment is implemented.

The electronic device provided by the embodiment of the application executes the computer program stored in the memory 231 through the processor 232, and the sample characteristic data of the service node can be effectively kept secret through encryption of the interactive data between the participating node and the service node, and the spatial characteristic data of the participating node can be guaranteed through security verification.

A computer-readable storage medium provided in an embodiment of the present application stores thereon a computer program, and when the computer program is executed by the processor 232, the clustering method provided in the above embodiment is implemented.

The computer-readable storage medium of the embodiment of the application stores a computer program and is executed by a processor, the processor 232 executes the computer program stored on the memory 231, the sample characteristic data of the service node can obtain effective confidentiality through encryption of interaction data between the participating node and the service node, the spatial characteristic data of the participating node can obtain security guarantee through security verification, and on the premise of realizing a clustering method, the security of the data can be effectively guaranteed, and the confidentiality of the data is strengthened.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methodologies themselves may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client server relationship to each other. The service end can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service (Virtual Private Server, or VPS for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a sequential list of executable instructions that may be thought of as being useful for implementing logical functions, may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that may fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), etc.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that can be related to instructions of a program, which can be stored in a computer-readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A clustering method, adapted to a service node, the method comprising:

for each class cluster, generating a cluster vector of the class cluster based on the number of a first target sample belonging to the class cluster, encrypting the cluster vector and sending the encrypted cluster vector to a participating node, wherein the cluster vector is used for representing the cluster center of the class cluster;

acquiring a first difference matrix from a first sample of the service node corresponding to each class cluster to a cluster center and a second difference matrix from a second sample of the participating node to the cluster center;

updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, taking each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and returning to execute the steps until the iteration is finished to generate a final target cluster.

2. The method of claim 1, wherein the obtaining of the first difference matrix comprises:

and obtaining a multiplication of the cluster matrix and the characteristic space matrix of the service node to obtain a multiplication matrix, and subtracting the multiplication matrix from the characteristic space matrix to generate the first difference matrix corresponding to the current iteration.

3. The method according to claim 1 or 2, wherein for a first iteration, the obtaining of the second difference matrix comprises:

and receiving an encrypted second difference matrix corresponding to the first iteration sent by the participating node, decrypting the encrypted second difference matrix to obtain the second difference matrix, wherein the encrypted second difference matrix is determined by the participating node based on an own encrypted characteristic space matrix and an encrypted initial cluster vector.

4. The method according to claim 1 or 2, wherein for a non-primary iteration, the obtaining of the second difference matrix comprises:

receiving an encryption updating difference matrix corresponding to the current iteration and sent by the participating node, wherein the encryption updating difference matrix is used for representing the distance between the cluster center of the last iteration and the cluster center of the current iteration and is generated by the participating node based on a characteristic space matrix of the participating node, the encrypted cluster vector of the last iteration and the encrypted cluster vector of the current iteration;

5. The method of claim 1, further comprising:

randomly setting a number of first samples during initial iteration, and taking each first sample in the set number of first samples as an initial cluster center, wherein one initial cluster center corresponds to one initial class cluster;

determining the first sample as the initial cluster center as the target first sample of the initial cluster;

generating a cluster vector for the initial cluster class with the number of the target first sample of the initial cluster class.

6. The method according to claim 1 or 5, wherein the generation process of the cluster vector comprises:

determining the position of the target first sample in the cluster vector according to the serial number, and encoding a vector element at the position into a first encoding value;

and encoding the vector elements at the residual positions into a second encoding value, wherein the vector elements at the residual positions correspond to the first samples which do not belong to the class cluster.

7. The method of claim 6, wherein encoding the vector element at the position into a first encoded value comprises:

and aiming at any cluster, acquiring the number of the target first samples belonging to the any cluster, and determining the first coding value according to the number of the target first samples.

8. The method of claim 1, further comprising:

receiving an encryption verification vector sent by the participating node, and decrypting the encryption verification vector based on the private key to obtain a decryption verification vector, wherein the encryption verification vector is generated according to the encrypted cluster vector;

sending the decrypted authentication vector to the participating node for security authentication.

9. The method of claim 1, further comprising:

based on the identification information of the samples, carrying out sample alignment with the participating nodes, and carrying out sequential numbering on the aligned first samples;

and generating a feature space matrix of the service node based on the aligned first samples.

10. The method of claim 1, further comprising:

and generating an encryption key, wherein the encryption key comprises a public key and a private key, and sending the public key to the participating node.

11. A clustering method, adapted for participating nodes, the method comprising:

receiving an encrypted cluster vector of each class cluster transmitted by each iteration service node, wherein the cluster vector is determined by the service node based on the number of a target first sample belonging to the class cluster, and the target first sample belongs to a sample in a first sample of the service node;

and aiming at each class cluster, acquiring an encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector of the class cluster, and sending the encrypted target matrix to the service node until iteration is finished.

12. The method according to claim 11, wherein the obtaining the encrypted target matrix corresponding to the class cluster based on the encrypted cluster vector at the first iteration is performed by using the encrypted target matrix as an encrypted difference matrix of a second sample pair cluster center on the participating node corresponding to the class cluster, and comprises:

repeating the encrypted cluster vectors of the initial cluster in the column direction according to the sample number to construct a cluster matrix of the initial cluster;

and obtaining a multiplication of the cluster matrix of the initial cluster and the characteristic space matrix of the participating node to obtain a multiplication matrix, and subtracting the multiplication matrix from the encrypted characteristic space matrix to generate the encryption difference matrix corresponding to the first iteration.

13. The method according to claim 11, wherein the obtaining the encrypted target matrix corresponding to the cluster class based on the encrypted cluster vector of the cluster class when the encrypted target matrix is not the encrypted update difference matrix between the cluster center of the last iteration and the cluster center of the current iteration at the non-initial iteration comprises:

obtaining the two encryption cluster matrixes, and multiplying the two encryption cluster matrixes by the characteristic space matrix of the participating node respectively to obtain a multiplication matrix;

and subtracting the multiplication matrix corresponding to the previous iteration from the multiplication matrix corresponding to the current iteration to obtain the encryption updating difference matrix corresponding to the current iteration.

14. The method of claim 11, wherein at a first iteration, the corresponding encrypted cluster vector is determined based on the number of the target first samples belonging to an initial class cluster.

15. The method according to any of claims 11-14, wherein after receiving the encrypted cluster vector for each cluster class sent by the service node each time iteration, the method further comprises:

adding each dimension of the encrypted cluster vectors to obtain verification data;

and verifying the safety verification of the service node based on the verification data.

16. The method of claim 15, wherein verifying the security verification of the service node based on the verification data comprises:

randomly generating a first column vector and a second column vector, and encrypting the second column vector by using a public key;

performing affine transformation on the first column vector and the encrypted second column vector based on the verification data to generate an encrypted verification vector;

sending the encryption verification vector to the service node, and receiving a decryption verification vector returned by the service node;

and in response to the encrypted verification vector and the decrypted verification vector being consistent, determining that the service node passes security verification.

17. The method according to any of claims 11-14, wherein before receiving the encrypted cluster vector for each cluster class sent by each iteration participant node, further comprising:

based on the identification information of the sample, carrying out sample alignment with the service node, and carrying out sequential numbering on a second sample of the aligned participating node;

generating a feature space matrix for the participating nodes based on the aligned second samples.

18. The method of claim 11, further comprising:

and receiving the public key sent by the service node.

19. A clustering apparatus, comprising:

the cluster vector generation module is used for generating a cluster vector of each class cluster based on the number of a first target sample belonging to the class cluster, encrypting the cluster vector and then sending the encrypted cluster vector to a participating node, wherein the cluster vector is used for representing the cluster center of the class cluster;

a difference matrix obtaining module, configured to obtain a first difference matrix from a first sample of the service node to a cluster center corresponding to each class cluster, and a second difference matrix from a second sample of the participating node to the cluster center corresponding to the participating node;

and the updating module is used for updating a cluster center according to the first difference matrix and the second difference matrix of each cluster, re-clustering the first sample by using the updated cluster center, taking each cluster obtained after re-clustering as a cluster corresponding to the next iteration, and returning to execute the steps until the iteration is finished to generate a final target cluster.

20. The apparatus of claim 19, wherein the difference matrix obtaining module is further configured to:

21. The apparatus of claim 19 or 20, wherein the difference matrix obtaining module is further configured to:

and aiming at the first iteration, receiving an encrypted second difference matrix corresponding to the first iteration sent by the participating node, decrypting the encrypted second difference matrix to obtain the second difference matrix, wherein the encrypted second difference matrix is determined by the participating node based on an own encrypted characteristic space matrix and an encrypted initial cluster vector.

22. The apparatus of claim 19 or 20, wherein the difference matrix obtaining module is further configured to:

aiming at non-primary iteration, receiving an encryption updating difference matrix which is sent by the participating node and corresponds to current iteration, wherein the encryption updating difference matrix is used for representing the distance between the cluster center of the last iteration and the cluster center of the current iteration and is generated by the participating node based on a characteristic space matrix of the participating node, the encrypted cluster vector of the last iteration and the encrypted cluster vector of the current iteration;

23. The apparatus of claim 19, further comprising:

the cluster center selection module is used for randomly setting a number of first samples during initial iteration, and taking each first sample in the set number of first samples as an initial cluster center, wherein one initial cluster center corresponds to one initial class cluster;

a sample determining module, configured to determine the first sample as the initial cluster center as the target first sample of the initial cluster;

the cluster vector generating module is further configured to generate a cluster vector of the initial cluster class according to the number of the target first sample of the initial cluster class.

24. The apparatus of claim 19 or 23, wherein the cluster vector generation module further comprises:

and the coding unit is used for determining the position of the target first sample in the cluster vector according to the serial number, coding the vector elements at the position into a first coding value, and coding the vector elements at the residual positions into a second coding value, wherein the vector elements at the residual positions correspond to the first samples which do not belong to the cluster.

25. The apparatus of claim 24, wherein the encoding unit is further configured to:

26. The apparatus of claim 19, further comprising:

and the verification module is used for receiving the encrypted verification vector sent by the participating node, decrypting the encrypted verification vector based on the private key to obtain a decrypted verification vector, and sending the decrypted verification vector to the participating node to perform security verification, wherein the encrypted verification vector is generated according to the encrypted cluster vector.

27. The apparatus of claim 19, further comprising:

the numbering module is used for aligning samples with the participating nodes based on the identification information of the samples and numbering the aligned first samples in sequence;

and the matrix generation module is used for generating a characteristic space matrix of the service node based on the aligned first sample.

28. The apparatus of claim 19, further comprising:

and the key generation module is used for generating an encryption key, wherein the encryption key comprises a public key and a private key, and sending the public key to the participating node.

29. A clustering apparatus, comprising:

a cluster vector receiving module, configured to receive an encrypted cluster vector of each class cluster sent by a service node iterated each time, where the cluster vector is determined by the service node based on a number of a target first sample belonging to the class cluster, and the target first sample belongs to a sample in a first sample of the service node;

and the target matrix generation module is used for acquiring an encrypted target matrix corresponding to each class cluster based on the encrypted cluster vector of the class cluster and sending the encrypted target matrix to the service node until iteration is finished.

30. The apparatus of claim 29, wherein the target matrix generation module is further configured to:

during initial iteration, performing column direction repetition on the encrypted cluster vectors of the initial cluster according to the sample number to construct a cluster matrix of the initial cluster;

31. The method of claim 29, wherein the target matrix generation module is further configured to:

when the iteration is not the initial iteration, according to the sample number, the encrypted cluster vectors corresponding to the last iteration and the encrypted cluster vectors corresponding to the current iteration are repeated in the column direction respectively to construct two encrypted cluster matrixes;

32. The apparatus of claim 29, wherein at a first iteration, the corresponding encrypted cluster vector is determined based on a number of the target first sample belonging to an initial class cluster.

33. The apparatus of any one of claims 29-32, further comprising:

and the safety verification module is used for adding each dimensionality of the encrypted cluster vector to obtain verification data and verifying the safety verification of the service node based on the verification data.

34. The apparatus of claim 33, wherein the security verification module comprises:

a column vector generating unit for randomly generating a first column vector and a second column vector and encrypting the second column vector using a public key;

a verification vector generation unit configured to perform affine transformation on the first column vector and the encrypted second column vector based on verification data, and generate an encrypted verification vector;

the decryption unit is used for sending the encrypted verification vector to the service node and receiving a decrypted verification vector returned by the service node;

and the verification unit is used for responding to the consistency of the encrypted verification vector and the decrypted verification vector and determining that the service node passes the security verification.

35. The apparatus of any one of claims 29-32, further comprising:

the numbering module is used for carrying out sample alignment on the service node based on the identification information of the samples and carrying out sequential numbering on the aligned second samples of the participating nodes;

and the matrix generation module is used for generating a characteristic space matrix of the participating node based on the aligned second sample.

36. The apparatus of claim 29, further comprising:

and the receiving module is used for receiving the public key sent by the service node.

37. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-18.

38. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-18.

39. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-18.