CN114494753A

CN114494753A - Clustering method, clustering device, electronic equipment and computer-readable storage medium

Info

Publication number: CN114494753A
Application number: CN202011271335.9A
Authority: CN
Inventors: 张达峰; 高艳君; 戴彬; 姜映映; 金哲洙; 文伟
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2022-05-13

Abstract

The application provides a clustering method, a clustering device, electronic equipment and a computer readable storage medium, and relates to the fields of computer technology, artificial intelligence, communication and the like. The method comprises the following steps: acquiring a characteristic subgraph corresponding to the characteristics of a target object to be clustered; determining attention weights corresponding to all neighbor nodes of the feature subgraph respectively, and extracting features of the feature subgraph based on all the attention weights to obtain a feature extraction result; and clustering the target object features according to the feature extraction result. According to the method, the context relation is learned through an attention mechanism, different weights are given to neighbor nodes with different correlations, not only can the global features of the feature subgraph be learned, but also the local features of the feature subgraph can be learned through the attention mechanism, the influence of noise nodes on clustering results is effectively reduced, the completeness of node feature representation is enhanced, and the robustness of an algorithm is improved.

Description

Clustering method, clustering device, electronic equipment and computer-readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a clustering method, a clustering device, an electronic device, and a computer-readable storage medium.

Background

Clustering algorithms are processes that aggregate individuals with the same physical or abstract properties into the same class through computer technology, analysis technology, and data processing technology, and each class generated by clustering algorithms has similar abstract properties.

Taking face clustering as an example, the face clustering algorithm is a process of clustering individuals having the same identity together. The face clustering is one of important methods for face recognition and label-free data management, and is widely applied to the fields of photo album management, face recognition and the like. For example, when photos of a plurality of persons are included in an album, it takes a lot of time to perform operations such as photo management (deletion, editing, sharing, and the like) and quick search for photos of a specified person. If the photos with the same identity are aggregated together through the face clustering algorithm, the searching time of the user can be greatly reduced.

The current clustering algorithm generally adopts a traditional clustering algorithm, such as a similarity metric-based clustering algorithm (K-means), which needs to be optimized.

Disclosure of Invention

In order to overcome the above technical problems or at least partially solve the above technical problems, the present application specifically proposes the following technical solutions:

in a first aspect, the present application provides a clustering method, including:

acquiring a characteristic subgraph corresponding to the characteristics of a target object to be clustered;

determining attention weights corresponding to all neighbor nodes of the feature subgraph respectively, and extracting features of the feature subgraph based on all the attention weights to obtain a feature extraction result;

and clustering the target object characteristics according to the characteristic extraction result.

In a second aspect, the present application provides a clustering apparatus, the apparatus comprising:

the acquisition module is used for acquiring a characteristic subgraph corresponding to the characteristics of the target object to be clustered;

the determining and extracting module is used for determining the attention weights corresponding to all the neighbor nodes of the feature subgraph respectively, and extracting the features of the feature subgraph based on all the attention weights to obtain a feature extraction result;

and the clustering module is used for clustering the target object characteristics according to the characteristic extraction result.

In a third aspect, the present application provides an electronic device comprising: a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method as set forth in the first aspect of the application.

In a fourth aspect, the present application provides a computer readable storage medium for storing a computer instruction, program, code set or instruction set which, when run on a computer, causes the computer to perform a method as set forth in the first aspect of the present application.

According to the clustering method, the clustering device, the electronic equipment and the computer readable storage medium, the context relation is learned through the attention mechanism, different weights are given to neighbor nodes with different correlations, not only can the global characteristics of the characteristic subgraph be learned, but also the local characteristics of the characteristic subgraph can be learned through the attention mechanism, the influence of noise nodes on clustering results is effectively reduced, the completeness of node characteristic representation is enhanced, and the robustness of an algorithm is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic diagram of FIG. G provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of the adjacency matrix of FIG. G provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a degree matrix D of FIG. G provided in an embodiment of the present application;

fig. 4 is a schematic view of a workflow of a LearnClus clustering algorithm provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of a LearnClus clustering algorithm provided in an embodiment of the present application;

FIG. 6 is a schematic workflow diagram of a linking-gcn clustering algorithm provided in an embodiment of the present application;

FIG. 7 is a schematic flow chart of a linking-gcn clustering algorithm provided in an embodiment of the present application;

fig. 8 is a schematic flowchart of a clustering algorithm provided in an embodiment of the present application;

fig. 9 is a schematic diagram of a module for extracting features of data to be clustered according to an embodiment of the present application;

fig. 10 is a schematic flowchart of subgraph construction provided in the embodiment of the present application;

fig. 11 is a schematic structural diagram of a CAGCN module according to an embodiment of the present application;

fig. 12 is a schematic diagram of GCN forward propagation and node aggregation provided in an embodiment of the present application;

FIG. 13 is a diagram illustrating weighted graph convolution forward propagation provided by an embodiment of the present application;

FIG. 14 is a diagram of a neural network model framework provided by an embodiment of the present application;

FIG. 15 is a simplified flow chart of an algorithm provided in an embodiment of the present application;

FIG. 16 is a schematic diagram of a post-processing algorithm provided by an embodiment of the present application;

FIG. 17 is a diagram illustrating a neighbor matrix of a sub-class according to an embodiment of the present disclosure;

FIG. 18 is a schematic diagram of a degree matrix of subclasses provided in an embodiment of the present application;

fig. 19 is a flowchart illustrating a clustering algorithm according to an embodiment of the present application;

FIG. 20 is a flow diagram illustrating cluster append logic according to an embodiment of the present application;

fig. 21 is a schematic clustering diagram provided in an embodiment of the present application;

FIG. 22 is a schematic diagram of BCubed F-score clustering provided in an embodiment of the present application;

fig. 23a is a first schematic diagram of a visualization result provided by an embodiment of the present application;

fig. 23b is a schematic diagram of a visualization result provided by the embodiment of the present application;

fig. 23c is a third schematic diagram of a visualization result provided by the embodiment of the present application;

fig. 23d is a fourth schematic diagram of a visualization result provided by the embodiment of the present application;

fig. 24 is a schematic structural diagram of a clustering device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms referred to in this application will first be introduced and explained:

(1) definition of the figure

The definition of a graph in this application is not a definition of a common image, but a graph definition of a mathematical concept.

Specifically, a graph G ═ (V, E) is defined, where V denotes a set of nodes and E denotes a set of edges. And each node in the graph is represented by a vector with dimension D. For ease of calculation, the results of the graph are typically represented using an adjacency matrix of the graph. As shown in FIG. 1, the structure can be represented by a graph G, V representing a set of nodes {0, 1, 2, 3, 4, 5}, E representing a set of edges { a, b, c, E, f, G, h }, where N represents 6 nodes, and A is a vector representation of node 4 (the vector representation of A in the graph is merely an example), with dimension D of 4.

(2) Adjacency matrix

The adjacency matrix of a graph is often used in graph theory to represent the relationships between nodes in the graph. As an example, fig. 2 is a representation of the adjacency matrix of fig. G (representation of the adjacency matrix without weights), and in fig. 2, 1 is used to indicate that there is a connection between two nodes, and 0 indicates that there is no connection between two nodes.

(3) Degree matrix

The degree of a node in graph G indicates how many paths pass through the node, and as an example, fig. 3 is a degree matrix representation of graph G, e.g., node 0 has a degree of 3, indicating that it has 3 neighboring nodes {1, 2, 5}, and so on. The degree matrix has a value only in the diagonal, which is expressed as the degree of the node, and the rest positions are 0.

(4) Graph convolution

The convolution operation on the graph is essentially an operation of representing the nodes in the graph by self characteristics and neighbor node characteristics, and the graph convolution results in that the differences between the node characteristics with neighbor relations are smaller and smaller, and the differences between irrelevant nodes are larger and larger. As shown in the calculation formula (1):

X^(l+1)＝σ(AX^(l)W^(l)+b^(l)) Formula (1)

Wherein, X^(l)Representing the node characteristics of the l-th layer; σ represents a nonlinear transformation function; a represents an adjacency matrix; w^(l)Represents the weight of the l-th layer; b^(l)Denotes the intercept of the l-th layer.

In practical applications, a Network such as a GCN (Graph volume Network) may be used to perform a Graph volume operation, and the GCN Network itself has a clustering function.

The characteristic value of the node is larger and larger along with the depth of the GCN, and the gradient disappears or the gradient explodes when the gradient is calculated by back propagation, so that the effectiveness of the algorithm is seriously influenced. Therefore, in practical application, the feature is usually normalized, but the characteristics of the GCN are not changed. The formula (2) is a normalized GCN formula.

Wherein, X^(l)Representing the node characteristics of the l-th layer; σ represents a nonlinear transformation function; a represents an adjacency matrix;

represents A + I, self-circulation;

to represent

Corresponding degree matrix, W^(l)Represents the weight of the l-th layer, b^(l)Denotes the intercept of the l-th layer.

In the following, taking the target object as a face as an example, the existing clustering algorithm and the improvement direction of the present application are briefly introduced.

Face clustering can better realize some specific functions, such as album management, and therefore people do a lot of excellent work before. Currently available clustering algorithms include the traditional clustering algorithm: similarity metric based clustering algorithms, such as the K-means clustering algorithm (K-means); a Density-Based Clustering algorithm, such as a Density-Based Clustering algorithm with Noise (DBSCAN), etc.

However, the complexity and effect of the conventional clustering algorithm are greatly influenced by the distribution and number of data sets, such as: the K-means is a clustering analysis algorithm for iterative solution, a proper K value and an initial clustering center need to be selected through continuous iteration, and the algorithm is high in complexity and low in efficiency. Spectral clustering algorithm (Spectral clustering) is also one of the commonly used clustering algorithms, but the Spectral clustering algorithm is greatly influenced by data distribution, the effect is better when the example distribution in each class is more uniform, and the data distribution in a real scene is extremely disordered and non-uniform. Hierarchical clustering algorithms (Hierarchical clustering algorithms) do not require the number of classes of clusters to be predefined and perform very well on low-dimensional datasets, but generally cannot handle the enormous challenges from high-dimensional spaces.

It can be seen that traditional clustering algorithms perform acceptably on low-dimensional, low-complexity data sets, but perform very poorly on complex and large-scale data sets, especially when the data contains noise.

In order to have better performance on a large-scale data set with higher dimensionality, the embodiment of the application adopts a deep learning clustering algorithm to improve the clustering performance.

The graph convolution neural network GCN has good modeling capability on graph structure data with high complexity and high dimension. GCN can be applied to many tasks and is very well behaved. For example: using GCN to carry out graph classification and node classification tasks; the GCN is used in a human body key point detection task, and the performance is improved; the GCN is used in the semantic segmentation task, so that the segmentation effect is better. In the embodiment of the application, each target object feature can be used as a node of a graph, and a GCN is used for realizing a clustering task on a complex data set.

Continuing with the example where the target object is a face, the clustering problem can be converted to a detection and segmentation problem on the graph. In a high-dimensional space, the density of the same type of nodes is similar, the range of the same type of nodes is detected, and then outliers and outliers in each type are removed. If the nodes are in the same range they have the same identity. Or, the clustering problem can be converted into a node prediction problem, whether connection exists between two nodes is predicted through GCN, so that whether two faces have the same identity is judged, and if the two nodes are connected, the two nodes have the same identity is shown.

As an example, the learclus clustering algorithm can convert the clustering problem into a detection and segmentation problem on a graph, as shown in fig. 4, this method mainly consists of three modules: candidate box generator to be clustered (pro-active generator), GCN-D (detection) detection network and GCN-S (segmentation) segmentation network. Firstly, an affinity graph (or called as an affinity graph) is constructed according to cosine distances among face features, a large number of clustering boxes are generated on the graph, and nodes in each box can be considered to belong to the same class. Secondly, each clustering box is used for predicting whether the clustering box is a complete subclass or not through GCN-D. And finally, removing outliers and noise nodes (corresponding to the predicted outliers in the graph) in each predicted clustering frame by using GCN-S (generalized belief propagation-Sum) to finish the clustering function. The method not only obtains higher accuracy in the face clustering task, but also can effectively improve the face recognition performance. As shown in fig. 5, the specific process is as follows:

s101: extracting the characteristics of each aligned face photo by using a trained deep learning model, and constructing a characteristic set D by using the extracted characteristics of the data to be clustered;

s102: regarding the characteristics of each face as nodes, and calculating cosine distances (cosine distances) between every two nodes;

s103: searching K nodes nearest to each node as neighbor nodes, and connecting the neighbor nodes to construct an affinity relationship graph G, wherein the graph G is represented by an adjacency matrix A, if connection exists between the two nodes, the graph G is represented by cosine distance in A, and if connection does not exist, the graph G is represented by 0;

s104: since constructing a cluster box for each node results in a large number of repeated boxes, in order to reduce the amount of repetitive computation, supernodes need to be constructed. Removing connections with cosine distances lower than a set threshold value T from the intimacy-density relationship graph G, firstly selecting a node set with connection relationships smaller than a specified number to construct a smaller sub-graph, and taking the sub-graph as a super node;

s105: each sub-graph that constitutes a super-node is treated as an independent class, which constitutes a cluster box.

S106: taking the average value of the node characteristics in each super-node sub-graph as the characteristics of the super-node, repeating the steps S102-S105, and constructing the generated clustering frame into a clustering frame set;

s107: predicting IoU and IoP values of each cluster box through GCN-D, as shown in FIG. 4, wherein IoU represents the difference between the nodes in the cluster box and the real nodes, and the larger IoU indicates the more accurate nodes contained in the cluster box; IoP for purity, the larger IoP indicates the more accurate the class;

s108: screening IoU and IoP large clustering boxes as clustering results through the step of S107, wherein noise nodes may exist in each cluster, and removing the noise nodes in each high IoP clustering box through GCN-S;

s109: some subclasses possibly exist in the clustering box and are subclasses of a certain large class, repeated subclasses are screened out through a deduplication algorithm, and clustering is completed.

As can be seen from the work flow diagram shown in fig. 5, the algorithm has many complex steps, and many intermediate results need to be saved, which consumes more memory resources.

On the basis, the embodiment of the application adopts an algorithm for converting the clustering problem into the connection prediction problem, and adopts a mode of reasoning by respectively constructing subgraphs by taking each node as a central node, so that the time complexity can be effectively shared, and larger subgraphs are not required to be constructed, so that the algorithm is more efficient. Compared with the traditional algorithm, the algorithm does not need to determine the clustering center, and is more robust on complex data sets.

Face clustering is a very challenging task, and is affected by many factors. For example: the human face feature difference of people with the same identity under different postures, different illumination and different moments can be larger; the size, quality and shape of the photos also affect the clustering results; some noise and outliers are often contained in the data set. The K-mean or spectral clustering algorithm not only needs to manually specify the number of clusters, resulting in low algorithm efficiency, but also is difficult to process the above-mentioned influence factors. In the prior art, an algorithm for replacing a completely absolute distance by a sequencing-based distance is provided, and the algorithm exceeds the performance of the traditional algorithm, and particularly has a good performance when the data is unevenly distributed and noise exists, but has very high calculation complexity under the condition of large data scale.

And judging whether connection exists between the two nodes through the information of the neighbor nodes based on the sorted clustering algorithm. And the existing linkage-gcn clustering algorithm predicts whether two nodes should be connected or not by learning the context relationship between the nodes. As shown in fig. 6, original features of the face image data are extracted through a Convolutional Neural Network (CNN) (3 features, i.e., ID-1, ID-2, and ID-3, are taken as examples in fig. 6), each node is taken as a central node, the central node and its neighboring nodes are connected to construct a subgraph, and then whether two nodes should be connected or not is predicted through context relationship between the nodes in the subgraph learned through GCN, if the two nodes have the same identity, the two nodes are connected, and a clustering result can be obtained. As shown in fig. 7, the specific process is as follows:

s201: extracting the characteristics of each aligned face photo by using a trained deep learning model, and constructing a characteristic set D by using the extracted characteristics of the data to be clustered, wherein the characteristics of each face are 512-dimensional vectors;

s202: taking the characteristics of each face as nodes, and calculating the cosine distance between every two nodes;

s203: taking each node as a central node, and constructing a subgraph according to cosine distances and K neighbor nodes of the node;

s204: predicting the probability of connection between the central node and the neighbor nodes of each subgraph through GCN to form each connection pair;

s205: and traversing each connection pair, and if the probability of the connection between the two nodes is greater than a set threshold value, clustering the two nodes into one class.

However, the above method treats each neighbor node as equal, if a noise node exists, the influence of the noise node is hard to remove, and the global feature of the GCN-based algorithm learning graph is excessively smooth as the number of network layers increases, so that the features of the associated nodes are closer and closer, and if a noise node or an abnormal value exists, a clustering error occurs.

Based on this, the embodiment of the present application provides a clustering method, which also converts the clustering problem into the node prediction problem, but the clustering method provided by the embodiment of the present application does not have the phenomenon of excessive smoothing, and has a higher accuracy compared with the existing clustering algorithm.

Specifically, as shown in fig. 8, the clustering method provided in the embodiment of the present application includes:

step S301: acquiring a characteristic subgraph corresponding to the characteristics of a target object to be clustered;

in the embodiment of the application, the feature subgraph is constructed according to the central node and at least one corresponding neighbor node thereof when the feature of the target object is used as the central node, and the feature subgraph can also be called a subgraph.

The target object features to be clustered refer to features extracted from target objects to be clustered, and for each target object to be clustered, the corresponding target object features can be processed according to the processing method introduced in the scheme, and the same processing process is not repeated herein.

In practical applications, the target object includes but is not limited to a human face, an animal, an object, a scene, a text voice, a language, a network terminal, a smart device, and the like. Taking the target object as a face as an example, the feature subgraph corresponding to the face image features to be clustered is obtained in the step.

In the embodiment of the present application, each target object feature may also be referred to as a node (or a feature node), and for convenience of description, the target object feature may also be referred to as a feature or a node feature hereinafter.

The feature subgraph is a subgraph constructed by taking corresponding target object features as a central node and adjacent nodes thereof, wherein the characteristic subgraph comprises the context relationship between the adjacent nodes.

Step S302: determining attention weights corresponding to all neighbor nodes of a feature subgraph of the feature subgraph, and extracting features of the feature subgraph based on all the attention weights to obtain a feature extraction result;

in the embodiment of the application, different weights are distributed to each neighbor node through learning of the attention mechanism, and if the effect of the neighbor node on the central node is small, the smaller weight is distributed, so that the influence of noise nodes can be effectively reduced. Meanwhile, through the learning of the attention mechanism, only the influence of a specific neighbor node of the central node on the central node can be concerned, so that the local information of the subgraph can be effectively learned.

And performing feature extraction on the feature subgraph based on each attention weight, so that the global features and the local features can be extracted.

Step S303: and clustering the target object characteristics according to the characteristic extraction result.

When the extracted feature extraction result is used for clustering the features of the target object, clustering errors caused by noise nodes can be reduced, the feature expression of sub-graph nodes can be enhanced, and the clustering effect is improved.

Therefore, according to the clustering method provided by the embodiment of the application, the context relation is learned through the attention mechanism, different weights are given to neighbor nodes with different correlations, not only can the global features of the feature subgraph be learned, but also the local features of the feature subgraph can be learned through the attention mechanism, the influence of noise nodes on clustering results is effectively reduced, the completeness of node feature representation is enhanced, and the robustness of an algorithm is improved.

In the embodiment of the application, the target object features to be clustered may be extracted in advance or may be extracted in real time. If the features of the data to be clustered are extracted in real time (i.e., the features of the target object to be clustered are extracted), target object detection and/or target object alignment can be performed before the features of the target object are extracted.

Taking face clustering as an example, as shown in fig. 9, the face detection submodule, the face alignment submodule, and the face feature extraction submodule are used to perform gradual extraction. The face detection sub-module is a very efficient step in face clustering, because in the picture of the real scene, the proportion of the face area in the whole picture is small, if the feature extraction is carried out on the original picture, the background information greatly interferes with the features, and further greatly affects the accuracy of clustering. Therefore, the face detection can be used as a primary step, the position of the face in the image is detected firstly, the face is cut out, the feature extraction is only carried out on the face region subsequently, and the interference of other irrelevant information such as the background is avoided.

The cut human face image may have various postures, so that the human face alignment submodule can be adopted to correct and align the inclined human face.

The feature extraction sub-module can perform feature extraction on the face images after alignment correction, namely, one image is represented by using a vector of a D dimension.

In the embodiment of the application, the feature subgraph can be constructed in advance or constructed in real time. The construction of the subgraph is one of core works of the application, because the subgraph contains the context information of the nodes, the prediction can be carried out based on the subgraph by learning the context information. When the characteristic subgraph is constructed, the neighbor nodes of each characteristic node are searched through the similarity between the characteristic nodes.

In one possible embodiment, the similarity between feature nodes may be calculated using cosine distances. The larger the cosine distance is, the higher the similarity of the two nodes is. The cosine distance can distinguish the difference between features from the direction more for representing the difference from the magnitude of the dimension value than the Euclidean distance represents the absolute difference between features. For example, in the case of face clustering, the difference of face features of the same person is large in different postures, different illumination and different moments, but the same person is still represented, and if the euclidean distance is adopted, the face features are clustered into two persons. Therefore, the clustering accuracy can be effectively improved by using the cosine distance in the embodiment of the application.

Formula (3) is a formula for calculating cosine distance, where X may be expressed as a feature vector of the face a, Y represents a feature vector of the face B, and sim (X, Y) represents the similarity between the face a and the face B.

In the embodiment of the application, when the feature subgraph takes the feature of the target object as a central node, the feature subgraph is constructed according to the central node and at least two corresponding neighbor nodes, wherein the at least two neighbor nodes can be in a first order or in multiple orders.

In a possible implementation manner, each feature subgraph is constructed by taking each target object feature to be clustered as a central node and multiple-order neighbor nodes thereof, so that the feature subgraph can fully contain the context relationship between the neighbor nodes. The inventor of the application finds that the connection prediction by using the subgraph formed by the multi-order neighbor nodes is very effective.

The first-order neighbor nodes are neighbor nodes searched according to the similarity with the central node, the second-order nodes are neighbor nodes searched according to the similarity with the first-order neighbor nodes, and the like.

Specifically, each target object feature f to be clustered is used as a central node and multi-order neighbor nodes thereofTo construct a characteristic subgraph together, first-order ki neighbor nodes of an example node f are searched

Wherein N is_fRepresenting a set of all neighbour node characteristics, V_fIndicating the number of neighbor nodes. Secondly, all the neighbor nodes are normalized,

finally, selecting r nearest neighbor nodes for each node to connect and construct a characteristic subgraph G_f(V_f，E_f) Wherein G is_fWithout including the central node f. In the present embodiment, the adjacency matrix A is used_fRepresents G_f。E_fIs G_fA set of edges.

In this embodiment of the application, the step of constructing the feature subgraph in real time may be before step S301, and specifically, the following steps may be included:

determining at least one first-order neighbor node of the central node in other target object characteristics by taking the target object characteristics as the central node;

determining at least one neighbor node with the largest cosine distance with the first-order neighbor node as a next-order neighbor node for each first-order neighbor node in sequence;

and constructing a characteristic subgraph according to the central node and the determined at least two-order neighbor nodes.

The step of determining at least one first-order neighbor node of the central node may include:

and determining at least one neighbor node with the largest cosine distance from the central node, at least one neighbor node with the smallest cosine distance from the central node and at least one other neighbor node with any cosine distance as a first-order neighbor node of the central node.

Taking the example of constructing each feature subgraph according to each feature node and its two-stage neighbor nodes together, as shown in fig. 10, the algorithm flow for constructing the feature subgraph may include:

s1101: calculating cosine distances between all nodes;

s1102: taking the characteristics of each target object as a central node P, and searching m1 neighbor nodes with the largest cosine distance, m2 neighbor nodes with the smallest cosine distance and m3 neighbor nodes with the middle random cosine distance; the number of the neighbor nodes (i.e. first-order neighbor nodes) of P is k1 ═ m1+ m2+ m 3;

s1103: for each neighbor node of P, n nodes (i.e., second-order neighbor nodes) with the largest distance from the cosine thereof are found, and k2 is k1 n.

Through the steps, the number of first-order neighbor nodes of the central node P is determined to be k1, the number of second-order neighbor nodes is determined to be k2, the theoretical total number of the total neighbor nodes of the central node P is mp-k 1+ k2, and in a practical situation, the same neighbors may exist among the nodes, so that the total number of the neighbor nodes of the central node P is less than or equal to mp;

s1104: selecting k neighbor nodes for each central node P, connecting the neighbor nodes to construct a characteristic subgraph, representing the characteristic subgraph by using an adjacent matrix A, and calculating to obtain a degree matrix D of the characteristic subgraph;

for the features of each target object image to be clustered, the construction of a corresponding feature sub-graph can be completed according to the steps, namely the steps of S1102-S1104 are repeated until all nodes are used as central nodes to complete the sub-graph construction.

Further, in step S303, according to the feature extraction result, determining a connection probability between the central node of the feature sub-graph and each of the first-order neighbor nodes corresponding thereto; the central node with the connection probability greater than the third threshold and the corresponding neighbor nodes are grouped into one type, and in practical application, a person skilled in the art can set the value of the third threshold according to practical situations, which is not limited herein in the embodiment of the present application.

In particular, the probability of a connection existing between a central node and its first-order neighbor nodes can be predicted by a fully-connected neural network. And if the connection probability between the two nodes is larger than a third threshold value, the two nodes belong to the same class. And traversing all the connection pairs output by the neural network prediction, and homopolymerizing two nodes with the connection probability larger than a third threshold value into one type.

In the embodiment of the present application, a feasible implementation manner is provided for step S302, and specifically, the implementation manner includes the steps of:

step S3021: determining attention weights corresponding to all neighbor nodes of the feature subgraph respectively through an attention module, and carrying out weighting processing on the feature subgraph through all the attention weights to obtain a first feature;

step S3022: performing feature extraction on the feature subgraph through at least one graph convolution network layer to obtain a second feature;

step S3023: and fusing the first characteristic and the second characteristic to obtain a characteristic extraction result.

In this embodiment of the present application, the above steps may be executed by a Network Module as shown in fig. 11, where the Network model may be referred to as a Context Attention Graph relational Network Module (CAGCN), and may extract not only global features of a sub-Graph, but also local features, and enhance an expression capability of the features through fusion of the global features and the local features, so as to improve an accuracy of clustering.

The CAGCN module is mainly composed of a module for learning GLOBAL features (corresponding to at least one graph convolution network in step S3022) and a module for learning LOCAL features (attention module in step S3021), which are called GCN-GLOBAL and GCN-LOCAL, respectively.

W in fig. 11 may represent a weighting process of GCN-LOCAL, for example, a learned attention neighbor matrix is matrix-multiplied by a feature matrix, and the main purpose is to assign different aggregation weights (attention weights) to neighbor nodes according to different correlations of each neighbor node when the neighbor nodes represent features of the central node.

Data such as digital images, voice and text belong to european space data, usually have fixed dimensions, and can be subjected to feature extraction by using Neural networks such as CNN or Recurrent Neural Network (RNN).

While unstructured data such as social networks do not have fixed dimensions, GCN may be used for processing such unstructured data, and for the embodiment of the present application, in the CAGCN module, GCN-GLOBAL may be a stack of multiple GCNs or a single GCN module. A large number of efforts have demonstrated that GCNs have very strong modeling power for such data. GCNs can be classified into spectral methods and spatial methods according to the number and task. Where the convolution of the spectral method is based on the fourier transform of the graph, whereas the spatial method operates directly using the neighborhood matrix of the graph. In the embodiment of the application, the spatial domain method of the GCN is used in the clustering task, and the essence of the spatial domain method is the application extension of the connection prediction.

Along with the deepening of a network structure, the range of the GCN clustering node features is gradually enlarged, so that the global receptive field is provided for the subgraph, and the ability of extracting the global features of the subgraph is provided. As can be seen from equation (2), this is because the characteristics of the next-layer node in the GCN are obtained by aggregating the neighboring nodes from the characteristics of the previous-layer node. The spatial domain method of the GCN is adopted to extract the subgraph features, each layer of GCN is stacked to be equivalent to one-time aggregation of the first-order neighbor node features, and the aggregation radius of the features is continuously enlarged along with the deepening of the layer number of the neural network, so that the GCN can effectively learn the global information of the subgraph (corresponding to the second feature in the step S3022).

The GCN-LOCAL distributes different weights to each neighbor node through the learning of an attention mechanism, and if the neighbor node has small effect on the central node, the smaller weights are distributed, so that the influence of noise nodes can be effectively reduced. Since the attention module only concerns the influence of a specific node on it, local information of the subgraph (corresponding to the first feature in step S3022) can be effectively learned.

As an example, since the clustering problem is converted into the connection prediction problem in the embodiment of the present application, when the connection prediction is performed using the feature subgraph formed by the multi-order neighbor nodes, it can be determined whether the central node and the first-order neighbor node belong to the same class by predicting whether there is a connection between the central node and the first-order neighbor node, and therefore the features of the first-order neighbor nodes are more important. At this time, the attention module can only pay attention to the influence of the first-order neighbor nodes of the nodes on the nodes, so that the local information of the subgraph can be effectively learned.

The GCN-LOCAL can avoid the problem that the clustering is inaccurate due to the fact that the difference between the characteristics of nodes in a subgraph is smaller and smaller as the number of network layers is deepened. When the probability that a connection exists between a specific neighbor node and a central node is predicted, for example, when a characteristic subgraph formed by multi-order neighbor nodes is used for connection prediction, the probability that a connection exists between a first-order neighbor node and the central node is predicted, and local information of the subgraph is also very valuable. When the GCN-GLOBAL learns the GLOBAL characteristics of the subgraph, namely the context relationship of the subgraph, the GCN-LOCAL can learn the LOCAL information, and the importance of the first-order neighbor node characteristics is effectively utilized.

One possible implementation is that in GCN-LOCAL, a graph convolution neural network based on a one-layer attention mechanism can be used to extract the LOCAL features (first features) of the above feature subgraph; in GCN-GLOBAL, GLOBAL features (second features) of the feature subgraph are extracted by using at least two layers of graph convolutional neural networks. Further, after the GLOBAL feature extraction submodule (GCN-GLOBAL) and the LOCAL feature extraction submodule (GCN-LOCAL) extract the GLOBAL feature and the LOCAL feature, the fusion is performed through step S3023 (corresponding to + (in fig. 11)), and specifically, the LOCAL feature and the GLOBAL feature may be fused in a point-by-point addition manner or the like.

In the embodiment of the application, the GLOBAL features of the subgraph can be learned by using the improved GCN (GCN-GLOBAL). As can be known from the formula (2), the GCN expresses the characteristics of the current node by aggregating the characteristics of the neighbor nodes in the forward propagation process, learns the characteristic expression of each node, and the larger the range of characteristic aggregation is along with the increase of the number of the GCN layers, the larger the range of the receptive field is, and the global characteristics of the subgraph can be gradually learned. Meanwhile, each subgraph comprises a large number of useful context relations, the context relations of the subgraph can be learned through GCN, and when the characteristic subgraph formed by multi-order neighbor nodes is used for connection prediction, the probability that a link exists between a central node and a first-order neighbor node of the central node is predicted through the context relations. However, the conventional GCN is a transduction learning manner, for example, the conventional GCN cannot better handle the task of clustering unknown target objects, so the embodiment of the present application improves the conventional GCN, makes the conventional GCN more robust to unknown data, and improves the clustering effect.

Transduction learning: in training the GCN network model, the training data and the test data are known, that is, the training data includes the test data.

Induction learning: in training the GCN network model, the training data is known and the test data is unknown.

Because transductive learning is known from test data during training, it cannot be generalized to unknown data, i.e., it would perform very poorly on unknown data sets. However, the conventional GCN is a transduction learning mode, all nodes are required to participate in training to obtain correct node expression, and in a real scene, it is difficult to obtain feature information of all nodes. Although the traditional GCN has strong feature extraction capability in graph structure data, the above defects can cause very poor prediction results on unknown data, and finally cause wrong clustering results.

And the induction learning can generate the feature expression of an unknown sample by using the feature information of the nodes, and the feature expression is better expressed on an unknown data set. When the graph convolution network is applied to tasks such as unknown face clustering and the like, induction learning is more suitable for the tasks to be executed in the embodiment of the application.

Considering that the GCN has very strong feature extraction capability for non-Euclidean spatial data sets, even if randomly initialized weight parameters are used, the extracted node features can be comparable to the effect of a node embedding algorithm obtained through complex training. In the embodiment of the application, the improved GCN is trained, and nodes of the learning subgraph are embedded.

Specifically, when the graph convolutional network layer is at least two layers, starting from the second graph convolutional network layer, the neighbor nodes of the current layer are obtained by aggregating the same neighbor node of the previous layer and the neighbor nodes of the same neighbor node.

That is, the corresponding central node of the next layer may contain the neighbor nodes of the previous layer.

The GCN-GLOBAL algorithm flow provided in the embodiment of the present application may correspond to the following formula (4):

wherein the content of the first and second substances,

feature matrix representing input subgraph, N for first layer input feature matrix_f，

Representing the feature representation of the next layer, N representing the number of nodes, D_in/Do_utRepresenting the characteristic dimension of the input/output.

Is a feature representation after aggregating neighbor nodes. g (A, X)^(l))＝Λ^(-1/2)AΛ^(-1/2)·X^(l)In this case, normalization processing may not be performed on a. Wherein g (-) is for A and X^(l)A is a neighbor matrix of the subgraph, excluding the self-circulation case, and the Λ -representation matrix Λ ═ Σ_jA_ij. And | | l represents that the neighbor nodes to be clustered and the self nodes are cascaded (connected) on the characteristic dimension. W^(l)And b^(l)Respectively representing the weight and the offset of the ith layer. σ is a nonlinear activation function, and in the embodiment of the present application, a ReLU activation function may be used as the nonlinear activation function.

Illustratively, as shown in fig. 12, for the central node C, the feature C1 of the next layer is obtained by aggregating the features of the central node C of the previous layer and the first-order neighbor nodes thereof, and the feature h1 is also obtained by aggregating the features of the previous layer and the first-order neighbor nodes thereof. And as the number of network layers is deepened, the range of feature aggregation of the central node is larger and larger, namely the reception field of graph convolution is larger and larger, and the global features of the subgraph can be extracted.

For example, the center node C2 at level L +2 is aggregated from the features C1 at level L +1 and its first-order neighbor nodes { h1, h, h }, while the features of h1 are aggregated from the features at level L. It can be seen that the aggregated features in C2 include the features of the neighbor nodes of h 1.

The improved GCN can extract global features.

The embodiment of the application realizes the characteristic representation of the unknown node by learning a node aggregation mode, is an inductive learning mode, and is more robust to the clustering effect of objects such as unknown faces in tasks such as face clustering and the like. The expressiveness of extracting features is enhanced to some extent compared to simply stacking multiple layers of GCNs (e.g., 4 layers) to extract sub-graph features.

According to the content, the GCN-GLOBAL can extract the GLOBAL features of the feature subgraph and learn the context relationship of the feature subgraph, so that the clustering accuracy is improved. In the adjacency matrix a, 1 indicates that there is a connection between two nodes, and 0 indicates that there is no connection between two nodes, that is, when node characteristics are aggregated, the aggregated nodes have the same role. However, in an actual scene, noise nodes are often introduced when a subgraph is constructed, and if the weight of each node is all 1, the noise nodes have a very bad influence on feature expression of a central node along with the deepening of a network, and finally clustering errors are caused.

In the embodiment of the application, when the connection prediction is performed by using the characteristic subgraph formed by the multi-order neighbor nodes, clustering is performed by predicting whether connection exists between the central node and the first-order neighbor nodes thereof, namely focusing on the first-order neighbor nodes of the subgraph. Then the feature of the central node is characterized by the feature of the first-order neighbor node, which can be regarded as extracting the local feature of the subgraph. Therefore, in the embodiment of the application, the LOCAL features of the subgraph can be extracted through the GCN-LOCAL neural network module.

In order to reduce the influence of noise nodes on the node feature expression, the GCN-LOCAL module in the embodiment of the present application assigns different weights to each aggregated feature by learning, focuses more on the node features with stronger relevance, reduces the influence of the node features with small relevance on the subsequent feature expression, and thus improves the accuracy of the whole clustering process.

Specifically, the algorithm flow of GCN-LOCAL proposed in the embodiment of the present application may correspond to the following formula (5):

X^(l+1)＝σ([X^(l)|](a·X^(l))]W^(l)+b^(l)) Formula (5)

Wherein the content of the first and second substances,

the feature matrix representing the input subgraph is N for the first layer of input_f，

Representing the feature representation of the next layer, N representing the number of nodes, D_in/D_outRepresenting the characteristic dimension of the input/output. a.X^(l)The feature representation after the neighbor nodes are aggregated, and a is a neighbor matrix of the subgraph after the attention mechanism learning, and the self-circulation condition is not included. And | | l represents that the neighbor nodes to be clustered and the self nodes are cascaded (connected) on the characteristic dimension. W^(l)And b^(l)Respectively, the weight and the offset of the l-th layer, and sigma is a ReLU nonlinear activation function.

The subgraph is constructed by using a neighbor matrix A, and a is the neighbor matrix of the subgraph after attention mechanism learning, wherein elements in a represent connection weights between each neighbor node and the central node. The formula of a is as the following formula (6):

wherein e is_ijExpressed as a connection weight coefficient between node i and node j. a is_ijThe connection weight between the activated node i and the activated node j is the element of the ith row and the jth column in the adjacency matrix a. N is a radical of_iRepresenting a set of neighbor nodes for node i. LeakReLU is a non-linear activation function.

e_ijExpressed as a connection weight coefficient between node i and node j, where i represents the central node and j represents the neighbor nodes of the central node. In the examples of this application ei_jThe weight coefficients are learned by means of self-supervision. As shown in equation (7):

e_ij＝α(Wx_i||Wx_j) Formula (7)

Wherein e is_ijIs learned through the learnable parameter a. W represents a learnable weight. And | | represents that the self node features and the neighbor node features after linear transformation are cascaded (connected) in feature dimension. If the element in row i and column j in A is 0, then ei_jIs 0.

According to the formulas (5), (6) and (7), the GCN-LOCAL neural network module distributes different weights to different neighbor nodes, pays more attention to the nodes which contribute more to the GCN-LOCAL neural network module, can effectively reduce the expression of noise nodes to subgraph characteristics, and improves the clustering accuracy.

As shown in fig. 12 and fig. 13, since the current layer node is aggregated by the features of the previous layer node and its neighbor nodes, the graph convolution network based on the attention mechanism no longer treats each neighbor node equally, but pays more attention to the neighbor node contributing the most to the own node, and if the role of the neighbor node is greater, a higher weight is assigned at the time of aggregation, for example, in fig. 13, weights a 1-a 4 are assigned to the central node and the first-order neighbor nodes. And for the noise or abnormal value nodes, smaller weight is assigned in a learning mode, and even the noise nodes are eliminated.

The GCN-LOCAL neural network module focuses on LOCAL features more, and GLOBAL features of a GCN-GLOBAL learning subgraph are learned, the graph convolution network module of the context attention mechanism provided by the embodiment of the application realizes the fusion of the GLOBAL features and the LOCAL features, so that the problems are relieved, the feature expression of subgraph nodes is enhanced, the clustering effect is improved, the problems of excessive smoothness and the like along with the increase of the number of network layers are avoided, and the clustering error caused by the effect that the influence of noise nodes is amplified along with the increase of the number of network layers is avoided.

In the embodiment of the present application, a possible implementation manner is provided for step S3023, and specifically, the implementation manner includes the steps of:

fusing the first characteristic and the second characteristic to obtain a fused characteristic;

and adjusting the fused features based on each neighbor node to obtain a feature extraction result.

Wherein, the fused features are adjusted based on each neighbor node through any one of the following neural networks: a single-layer graph convolution network; a single layer attention-based graph convolution network; a single layer graph convolution network based on a contextual attention mechanism.

As can be seen from the above description, the CAGCN improves the expression capability of the sub-graph features by extracting and fusing the global and local features, that is, fusing the first feature and the second feature, for example, adding point by point or concatenating in feature dimension to obtain the fused features. Considering that the feature fusion mode of the GCN-LOCAL and GCN-GLOBAL extraction is simple, in order to avoid introducing other noises and further make the feature fusion more perfect, the embodiment of the present application provides a refinement module (refinement module) to further improve the feature expression capability of the subgraph.

Specifically, the relevance of each neighbor node to the central node of the feature subgraph can be determined; and distributing corresponding adjustment weights of the corresponding correlation for each neighbor node. For example, the refinement adjustment module may adjust the merged feature by using a single-layer graph convolution network based on an attention mechanism, and reduce the expression of the feature by the nodes with low correlation through the attention mechanism. By way of example, when the connection prediction is performed by using the feature subgraph composed of multi-order neighbor nodes, only the relation between the central node and the first-order neighbor nodes thereof can be predicted, so that the feature expression of nodes except the first-order neighbor nodes in the fused features can be reduced through a single-layer graph convolution network based on the attention mechanism.

In this embodiment, the refinement adjustment module may use a layer of graph convolution neural network, a layer of graph convolution neural network based on attention mechanism, a layer of graph convolution neural network based on context-based attention mechanism (a CAGCN module), or the like, and in other embodiments, other neural networks may be used.

In combination with the CAGCN module and the refinement module, as shown in fig. 14, the neural network model provided in the embodiment of the present application is constructed by a feature extraction and fusion module (i.e., the above-mentioned CAGCN) and the refinement module. The specific algorithm flow may refer to the above formula (4) and formula (5), wherein when the local features are extracted, the feature subgraph is weighted by the attention weight determined by the attention mechanism (corresponding to w in fig. 14), then the global and local features of the feature subgraph are fused (corresponding to f in fig. 14), and the fine adjustment module performs finer adjustment on the fused features. Firstly, taking the characteristic subgraph obtained in S301 as the input of a neural network model; secondly, extracting and fusing the features of the feature subgraph through CAGCN, and performing fine adjustment (for example, by image convolution) through a fine adjustment module; and finally, predicting the probability of connection between the central node and a specific neighbor node thereof through two fully-connected layers, and outputting a connection pair. For example, when a characteristic subgraph formed by multi-order neighbor nodes is used for connection prediction, the probability of connection between a central node and a first-order neighbor node thereof is predicted through two fully-connected layers, and a connection pair is output.

The method comprises the steps of predicting data to be tested through a neural network to obtain the connection probability between each central node and a first-order neighbor node of the central node, traversing each prediction result, if the connection probability is lower than a set threshold value, indicating that the connection does not exist between the central node and the neighbor node, if the connection probability is higher than the set threshold value, connecting the two nodes, finally traversing each node through a breadth-first algorithm, and searching the node connected with the node to obtain each clustering result.

The embodiment of the application learns the context relationship between the nodes through the GCN to predict whether the two nodes should be connected, and if the identities of the two nodes are the same, the two nodes are connected. And the characteristic subgraph formed by the multi-order neighbor nodes is very effective in connection prediction. In summary, the algorithm for converting the clustering problem into the connection prediction problem provided by the embodiment of the present application is feasible.

In the embodiment of the application, the characteristic subgraph is subjected to characteristic extraction through a neural network constructed by a convolutional neural network module (CAGCN) with a three-layer context attention mechanism, fine adjustment is performed through a fine adjustment module, finally, the probability that connection exists between a central node and a first-order neighbor node is predicted through two layers of full connection layers and a softmax (logistic regression) layer, optimization is performed through a cross entropy loss function, and a connection pair is output. And constructing a corresponding subgraph for each node in the training process to carry out forward and backward propagation, wherein unknown data can be predicted after N epochs are trained once after all nodes are traversed and belong to one round (epoch) (all training samples are trained once).

The graph convolution network based on the attention mechanism provided by the embodiment of the application has very good performance on a complex data set, the model can learn not only the global characteristics of the input subgraph, but also the local characteristics of the subgraph through the attention mechanism, and the influence of noise nodes on a clustering result can be effectively reduced. Through the fusion of the global features and the local features, the context relationship of the subgraph is learned, the node feature expression can be effectively enhanced, and the clustering accuracy is improved.

The graph convolution network based on the attention mechanism provided by the embodiment of the application can adjust the depth, the width and the like of the network according to an application scene so as to adapt to the calculation of multiple platforms.

As shown in fig. 15, each node to be clustered is used as a central node, and its neighbor nodes are searched for by cosine distance, so as to construct a subgraph with the central node as a unit; predicting the probability of connection between the central node and the first-order neighbor node of each subgraph through a GCN neural network; if the probability of the connection between the two nodes is larger than the threshold value, connecting the two nodes; and finally, finishing clustering by traversing each connection pair.

As shown in fig. 15, an isolated node, which may be a noise or an abnormal value node, may be included in each category after the clustering is completed, resulting in a decrease in the clustering purity. Therefore, the embodiment of the application provides an efficient post-processing algorithm based on the degree matrix to enhance the stability of the clustering result

For the embodiment of the present application, after step S303, the efficient post-processing algorithm based on the degree matrix is executed in step S304, so that the stability of the clustering process is effectively improved, and the interference of noise nodes is eliminated. Specifically, step S304 may include the steps of:

step S3041: respectively constructing class subgraphs based on each class obtained by clustering;

after the clustering is completed, each subclass may form a sub-graph, and the specific construction method refers to the introduction of constructing a feature sub-graph above, which is not described herein again.

Step S3042: and determining a degree matrix of the corresponding class subgraph for each class, and determining a noise node in the class according to the degree matrix and each node in the class.

The method for calculating the degree matrix of each class subgraph is described above, and is not described herein again.

Further, for each class, determining a noise node in the class according to the degree matrix and each node in the class includes:

determining each isolated node in the corresponding class subgraph according to the degree matrix; in one possible implementation, each node in the degree matrix that is less than the first threshold may be determined as an isolated node in the corresponding class sub-graph. In practical applications, a person skilled in the art may set the value of the first threshold according to practical situations, and the embodiment of the present application is not limited herein. By way of example, a node in the degree matrix having a level of 1 indicates that it is an isolated node.

Respectively determining the average characteristics of other nodes except the isolated nodes aiming at each isolated node; and determining whether each isolated node is a noise node or not according to each isolated node and the corresponding average characteristic thereof. Specifically, the cosine distance of each isolated node from its corresponding average feature may be calculated; and when the cosine distance is smaller than a second threshold value, determining the corresponding isolated node as a noise node. In practical applications, a person skilled in the art may set the value of the second threshold according to practical situations, and the embodiment of the present application is not limited herein.

In a possible implementation manner, the feature average value of the remaining nodes is calculated as the cluster center of the subclass excluding the isolated node, the cosine distance between the isolated node and the cluster center is calculated, if the distance is smaller than a set second threshold, the node is a noise node, and if the distance is larger than the set second threshold, the node belongs to the subclass.

In one example, as shown in FIG. 16, A, B, C, D, E are the same type of node, and there is only a connection between node F and node A. Fig. 17 is a neighbor matrix of the sub-class, and fig. 18 is a degree matrix of the sub-class. The degree of the F node is 1 as known from the degree matrix, and the F node can be regarded as an isolated node. And then, calculating A, B, C, D, E a characteristic average value of the node as a cluster center S of the subclass, wherein if the cosine distance between the S and the F node is greater than a set second threshold, the S and the F node are in the same class, otherwise, the F node is a noise node.

The efficient post-processing algorithm based on the degree matrix can effectively improve the stability of the clustering process and eliminate the interference of noise nodes.

In combination with the above embodiments, the embodiments of the present application provide a clustering algorithm based on an attention mechanism, convert a clustering problem into a node prediction problem, extract and fuse global and local features of subgraphs through a graph convolution network based on the attention mechanism, finely adjust the fused features, predict a probability that a connection exists between nodes through a fully-connected classification network, and if a connection exists, indicate that two nodes are the same identity. And finally, eliminating the interference of the noise node through a post-processing algorithm.

The graph neural network module based on the attention mechanism enhances the feature expression of the subgraph by fusing global and local feature information, and improves the accuracy of clustering; in the feature transfer process, the attention module aggregates the features of the nodes with larger relevance. If the noise node exists in the subgraph, the noise node is endowed with smaller weight through learning the context relationship, and the purity of the clustering result is effectively improved; and the noise and abnormal values in the clustering result can be effectively removed by a post-processing algorithm based on the degree matrix.

Specifically, as shown in fig. 19, the main algorithm flow includes:

s401: extracting the characteristics of each aligned target object by using a trained deep learning model, and constructing a characteristic set D from the extracted characteristics of the data to be clustered;

s402: taking the characteristics of each target object as nodes, and calculating the cosine distance between every two nodes;

s403: taking each node as a central node, and constructing a characteristic subgraph by taking K1 neighbor nodes and K2 neighbor nodes of K1 neighbor nodes;

specifically, each node can be used as a central node P to search m1 neighbor nodes with the largest cosine distance, m2 neighbor nodes with the smallest cosine distance and m3 neighbor nodes with the middle random cosine distance; the number of the neighbor nodes of P is k1 ═ m1+ m2+ m 3;

and searching n nodes with the largest distance from the cosine of the node for each neighbor node of the P, wherein k2 is k1 n. Through the steps, the number of first-order neighbor nodes of the central node P is determined to be k1, the number of second-order neighbors is k2, the theoretical total number of the neighbor nodes of P is k1+ k2, and in a practical situation, the same neighbors may exist between every first-order node, so that the total number of the neighbor nodes of P is less than or equal to k;

selecting k neighbor nodes for each central node P to construct a characteristic subgraph, representing the characteristic subgraph by using an adjacent matrix A, and calculating to obtain a degree matrix D of the characteristic subgraph;

and repeating the steps until all the nodes are used as central nodes to complete the feature subgraph construction.

S404: extracting and fusing global and local characteristics of each subgraph through a graph convolution network based on an attention mechanism, finely adjusting the fused characteristics, predicting the probability of connection between a central node and a first-order neighbor node of the central node through a full-connection classification network, and forming each connection pair;

and extracting and fusing the global feature and the local feature information of each subgraph based on the graph convolution network of the attention mechanism, and enhancing the feature expression of each subgraph. The fine tuning further improves the reliability of the fused features.

S405: and if the probability of the connection between the two nodes is greater than a set third threshold value, the two nodes are grouped into one class.

S406: and constructing each class into a class subgraph, calculating the cosine distance between the feature of the node with the degree of calculation smaller than 2 (only one example, other thresholds can also be used), and if the distance is smaller than a set second threshold, determining the node as a noise node. The high-efficiency post-processing algorithm based on the degree matrix can effectively improve the stability of the clustering process and eliminate the interference of noise nodes.

The embodiment of the application provides a clustering algorithm based on a graph attention machine mechanism. And (4) converting the clustering problem into a node prediction problem, predicting the probability of connection between nodes through GCN, and if connection exists, indicating that the two nodes are the same identity. The clustering algorithm provided by the embodiment of the application can learn the global and local context relationship of the graph, and improve the integrity of feature representation and the robustness of the algorithm.

The clustering algorithm based on the graph attention machine mechanism provided by the embodiment of the application has the following beneficial effects:

(1) compared with the conventional GCN clustering algorithm, the clustering algorithm based on the attention mechanism has higher accuracy;

(2) the provided algorithm has no phenomenon of excessive smoothness, and the clustering accuracy is improved;

(3) the characteristics of the nodes with high aggregation correlation are learned through an attention mechanism, and the influence of noise nodes is reduced;

(4) compared with the traditional unsupervised clustering algorithm, the method is a supervised learning algorithm and better appears on a large-scale data set with higher dimensionality;

(5) compared with the prior GCN method which only learns the global information of the subgraph, the method can learn the global information and the local information of the subgraph at the same time, and fuse the local and global characteristics to improve the clustering effect;

(6) the post-processing algorithm based on the degree matrix is provided, so that noise nodes can be effectively removed, and the clustering purity is improved;

(7) the clustering algorithm can share time complexity, does not need to construct larger subgraphs, and is very suitable for clustering of a mobile terminal.

Based on the foregoing embodiments, the embodiments of the present application further provide an additional logic, considering that the clustering process is to cluster all target objects to be clustered, but in small devices with limited computing resources, such as mobile phones, an initial clustering mode and a late additional clustering mode are generally adopted. When the user uses the mobile phone to take a picture, all photos are not taken at one time, but are gradually accumulated along with the use time, so that the user can quickly see the clustering result in an additional mode, and the user experience is enhanced.

In a possible implementation manner, the clustering method further includes:

(1) acquiring new target object characteristics to be clustered;

(2) respectively acquiring a preset number of historical target object characteristics aiming at the clustered categories;

(3) the historical target object features and the new target object features are used as target object features to be clustered, and clustering is performed through the clustering method provided by the embodiments, namely, the historical target object features and the new target object features are clustered through the attention module and the graph convolution network to obtain new clustering results;

(4) and determining the category corresponding to the new target object characteristic based on the new clustering result and the clustered category.

Specifically, taking a target object to be clustered as a face image as an example, as shown in fig. 20, at an initial clustering node, selecting a specified number N of faces to perform clustering, so as to obtain N face categories (corresponding to existing clusters in fig. 20); when a new photo is detected, carrying out face detection, face alignment and face feature extraction on the new photo to obtain the feature of each face; randomly selecting n2 face images from n gathered classes as a representative of the class, merging the face images with new face images and clustering the face images through a GCN based on an attention mechanism; and calculating the proportion of the new photos in the new clustering result in the existing clustering result, combining the new photos into the existing clustering result if the proportion is more than 0.5, taking the new photos as a new category if the proportion is less than 0.2, not performing any operation if the proportion is more than 0.2 and less than 0.5, and re-clustering the new photos again when clustering the next time.

For the present embodiment, the following evaluation indexes can be adopted:

(1)Pairwise F-score

wherein:

by way of example, as shown in FIG. 21, where x represents a class of documents, o represents a class of documents, and block o represents a class of documents, perfect clustering should place various graphics into a class. Wherein TP means that two documents grouped together are correctly classified, FP means that documents that should not be classified are wrongly classified, and FN means that documents that should not be separated are wrongly separated. Pairwise Precision represents the proportion of correctly clustered in the clustering result, and Pairwise Recall represents the proportion of correctly clustered together in which they should be clustered together.

TP+FP＝C(2,6)+C(2,6)+C(2,5)＝15+15+10＝40

TP＝C(2,5)+C(2,4)+C(2,3)+C(2,2)＝15+15+10＝20

FP＝40-20＝20

TP+FN＝C(2,8)+C(2,5)+C(2,4)＝28+10+6＝44

Pairwise Precision＝20/40＝50％

Pairwise Recall＝20/44＝45.5％

Pairwise F-score＝2*Precision*Recall/(Precision+Recall)＝0.476

(2)BCubed F-score

Taking face clustering as an example, BCubed call indicates whether all photos of a person are gathered in the same category, which is equivalent to integrity; BCubed precision indicates whether all photos of the same person in the same class are equivalent to homogeneity.

e represents a face image, c (e) represents a predicted cluster class represented by e, and l (e) represents a true class.

As shown in fig. 22, BCubed precision and BCubed call for node e are 4/5 (precision (e) in the corresponding graph) and 4/6 (recall (e) in the corresponding graph), respectively.

Average BCubed precision ═ (4/5 × 4+1/5+2/6 × 2+4/6 × 4+2/2 × 2)/13 ═ 0.6718

Average BCubed recall (4/6 × 4+1/5+2/6 × 2+4/5 × 4+2/2 × 2)/13 ═ 0.6718

Average BCubed F-score 2 Precision Recall/(Precision + Recall) ═ 0.6718

The clustering method provided by the embodiment of the application can achieve the following experimental effects:

the effectiveness of the novel clustering algorithm provided by the application is verified through face clustering on public data such as MS-Celeb-1M, YouTube Faces and IJB-B, and tests show that the scheme of the application exceeds all face clustering schemes at present. The MS-Celeb-1M data set comprises 100K face identities, 10M face images in total, and one tenth of data is randomly selected for testing. JB-B includes three subsets, 512, 1024 and 1845 personal face identities, corresponding to 18171, 36575 and 68195 images. The YouTube Faces dataset contains 1595 individual face identities, 155282 total face images, 1436 individual face identities were randomly acquired, and 140629 total images were tested. For fair comparison, the same evaluation method is adopted for testing on the same data with other methods, and comparison of all results shows that the scheme of the application is the face clustering method with the best performance at present. The comparison scheme mainly comprises the following steps:

(1) K-Means, one of the most common clustering algorithms, but with higher temporal complexity

(2) HAC, the method adopts a bottom-up mode, and carries out hierarchical combination on closed clusters according to a certain criterion

(3) DBSCAN, extracting clusters according to designed density standard, and reserving sparse background as noise

(4) ARO, clustering using approximate nearest neighbor search and modified distance metrics

(5) CDP, graph-based clustering algorithm, which utilizes more robust pairwise relationships to achieve clustering

(6) Link-GCN, using GCN to predict node-to-link connection using graph context to realize clustering

(7) local + local-range, a density-aware feature embedding method, which learns robust feature embedding using local and non-local information and then performs clustering using a conventional clustering algorithm.

(8) LearnCluster, which converts the clustering problem into the detection and segmentation problem to finally realize clustering

(9) LCFCCE which uses GCN to estimate connectivity based on vertex confidence estimates and obtains clusters by connecting each vertex to the most connected neighbors in the candidate set

(10) According to the scheme of the embodiment of the application, the attention mechanism and global features and local features of GCN fusion nodes are adopted to enhance feature expression of the nodes, and clustering is achieved through node-to-node connection prediction. Wherein, the method (w/o RF) indicates that only one CAGCN module is used for feature extraction and fusion, and (w/RF) indicates that the neural network constructed by the CAGCN and the refinement module is used for feature extraction and fusion.

TABLE 1 IJB-B test results for BCubed F-score on the data set

Table 1 shows the results of the tests on the IJB-B data set using the BCubed F-score evaluation method. The result shows that the performance of the algorithms for deep learning such as the link-gcn, the local + long-range and the scheme of the application is greatly improved compared with the traditional algorithm. In particular, the performance of the proposed solution at IJB-B data set 3 subsets exceeds the best current solution.

TABLE 2 test results of Pairwise F-score on YTB data set

Table 2 is the results of the test using Pairwise F-score evaluation method on YTB data sets. The experimental results showed that the protocol F-score proposed in this application was up to 95.36 on the YTB dataset.

To verify the effectiveness of the methods of the present application, a series of ablation experiments were performed at IJB-B and YTB data sets. The impact of attention mechanism and multi-feature fusion on neural network feature extraction capability was explored on the IJB-B dataset. The clustering results were also visualized to analyze the performance of the algorithm on the YTB dataset.

Ablation experiment 1

Table 3 shows the results of multiple ablation experiments performed on the IJB-B data set using the BCubed F-score and NMI indices. GCN-M is the result of prediction using 4-layer GCN.

2-GCN adopts the same structure as CAGCN, and replaces Attention with GCN. Comparison of GCN-M and 2-GCN shows that multi-feature fusion can effectively improve feature expression capability and clustering performance of a neural network. GCN-A, GAT and CAGCN use learnable attention-aggregations for feature extraction. Wherein GCN-A introduces an attention mechanism on the basis of GCN-M, and GAT represents A layer of neural network of another attention mechanism. Compared with GCN-M, Attention is more concerned with features that are more relevant to itself, which can improve the capability of feature extraction. GAT focuses on features of only first-order neighbor nodes and extracts local features of subgraphs, while CAGCN can extract global and local features simultaneously. Comparison of GCN-M, GAT and CAGCN shows that global and local features of subgraphs are very important. The fusion of global and local features can enhance the expression of the features and improve the accuracy of clustering.

TABLE 3 IJB test results of BCubed F-score and NMI on the data set of B

Ablation experiment 2

916 face images with 10 identities are randomly selected from the YTB data set for visual analysis. As shown in fig. 23a to 23d, this is a result of performing dimension reduction visualization on 512-dimensional face features by using a T-SNE algorithm. FIG. 23a shows basic truth labels of all nodes, and FIGS. 23b, 23c, and 23d show prediction clusters of Linkage-GCN, CAGCNx1, and CAGCNx3, respectively. Nodes in a larger circle belong to the same person, but are divided into four classes by Linkage-GCN, CAGCNx 1 predicts three classes, and CAGCNx 3 predicts correctly. The above results show that the method of the present application has better performance. The nodes in the smaller circles represent the same category, but as can be seen from the visualization result fig. 23a, the nodes in the two smaller circles are distributed very far apart. Thus, it is reasonable for fig. 23d to classify them into two categories. By looking at the original picture, it can be found that the face quality in the smaller circle is very poor and the brightness of the image is very dark. The result shows that the brightness of the picture and the quality of the face have obvious influence on the clustering result.

Ablation experiment 3

A characteristic fusion mode of point-by-point pixel addition and characteristic dimension splicing is used in the CAGCN module, and meanwhile, the influence of a single-layer GCN, a single-layer GCN based on an attention mechanism and a CAGCN module as a refinement module on the final clustering result is introduced in the refinement module. As shown in table 3, "+" indicates a feature fusion manner using point-by-point pixel addition, and "concat" indicates a feature fusion manner of feature dimension stitching; "w/o RF" indicates no refinement module is added, "w/RF (GCN 1), w/RF (A1), w/RF (block 2)" indicates that a single layer of GCN is added, and the single layer of GCN and CAGCN modules based on attention mechanism are the results of the refinement module. The clustering effect can be improved to a certain extent by using a fine adjustment model, but the operation time of the algorithm can be correspondingly increased, and the improvement effect is not obvious as can be seen from the table, so that the scheme of adding the fine adjustment module is not adopted in the actual use. Compared with c, d, e and f, the CAGCN module provided by the application has better performance, and meanwhile, the effectiveness of the CAGCN provided by the application is proved. Experiments a, b, c, d, e and f show that the effect of the feature fusion mode of point-by-point pixel addition is better when a fine adjustment module is not used; and when the refined adjusting module is used, the clustering effect is better by adopting a concat feature fusion mode. The concat increases the dimensionality of the features, and if no fine adjustment module exists, no information is exchanged among the features, so that the experimental result of b is not good as that of a; and the information among the characteristics is exchanged after the fine adjustment module is added, and the expression capability of the characteristics is improved, so the experimental results of d and f are better than those of c and e.

Method	Fusion	Fine adjustment of	IJB-B 512	IJB-B 1024	MS-Celeb-1M
						a	+	w/o RF	0.8361	0.8346	0.9624
b	concat	w/o RF	0.8341	0.8340	0.9608
						c	+	w/RF(GCN*1)	0.8336	0.8321	0.9593
d	concat	w/RF(GCN*1)	0.8340	0.8356	0.9646
						e	+	w/RF(A*1)	0.8310	0.8325	0.9608
f	concat	w/RF(A*1)	0.8337	0.8359	0.9652
						g	+	w/RF(block*2)	0.8389	0.8374	0.9670

An embodiment of the present application further provides a clustering apparatus, as shown in fig. 24, the clustering apparatus 50 may include: an acquisition module 501, a determination and extraction module 502, and a clustering module 503, wherein,

the obtaining module 501 is configured to obtain a feature subgraph corresponding to a feature of a target object to be clustered;

the determining and extracting module 502 is configured to determine attention weights corresponding to each neighboring node of the feature sub-graph, and perform feature extraction on the feature sub-graph based on each attention weight to obtain a feature extraction result;

the clustering module 503 is configured to cluster the target object features according to the feature extraction result.

In an optional implementation manner, the determining and extracting module 502 is specifically configured to, when configured to determine attention weights corresponding to respective neighboring nodes of the feature sub-graph, and perform feature extraction on the feature sub-graph based on the attention weights to obtain a feature extraction result:

determining attention weights corresponding to all neighbor nodes of the feature subgraph respectively through an attention module, and carrying out weighting processing on the feature subgraph through all the attention weights to obtain a first feature;

performing feature extraction on the feature subgraph through at least one graph convolution network layer to obtain a second feature;

and fusing the first characteristic and the second characteristic to obtain a characteristic extraction result.

In an optional implementation manner, when the graph convolutional network layer is at least two layers, starting from the second graph convolutional network layer, the neighbor nodes of the current layer are obtained by aggregating the same neighbor node of the previous layer and the neighbor nodes of the same neighbor node.

In an optional implementation manner, when the determining and extracting module 502 is configured to fuse the first feature and the second feature to obtain the feature extraction result, specifically configured to:

In an optional implementation manner, when the determining and extracting module 502 is used to adjust the fused features based on each neighboring node, it is specifically configured to:

adjusting the fused features based on each neighbor node through any one of the following neural networks:

a single-layer graph convolution network;

a single layer attention-based graph convolution network;

a single layer graph convolution network based on a contextual attention mechanism.

In an optional implementation manner, the clustering module 50 further includes a post-processing module, configured to, after the clustering module 503 clusters the target object features according to the feature extraction result:

respectively constructing class subgraphs based on each class obtained by clustering;

and determining a degree matrix of the corresponding class subgraph for each class, and determining a noise node in the class according to the degree matrix and each node in the class.

In an optional implementation manner, when the post-processing module is configured to determine the noise node in the class according to the degree matrix and each node in the class, the post-processing module is specifically configured to:

determining each isolated node in the corresponding class subgraph according to the degree matrix;

respectively determining the average characteristics of other nodes except the isolated nodes aiming at each isolated node;

and determining whether each isolated node is a noise node or not according to each isolated node and the corresponding average characteristic thereof.

In an optional implementation manner, when the post-processing module is configured to determine each isolated node in the corresponding class subgraph according to the degree matrix, the post-processing module is specifically configured to:

and determining each node in the degree matrix with the degree being less than the first threshold value as each isolated node in the corresponding class subgraph.

In an optional implementation manner, when determining whether each isolated node is a noise node according to each isolated node and its corresponding average feature, the post-processing module is specifically configured to:

calculating the cosine distance between each isolated node and the corresponding average characteristic;

and when the cosine distance is smaller than a second threshold value, determining the corresponding isolated node as a noise node.

In an optional implementation manner, when the clustering module 503 is configured to cluster the target object features according to the feature extraction result, specifically configured to:

determining the connection probability between the central node of the characteristic subgraph and each corresponding first-order neighbor node according to the characteristic extraction result;

and the central nodes with the connection probability larger than the third threshold value and the corresponding neighbor nodes are grouped into one type.

In an optional implementation manner, before the obtaining module 501 is configured to obtain a feature subgraph corresponding to a feature of a target object to be clustered, the obtaining module is further configured to:

taking the target object characteristics as a central node, and determining at least one first-order neighbor node of the central node in other target object characteristics;

In an optional implementation manner, when the obtaining module 501 is configured to determine at least one first-order neighbor node of the central node, specifically, to:

In an alternative implementation, the cluster 50 further includes an additional clustering module, and the additional clustering module is configured to:

acquiring new target object characteristics to be clustered;

respectively acquiring a preset number of historical target object characteristics aiming at the clustered classes;

clustering the historical target object characteristics and the new target object characteristics through an attention module and a graph convolution network to obtain a new clustering result;

and determining a class corresponding to the new target object characteristics based on the new clustering result and the clustered class.

It can be clearly understood by those skilled in the art that the implementation principle and the generated technical effect of the clustering device provided in the embodiment of the present application are the same as those of the foregoing method embodiment, and for convenience and brevity of description, corresponding contents in the foregoing method embodiment may be referred to where no part of the embodiment of the device is mentioned, and are not repeated herein.

The clustering device provided in the embodiment of the present application can realize at least one of the modules through the AI model. Functions associated with Artificial Intelligence (AI) may be performed by the non-volatile memory, the volatile memory, and the processor.

The processor may include one or more processors. At this time, the one or more processors may be general-purpose processors, such as a Central Processing Unit (CPU), an Application Processor (AP), or the like, or pure graphics processing units, such as a Graphics Processing Unit (GPU), a Vision Processing Unit (VPU), and/or AI-specific processors, such as a Neural Processing Unit (NPU).

The one or more processors control the processing of the input data according to predefined operating rules or AI models stored in the non-volatile memory and the volatile memory. Predefined operating rules or artificial intelligence models are provided through training or learning.

Here, the provision by learning means that a predefined operation rule or an AI model having a desired characteristic is obtained by applying a learning algorithm to a plurality of learning data. This learning may be performed in the device itself in which the AI according to the embodiment is performed, and/or may be implemented by a separate server/system.

The AI model may include a plurality of neural network layers. Each layer has a plurality of weight values, and the calculation of one layer is performed by the calculation result of the previous layer and the plurality of weights of the current layer. Examples of neural networks include, but are not limited to, Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBNs), Bidirectional Recurrent Deep Neural Networks (BRDNNs), generator-confrontation networks (GANs), and deep Q networks.

A learning algorithm is a method of training a predetermined target device (e.g., a robot) using a plurality of learning data to make, allow, or control the target device to make a determination or prediction. Examples of the learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing at least one instruction, at least one program, set of codes or set of instructions, which is loaded and executed by the processor to implement the respective content of the aforementioned method embodiments.

Optionally, the electronic device may further comprise a transceiver. The processor is coupled to the transceiver, such as via a bus. It should be noted that the transceiver in practical application is not limited to one, and the structure of the electronic device does not constitute a limitation to the embodiments of the present application.

The processor may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like.

A bus may include a path that transfers information between the above components. The bus may be a PCI bus or an EISA bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The embodiment of the present application further provides a computer-readable storage medium, which is used for storing computer instructions, and when the computer instructions are executed on a computer, the computer can be used for executing the corresponding contents in the foregoing method embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A clustering method, comprising:

and clustering the target object features according to the feature extraction result.

2. The clustering method according to claim 1, wherein the determining attention weights corresponding to respective neighboring nodes of the feature subgraph, and performing feature extraction on the feature subgraph based on the respective attention weights to obtain a feature extraction result comprises:

3. The clustering method according to claim 2, wherein when the graph convolutional network layer is at least two layers, starting from a second graph convolutional network layer, the neighbor nodes of a current layer are aggregated from the same neighbor node of a previous layer and the neighbor nodes of the same neighbor node.

4. The clustering method according to claim 2, wherein the fusing the first feature and the second feature to obtain a feature extraction result comprises:

fusing the first feature and the second feature to obtain a fused feature;

5. The clustering method according to claim 4, wherein the adjusting the merged features based on each neighboring node comprises:

a single-layer graph convolution network;

a single layer attention-based graph convolution network;

6. The clustering method according to any one of claims 1 to 5, wherein after clustering the target object features according to the feature extraction result, the method further comprises:

7. The clustering method according to claim 6, wherein the determining the noise node in the class according to the degree matrix and each node in the class comprises:

8. The clustering method according to claim 7, wherein the determining each isolated node in the corresponding class subgraph according to the degree matrix comprises:

and determining each node in the degree matrix with the degree being less than a first threshold value as each isolated node in the corresponding class subgraph.

9. The clustering method according to claim 7, wherein the determining whether each isolated node is a noise node according to each isolated node and its corresponding average feature comprises:

10. The clustering method according to any one of claims 1 to 9, wherein clustering the target object features according to the feature extraction result comprises:

11. The clustering method according to any one of claims 1 to 10, wherein before the obtaining of the feature subgraph corresponding to the features of the target objects to be clustered, the method further comprises:

and constructing the characteristic subgraph according to the central node and the determined at least two-stage neighbor nodes.

12. The clustering method of claim 11, wherein determining at least one first order neighbor node of the central node comprises:

13. The clustering method according to any one of claims 1 to 12, further comprising:

acquiring new target object characteristics to be clustered;

14. A clustering apparatus, comprising:

the determining and extracting module is used for determining attention weights corresponding to all neighbor nodes of the feature subgraph respectively, and extracting features of the feature subgraph based on all the attention weights to obtain a feature extraction result;

15. An electronic device, comprising: a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of any of claims 1-13.

16. A computer-readable storage medium for storing a computer instruction, a program, a set of codes, or a set of instructions, which, when run on a computer, causes the computer to perform the method of any one of claims 1-13.