CN111814923A

CN111814923A - Image clustering method, system, device and medium

Info

Publication number: CN111814923A
Application number: CN202010944285.XA
Authority: CN
Inventors: 周曦; 姚志强; 凌英剑; 田国栋
Original assignee: Shanghai Yunconghuilin Artificial Intelligence Technology Co ltd
Current assignee: Shanghai Yunconghuilin Artificial Intelligence Technology Co ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2020-10-23
Anticipated expiration: 2040-09-10
Also published as: CN111814923B

Abstract

The invention provides an image clustering method, system, device and medium, comprising: according to the image characteristics of the input image and the corresponding adjacent image characteristics; scoring each edge in the spanning tree, and cutting the edge in the spanning tree according to a scoring result to obtain a clustering cluster; the invention clusters through the tree structure, scores through the tree edges and combines the context subgraph, thereby effectively improving the algorithm robustness and improving the image clustering quality.

Description

Image clustering method, system, device and medium

Technical Field

The present invention relates to the field of image processing, and in particular, to an image clustering method, system, device, and medium.

Background

In the scenes of security monitoring, auxiliary labeling and the like, a given batch of label-free face data needs to be clustered. Clustering refers to classifying the non-labeled face data, so that the same people are in the same class as much as possible, and different people are in different classes. The traditional clustering algorithm usually assumes possible clustering categories of images, such as N types of images, according to prior knowledge, and finally obtains N clustering results by clustering.

However, the face data is very complex and is easily affected by environmental factors, and image noise and the like can cause the face data to be clustered into wrong prior categories, thereby affecting the clustering effect. Therefore, how to perform clustering without relying on prior hypothesis becomes a difficult problem to be solved urgently.

Disclosure of Invention

In view of the problems in the prior art, the invention provides an image clustering method, system, device and medium, which mainly solve the problems that the traditional clustering algorithm is poor in effect and is easily influenced by environmental factors.

In order to achieve the above and other objects, the present invention adopts the following technical solutions.

An image clustering method, comprising:

acquiring a spanning tree according to the image characteristics of the input image and the corresponding adjacent image characteristics;

and scoring each edge in the spanning tree, and cutting the edge in the spanning tree according to a scoring result to obtain a clustering cluster.

Optionally, the scoring each edge in the spanning tree includes: traversing each edge of the spanning tree, acquiring a subgraph based on the edge context according to each edge and the corresponding node in the spanning tree, and grading the edges of the spanning tree according to the subgraph.

Optionally, the method includes, according to an image feature of an input image and a corresponding neighboring image feature:

acquiring one or more neighbor image characteristics corresponding to a certain image characteristic through a neighbor retrieval algorithm;

and acquiring a maximum spanning tree according to the image characteristics and the corresponding adjacent image characteristics.

Optionally, the obtaining the edge context based sub-graph includes:

taking each edge and the corresponding node in the spanning tree as a class node, and acquiring one or more neighbor image characteristics of each class node as a class II node;

acquiring one or more neighbor image characteristics of the second-class node as a third-class node;

and constructing subgraphs of the corresponding images through the first class nodes, the second class nodes and the third class nodes.

Optionally, a union of the first class node, the second class node and the third class node is used as a node of the subgraph; and taking the intersection of the neighbor node corresponding to a certain node in the subgraph and other nodes in the subgraph as the edge of the corresponding node.

Optionally, the neighbor search algorithm comprises hash search, indexivflat, indexivffq, indexihnswflat; the method for obtaining the spanning tree of the corresponding image comprises one of a Kluyverkarl algorithm and a Primum algorithm.

Optionally, the scoring the edges of the spanning tree according to the subgraph comprises: and inputting the subgraph into a graph convolution neural network, and scoring each edge in the spanning tree through the graph convolution neural network.

Optionally, the graph convolutional neural network includes at least one batch normalization layer, two convolutional layers, two feature splicing layers, and two full-connection layers.

Optionally, a loss function is constructed according to the probability that two nodes connected by the edges in the spanning tree are corresponding to the same image, the graph convolution neural network is trained according to the loss function, and the score of the corresponding edge in the spanning tree is obtained through the trained graph convolution neural network.

Optionally, a score threshold is set, edges with scores lower than the score threshold are cut, and the remaining connected nodes in the spanning tree are used as the cluster.

Optionally, when the number of nodes in the cluster is greater than a preset value, clustering the cluster again to obtain a plurality of new clusters.

An image clustering system, comprising:

the tree generating module is used for acquiring a generating tree according to the image characteristics of the input image and the corresponding adjacent image characteristics;

and the clustering module is used for scoring each edge in the spanning tree, cutting the edge in the spanning tree according to a scoring result and obtaining a clustering cluster.

Optionally, the system comprises a subgraph generation module, wherein one end of the subgraph generation module is connected with the tree generation module to obtain the spanning tree, each edge of the spanning tree is traversed, and a subgraph based on the edge context is obtained according to each edge and a corresponding node in the spanning tree; the other end of the subgraph generation module is connected with the clustering module, and the clustering module scores the edges of the spanning tree according to the subgraph output by the subgraph generation module.

Optionally, the tree generation module includes a neighboring feature obtaining unit, configured to, according to the image feature of the input image and the corresponding neighboring image feature,

and acquiring the spanning tree according to the image characteristics and the corresponding adjacent image characteristics.

Optionally, the subgraph generation module comprises a subgraph node construction unit for obtaining the edge context based subgraph,

taking the nodes in the spanning tree as class one nodes, and acquiring one or more neighbor image characteristics of each class one node as class two nodes;

Optionally, the clustering module includes a scoring unit, configured to input the subgraph into a graph convolution neural network, and score each edge in the spanning tree through the graph convolution neural network.

Optionally, the clustering module further includes a clipping unit, the clipping unit is connected to the scoring unit, and the clipping unit is configured to set a score threshold, clip an edge with a score lower than the score threshold, and use remaining connected nodes in the spanning tree as the clustering cluster.

An apparatus, comprising:

one or more processors; and

one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the image clustering method.

One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the image clustering method.

As described above, the image clustering method, system, device, and medium of the present invention have the following advantageous effects.

The spanning tree is constructed through the image characteristics, so that the connection relation between the image characteristics (namely the edges of the spanning tree) is obtained, the clustering cluster is further obtained through edge grading, the prior knowledge of image category distribution is not relied on, and the clustering effect and the robustness are favorably enhanced.

Drawings

Fig. 1 is a flowchart of an image clustering method according to an embodiment of the present invention.

FIG. 2 is a block diagram of an image clustering system according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal device in an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a terminal device in another embodiment of the present invention.

Fig. 5 is a schematic diagram of a sub-diagram in an embodiment of the invention.

Fig. 6 is a schematic diagram of a network architecture of a convolutional neural network according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to FIG. 1, the present invention provides an image clustering method, which includes steps S01-S02.

In step S01, image features are acquired, and a spanning tree is acquired from the image features:

in one embodiment, the recognition model may be pre-trained using conventional neural network architectures such as convolutional neural network, long-short term memory neural network, deep neural network, and the like. And performing feature extraction on the input image through the identification model to obtain image features. Taking face recognition as an example, a face image can be input into an unsupervised convolutional neural network, a face recognition model is trained through the unmarked face image, and the face features of the face image to be recognized are extracted through the face recognition model. And human body images, animal images, vehicle images or other object images and the like can be collected for training the image recognition model, and different types of image features can be extracted.

In one embodiment, a neighbor search algorithm may be employed to search for the acquired image features. The image characteristics corresponding to a certain image can be preset as a retrieval target object, and one or more adjacent image characteristics corresponding to the retrieval target object are obtained. The neighbor searching algorithm may include: hash search (hash search), indexivflat (inverted file and exact post-inspection), indexivffpq (coarse quantization + split subspace + subspace compare one by one to convert to inverted index), indexihnswflat (cascaded graph search), and the like.

The neighbor retrieval algorithm is mainly used for calculating the similarity between the retrieval target object and other image features, can set a similarity threshold, and takes the image features reaching the similarity threshold as the neighbor image features of the retrieval target object. Specifically, for example, in a hash search, image features may be converted into binary codes through a hash function, an exclusive or operation is performed on the binary codes of a search target object and the binary codes of other image features, a distance between two image features is determined according to the number of 1 s obtained through the operation, similarity is expressed through the distance, and a similarity threshold is combined to obtain neighboring image features of the search target object.

In one embodiment, the spanning tree of the search target object may be obtained according to the search target object and the corresponding neighboring image features. And communicating the corresponding image characteristics of the same retrieval target object in different images through a tree structure. Taking a human face as an example, if the image a includes the human face feature a1 of the retrieval target object, and the image B, C, D includes the human face features a2, a3, and a4 of the retrieval target object in different scenes, respectively, the human face features a1, a2, a3, and a4 may be connected by a spanning tree algorithm to form a tree structure, so as to obtain a spanning tree. The spanning tree algorithm may be, but is not limited to, one of prim algorithm (prim) and kruskal algorithm (kruskal).

In an embodiment, in order to make the spanning tree include similar image features as many as possible, the maximum spanning tree of the retrieval target object may be obtained, that is, the sum of the edge weights of the spanning trees communicating each node is maximum by using the image features as the nodes. Taking prim as an example, the prim algorithm is a minimum spanning tree in which the sum of weights of all edges is minimum in a tree formed by searching edge subsets in a connected graph. In practical application, to obtain the maximum spanning tree, the negative value of the edge weight may be input into the prim algorithm for searching.

In an embodiment, the same batch of image data generally includes a plurality of categories of images, and when no annotation is made to the batch of images, a specified number of image specialties are randomly selected as retrieval target objects to construct a maximum spanning tree for the plurality of retrieval target objects respectively. In the neighbor retrieval process, the retrieval target objects can be merged or added according to the similarity calculation result.

In an embodiment, after the spanning tree is obtained, a subgraph of the edge context can be obtained according to the edges and the nodes in the spanning tree. The edge context refers to a certain edge in the spanning tree, an edge connected with the edge and a neighbor image feature corresponding to a neighbor node. Specifically, the spanning tree may be traversed, and a certain edge in the spanning tree and two nodes corresponding to the edge are taken as a class of nodes and are marked as pivot nodes; acquiring one or more adjacent image characteristics corresponding to the pivot node through the adjacent retrieval algorithm, and taking the adjacent image characteristics as the two types of nodes and marking as one hop node; one or more neighbor image characteristics corresponding to one hop node are further obtained through a neighbor retrieval algorithm to serve as three types of nodes, and the three types of nodes are marked as two hops. Referring to fig. 5, the maximum node in the graph is the pivot node, the secondary node is the one hop node, and the minimum node is the two hop node. And taking the union of the pivot node, the one hop node and the two hop nodes as the nodes of the connected subgraph. And traversing nodes of the subgraph, and taking the intersection of one or more adjacent image features of a certain subgraph node and other nodes in the subgraph as the edge of the node.

And forming an adjacent matrix and a characteristic matrix of the corresponding edge of the spanning tree through the nodes and the edges of the subgraph. Representing the connection relation of nodes in the subgraph by an adjacent matrix; the feature matrix represents image features corresponding to the nodes.

In step S02, each edge in the spanning tree is scored, and the edges in the spanning tree are clipped according to the scoring result, so as to obtain a cluster.

In one embodiment, the adjacency matrix and the feature matrix obtained in step S01 may be input into a graph convolution neural network, and each edge in the spanning tree may be scored by the graph convolution neural network.

Referring to fig. 6, in an embodiment, the graph convolutional neural network model may be trained in advance, wherein the graph convolutional neural network model at least includes a batch normalization layer (batchnormaize), two convolutional layers (GraphCatConv/graphreconv), two feature concatenation layers (Concat), and two fully connected layers (Classifier). Specifically, the number of network layers of the graph convolution neural network can be adjusted according to actual conditions, for example, the convolution layer can adopt a three-layer structure, and the full-connection layer can adopt a four-layer structure.

Specifically, the propagation formula of the first layer convolutional layer can be expressed as:

where σ denotes a nonlinear activation function, A is an adjacency matrix,

represents A + I, I is an identity matrix; h is a characteristic of each layer; d is

Normalizing the adjacent matrix by the degree matrix; w is the weight.

The second convolution layer can be expressed as

In an embodiment, a loss function of the graph convolution neural network can be constructed according to the probability that two nodes with communicated edges in the spanning tree are corresponding to the features of the same image. The loss function can be expressed as:

where NLLLoss is a negative log likelihood function,

representing the pivot node corresponding to the edge;

represents the ith one hop node; a and

the constant coefficients can be set according to actual conditions.

The method comprises the steps of obtaining training data by respectively carrying out feature extraction, neighbor retrieval, spanning tree, subgraph and other operations on a batch of labeled images, inputting the training data into a graph convolution neural network, and obtaining a graph convolution neural network model by taking a positive case if two nodes on one side of the spanning tree are the same image or taking a negative case if the nodes on the other side of the spanning tree are the same image.

In one embodiment, the score for each edge in the spanning tree may be obtained by a graph convolution neural network model. Further, a score threshold value can be set, edges with scores lower than the score threshold value are selected for cutting, and the remaining connected nodes in the spanning tree are used as clustering clusters.

In an embodiment, a batch of image data may obtain a plurality of clusters, traverse all clusters, and when the number of nodes in a certain cluster is greater than a preset value, perform re-clustering on the cluster by the method of the foregoing steps S01-S02, and take each extracted small cluster as a new cluster.

Referring to fig. 2, the present embodiment provides an image clustering system for implementing the image clustering method in the foregoing method embodiments. Since the technical principle of the system embodiment is similar to that of the method embodiment, repeated description of the same technical details is omitted.

In an embodiment, the image clustering system includes a tree generating module 10 and a clustering module 11, the tree generating module 10 is configured to assist in performing step S01 described in the foregoing method embodiment, and the clustering module 11 is configured to assist in performing step S02 described in the foregoing method embodiment.

In one embodiment, the method comprises a subgraph generation module, wherein one end of the subgraph generation module is connected with a tree generation module to obtain a spanning tree, each edge of the spanning tree is traversed, and a subgraph based on the edge context is obtained according to each edge and a corresponding node in the spanning tree; the other end of the subgraph generation module is connected with the clustering module, and the clustering module scores the edges of the spanning tree according to the subgraph output by the subgraph generation module.

In one embodiment, the tree generation module includes a neighboring feature obtaining unit, configured to, according to the image feature of the input image and the corresponding neighboring image feature,

In one embodiment, the subgraph generation module includes a subgraph node construction unit for obtaining a subgraph based on edge context,

In one embodiment, the clustering module includes a scoring unit configured to input the subgraph into a graph convolutional neural network, and score each edge in the spanning tree through the graph convolutional neural network.

In an embodiment, the clustering module further includes a clipping unit, the clipping unit is connected to the scoring unit, the clipping unit is configured to set a score threshold, clip edges with scores lower than the score threshold, and use remaining connected nodes in the spanning tree as clustering clusters.

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the image clustering method in fig. 1 according to the present embodiment.

Fig. 3 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

Fig. 4 is a schematic hardware structure diagram of a terminal device according to another embodiment of the present application. Fig. 4 is a specific embodiment of fig. 3 in an implementation process. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, the first processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the method illustrated in fig. 1 described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 4 can be implemented as the input device in the embodiment of fig. 3.

In summary, the image clustering method, system, device and medium of the present invention score the tree edges according to the distribution context of the pictures through the graph convolution neural network based on the clustering process flow of the tree structure; the maximum spanning tree comprises a large number of reliable edges, a large number of uncertain links are eliminated, and the error rate of the algorithm is reduced; the subgraph expands a large number of adjacent nodes on the basis of edge nodes, connects the adjacent nodes and contains the context information of the nodes; the graph convolution network scores through the edge-based context subgraphs, so that the robustness of the algorithm to errors and the scoring quality are improved, and the clustering effect is further optimized. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. An image clustering method, comprising:

acquiring a spanning tree according to the image characteristics of an input image and the adjacent image characteristics corresponding to the image characteristics of the input image; wherein, the image characteristics of the input image and the adjacent image characteristics are taken as nodes of the spanning tree; the neighbor image features are image features of which the similarity with the image features of the input image reaches a similarity threshold;

2. The image clustering method according to claim 1, wherein the scoring each edge in the spanning tree comprises: traversing each edge of the spanning tree, acquiring a subgraph based on the edge context according to each edge and the corresponding node in the spanning tree, and grading the edges of the spanning tree according to the subgraph.

3. The image clustering method according to claim 1, wherein the obtaining a spanning tree from the image feature of the input image and a neighboring image feature corresponding to the image feature of the input image includes:

and acquiring a maximum spanning tree according to the image characteristics and the adjacent image characteristics corresponding to the image characteristics of the input image.

4. The image clustering method according to claim 2, wherein obtaining the edge context based sub-graph comprises:

5. The image clustering method according to claim 4, characterized in that the union of the first class node, the second class node and the third class node is used as a node of the subgraph; and taking the intersection of the neighbor node corresponding to a certain node in the subgraph and other nodes in the subgraph as the edge of the corresponding node.

6. The image clustering method according to claim 3, wherein the neighbor search algorithm comprises Hash search, IndexIVFFlat, IndexIVFPQ, IndexHNSWFlat; the method for obtaining the spanning tree of the corresponding image comprises one of a Kluyverkarl algorithm and a Primum algorithm.

7. The image clustering method of claim 2, wherein the scoring the edges of the spanning tree from the subgraph comprises: and inputting the subgraph into a graph convolution neural network, and scoring each edge in the spanning tree through the graph convolution neural network.

8. The image clustering method according to claim 7, wherein the graph convolutional neural network comprises at least one batch normalization layer, two convolutional layers, two feature splicing layers and two full-connected layers.

9. The image clustering method according to claim 7, characterized in that neighboring image feature extraction is performed on labeled training image data, a spanning tree and a corresponding sub-graph are obtained according to the image and neighboring image features, the graph convolution neural network is trained by combining object attributes of two nodes of an edge of a sub-graph, and a score of a corresponding edge in the spanning tree is obtained through the trained graph convolution neural network.

10. The image clustering method according to claim 9, wherein a score threshold is set, edges with scores lower than the score threshold are clipped, and the remaining connected nodes in the spanning tree are taken as the cluster.

11. The image clustering method according to any one of claims 1 to 10, characterized in that when the number of nodes in the cluster is greater than a preset value, the cluster is clustered again to obtain a plurality of new clusters.

12. An image clustering system, comprising:

the system comprises a tree generating module, a tree generating module and a tree selecting module, wherein the tree generating module is used for acquiring a spanning tree according to the image characteristics of an input image and the adjacent image characteristics corresponding to the image characteristics of the input image; wherein, the image characteristics of the input image and the adjacent image characteristics are taken as nodes of the spanning tree; the neighbor image features are image features of which the similarity with the image features of the input image reaches a similarity threshold;

13. The image clustering system according to claim 12, comprising a subgraph generation module, wherein one end of the subgraph generation module is connected to the tree generation module to obtain the spanning tree, traverse each edge of the spanning tree, and obtain an edge context-based subgraph according to each edge and corresponding node in the spanning tree; the other end of the subgraph generation module is connected with the clustering module, and the clustering module scores the edges of the spanning tree according to the subgraph output by the subgraph generation module.

14. The image clustering system according to claim 13, wherein the tree generating module comprises a neighboring feature obtaining unit, configured to, when obtaining a spanning tree according to the image features of the input image and the corresponding neighboring image features,

15. The image clustering system according to claim 14, wherein the subgraph generation module comprises a subgraph node construction unit for obtaining the edge context based subgraph,

16. The image clustering system of claim 13, wherein the clustering module comprises a scoring unit configured to input the subgraph into a graph convolution neural network through which each edge in the spanning tree is scored.

17. The image clustering system of claim 16, wherein the clustering module further comprises a clipping unit, the clipping unit is connected to the scoring unit, and the clipping unit is configured to set a score threshold, clip edges with scores lower than the score threshold, and use the remaining connected nodes in the spanning tree as the clustering clusters.

18. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-12.

19. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-12.