WO2022088390A1

WO2022088390A1 - Image incremental clustering method and apparatus, electronic device, storage medium and program product

Info

Publication number: WO2022088390A1
Application number: PCT/CN2020/134074
Authority: WO
Inventors: 刘凯鉴; 余世杰; 陈浩彬; 陈大鹏; 赵瑞
Original assignee: 浙江商汤科技开发有限公司
Priority date: 2020-10-30
Filing date: 2020-12-04
Publication date: 2022-05-05
Also published as: JP2023502863A; CN112257801B; CN112257801A; KR20220070482A; TW202217597A

Abstract

An image incremental clustering method and apparatus, an electronic device, a storage medium and a program product. Said method comprises: acquiring a first clustering cluster of a first image data set (S21); dividing the first clustering cluster into M first sub-clusters, and acquiring a first clustering center corresponding to each first sub-cluster among the M first sub-clusters, M being an integer greater than or equal to 1 (S22); and acquiring a second image data set, and combining the second image data set with the first clustering cluster by using the first clustering center (S23).

Description

Incremental clustering method, apparatus, electronic device, storage medium and program product for images

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is based on a Chinese patent application with an application number of 202011185911.8 and an application date of October 30, 2020, and claims the priority of the Chinese patent application, the entire contents of which are hereby incorporated by reference into the present disclosure.

technical field

The embodiments of the present disclosure relate to the technical field of computer vision, and in particular, to a method and apparatus for incremental clustering of images, an electronic device, a storage medium, and a program product.

Background technique

The development of deep learning has greatly promoted the progress of image processing technology. Taking face recognition as an example, the face recognition model obtained through supervised learning has made a qualitative leap in recognition accuracy. When labeling image data, how to classify it accurately and quickly is still an issue worthy of discussion and research.

SUMMARY OF THE INVENTION

In view of the above problems, the present disclosure provides an incremental clustering method, device, electronic device, storage medium and program product for images, which are beneficial to solve the problem that the clustering effect is affected by the drift of the clustering center in the incremental clustering .

To achieve the above purpose, a first aspect of the embodiments of the present disclosure provides an incremental clustering method for images, the method comprising:

Obtain the first cluster of the first image data set; divide the first cluster into M first sub-clusters, and obtain the first sub-cluster corresponding to each of the M first sub-clusters. a cluster center; the M is an integer greater than or equal to 1; a second image data set is obtained, and the first cluster center is used to combine the second image data set and the first cluster.

With reference to the first aspect, in a possible implementation manner, the first cluster includes a first cluster A, a first cluster B, and a first cluster C; The cluster center merges the second image data set with the first cluster, including:

When the second image data set includes a plurality of image data, cluster the plurality of image data to obtain isolated image data and a second cluster; Merging the isolated image data with the first cluster A; and merging the second cluster with the first cluster B using the first cluster center; in the second image data When only a single image data exists in the set, the single image data is merged with the first cluster C by using the first cluster center.

In this way, the plurality of image data in the second image data set is clustered, and the isolated image data and the second cluster are obtained by using the obtained isolated image data and the first cluster A and the first cluster included in the first cluster respectively. By merging B and the first cluster C, the cluster can absorb a single sample and merge between the clusters.

With reference to the first aspect, in a possible implementation manner, the first cluster has a corresponding second cluster center; when using the first cluster center to combine the second image data set with the Before the first cluster is merged, the method further includes:

K first clusters are determined from the first clusters by using the second cluster centers.

With reference to the first aspect, in a possible implementation manner, the second cluster has a corresponding third cluster center; the second cluster center is determined from the first cluster by using the second cluster center Get K first clusters, including:

Obtain the first similarity between the isolated image data and the second cluster center; sort the first clusters according to the first similarity from high to low to obtain a first cluster sequence , select the top K first clusters in the first cluster sequence; and, obtain the second similarity between the third cluster center and the second cluster center; Second, sort the first clusters from high to low similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence; or, obtain the single The third similarity between the image data and the second cluster center; sort the first cluster according to the third similarity from high to low to obtain a third cluster sequence, and select the The first K first clusters in the third cluster sequence.

In this way, using the calculated similarity between the second cluster center and the isolated image data, the third cluster center and the single image data, the first cluster is screened, which is beneficial to determine the image in the second image data set. The first cluster with more similar data cluster categories.

With reference to the first aspect, in a possible implementation manner, using the first cluster center to combine the isolated image data with the first cluster A includes:

Obtain the fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the fourth similarity in each of the first clusters The first number of the first cluster centers D whose degree is greater than the first threshold; the first cluster with the largest first number among the K first clusters is determined as the first cluster Cluster A; merge the isolated image data with the first cluster cluster A.

In this way, there are first sub-clusters that are at most similar to the isolated image data in the first cluster A, and combining the isolated image data into the first cluster A can make the clustering result more accurate.

With reference to the first aspect, in a possible implementation manner, using the first cluster center to merge the second cluster with the first cluster B includes:

Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters The first cluster center corresponding to each first sub-cluster of ; for each first cluster in the K first clusters, determine the The second number of the first cluster centers E whose fifth similarity is greater than the second threshold; the first cluster with the largest second number among the K first clusters is determined as the first cluster a cluster B; the second cluster is merged with the first cluster B.

In this way, if the number of the first cluster K is the largest, it is determined as the first cluster B, that is to say, the first cluster B has at most first clusters that are closer to the second subcluster of the second cluster. sub-cluster, merging the second cluster into the first cluster B can make the clustering result more accurate.

With reference to the first aspect, in a possible implementation manner, the use of the first cluster center to combine the single image data with the first cluster C includes:

Obtain the sixth similarity between the single image data and the first cluster center F; the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the sixth similarity in each of the first clusters The third number of the first cluster centers F whose degree is greater than the third threshold; the first cluster with the largest third number among the K first clusters is determined as the first cluster Cluster C; merge the single image data with the first cluster cluster C.

In this way, there are first sub-clusters that are at most similar to the single image data in the first cluster C, and merging the single image data into the first cluster C can make the clustering result more accurate.

With reference to the first aspect, in a possible implementation manner, the M is less than or equal to a fourth threshold; when using the first cluster center to combine the second image data set with the first cluster Afterwards, the method further includes:

Divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or equal to 1 Integer of ; when the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the fifth cluster centers corresponding to the R third subclusters are used for the The first cluster center is updated; when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters; according to the Sorting the R third subclusters from large to small with the fourth number to obtain a fourth clustering cluster sequence, selecting the first P third subclusters in the fourth clustering cluster sequence, and using the P third subclusters. The fifth cluster center corresponding to the three sub-clusters updates the first cluster center; the P is less than or equal to the fourth threshold.

In this way, in the case of many sub-clusters, the number of sub-centers can be limited by retaining sub-clusters with more image data, and the influence of outlier image data can be eliminated. In the case of quantitative clustering, it still has a good clustering effect.

With reference to the first aspect, in a possible implementation manner, the first cluster is obtained by clustering the image data in the first image data set; the first cluster is divided into M first subclusters, including:

Obtain the seventh similarity between the image data in the first cluster to obtain a similarity matrix; and divide the first cluster into the M first sub-clusters based on the similarity matrix.

In this way, the first cluster can be divided into the M first sub-clusters by using the similarity matrix.

With reference to the first aspect, in a possible implementation manner, the dividing the first cluster into the M first sub-clusters based on the similarity matrix includes:

Obtaining a connected graph composed of image data in the first cluster as vertices; querying the similarity matrix to obtain the seventh similarity between the vertices in the connected graph; Seven vertices with a similarity greater than the fifth threshold are divided into a first sub-cluster to obtain the M first sub-clusters.

In this way, the plurality of vertices with the seventh similarity greater than the fifth threshold can be divided into a first sub-cluster by using the connectivity graph.

A second aspect of the embodiments of the present disclosure provides an apparatus for incremental clustering of images, and the apparatus includes:

a first obtaining module, configured to obtain a first cluster of a first image data set; a first segmentation module, configured to divide the first cluster into M first sub-clusters, and obtain the M first sub-clusters the first cluster center corresponding to each first sub-cluster in the first sub-cluster; the M is an integer greater than or equal to 1; the merging module is configured to obtain a second image data set, using the first cluster center The second image dataset is merged with the first cluster.

A third aspect of the embodiments of the present disclosure provides an electronic device, the electronic device includes an input device and an output device, and further includes a processor adapted to implement one or more instructions; and a computer storage medium, the computer storage medium storing There is one or more instructions adapted to be loaded by the processor and to perform the steps in any of the embodiments of the first aspect above.

A fourth aspect of the embodiments of the present disclosure provides a computer storage medium, where the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the foregoing first aspects steps in the implementation.

A fifth aspect of the embodiments of the present disclosure provides a computer program product, the computer program product includes one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the implementations of the first aspect above steps in the method.

It can be seen that the embodiment of the present disclosure obtains the first cluster of the first image data set; divides the first cluster into M first sub-clusters, and obtains the M first sub-clusters. the first cluster center corresponding to each first sub-cluster; the M is an integer greater than or equal to 1; obtain a second image data set, and use the first cluster center to combine the second image data set with the The first cluster is merged. In this way, the first cluster is divided into a plurality of first sub-clusters, and the second image data set is merged by the first cluster based on the first cluster center of the first sub-cluster. By maintaining a plurality of first clusters center (ie, sub-center) to solve the problem that with the increase of image data, the cluster center (the cluster center of the first cluster, that is, the main center) will be affected by the new image data and cause drift, which is beneficial to Make the clustering results more accurate to improve the clustering effect. In addition, in the clustering process, the second image data set does not need to perform similarity calculation with the first image data set as a whole, which is beneficial to reduce the computational complexity.

Description of drawings

FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure;

2 is a schematic flowchart of a method for incremental clustering of images according to an embodiment of the present disclosure;

3A is a schematic diagram of a connectivity graph of a first cluster according to an embodiment of the present disclosure;

3B is a schematic diagram of dividing a first cluster into first sub-clusters according to an embodiment of the present disclosure;

4A is a schematic diagram of a clustering result of a second image data set according to an embodiment of the present disclosure;

4B is a schematic diagram of merging isolated image data with a first cluster according to an embodiment of the present disclosure;

4C is a schematic diagram of merging a second cluster with a first cluster according to an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of updating a first cluster center according to an embodiment of the present disclosure;

6 is a schematic flowchart of another method for incremental clustering of images according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of an apparatus for incremental clustering of images according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Implementation

In order to make those skilled in the art better understand the solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments can be Embodiments are part of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

The appearances of the terms "comprising" and "having" and any variations thereof in this disclosure, the claims, and the drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but in some embodiments of the present disclosure also includes unlisted steps or units, or Other steps or units inherent to these processes, methods, products or devices are also included in some embodiments of the present disclosure. In addition, the terms "first", "second", "third", etc. are used to distinguish different objects and not to describe a specific order.

In actual scenarios, such as social media, security, etc., images are often generated incrementally, so incremental clustering has a wide range of applications in solving classification problems. Traditional incremental clustering needs to maintain some first clusters. However, different clusters have different degrees of sparseness. With the continuous progress of incremental clustering, the possibility of cluster center drift increases, and the clustering effect decreases.

An embodiment of the present disclosure proposes an incremental clustering method for image data, which can be implemented based on the application environment shown in FIG. 1 . As shown in FIG. 1 , the application environment mainly includes an image processing center 101 and an image acquisition device 102 . The processing center 101 includes but is not limited to a server 1011, a terminal and a database. In some scenarios, the image acquisition device 102 may be a camera or a camera deployed in scenes such as gate passages, shopping malls, and residential areas, and is used to collect images, such as face images, video surveillance images, and the image processing center 101 may be The monitoring center, the image processing center 101 can introduce a video cloud node (Video Cloud Node, VCN) 1012 to manage the video monitoring, for example: display the images on the display 1013, and store the images in the database 1014 after clustering. In some scenarios, the image collection device 102 may also be a user terminal, and the images it collects may be photos taken by the user, for example, photos posted by the user on social media, and the image processing center may be the processing background of social media. Among them, the image acquisition device 102 can upload the collected images to the image processing center 101, and the image processing center 101 performs operations such as feature extraction, cluster classification, face recognition, etc. Since the images on the image acquisition device side are generated incrementally every day , and incremental clustering needs to maintain some clusters. With the continuous increase of image data and the continuous progress of incremental clustering, the cluster center of the original maintained cluster will have the risk of drift, which makes the clustering The effect gradually deteriorates, so the server 1011 can be used to execute the incremental clustering method proposed by the embodiment of the present disclosure, so as to solve the problem that the clustering effect is affected by the drift of the cluster center in the incremental clustering. The above-mentioned server 1011 may be an independent physical server, a server cluster or a distributed system, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware services , domain name services, security services, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

The incremental clustering method for images provided by the embodiments of the present disclosure will be described in detail below with reference to the related drawings.

FIG. 2 is a schematic flowchart of an image incremental clustering method provided by an embodiment of the present disclosure. The image incremental clustering method is applied to a server, as shown in FIG. 2 , including steps S21 to S23:

S21: Acquire the first cluster of the first image data set.

The first image dataset refers to an image dataset that has been clustered into multiple clusters before the current batch of image data. ) is the current batch of data, then the data of the face image that has been uploaded to the server before this is the first image data set. The first cluster is a cluster obtained by clustering the image data in the first image data set, and the clustering algorithm used may be a K-means clustering algorithm. It should be understood that each cluster exists The corresponding cluster center, that is, the second cluster center.

S22: Divide the first cluster into M first subclusters, and obtain a first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or An integer equal to 1.

FIG. 3A is a schematic diagram of a connectivity graph of a first cluster according to an embodiment of the present disclosure. As shown in FIG. 3A , the connectivity graph of the first cluster includes a first cluster 301 and a second cluster center 302 , wherein the first clustering cluster 301 is a clustering cluster obtained by clustering the image data in the first image data set; the second clustering center 302 is that each clustering cluster has a corresponding clustering center.

FIG. 3B is a schematic diagram of dividing a first cluster into first sub-clusters according to an embodiment of the present disclosure. As shown in FIG. 3B , the division of the first cluster into first sub-clusters includes a first cluster 301 , the second cluster center 302, the first sub-cluster 303 and the first cluster center 304, wherein the first sub-cluster 303 is a sub-cluster obtained by dividing the first cluster cluster 301; the first cluster center 304 is the cluster center of each first subcluster.

The first sub-cluster is the sub-cluster obtained by dividing the first cluster. For each first cluster in the first data set, the similarity between the image data in the first cluster is obtained, that is, the first sub-cluster is obtained. Seven degrees of similarity, get a similarity matrix, and then obtain a connected graph with the image data in the first cluster as vertices, as shown in Figure 3A, for every two vertices in the connected graph, query from the similarity matrix Its similarity, in the case of clustering the first image data set, the threshold used is X, that is, the fifth threshold, then the multiple image data whose similarity is greater than this X is divided into a more compact first sub-cluster , so that M first subclusters are obtained. As shown in FIG. 3B , the first cluster shown in FIG. 3A is divided into M first subclusters through the analysis of the connected graph. After the M first sub-clusters are obtained, the cluster center of each first sub-cluster in the M first sub-clusters is obtained, that is, the first cluster center, then each first cluster cluster can be composed of a main cluster Center and M sub-cluster center descriptions. Describing the first cluster with a more compact sub-cluster is beneficial to solve the problem that the expression ability of a single main cluster center is weakened with the incorporation of new image data.

S23: Acquire a second image data set, and combine the second image data set with the first cluster by using the first cluster center.

FIG. 4A is a schematic diagram of a clustering result of a second image dataset provided by an embodiment of the present disclosure. As shown in FIG. 4A , the clustering result of the second image dataset includes a second image dataset 401 , a second cluster Cluster 402, isolated image data 403 and third cluster center 404, wherein the second image data set 401 is the data set of the current batch of images uploaded by the image acquisition device; The image data is clustered by clustering; the isolated image data 403 is the isolated image data that has not been clustered; the third cluster center 404 is the cluster center where each second cluster exists.

FIG. 4B is a schematic diagram of merging isolated image data with a first cluster according to an embodiment of the present disclosure. As shown in FIG. 4B , merging isolated image data with a first cluster includes a first cluster A 405 and an isolated cluster A 405 . Image data 403, wherein the first cluster A 405 is the first cluster A determined in the first cluster.

FIG. 4C is a schematic diagram of merging a second cluster with a first cluster according to an embodiment of the present disclosure. As shown in FIG. 4C , the combination of the second cluster and the first cluster includes the first cluster B 406 and the second cluster 407, wherein the first cluster B 406 and the second cluster 407 belong to the same cluster category.

The second image data set is the data set of the current batch of images uploaded by the image acquisition device, and is obtained from the images uploaded by the image acquisition device. The first cluster includes a first cluster A, a first cluster B, and a first cluster C, and when the second image data set includes multiple image data, cluster to get the clustering result. The clustering result includes unclustered isolated image data and several second clusters, and each of the several second clusters has a corresponding cluster center, that is, the third cluster center, see Figure 4A. For the isolated image data, the first cluster A is determined from the first cluster, and the first cluster center is used to merge it with the first cluster A, that is, as shown in FIG. 4B , the isolated image data Absorbed into the first cluster A, the first cluster A and the isolated image data belong to the same cluster category. For each second cluster, determine the first cluster B from the first cluster, and use the first cluster center to merge it with the first cluster B, that is, as shown in FIG. 4C . For the merging between clusters, the first cluster B and the second cluster belong to the same cluster category. Similar to the isolated image data, in the case where there is only a single image data in the second image data set, that is, the newly added image data is only a single image data, and there is no need to perform a clustering operation on the second image data set. The first cluster C is determined, and the first cluster C is merged with the first cluster C by using the first cluster center, and the first cluster C and the single image data belong to the same cluster category.

In a possible implementation manner, before using the first cluster center to combine the second image data set with the first cluster, the method further includes:

Wherein, before merging the second image data set with the first cluster, all the first clusters need to be preliminarily screened by using the second cluster center of the first cluster, and from all the first clusters K first clusters are determined, and then the above-mentioned first cluster A and first cluster B, or first cluster C are selected from the K clusters. It should be noted that the K first clusters may be the top K after sorting all the first clusters by using the second cluster center, for example: the top 20 of the 100 first clusters after sorting The K first clusters may also be all sorted first clusters, for example, 100 first clusters are still selected after sorting. Using the second cluster center to preliminarily screen the first cluster is beneficial to determine the first cluster that is more similar to the image data clustering category in the second image data set, such as the above-mentioned first cluster A, the first cluster B and the first cluster C.

In a possible implementation manner, the determining K first clusters from the first clusters by using the second cluster center includes:

Wherein, when the second image data set is clustered to obtain isolated image data and multiple second clusters, for the isolated sample image data, calculate the difference between it and the second cluster center of each first cluster. For the second cluster, calculate the second similarity between the corresponding third cluster center and the second cluster center of each first cluster, respectively according to the first similarity Sort all the first clusters from high to low degree and the second similarity to obtain the corresponding first and second cluster sequences, and then from the first and second clusters The first K first cluster clusters are respectively selected from the cluster sequence. In the case where only a single image data is included in the second image data set, the third similarity between the single image data and the second cluster center of each first cluster is calculated, and the third similarity is from high to low. The first clusters are sorted to obtain a corresponding third cluster sequence, and then the top K first clusters are selected from the third cluster sequence.

In a possible implementation manner, the using the first cluster center to combine the isolated image data with the first cluster A includes:

Among them, for the merging of the isolated sample image data, the first cluster A needs to be determined from the first K first clusters selected. It should be noted that the first K first clusters may be sorted All the first clusters of . First, the similarity between the isolated image data and the cluster center (ie, the first cluster center D) of each first sub-cluster of each first cluster in the K first clusters is calculated, and is determined as the first cluster center D. Four similarity degrees, and then analyze the K first clusters to determine the number of first cluster centers D in each first cluster that satisfy the fourth similarity greater than the first threshold, and determine it as the first number, Determine the first cluster with the largest first number as the first cluster A, for example, among the K first clusters, the first cluster 1 has 20 such first cluster centers D, The first cluster 2 has 18 such first cluster centers D, ..., the first cluster K has 15 such first cluster centers D, and the first cluster 1 has the largest number, then it is It is determined to be the first cluster A, that is to say, the first sub-cluster A that is most similar to the isolated image data exists in the first cluster A. Merging the isolated image data into the first cluster A can make the clustering The results are more accurate.

In a possible implementation manner, the merging the second cluster cluster and the first cluster cluster B by using the first cluster center includes:

Among them, for the merging between the clusters, the first cluster B needs to be determined from the first K first clusters selected. It should be noted that the first K first clusters The clusters can be all the first cluster clusters after sorting. First, divide each second cluster into N second sub-clusters according to the method of dividing the first cluster, and calculate the cluster center of each second sub-cluster, that is, the fourth cluster center, and then calculate The similarity between the fourth cluster center and the cluster center (ie, the first cluster center E) of each first sub-cluster of each first cluster in the K first clusters is determined as eh The fifth similarity, and then analyze the K first clusters to determine the number of first cluster centers E that satisfy the fifth similarity greater than the second threshold in each first cluster, and determine it as the second number , determine the first cluster with the second largest number as the first cluster B, for example: among the K first clusters, the first cluster 1 has 30 such first cluster centers E , the first cluster 2 has 15 such first cluster centers E, ..., the first cluster K has 40 such first cluster centers E, and the first cluster K has the largest number, then the It is determined to be the first cluster B, that is to say, the first cluster B has a first sub-cluster that is at most similar to the second sub-cluster of the second cluster, and the second cluster is merged into the first sub-cluster. Clustering in cluster B can make the clustering result more accurate.

In a possible implementation manner, the combining the single image data with the first cluster C by using the first cluster center includes:

Among them, for the merging of single image data, it is necessary to determine the first cluster C from the selected top K first clusters. It should be noted that the top K first clusters may be sorted All first cluster clusters. First, the similarity between the single image data and the cluster center (ie, the first cluster center F) of each first sub-cluster of each first cluster in the K first clusters is calculated, and it is determined as the first cluster center. Six similarity degrees, and then analyze the K first clusters to determine the number of first cluster centers F that satisfy the sixth similarity greater than the third threshold in each first cluster, and determine it as the third number, Determine the first cluster with the third largest number as the first cluster C, that is to say, there is a first sub-cluster that is at most similar to the single image data in the first cluster C, and combine the single image data To the first cluster C can make the clustering result more accurate.

In a possible implementation manner, the M is less than or equal to a fourth threshold; after the second image data set and the first cluster are merged by using the first cluster center, as shown in FIG. 5 As shown, the method further includes:

S51: Divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or an integer equal to 1;

S52, in the case that the R is less than or equal to the fourth threshold, retain the R third sub-clusters, and use the fifth cluster centers corresponding to the R third sub-clusters to A cluster center is updated;

S53, when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters;

S54, sort the R third subclusters from large to small according to the fourth number to obtain a fourth clustering cluster sequence, select the first P third subclusters in the fourth clustering cluster sequence, and use The fifth cluster centers corresponding to the P third subclusters update the first cluster centers; the P is less than or equal to the fourth threshold.

Among them, after merging the isolated image data and the second cluster, or a single image data into a certain first cluster, since the original first cluster has new image data to be clustered in, it is necessary to The sub-centers of the first cluster are updated. Specifically, the merged first cluster is divided into R third sub-clusters according to the method of dividing the first cluster, and the fifth cluster center of each third sub-cluster is calculated, and the third sub-cluster is determined according to R. The number of three sub-clusters, if the number of third sub-clusters is less than or equal to the fourth threshold, for example: 20, the R third sub-clusters are reserved, and the fifth cluster center of these R third sub-clusters is used as The new sub-center of the merged first cluster to update the original first cluster center, then the merged first cluster is described by the second cluster center and the R fifth cluster centers .

In addition, if the number of third sub-clusters is greater than the fourth threshold, the R third sub-clusters are sorted according to the number of image data in each third sub-cluster (that is, the fourth number) from large to small to obtain the fourth cluster. Cluster-like sequence, select the first P third sub-clusters to keep, for example: only keep the first 20 third sub-clusters, discard the rest of the third sub-clusters, and use the fifth cluster center of the P third sub-clusters as the merge Then, the merged first cluster is described by using the second cluster center and the P fifth cluster centers. It should be understood that each time a cluster is divided into sub-clusters, only a preset number of sub-clusters are reserved. Therefore, both M and N are less than or equal to the fourth threshold, so that when there are many sub-clusters, By retaining the sub-clusters with more image data to limit the number of sub-centers and eliminate the influence of outlier image data, it is not only easy to maintain, but also has a good clustering effect in long-term large-scale incremental clustering scenarios.

Please refer to FIG. 6. FIG. 6 is a schematic flowchart of another image incremental clustering method provided by an embodiment of the present disclosure, as shown in FIG. 6, including steps S61 to S66:

S61, obtaining the first cluster of the first image data set;

S62: Divide the first cluster into M first subclusters, and obtain a first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or an integer equal to 1;

S63, obtaining a second image data set;

S64, in the case that the second image data set includes multiple image data, perform clustering on the multiple image data to obtain isolated image data and a second cluster;

S65, using the first cluster center to combine the isolated image data with the first cluster A; and using the first cluster center to combine the second cluster and the first cluster B merge;

S66, in the case that there is only a single image data in the second image data set, use the first cluster center to combine the single image data with the first cluster C.

The implementations of the above steps S61 to S66 have been described in the embodiments shown in FIG. 2 to FIG. 5 , and can achieve the same or similar beneficial effects, and will not be repeated here.

Breakthroughs in deep learning research continue to promote the development of face recognition technology, and face recognition models obtained through supervised learning continue to make breakthroughs. However, when faced with a large amount of unlabeled face data, how to classify accurately and quickly is a problem. A problem of enormous economic and social value.

Due to the actual scene, such as social media, security and other fields, the amount of image data is often relatively large, and the data is incrementally generated every day, so the incremental clustering method has greater practical application value. The incremental clustering method needs to maintain some clusters in the clustering process. The traditional clustering algorithm uses a single cluster center to describe a cluster, such as taking the mean of all sample features in the cluster to obtain the cluster center , but different clusters have different degrees of sparseness, so simply adopting a single cluster center with mean value is easy to lose the rich sample information inside the cluster. As the process of incremental clustering continues, the clustering effect will be gradually affected. .

In the actual application process of face clustering, the distribution of facial features of different people in the feature space data is not the same, and the samples in some clusters are relatively compact, and the samples in some clusters may be relatively loose. If a single center is used to describe the cluster, the internal information of the cluster will be lost. With the continuous progress of incremental clustering, the influence of the existing samples will continue to decrease. There is an increased risk of center drift.

An image incremental clustering method provided by an embodiment of the present disclosure includes the steps of:

S67, perform similarity calculation between the cluster samples, and divide a cluster into several closer sub-clusters;

By calculating the similarity between cluster samples, the similarity matrix S can be obtained. Assuming that the threshold used for clustering is λ, a higher threshold λ' needs to be set, that is, λ'>λ is satisfied to cluster a cluster. The cluster is split into several tighter subclusters.

Clusters can be analyzed using methods based on connectivity graph analysis to obtain the polycentricity of clusters. The similarity matrix is calculated for the clusters. By using a similarity threshold higher than that used for clustering, a cluster can be divided into several more compact sub-clusters, so that multiple sub-cluster centers can be obtained, plus as The center of the cluster in the main center constitutes the multi-center description of the cluster.

Here, using the design analysis of cluster multi-center based on connected graph analysis to obtain multiple sub-centers includes: first, for each cluster, by setting a higher threshold (needs to be higher than the clustering threshold), the cluster is Scatter into several more compact connected sub-graphs, and calculate the sub-centers for each connected sub-graph, so that multiple sub-centers can be obtained.

S68. During the incremental clustering process, whenever a new batch of data is added, the new data will be clustered once, and a certain number of clusters and unclustered isolated samples will be generated;

S69 , generating a number of clusters and unclustered isolated samples, and obtaining the existing clustering results in step S67 for cluster merging.

Multi-center incremental clustering method based on a single main center and multiple sub-centers: On the basis of obtaining the main center and multiple sub-centers, in the process of incremental clustering, first use the main center and new data to perform a TopK search. sieve, and then further determine whether to absorb new samples or other clusters based on multiple sub-centers.

The process of cluster merging involves merging between clusters and absorbing individual isolated samples into clusters. For the absorption of isolated sample points, based on the multi-center design, a lower threshold is first set, and the main center is used to search for TopK, and then according to whether the sub-center and the sample point meet the clustering threshold λ. In this case, there may be multiple clusters and isolated sample points to meet such requirements, and the cluster with the largest number of sub-centers that meet the requirements is used as the target cluster. When merging between clusters, a lower threshold is also used to filter and retrieve TopK, and then according to whether there are sub-center pairs between clusters that meet the threshold requirements, when there are multiple clusters that meet the requirements, take the threshold that meets the requirements The cluster with the largest number of sub-centers is used as the target cluster.

Using a multi-center-based incremental clustering architecture, a single main center and multiple sub-centers in the multi-center mechanism are comprehensively utilized. During the TopK nearest neighbor search, the main center is used to participate in the calculation of similarity, and then through multiple sub-centers and pending The similarity of a single sample or cluster of clusters is calculated to further determine whether the absorption of a single sample or the merging of clusters is completed. This architecture comprehensively utilizes the advantages of multi-center representation, which can improve the clustering effect without increasing too much computational complexity.

When clusters are merged or new samples are added, the sub-centers need to be updated. In order to simplify the calculation, it can be modeled as a cluster of sub-centers, so as to realize the merged update of the sub-centers. At the same time, in order to prevent too much sub-center data, each sub-center can be sorted from large to small according to the number of sample points represented, for example, only the first 20 sub-centers are taken at most.

An incremental update method using cluster polycenters. In actual scenarios, as the amount of data continues to increase, the combined update of sub-centers and the limitation of the number of sub-centers can prevent the continuous increase of the number of sub-centers, which will bring too much computational and storage burden, and can also reduce outliers. The influence of interference points.

In the embodiment of the present disclosure, the complex situation of face clustering under large-scale data is fully considered,

Firstly, a multi-center construction method of face clusters is proposed, which can be used to obtain the description of a single main center and multiple sub-centers of a face cluster. It solves the problem that the description of a cluster is to maintain a cluster center, ignoring some compact sub-cluster information inside the cluster, and as the data continues to increase, due to the maintenance of a single cluster center, the cluster center will continue to be subject to new changes. The influence of the samples has a certain risk of center drift, and the influence of the existing samples in the cluster will continue to weaken, reducing the expression ability of the center. Also, a single cluster center will lose the sample information inside the cluster during the incremental clustering process. In the incremental clustering process, a single cluster center is usually maintained for each cluster, and data is continuously added. In the process, the clustering center is used to calculate the similarity between new samples or clusters to merge and update the clusters, and the clustering center will also be updated continuously. With the continuous addition of data, a single multi-center will gradually lose the rich sample information within the cluster, and it is also prone to drift, which will affect the clustering effect over time.

Secondly, an incremental clustering architecture based on multi-center is proposed. Using this architecture, the computational complexity and clustering accuracy of incremental clustering using multi-center representation can be well balanced. The merging of samples and clusters solves the problem that the multi-center setting of the prior art will have a great impact on the computing speed and storage of clustering in large-scale data scenarios.

Finally, a multi-center incremental update method is proposed. This method can achieve good clustering in long-term large-scale incremental clustering scenarios through the merged update between sub-centers and the limitation of the number of sub-centers. Effect. Based on this method, the increase of the number of polycenters can be limited, and the influence of outliers can be eliminated at the same time, which solves the problem of maintaining multiple polycenters in the prior art because the features of face pictures generally have high dimensions. The problem of multiplying the memory pressure, and the problem of multiplying the computation extra during the TopK nearest neighbor search.

Based on the description of the method embodiment shown in FIG. 2 or FIG. 6 , an embodiment of the present disclosure further provides an apparatus for incremental clustering of images. Please refer to FIG. 7 . FIG. 7 provides an image increment according to an embodiment of the present disclosure. A schematic diagram of the structure of the clustering device, as shown in Figure 7, the device includes:

The first acquisition module 71 is configured to acquire the first cluster of the first image data set;

A first segmentation module 72, configured to segment the first cluster into M first sub-clusters, and obtain a first cluster center corresponding to each first sub-cluster in the M first sub-clusters; The M is an integer greater than or equal to 1;

The merging module 73 is configured to obtain a second image data set, and use the first cluster center to merge the second image data set and the first cluster.

In a possible implementation manner, the first cluster includes a first cluster A, a first cluster B and a first cluster C; In terms of merging the second image data set with the first cluster, the merging module 73 is configured to: if the second image data set includes a plurality of image data, cluster the plurality of image data, obtaining isolated image data and a second cluster; using the first cluster center to merge the isolated image data with the first cluster A; and using the first cluster center to combine the first cluster The two clusters are merged with the first cluster B; in the case that there is only a single image data in the second image data set, the first cluster center is used to combine the single image data with the second image data A cluster C is merged.

In a possible implementation manner, the first cluster has a corresponding second cluster center; when using the first cluster center to associate the second image data set with the first cluster Before merging, the merging module 73 is further configured to: determine K first clusters from the first clusters by using the second cluster center.

In a possible implementation manner, the second cluster has a corresponding third cluster center; after using the second cluster center to determine K first clusters from the first cluster In terms of clusters, the merging module 73 is configured to: obtain a first similarity between the isolated image data and the second cluster center; Sorting the clusters to obtain a first cluster sequence, and selecting the top K first clusters in the first cluster sequence; and obtaining the distance between the third cluster center and the second cluster center the second similarity; sort the first clusters according to the second similarity from high to low to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence Clustering; or, obtaining a third similarity between the single image data and the second cluster center; sorting the first clusters according to the third similarity from high to low to obtain For the third cluster sequence, the top K first clusters in the third cluster sequence are selected.

In a possible implementation manner, in terms of merging the isolated image data with the first cluster A by using the first cluster center, the merging module 73 is configured to: obtain the isolated image data and the first cluster A. A fourth similarity between cluster centers D; the first cluster center D is the The first cluster center; for each first cluster in the K first clusters, determine that the fourth similarity in each first cluster is greater than the first threshold. The first number of the first cluster centers D; the first cluster with the largest number of the K first clusters is determined as the first cluster A; the isolated image The data is merged with the first cluster A.

In a possible implementation manner, in terms of merging the second cluster with the first cluster B by using the first cluster center, the merging module 73 is configured to: combine the second cluster The cluster is divided into N second subclusters, and the fourth cluster center corresponding to each second subcluster in the N second subclusters is obtained; the N is an integer greater than or equal to 1; obtain the The fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each first child of each first cluster in the K first clusters the first cluster center corresponding to the cluster; for each first cluster in the K first clusters, determine that the fifth similarity in each first cluster is greater than the first cluster The second number of the first cluster centers E with two thresholds; the first cluster with the largest second number among the K first clusters is determined as the first cluster B; The second cluster is merged with the first cluster B.

In a possible implementation manner, in terms of merging the single image data and the first cluster C by using the first cluster center, the merging module 73 is configured to: obtain the single image data and the first cluster C. a sixth degree of similarity between cluster centers F; the first cluster center F is the The first cluster center; for each first cluster in the K first clusters, determine the sixth similarity greater than the third threshold in each first cluster. the third number of the first cluster centers F; the first cluster with the third largest number of the K first clusters is determined as the first cluster C; the single image The data is merged with the first cluster C.

In a possible implementation manner, the M is less than or equal to a fourth threshold; the first dividing module 72 is further configured to: divide the merged first cluster into R third sub-clusters, and obtain the The fifth cluster center of each third sub-cluster in the R third sub-clusters; the R is an integer greater than or equal to 1; when the R is less than or equal to the fourth threshold, keep the R third sub-clusters, and the first cluster center is updated with the fifth cluster center corresponding to the R third sub-clusters; when the R is greater than the fourth threshold, Obtain the fourth quantity of image data in each of the R third sub-clusters; sort the R third sub-clusters according to the fourth quantity in descending order to obtain a fourth cluster Cluster-like sequence, select the first P third subclusters in the fourth clustering cluster sequence, and use the fifth clustering centers corresponding to the P third subclusters to update the first clustering center ; the P is less than or equal to the fourth threshold.

In a possible implementation manner, in the aspect of dividing the first cluster into M first sub-clusters, the first dividing module 72 is configured to: acquire between the image data in the first cluster The seventh similarity is obtained, and a similarity matrix is obtained; the first cluster is divided into the M first sub-clusters based on the similarity matrix.

In a possible implementation manner, in terms of dividing the first cluster into the M first sub-clusters based on the similarity matrix, the first dividing module 72 is configured to: obtain the first sub-cluster with the first sub-cluster The image data in the cluster is a connected graph composed of vertices; the seventh similarity between the vertices in the connected graph is obtained by querying the similarity matrix; the seventh similarity is greater than the fifth similarity The multiple vertices of the threshold are divided into a first sub-cluster to obtain the M first sub-clusters.

According to an embodiment of the present disclosure, each unit in the apparatus for incremental clustering of images shown in FIG. 7 may be respectively or all merged into one or several other units to form, or some of the unit(s) may be further It can be further divided into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure. The above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit. In other embodiments of the present disclosure, the image-based incremental clustering apparatus may also include other units, and in practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by a plurality of units in cooperation.

According to another embodiment of the present disclosure, a general-purpose computing device, such as a computer, may be implemented on a general-purpose computing device, such as a computer, including processing elements such as a central processing unit (CPU), random access storage medium (RAM), read-only storage medium (ROM), etc., and storage elements. Running a computer program (including program code) capable of executing the steps involved in the corresponding method as shown in FIG. 2 or FIG. 6, to construct the incremental clustering apparatus of the image as shown in FIG. 7, and to realize the present invention. Incremental clustering methods for images of disclosed embodiments. The computer program can be recorded on, for example, a computer-readable recording medium, and loaded in the above-mentioned computing device through the computer-readable recording medium, and executed therein.

Based on the descriptions of the foregoing method embodiments and apparatus embodiments, the embodiments of the present disclosure further provide an electronic device. Referring to FIG. 8 , the electronic device includes at least a processor 81 , an input device 82 , an output device 83 and a computer storage medium 84 . The processor 81 , the input device 82 , the output device 83 and the computer storage medium 84 in the electronic device may be connected through a bus or other means.

The computer storage medium 84 may be stored in the memory of the electronic device, the computer storage medium 84 configured to store a computer program including program instructions, the processor 81 configured to execute the program stored by the computer storage medium 84 instruction. The processor 81 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and the control core of the electronic device, which is suitable for implementing one or more instructions, and is suitable for loading and executing one or more instructions to achieve the corresponding Method flow or corresponding function.

In one embodiment, the processor 81 of the electronic device provided by the embodiment of the present disclosure may be configured to perform incremental clustering processing of a series of images:

obtaining the first cluster of the first image data set;

Divide the first cluster into M first subclusters, and obtain the first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or equal to 1 an integer of ; obtain a second image data set, and combine the second image data set with the first cluster by using the first cluster center.

In yet another embodiment, the first cluster includes a first cluster A, a first cluster B and a first cluster C; the processor 81 executes the process of using the first cluster center to Combining the second image data set with the first cluster includes: in the case that the second image data set includes a plurality of image data, clustering the plurality of image data to obtain an isolated image data and a second cluster; combining the isolated image data with the first cluster A using the first cluster center; and combining the second cluster using the first cluster center The cluster is merged with the first cluster cluster B; in the case that there is only a single image data in the second image data set, the single image data is combined with the first cluster by using the first cluster center Cluster C is merged.

In yet another embodiment, the first cluster has a corresponding second cluster center; before using the first cluster center to merge the second image data set and the first cluster, The processor 81 is further configured to perform: using the second cluster center to determine K first clusters from the first clusters.

In yet another embodiment, the second cluster has a corresponding third cluster center; the processor 81 executes the process of determining the Kth cluster from the first cluster by using the second cluster center. a cluster, including: acquiring a first similarity between the isolated image data and the second cluster center; sorting the first clusters according to the first similarity from high to low Obtain the first cluster sequence, and select the top K first clusters in the first cluster sequence; and, obtain the second cluster center between the third cluster center and the second cluster center. similarity; sort the first clusters from high to low according to the second similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence ; Or, obtain the third similarity between the single image data and the second cluster center; sort the first cluster according to the third similarity from high to low to obtain the third cluster Cluster sequence, select the top K first clusters in the third cluster sequence.

In yet another embodiment, the processor 81 performing the process of using the first cluster center to merge the isolated image data with the first cluster A includes: acquiring the isolated image data and the first cluster A. The fourth similarity between centers D; the first cluster center D is the first cluster corresponding to each first sub-cluster of each first cluster in the K first clusters Class center; for each of the K first clusters, determine the first cluster whose fourth similarity is greater than a first threshold in each of the first clusters the first number of cluster centers D; determine the first cluster with the largest first number among the K first clusters as the first cluster A; combine the isolated image data with all The first cluster cluster A is merged.

In yet another embodiment, the processor 81 performing the using the first cluster center to merge the second cluster with the first cluster B includes: dividing the second cluster is N second subclusters, and obtains the fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is an integer greater than or equal to 1; obtain the fourth cluster center The fifth similarity between the class center and the first cluster center E; the first cluster center E is the corresponding value of each first sub-cluster of each first cluster in the K first clusters the first cluster center; for each first cluster in the K first clusters, determine that the fifth similarity in each first cluster is greater than the second threshold; the second number of the first cluster centers E; the first cluster with the largest second number among the K first clusters is determined as the first cluster B; the The second cluster is merged with the first cluster B.

In yet another embodiment, the processor 81 performing the merging of the single image data and the first cluster C by using the first cluster center includes: acquiring the single image data and the first cluster C. The sixth similarity between centers F; the first cluster center F is the first cluster corresponding to each first sub-cluster of each first cluster in the K first clusters class center; for each first cluster in the K first clusters, determine the first cluster whose sixth similarity is greater than a third threshold in each first cluster the third number of class centers F; determine the first cluster with the largest third number among the K first clusters as the first cluster C; combine the single image data with all The first cluster cluster C is merged.

In yet another embodiment, the M is less than or equal to a fourth threshold; after merging the second image data set and the first cluster by using the first cluster center, the processor 81 is further configured to: Execute: divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or an integer equal to 1; when the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the fifth cluster center pair corresponding to the R third subclusters is used The first cluster center is updated; when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters; according to Sorting the R third subclusters in descending order of the fourth number to obtain a fourth clustering cluster sequence, selecting the first P third subclusters in the fourth clustering clustering sequence, and using the P The fifth cluster center corresponding to the third sub-cluster updates the first cluster center; the P is less than or equal to the fourth threshold.

In yet another embodiment, the first cluster is obtained by clustering the image data in the first image data set; the processor 81 executes the process of dividing the first cluster into M first clusters. sub-cluster, including: acquiring the seventh similarity between the image data in the first cluster to obtain a similarity matrix; dividing the first cluster into the M based on the similarity matrix first subcluster.

In yet another embodiment, the processor 81 performing the dividing the first cluster into the M first sub-clusters based on the similarity matrix includes: obtaining a The image data is a connected graph composed of vertices; the seventh similarity between the vertices in the connected graph is obtained by querying the similarity matrix; the plurality of vertices whose seventh similarity is greater than the fifth threshold are obtained Divide into a first sub-cluster to obtain the M first sub-clusters.

Exemplarily, the above-mentioned electronic devices may be computers, computer hosts, servers, cloud servers, server clusters, etc. The electronic devices may include, but are not limited to, a processor 81, an input device 82, an output device 83, and a computer storage medium 84. The input device 82 It can be a keyboard, a touch screen, etc., and the output device 83 can be a speaker, a display, a radio frequency transmitter, and the like. Those skilled in the art can understand that the schematic diagram may be an example of an electronic device, and does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or different components.

It should be noted that, since the processor 81 of the electronic device implements the steps in the above-mentioned incremental image clustering method when executing the computer program, the above-mentioned embodiments of the incremental image clustering method are all applicable to the electronic device, and can achieve the same or similar beneficial effects.

Embodiments of the present disclosure further provide a computer program product, which implements any one of the methods in the foregoing embodiments when the computer program product is executed by a processor. The computer program product can be implemented in hardware, software or a combination thereof. In some embodiments of the present disclosure, the computer program product is embodied as a computer storage medium, and in other embodiments of the present disclosure, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

Embodiments of the present disclosure also provide a computer storage medium (Memory), where the computer storage medium is a memory device in an electronic device and is configured to store programs and data. It can be understood that, the computer storage medium here may include both a built-in storage medium in the terminal, and certainly also an extended storage medium supported by the terminal. The computer storage medium provides storage space, and the storage space stores the operating system of the terminal. In addition, one or more instructions suitable for being loaded and executed by the processor 81 are also stored in the storage space, and these instructions may be one or more computer programs (including program codes). It should be noted that the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; in some embodiments of the present disclosure, it may also be at least one disk memory. A computer storage medium located remotely from the aforementioned processor 81 . In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 81 to implement the corresponding steps of the above-mentioned method for incremental clustering of images.

Exemplarily, the computer program of the computer storage medium includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc.

It should be noted that, since the computer program of the computer storage medium is executed by the processor to realize the steps in the above-mentioned incremental image clustering method, all the embodiments of the above-mentioned incremental image clustering method are applicable to the computer storage medium. medium, and can achieve the same or similar beneficial effects.

The embodiments of the present disclosure have been introduced in detail above, and the principles and implementations of the present disclosure are described in this document by applying an example. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure; at the same time, for the present disclosure Persons of ordinary skill in the art, according to the idea of the present disclosure, will have changes in the implementation manner and application scope. In summary, the contents of this specification should not be construed as limiting the present disclosure.

Industrial Applicability

In this embodiment, the first cluster is divided into a plurality of first sub-clusters, and the first cluster cluster is combined with the second image data set based on the first cluster center of the first sub-cluster. The first clustering center is used to solve the problem that with the increase of image data, the clustering center will drift due to the influence of the newly added image data, which is conducive to making the clustering result more accurate and improving the clustering effect.

Claims

A method for incremental clustering of images, the method comprising:

obtaining the first cluster of the first image data set;

Divide the first cluster into M first subclusters, and obtain the first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or equal to 1 the integer;

Acquire a second image data set, and combine the second image data set with the first cluster by using the first cluster center.
The method according to claim 1, wherein the first cluster includes a first cluster A, a first cluster B and a first cluster C; the use of the first cluster center Combining the second image dataset with the first cluster includes:

In the case that the second image data set includes a plurality of image data, clustering the plurality of image data to obtain isolated image data and a second cluster;

combining the isolated image data with the first cluster A using the first cluster center; and combining the second cluster with the first cluster using the first cluster center Cluster B merge;

When only a single image data exists in the second image data set, the single image data is merged with the first cluster C by using the first cluster center.
The method according to claim 2, wherein the first cluster has a corresponding second cluster center; when using the first cluster center to combine the second image data set with the first cluster Before the cluster merging, the method further includes:

K first clusters are determined from the first clusters by using the second cluster centers.
The method according to claim 3, wherein the second cluster has a corresponding third cluster center; the second cluster center is used to determine K from the first cluster The first cluster, including:

obtaining the first similarity between the isolated image data and the second cluster center;

Sort the first clusters from high to low according to the first similarity to obtain a first cluster sequence, and select the top K first clusters in the first cluster sequence; and,

obtaining the second similarity between the third cluster center and the second cluster center;

Sort the first clusters from high to low according to the second similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence; or,

obtaining the third similarity between the single image data and the second cluster center;

Sort the first clusters from high to low according to the third similarity to obtain a third cluster sequence, and select the top K first clusters in the third cluster sequence.
The method according to claim 3, wherein the combining the isolated image data with the first cluster A by using the first cluster center comprises:

Obtain the fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster;

For each first cluster in the K first clusters, determine the first cluster center D whose fourth similarity is greater than a first threshold in each first cluster the first quantity;

Determining the first cluster with the largest first number among the K first clusters as the first cluster A;

The isolated image data is merged with the first cluster A.
The method according to claim 3, wherein the combining the second cluster cluster with the first cluster cluster B by using the first cluster center comprises:

Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 the integer;

Obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster;

For each first cluster in the K first clusters, determine the first cluster center E whose fifth similarity is greater than a second threshold in each first cluster the second quantity;

Determining the first cluster with the largest second number among the K first clusters as the first cluster B;

The second cluster is merged with the first cluster B.
The method according to claim 3, wherein the combining the single image data with the first cluster C using the first cluster center comprises:

Obtain the sixth similarity between the single image data and the first cluster center F; the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster;

For each first cluster in the K first clusters, determine the first cluster center F whose sixth similarity is greater than a third threshold in each first cluster the third quantity;

Determining the first cluster with the third largest number of the K first clusters as the first cluster C;

The single image data is merged with the first cluster C.
The method according to any one of claims 1 to 7, wherein the M is less than or equal to a fourth threshold; when using the first cluster center to associate the second image data set with the first cluster After the clusters are merged, the method further includes:

Divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or equal to 1 the integer;

In the case that the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the first clustering The class center is updated;

In the case that the R is greater than the fourth threshold, acquiring a fourth quantity of image data in each of the R third sub-clusters;

Sort the R third sub-clusters from large to small according to the fourth number to obtain a fourth cluster sequence, select the first P third sub-clusters in the fourth cluster sequence, and use the The fifth cluster centers corresponding to the P third subclusters update the first cluster centers; the P is less than or equal to the fourth threshold.
The method according to any one of claims 1 to 7, wherein the first cluster is obtained by clustering image data in the first image data set; Divide into M first subclusters, including:

obtaining the seventh similarity between the image data in the first cluster to obtain a similarity matrix;

The first cluster is divided into the M first sub-clusters based on the similarity matrix.
The method according to claim 9, wherein the dividing the first cluster into the M first sub-clusters based on the similarity matrix comprises:

obtaining a connected graph composed of image data in the first cluster as vertices;

Obtain the seventh similarity between vertices in the connected graph by querying the similarity matrix;

The plurality of vertices with the seventh similarity greater than the fifth threshold are divided into a first subcluster to obtain the M first subclusters.
A device for incremental clustering of images, the device comprising:

a first acquisition module, configured to acquire the first cluster of the first image data set;

a first dividing module, configured to divide the first cluster into M first sub-clusters, and obtain a first cluster center corresponding to each first sub-cluster in the M first sub-clusters; The above M is an integer greater than or equal to 1;

The merging module is configured to obtain a second image data set, and use the first cluster center to merge the second image data set and the first cluster.
The apparatus according to claim 11, wherein the first cluster includes a first cluster A, a first cluster B and a first cluster C; the merging module includes:

a clustering submodule, configured to perform clustering on the plurality of image data when the second image data set includes a plurality of image data to obtain isolated image data and a second cluster;

a first merging submodule configured to use the first cluster center to merge the isolated image data with the first cluster A; and,

a second merging submodule, configured to use the first cluster center to merge the second cluster with the first cluster B;

A third merging submodule is configured to merge the single image data with the first cluster C by using the first cluster center when only a single image data exists in the second image data set.
The device according to claim 12, wherein the first cluster has a corresponding second cluster center; the merging module further comprises:

The first determination submodule is configured to determine K first clusters from the first clusters by using the second cluster centers.
The device according to claim 13, wherein the second cluster has a corresponding third cluster center; the first determination submodule comprises:

a first obtaining unit, configured to obtain a first similarity between the isolated image data and the second cluster center;

A first sorting unit, configured to sort the first clusters according to the first similarity from high to low to obtain a first cluster sequence, and select the first Kth in the first cluster sequence a cluster; and,

a second obtaining unit, configured to obtain a second similarity between the third cluster center and the second cluster center;

The second sorting unit is configured to sort the first clusters according to the second similarity from high to low to obtain a second cluster sequence, and select the top K th cluster in the second cluster sequence a cluster; or,

a third obtaining unit, configured to obtain a third similarity between the single image data and the second cluster center;

A third sorting unit, configured to sort the first clusters according to the third similarity from high to low to obtain a third cluster sequence, and select the top K th cluster in the third cluster sequence A cluster of clusters.
The apparatus according to claim 13, the first merging submodule comprises:

A fourth obtaining unit, configured to obtain a fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the K first clusters. the first cluster center corresponding to each first sub-cluster of a cluster;

A first determining unit, configured to, for each of the K first clusters, determine all of the first clusters whose fourth similarity is greater than a first threshold in each of the first clusters; Describe the first quantity of the first cluster center D;

a second determining unit, configured to determine the first cluster with the largest number of first clusters among the K first clusters as the first cluster A;

a first merging unit, configured to merge the isolated image data with the first cluster A.
The apparatus according to claim 13, the second merging submodule comprises:

a first dividing unit, configured to divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; Said N is an integer greater than or equal to 1;

A fifth obtaining unit, configured to obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters. the first cluster center corresponding to each first sub-cluster of a cluster;

A third determination unit, configured to, for each of the K first clusters, determine all of the first clusters whose fifth similarity is greater than the second threshold in each of the first clusters Describe the second quantity of the first cluster center E;

a fourth determination unit, configured to determine the first cluster with the second largest number of the K first clusters as the first cluster B;

The second merging unit is configured to merge the second cluster with the first cluster B.
The apparatus according to claim 13, the third merging submodule comprises:

A sixth obtaining unit, configured to obtain a sixth degree of similarity between the single image data and the first cluster center F; the first cluster center F is each of the K first clusters. the first cluster center corresponding to each first sub-cluster of a cluster;

A fifth determining unit, configured to, for each of the K first clusters, determine all of the first clusters whose sixth similarity is greater than a third threshold in each of the first clusters. Describe the third quantity of the first cluster center F;

a sixth determining unit, configured to determine the first cluster with the third largest number of the K first clusters as the first cluster C;

A third merging unit configured to merge the single image data with the first cluster C.
The apparatus according to any one of claims 11 to 17, wherein the M is less than or equal to a fourth threshold; the apparatus further comprises:

The second dividing module is configured to divide the merged first cluster into R third sub-clusters, and obtain the fifth cluster center of each third sub-cluster in the R third sub-clusters; The above R is an integer greater than or equal to 1;

a first update module, configured to retain the R third subclusters when the R is less than or equal to the fourth threshold, and use the fifth cluster corresponding to the R third subclusters The center updates the first cluster center;

a second acquisition module, configured to acquire a fourth quantity of image data in each of the R third subclusters in the case that the R is greater than the fourth threshold;

The second update module is configured to sort the R third sub-clusters from large to small according to the fourth number to obtain a fourth cluster sequence, and select the first P-th cluster sequence in the fourth cluster sequence. Three sub-clusters, and the first cluster centers are updated with the fifth cluster centers corresponding to the P third sub-clusters; the P is less than or equal to the fourth threshold.
The apparatus according to any one of claims 11 to 17, wherein the first cluster is obtained by clustering image data in the first image data set; the first segmentation module comprises:

an obtaining submodule, configured to obtain the seventh similarity between the image data in the first cluster to obtain a similarity matrix;

A segmentation sub-module, configured to segment the first cluster into the M first sub-clusters based on the similarity matrix.
The apparatus according to claim 19, the segmentation submodule comprises:

a seventh obtaining unit, configured to obtain a connected graph formed by taking the image data in the first cluster as vertices;

a query unit, configured to query the similarity matrix to obtain the seventh similarity between the vertices in the connected graph;

The second dividing unit is configured to divide a plurality of vertices with the seventh similarity greater than the fifth threshold into a first sub-cluster to obtain the M first sub-clusters.
An electronic device includes an input device and an output device, and also includes:

a processor adapted to implement one or more instructions; and,

A computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the method of any one of claims 1 to 10.
A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any one of claims 1 to 10.
A computer program product comprising one or more instructions adapted to be loaded by a processor and to perform the method of any one of claims 1 to 10.