WO2022088390A1 - Image incremental clustering method and apparatus, electronic device, storage medium and program product - Google Patents

Image incremental clustering method and apparatus, electronic device, storage medium and program product Download PDF

Info

Publication number
WO2022088390A1
WO2022088390A1 PCT/CN2020/134074 CN2020134074W WO2022088390A1 WO 2022088390 A1 WO2022088390 A1 WO 2022088390A1 CN 2020134074 W CN2020134074 W CN 2020134074W WO 2022088390 A1 WO2022088390 A1 WO 2022088390A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
clusters
image data
center
similarity
Prior art date
Application number
PCT/CN2020/134074
Other languages
French (fr)
Chinese (zh)
Inventor
刘凯鉴
余世杰
陈浩彬
陈大鹏
赵瑞
Original Assignee
浙江商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江商汤科技开发有限公司 filed Critical 浙江商汤科技开发有限公司
Priority to JP2022524182A priority Critical patent/JP2023502863A/en
Priority to KR1020227013791A priority patent/KR20220070482A/en
Publication of WO2022088390A1 publication Critical patent/WO2022088390A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure is based on a Chinese patent application with an application number of 202011185911.8 and an application date of October 30, 2020, and claims the priority of the Chinese patent application, the entire contents of which are hereby incorporated by reference into the present disclosure.
  • the embodiments of the present disclosure relate to the technical field of computer vision, and in particular, to a method and apparatus for incremental clustering of images, an electronic device, a storage medium, and a program product.
  • the present disclosure provides an incremental clustering method, device, electronic device, storage medium and program product for images, which are beneficial to solve the problem that the clustering effect is affected by the drift of the clustering center in the incremental clustering .
  • a first aspect of the embodiments of the present disclosure provides an incremental clustering method for images, the method comprising:
  • a cluster center the M is an integer greater than or equal to 1; a second image data set is obtained, and the first cluster center is used to combine the second image data set and the first cluster.
  • the first cluster includes a first cluster A, a first cluster B, and a first cluster C;
  • the cluster center merges the second image data set with the first cluster, including:
  • the second image data set includes a plurality of image data
  • cluster the plurality of image data to obtain isolated image data and a second cluster
  • Merging the isolated image data with the first cluster A and merging the second cluster with the first cluster B using the first cluster center; in the second image data
  • the single image data is merged with the first cluster C by using the first cluster center.
  • the plurality of image data in the second image data set is clustered, and the isolated image data and the second cluster are obtained by using the obtained isolated image data and the first cluster A and the first cluster included in the first cluster respectively.
  • the cluster can absorb a single sample and merge between the clusters.
  • the first cluster has a corresponding second cluster center; when using the first cluster center to combine the second image data set with the Before the first cluster is merged, the method further includes:
  • K first clusters are determined from the first clusters by using the second cluster centers.
  • the second cluster has a corresponding third cluster center; the second cluster center is determined from the first cluster by using the second cluster center Get K first clusters, including:
  • the first cluster is screened, which is beneficial to determine the image in the second image data set.
  • the first cluster with more similar data cluster categories.
  • using the first cluster center to combine the isolated image data with the first cluster A includes:
  • the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the fourth similarity in each of the first clusters
  • the first number of the first cluster centers D whose degree is greater than the first threshold; the first cluster with the largest first number among the K first clusters is determined as the first cluster Cluster A; merge the isolated image data with the first cluster cluster A.
  • using the first cluster center to merge the second cluster with the first cluster B includes:
  • the second cluster Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters The first cluster center corresponding to each first sub-cluster of ; for each first cluster in the K first clusters, determine the The second number of the first cluster centers E whose fifth similarity is greater than the second threshold; the first cluster with the largest second number among the K first clusters is determined as the first cluster a cluster B; the second cluster is merged with the first cluster B.
  • the first cluster B has at most first clusters that are closer to the second subcluster of the second cluster. sub-cluster, merging the second cluster into the first cluster B can make the clustering result more accurate.
  • the use of the first cluster center to combine the single image data with the first cluster C includes:
  • the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the sixth similarity in each of the first clusters
  • the third number of the first cluster centers F whose degree is greater than the third threshold; the first cluster with the largest third number among the K first clusters is determined as the first cluster Cluster C; merge the single image data with the first cluster cluster C.
  • the M is less than or equal to a fourth threshold; when using the first cluster center to combine the second image data set with the first cluster Afterwards, the method further includes:
  • the first cluster center is updated; when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters; according to the Sorting the R third subclusters from large to small with the fourth number to obtain a fourth clustering cluster sequence, selecting the first P third subclusters in the fourth clustering cluster sequence, and using the P third subclusters.
  • the fifth cluster center corresponding to the three sub-clusters updates the first cluster center; the P is less than or equal to the fourth threshold.
  • the first cluster is obtained by clustering the image data in the first image data set; the first cluster is divided into M first subclusters, including:
  • the first cluster can be divided into the M first sub-clusters by using the similarity matrix.
  • the dividing the first cluster into the M first sub-clusters based on the similarity matrix includes:
  • the plurality of vertices with the seventh similarity greater than the fifth threshold can be divided into a first sub-cluster by using the connectivity graph.
  • a second aspect of the embodiments of the present disclosure provides an apparatus for incremental clustering of images, and the apparatus includes:
  • a first obtaining module configured to obtain a first cluster of a first image data set
  • a first segmentation module configured to divide the first cluster into M first sub-clusters, and obtain the M first sub-clusters the first cluster center corresponding to each first sub-cluster in the first sub-cluster; the M is an integer greater than or equal to 1
  • the merging module is configured to obtain a second image data set, using the first cluster center The second image dataset is merged with the first cluster.
  • a third aspect of the embodiments of the present disclosure provides an electronic device, the electronic device includes an input device and an output device, and further includes a processor adapted to implement one or more instructions; and a computer storage medium, the computer storage medium storing There is one or more instructions adapted to be loaded by the processor and to perform the steps in any of the embodiments of the first aspect above.
  • a fourth aspect of the embodiments of the present disclosure provides a computer storage medium, where the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the foregoing first aspects steps in the implementation.
  • a fifth aspect of the embodiments of the present disclosure provides a computer program product, the computer program product includes one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the implementations of the first aspect above steps in the method.
  • the embodiment of the present disclosure obtains the first cluster of the first image data set; divides the first cluster into M first sub-clusters, and obtains the M first sub-clusters.
  • the first cluster is divided into a plurality of first sub-clusters, and the second image data set is merged by the first cluster based on the first cluster center of the first sub-cluster.
  • the cluster center (the cluster center of the first cluster, that is, the main center) will be affected by the new image data and cause drift, which is beneficial to Make the clustering results more accurate to improve the clustering effect.
  • the second image data set does not need to perform similarity calculation with the first image data set as a whole, which is beneficial to reduce the computational complexity.
  • FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for incremental clustering of images according to an embodiment of the present disclosure
  • 3A is a schematic diagram of a connectivity graph of a first cluster according to an embodiment of the present disclosure
  • 3B is a schematic diagram of dividing a first cluster into first sub-clusters according to an embodiment of the present disclosure
  • FIG. 4A is a schematic diagram of a clustering result of a second image data set according to an embodiment of the present disclosure
  • 4B is a schematic diagram of merging isolated image data with a first cluster according to an embodiment of the present disclosure
  • 4C is a schematic diagram of merging a second cluster with a first cluster according to an embodiment of the present disclosure
  • FIG. 5 is a schematic flowchart of updating a first cluster center according to an embodiment of the present disclosure
  • FIG. 6 is a schematic flowchart of another method for incremental clustering of images according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of an apparatus for incremental clustering of images according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • An embodiment of the present disclosure proposes an incremental clustering method for image data, which can be implemented based on the application environment shown in FIG. 1 .
  • the application environment mainly includes an image processing center 101 and an image acquisition device 102 .
  • the processing center 101 includes but is not limited to a server 1011, a terminal and a database.
  • the image acquisition device 102 may be a camera or a camera deployed in scenes such as gate passages, shopping malls, and residential areas, and is used to collect images, such as face images, video surveillance images, and the image processing center 101 may be The monitoring center, the image processing center 101 can introduce a video cloud node (Video Cloud Node, VCN) 1012 to manage the video monitoring, for example: display the images on the display 1013, and store the images in the database 1014 after clustering.
  • VCN Video Cloud Node
  • the image collection device 102 may also be a user terminal, and the images it collects may be photos taken by the user, for example, photos posted by the user on social media, and the image processing center may be the processing background of social media.
  • the image acquisition device 102 can upload the collected images to the image processing center 101, and the image processing center 101 performs operations such as feature extraction, cluster classification, face recognition, etc. Since the images on the image acquisition device side are generated incrementally every day , and incremental clustering needs to maintain some clusters. With the continuous increase of image data and the continuous progress of incremental clustering, the cluster center of the original maintained cluster will have the risk of drift, which makes the clustering The effect gradually deteriorates, so the server 1011 can be used to execute the incremental clustering method proposed by the embodiment of the present disclosure, so as to solve the problem that the clustering effect is affected by the drift of the cluster center in the incremental clustering.
  • the above-mentioned server 1011 may be an independent physical server, a server cluster or a distributed system, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware services , domain name services, security services, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • FIG. 2 is a schematic flowchart of an image incremental clustering method provided by an embodiment of the present disclosure.
  • the image incremental clustering method is applied to a server, as shown in FIG. 2 , including steps S21 to S23:
  • the first image dataset refers to an image dataset that has been clustered into multiple clusters before the current batch of image data. ) is the current batch of data, then the data of the face image that has been uploaded to the server before this is the first image data set.
  • the first cluster is a cluster obtained by clustering the image data in the first image data set, and the clustering algorithm used may be a K-means clustering algorithm. It should be understood that each cluster exists The corresponding cluster center, that is, the second cluster center.
  • S22 Divide the first cluster into M first subclusters, and obtain a first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or An integer equal to 1.
  • FIG. 3A is a schematic diagram of a connectivity graph of a first cluster according to an embodiment of the present disclosure.
  • the connectivity graph of the first cluster includes a first cluster 301 and a second cluster center 302 , wherein the first clustering cluster 301 is a clustering cluster obtained by clustering the image data in the first image data set; the second clustering center 302 is that each clustering cluster has a corresponding clustering center.
  • FIG. 3B is a schematic diagram of dividing a first cluster into first sub-clusters according to an embodiment of the present disclosure.
  • the division of the first cluster into first sub-clusters includes a first cluster 301 , the second cluster center 302, the first sub-cluster 303 and the first cluster center 304, wherein the first sub-cluster 303 is a sub-cluster obtained by dividing the first cluster cluster 301; the first cluster center 304 is the cluster center of each first subcluster.
  • the first sub-cluster is the sub-cluster obtained by dividing the first cluster. For each first cluster in the first data set, the similarity between the image data in the first cluster is obtained, that is, the first sub-cluster is obtained. Seven degrees of similarity, get a similarity matrix, and then obtain a connected graph with the image data in the first cluster as vertices, as shown in Figure 3A, for every two vertices in the connected graph, query from the similarity matrix Its similarity, in the case of clustering the first image data set, the threshold used is X, that is, the fifth threshold, then the multiple image data whose similarity is greater than this X is divided into a more compact first sub-cluster , so that M first subclusters are obtained. As shown in FIG.
  • the first cluster shown in FIG. 3A is divided into M first subclusters through the analysis of the connected graph.
  • the cluster center of each first sub-cluster in the M first sub-clusters is obtained, that is, the first cluster center, then each first cluster cluster can be composed of a main cluster Center and M sub-cluster center descriptions. Describing the first cluster with a more compact sub-cluster is beneficial to solve the problem that the expression ability of a single main cluster center is weakened with the incorporation of new image data.
  • S23 Acquire a second image data set, and combine the second image data set with the first cluster by using the first cluster center.
  • FIG. 4A is a schematic diagram of a clustering result of a second image dataset provided by an embodiment of the present disclosure.
  • the clustering result of the second image dataset includes a second image dataset 401 , a second cluster Cluster 402, isolated image data 403 and third cluster center 404, wherein the second image data set 401 is the data set of the current batch of images uploaded by the image acquisition device; The image data is clustered by clustering; the isolated image data 403 is the isolated image data that has not been clustered; the third cluster center 404 is the cluster center where each second cluster exists.
  • FIG. 4B is a schematic diagram of merging isolated image data with a first cluster according to an embodiment of the present disclosure. As shown in FIG. 4B , merging isolated image data with a first cluster includes a first cluster A 405 and an isolated cluster A 405 . Image data 403, wherein the first cluster A 405 is the first cluster A determined in the first cluster.
  • FIG. 4C is a schematic diagram of merging a second cluster with a first cluster according to an embodiment of the present disclosure. As shown in FIG. 4C , the combination of the second cluster and the first cluster includes the first cluster B 406 and the second cluster 407, wherein the first cluster B 406 and the second cluster 407 belong to the same cluster category.
  • the second image data set is the data set of the current batch of images uploaded by the image acquisition device, and is obtained from the images uploaded by the image acquisition device.
  • the first cluster includes a first cluster A, a first cluster B, and a first cluster C, and when the second image data set includes multiple image data, cluster to get the clustering result.
  • the clustering result includes unclustered isolated image data and several second clusters, and each of the several second clusters has a corresponding cluster center, that is, the third cluster center, see Figure 4A.
  • the first cluster A is determined from the first cluster, and the first cluster center is used to merge it with the first cluster A, that is, as shown in FIG.
  • the isolated image data Absorbed into the first cluster A, the first cluster A and the isolated image data belong to the same cluster category.
  • For each second cluster determine the first cluster B from the first cluster, and use the first cluster center to merge it with the first cluster B, that is, as shown in FIG. 4C .
  • the first cluster B and the second cluster belong to the same cluster category. Similar to the isolated image data, in the case where there is only a single image data in the second image data set, that is, the newly added image data is only a single image data, and there is no need to perform a clustering operation on the second image data set.
  • the first cluster C is determined, and the first cluster C is merged with the first cluster C by using the first cluster center, and the first cluster C and the single image data belong to the same cluster category.
  • the method before using the first cluster center to combine the second image data set with the first cluster, the method further includes:
  • K first clusters are determined from the first clusters by using the second cluster centers.
  • all the first clusters need to be preliminarily screened by using the second cluster center of the first cluster, and from all the first clusters K first clusters are determined, and then the above-mentioned first cluster A and first cluster B, or first cluster C are selected from the K clusters.
  • the K first clusters may be the top K after sorting all the first clusters by using the second cluster center, for example: the top 20 of the 100 first clusters after sorting
  • the K first clusters may also be all sorted first clusters, for example, 100 first clusters are still selected after sorting.
  • Using the second cluster center to preliminarily screen the first cluster is beneficial to determine the first cluster that is more similar to the image data clustering category in the second image data set, such as the above-mentioned first cluster A, the first cluster B and the first cluster C.
  • the determining K first clusters from the first clusters by using the second cluster center includes:
  • the second image data set when the second image data set is clustered to obtain isolated image data and multiple second clusters, for the isolated sample image data, calculate the difference between it and the second cluster center of each first cluster.
  • the second cluster calculate the second similarity between the corresponding third cluster center and the second cluster center of each first cluster, respectively according to the first similarity Sort all the first clusters from high to low degree and the second similarity to obtain the corresponding first and second cluster sequences, and then from the first and second clusters
  • the first K first cluster clusters are respectively selected from the cluster sequence.
  • the third similarity between the single image data and the second cluster center of each first cluster is calculated, and the third similarity is from high to low.
  • the first clusters are sorted to obtain a corresponding third cluster sequence, and then the top K first clusters are selected from the third cluster sequence.
  • the using the first cluster center to combine the isolated image data with the first cluster A includes:
  • the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the fourth similarity in each of the first clusters
  • the first number of the first cluster centers D whose degree is greater than the first threshold; the first cluster with the largest first number among the K first clusters is determined as the first cluster Cluster A; merge the isolated image data with the first cluster cluster A.
  • the first cluster A needs to be determined from the first K first clusters selected. It should be noted that the first K first clusters may be sorted All the first clusters of . First, the similarity between the isolated image data and the cluster center (ie, the first cluster center D) of each first sub-cluster of each first cluster in the K first clusters is calculated, and is determined as the first cluster center D.
  • the K first clusters Four similarity degrees, and then analyze the K first clusters to determine the number of first cluster centers D in each first cluster that satisfy the fourth similarity greater than the first threshold, and determine it as the first number, Determine the first cluster with the largest first number as the first cluster A, for example, among the K first clusters, the first cluster 1 has 20 such first cluster centers D, The first cluster 2 has 18 such first cluster centers D, ..., the first cluster K has 15 such first cluster centers D, and the first cluster 1 has the largest number, then it is It is determined to be the first cluster A, that is to say, the first sub-cluster A that is most similar to the isolated image data exists in the first cluster A. Merging the isolated image data into the first cluster A can make the clustering The results are more accurate.
  • the merging the second cluster cluster and the first cluster cluster B by using the first cluster center includes:
  • the second cluster Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters The first cluster center corresponding to each first sub-cluster of ; for each first cluster in the K first clusters, determine the The second number of the first cluster centers E whose fifth similarity is greater than the second threshold; the first cluster with the largest second number among the K first clusters is determined as the first cluster a cluster B; the second cluster is merged with the first cluster B.
  • the first cluster B needs to be determined from the first K first clusters selected.
  • the clusters can be all the first cluster clusters after sorting. First, divide each second cluster into N second sub-clusters according to the method of dividing the first cluster, and calculate the cluster center of each second sub-cluster, that is, the fourth cluster center, and then calculate The similarity between the fourth cluster center and the cluster center (ie, the first cluster center E) of each first sub-cluster of each first cluster in the K first clusters is determined as eh
  • the fifth similarity and then analyze the K first clusters to determine the number of first cluster centers E that satisfy the fifth similarity greater than the second threshold in each first cluster, and determine it as the second number , determine the first cluster with the second largest number as the first cluster B, for example: among the K first clusters, the first cluster 1 has 30 such first cluster centers E , the first cluster 2 has 15 such first cluster centers E, ..., the first cluster K has
  • the combining the single image data with the first cluster C by using the first cluster center includes:
  • the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the sixth similarity in each of the first clusters
  • the third number of the first cluster centers F whose degree is greater than the third threshold; the first cluster with the largest third number among the K first clusters is determined as the first cluster Cluster C; merge the single image data with the first cluster cluster C.
  • the first cluster C For the merging of single image data, it is necessary to determine the first cluster C from the selected top K first clusters. It should be noted that the top K first clusters may be sorted All first cluster clusters. First, the similarity between the single image data and the cluster center (ie, the first cluster center F) of each first sub-cluster of each first cluster in the K first clusters is calculated, and it is determined as the first cluster center.
  • the cluster center ie, the first cluster center F
  • the M is less than or equal to a fourth threshold; after the second image data set and the first cluster are merged by using the first cluster center, as shown in FIG. 5 As shown, the method further includes:
  • the merged first cluster is divided into R third sub-clusters according to the method of dividing the first cluster, and the fifth cluster center of each third sub-cluster is calculated, and the third sub-cluster is determined according to R.
  • the number of three sub-clusters if the number of third sub-clusters is less than or equal to the fourth threshold, for example: 20, the R third sub-clusters are reserved, and the fifth cluster center of these R third sub-clusters is used as The new sub-center of the merged first cluster to update the original first cluster center, then the merged first cluster is described by the second cluster center and the R fifth cluster centers .
  • the fourth threshold for example: 20
  • the R third sub-clusters are reserved, and the fifth cluster center of these R third sub-clusters is used as The new sub-center of the merged first cluster to update the original first cluster center, then the merged first cluster is described by the second cluster center and the R fifth cluster centers .
  • the R third sub-clusters are sorted according to the number of image data in each third sub-cluster (that is, the fourth number) from large to small to obtain the fourth cluster.
  • Cluster-like sequence select the first P third sub-clusters to keep, for example: only keep the first 20 third sub-clusters, discard the rest of the third sub-clusters, and use the fifth cluster center of the P third sub-clusters as the merge Then, the merged first cluster is described by using the second cluster center and the P fifth cluster centers. It should be understood that each time a cluster is divided into sub-clusters, only a preset number of sub-clusters are reserved.
  • both M and N are less than or equal to the fourth threshold, so that when there are many sub-clusters.
  • the embodiment of the present disclosure obtains the first cluster of the first image data set; divides the first cluster into M first sub-clusters, and obtains the M first sub-clusters.
  • the first cluster is divided into a plurality of first sub-clusters, and the second image data set is merged by the first cluster based on the first cluster center of the first sub-cluster.
  • the cluster center (the cluster center of the first cluster, that is, the main center) will be affected by the new image data and cause drift, which is beneficial to Make the clustering results more accurate to improve the clustering effect.
  • the second image data set does not need to perform similarity calculation with the first image data set as a whole, which is beneficial to reduce the computational complexity.
  • FIG. 6 is a schematic flowchart of another image incremental clustering method provided by an embodiment of the present disclosure, as shown in FIG. 6, including steps S61 to S66:
  • S62 Divide the first cluster into M first subclusters, and obtain a first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or an integer equal to 1;
  • the incremental clustering method needs to maintain some clusters in the clustering process.
  • the traditional clustering algorithm uses a single cluster center to describe a cluster, such as taking the mean of all sample features in the cluster to obtain the cluster center , but different clusters have different degrees of sparseness, so simply adopting a single cluster center with mean value is easy to lose the rich sample information inside the cluster. As the process of incremental clustering continues, the clustering effect will be gradually affected. .
  • the similarity matrix S can be obtained. Assuming that the threshold used for clustering is ⁇ , a higher threshold ⁇ ' needs to be set, that is, ⁇ '> ⁇ is satisfied to cluster a cluster. The cluster is split into several tighter subclusters.
  • Clusters can be analyzed using methods based on connectivity graph analysis to obtain the polycentricity of clusters.
  • the similarity matrix is calculated for the clusters.
  • a cluster can be divided into several more compact sub-clusters, so that multiple sub-cluster centers can be obtained, plus as The center of the cluster in the main center constitutes the multi-center description of the cluster.
  • using the design analysis of cluster multi-center based on connected graph analysis to obtain multiple sub-centers includes: first, for each cluster, by setting a higher threshold (needs to be higher than the clustering threshold), the cluster is Scatter into several more compact connected sub-graphs, and calculate the sub-centers for each connected sub-graph, so that multiple sub-centers can be obtained.
  • step S69 generating a number of clusters and unclustered isolated samples, and obtaining the existing clustering results in step S67 for cluster merging.
  • Multi-center incremental clustering method based on a single main center and multiple sub-centers On the basis of obtaining the main center and multiple sub-centers, in the process of incremental clustering, first use the main center and new data to perform a TopK search. sieve, and then further determine whether to absorb new samples or other clusters based on multiple sub-centers.
  • the process of cluster merging involves merging between clusters and absorbing individual isolated samples into clusters. For the absorption of isolated sample points, based on the multi-center design, a lower threshold is first set, and the main center is used to search for TopK, and then according to whether the sub-center and the sample point meet the clustering threshold ⁇ . In this case, there may be multiple clusters and isolated sample points to meet such requirements, and the cluster with the largest number of sub-centers that meet the requirements is used as the target cluster.
  • a lower threshold is also used to filter and retrieve TopK, and then according to whether there are sub-center pairs between clusters that meet the threshold requirements, when there are multiple clusters that meet the requirements, take the threshold that meets the requirements.
  • the cluster with the largest number of sub-centers is used as the target cluster.
  • a single main center and multiple sub-centers in the multi-center mechanism are comprehensively utilized.
  • the main center is used to participate in the calculation of similarity, and then through multiple sub-centers and pending
  • the similarity of a single sample or cluster of clusters is calculated to further determine whether the absorption of a single sample or the merging of clusters is completed.
  • This architecture comprehensively utilizes the advantages of multi-center representation, which can improve the clustering effect without increasing too much computational complexity.
  • each sub-center can be sorted from large to small according to the number of sample points represented, for example, only the first 20 sub-centers are taken at most.
  • a multi-center construction method of face clusters is proposed, which can be used to obtain the description of a single main center and multiple sub-centers of a face cluster. It solves the problem that the description of a cluster is to maintain a cluster center, ignoring some compact sub-cluster information inside the cluster, and as the data continues to increase, due to the maintenance of a single cluster center, the cluster center will continue to be subject to new changes.
  • the influence of the samples has a certain risk of center drift, and the influence of the existing samples in the cluster will continue to weaken, reducing the expression ability of the center.
  • a single cluster center will lose the sample information inside the cluster during the incremental clustering process.
  • a single cluster center is usually maintained for each cluster, and data is continuously added.
  • the clustering center is used to calculate the similarity between new samples or clusters to merge and update the clusters, and the clustering center will also be updated continuously.
  • a single multi-center will gradually lose the rich sample information within the cluster, and it is also prone to drift, which will affect the clustering effect over time.
  • an incremental clustering architecture based on multi-center is proposed. Using this architecture, the computational complexity and clustering accuracy of incremental clustering using multi-center representation can be well balanced. The merging of samples and clusters solves the problem that the multi-center setting of the prior art will have a great impact on the computing speed and storage of clustering in large-scale data scenarios.
  • an embodiment of the present disclosure further provides an apparatus for incremental clustering of images. Please refer to FIG. 7 .
  • FIG. 7 provides an image increment according to an embodiment of the present disclosure.
  • a schematic diagram of the structure of the clustering device, as shown in Figure 7, the device includes:
  • the first acquisition module 71 is configured to acquire the first cluster of the first image data set
  • a first segmentation module 72 configured to segment the first cluster into M first sub-clusters, and obtain a first cluster center corresponding to each first sub-cluster in the M first sub-clusters;
  • the M is an integer greater than or equal to 1;
  • the merging module 73 is configured to obtain a second image data set, and use the first cluster center to merge the second image data set and the first cluster.
  • the first cluster includes a first cluster A, a first cluster B and a first cluster C;
  • the merging module 73 is configured to: if the second image data set includes a plurality of image data, cluster the plurality of image data, obtaining isolated image data and a second cluster; using the first cluster center to merge the isolated image data with the first cluster A; and using the first cluster center to combine the first cluster The two clusters are merged with the first cluster B; in the case that there is only a single image data in the second image data set, the first cluster center is used to combine the single image data with the second image data A cluster C is merged.
  • the first cluster has a corresponding second cluster center; when using the first cluster center to associate the second image data set with the first cluster Before merging, the merging module 73 is further configured to: determine K first clusters from the first clusters by using the second cluster center.
  • the second cluster has a corresponding third cluster center; after using the second cluster center to determine K first clusters from the first cluster
  • the merging module 73 is configured to: obtain a first similarity between the isolated image data and the second cluster center; Sorting the clusters to obtain a first cluster sequence, and selecting the top K first clusters in the first cluster sequence; and obtaining the distance between the third cluster center and the second cluster center the second similarity; sort the first clusters according to the second similarity from high to low to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence Clustering; or, obtaining a third similarity between the single image data and the second cluster center; sorting the first clusters according to the third similarity from high to low to obtain For the third cluster sequence, the top K first clusters in the third cluster sequence are selected.
  • the merging module 73 is configured to: obtain the isolated image data and the first cluster A.
  • a fourth similarity between cluster centers D; the first cluster center D is the The first cluster center; for each first cluster in the K first clusters, determine that the fourth similarity in each first cluster is greater than the first threshold.
  • the first number of the first cluster centers D; the first cluster with the largest number of the K first clusters is determined as the first cluster A; the isolated image The data is merged with the first cluster A.
  • the merging module 73 is configured to: combine the second cluster The cluster is divided into N second subclusters, and the fourth cluster center corresponding to each second subcluster in the N second subclusters is obtained; the N is an integer greater than or equal to 1; obtain the The fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each first child of each first cluster in the K first clusters the first cluster center corresponding to the cluster; for each first cluster in the K first clusters, determine that the fifth similarity in each first cluster is greater than the first cluster
  • the second number of the first cluster centers E with two thresholds; the first cluster with the largest second number among the K first clusters is determined as the first cluster B; The second cluster is merged with the first cluster B.
  • the merging module 73 is configured to: obtain the single image data and the first cluster C. a sixth degree of similarity between cluster centers F; the first cluster center F is the The first cluster center; for each first cluster in the K first clusters, determine the sixth similarity greater than the third threshold in each first cluster. the third number of the first cluster centers F; the first cluster with the third largest number of the K first clusters is determined as the first cluster C; the single image The data is merged with the first cluster C.
  • the M is less than or equal to a fourth threshold; the first dividing module 72 is further configured to: divide the merged first cluster into R third sub-clusters, and obtain the The fifth cluster center of each third sub-cluster in the R third sub-clusters; the R is an integer greater than or equal to 1; when the R is less than or equal to the fourth threshold, keep the R third sub-clusters, and the first cluster center is updated with the fifth cluster center corresponding to the R third sub-clusters; when the R is greater than the fourth threshold, Obtain the fourth quantity of image data in each of the R third sub-clusters; sort the R third sub-clusters according to the fourth quantity in descending order to obtain a fourth cluster Cluster-like sequence, select the first P third subclusters in the fourth clustering cluster sequence, and use the fifth clustering centers corresponding to the P third subclusters to update the first clustering center ; the P is less than or equal to the fourth threshold.
  • the first dividing module 72 is configured to: acquire between the image data in the first cluster The seventh similarity is obtained, and a similarity matrix is obtained; the first cluster is divided into the M first sub-clusters based on the similarity matrix.
  • the first dividing module 72 is configured to: obtain the first sub-cluster with the first sub-cluster
  • the image data in the cluster is a connected graph composed of vertices; the seventh similarity between the vertices in the connected graph is obtained by querying the similarity matrix; the seventh similarity is greater than the fifth similarity
  • the multiple vertices of the threshold are divided into a first sub-cluster to obtain the M first sub-clusters.
  • each unit in the apparatus for incremental clustering of images shown in FIG. 7 may be respectively or all merged into one or several other units to form, or some of the unit(s) may be further It can be further divided into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure.
  • the above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit. In other embodiments of the present disclosure, the image-based incremental clustering apparatus may also include other units, and in practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by a plurality of units in cooperation.
  • a general-purpose computing device such as a computer
  • a general-purpose computing device such as a computer
  • a general-purpose computing device such as a computer
  • processing elements such as a central processing unit (CPU), random access storage medium (RAM), read-only storage medium (ROM), etc.
  • storage elements such as a central processing unit (CPU), random access storage medium (RAM), read-only storage medium (ROM), etc.
  • Running a computer program capable of executing the steps involved in the corresponding method as shown in FIG. 2 or FIG. 6, to construct the incremental clustering apparatus of the image as shown in FIG. 7, and to realize the present invention.
  • the computer program can be recorded on, for example, a computer-readable recording medium, and loaded in the above-mentioned computing device through the computer-readable recording medium, and executed therein.
  • the embodiments of the present disclosure further provide an electronic device.
  • the electronic device includes at least a processor 81 , an input device 82 , an output device 83 and a computer storage medium 84 .
  • the processor 81 , the input device 82 , the output device 83 and the computer storage medium 84 in the electronic device may be connected through a bus or other means.
  • the computer storage medium 84 may be stored in the memory of the electronic device, the computer storage medium 84 configured to store a computer program including program instructions, the processor 81 configured to execute the program stored by the computer storage medium 84 instruction.
  • the processor 81 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and the control core of the electronic device, which is suitable for implementing one or more instructions, and is suitable for loading and executing one or more instructions to achieve the corresponding Method flow or corresponding function.
  • CPU Central Processing Unit, central processing unit
  • the processor 81 of the electronic device provided by the embodiment of the present disclosure may be configured to perform incremental clustering processing of a series of images:
  • the first cluster includes a first cluster A, a first cluster B and a first cluster C; the processor 81 executes the process of using the first cluster center to Combining the second image data set with the first cluster includes: in the case that the second image data set includes a plurality of image data, clustering the plurality of image data to obtain an isolated image data and a second cluster; combining the isolated image data with the first cluster A using the first cluster center; and combining the second cluster using the first cluster center The cluster is merged with the first cluster cluster B; in the case that there is only a single image data in the second image data set, the single image data is combined with the first cluster by using the first cluster center Cluster C is merged.
  • the first cluster has a corresponding second cluster center; before using the first cluster center to merge the second image data set and the first cluster, The processor 81 is further configured to perform: using the second cluster center to determine K first clusters from the first clusters.
  • the second cluster has a corresponding third cluster center; the processor 81 executes the process of determining the Kth cluster from the first cluster by using the second cluster center.
  • a cluster including: acquiring a first similarity between the isolated image data and the second cluster center; sorting the first clusters according to the first similarity from high to low Obtain the first cluster sequence, and select the top K first clusters in the first cluster sequence; and, obtain the second cluster center between the third cluster center and the second cluster center.
  • the processor 81 performing the process of using the first cluster center to merge the isolated image data with the first cluster A includes: acquiring the isolated image data and the first cluster A.
  • the fourth similarity between centers D; the first cluster center D is the first cluster corresponding to each first sub-cluster of each first cluster in the K first clusters Class center; for each of the K first clusters, determine the first cluster whose fourth similarity is greater than a first threshold in each of the first clusters the first number of cluster centers D; determine the first cluster with the largest first number among the K first clusters as the first cluster A; combine the isolated image data with all The first cluster cluster A is merged.
  • the processor 81 performing the using the first cluster center to merge the second cluster with the first cluster B includes: dividing the second cluster is N second subclusters, and obtains the fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is an integer greater than or equal to 1; obtain the fourth cluster center
  • the processor 81 performing the merging of the single image data and the first cluster C by using the first cluster center includes: acquiring the single image data and the first cluster C.
  • the sixth similarity between centers F; the first cluster center F is the first cluster corresponding to each first sub-cluster of each first cluster in the K first clusters class center; for each first cluster in the K first clusters, determine the first cluster whose sixth similarity is greater than a third threshold in each first cluster the third number of class centers F; determine the first cluster with the largest third number among the K first clusters as the first cluster C; combine the single image data with all The first cluster cluster C is merged.
  • the M is less than or equal to a fourth threshold; after merging the second image data set and the first cluster by using the first cluster center, the processor 81 is further configured to: Execute: divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or an integer equal to 1; when the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the fifth cluster center pair corresponding to the R third subclusters is used The first cluster center is updated; when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters; according to Sorting the R third subclusters in descending order of the fourth number to obtain a fourth clustering cluster sequence, selecting the first P third subclusters in the fourth clustering clustering sequence, and using the P The fifth cluster center corresponding to the third sub-cluster updates the first cluster center; the P is less than or equal to the fourth threshold.
  • the first cluster is obtained by clustering the image data in the first image data set; the processor 81 executes the process of dividing the first cluster into M first clusters. sub-cluster, including: acquiring the seventh similarity between the image data in the first cluster to obtain a similarity matrix; dividing the first cluster into the M based on the similarity matrix first subcluster.
  • the processor 81 performing the dividing the first cluster into the M first sub-clusters based on the similarity matrix includes: obtaining a The image data is a connected graph composed of vertices; the seventh similarity between the vertices in the connected graph is obtained by querying the similarity matrix; the plurality of vertices whose seventh similarity is greater than the fifth threshold are obtained Divide into a first sub-cluster to obtain the M first sub-clusters.
  • the above-mentioned electronic devices may be computers, computer hosts, servers, cloud servers, server clusters, etc.
  • the electronic devices may include, but are not limited to, a processor 81, an input device 82, an output device 83, and a computer storage medium 84.
  • the input device 82 It can be a keyboard, a touch screen, etc.
  • the output device 83 can be a speaker, a display, a radio frequency transmitter, and the like.
  • the schematic diagram may be an example of an electronic device, and does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or different components.
  • the processor 81 of the electronic device implements the steps in the above-mentioned incremental image clustering method when executing the computer program, the above-mentioned embodiments of the incremental image clustering method are all applicable to the electronic device, and can achieve the same or similar beneficial effects.
  • Embodiments of the present disclosure further provide a computer program product, which implements any one of the methods in the foregoing embodiments when the computer program product is executed by a processor.
  • the computer program product can be implemented in hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in other embodiments of the present disclosure, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • Embodiments of the present disclosure also provide a computer storage medium (Memory), where the computer storage medium is a memory device in an electronic device and is configured to store programs and data.
  • the computer storage medium here may include both a built-in storage medium in the terminal, and certainly also an extended storage medium supported by the terminal.
  • the computer storage medium provides storage space, and the storage space stores the operating system of the terminal.
  • one or more instructions suitable for being loaded and executed by the processor 81 are also stored in the storage space, and these instructions may be one or more computer programs (including program codes).
  • the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; in some embodiments of the present disclosure, it may also be at least one disk memory.
  • the computer program of the computer storage medium includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, and the like.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc.
  • the first cluster is divided into a plurality of first sub-clusters, and the first cluster cluster is combined with the second image data set based on the first cluster center of the first sub-cluster.
  • the first clustering center is used to solve the problem that with the increase of image data, the clustering center will drift due to the influence of the newly added image data, which is conducive to making the clustering result more accurate and improving the clustering effect.

Abstract

An image incremental clustering method and apparatus, an electronic device, a storage medium and a program product. Said method comprises: acquiring a first clustering cluster of a first image data set (S21); dividing the first clustering cluster into M first sub-clusters, and acquiring a first clustering center corresponding to each first sub-cluster among the M first sub-clusters, M being an integer greater than or equal to 1 (S22); and acquiring a second image data set, and combining the second image data set with the first clustering cluster by using the first clustering center (S23).

Description

图像的增量聚类方法、装置、电子设备、存储介质及程序产品Incremental clustering method, apparatus, electronic device, storage medium and program product for images
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开基于申请号为202011185911.8、申请日为2020年10月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本公开。The present disclosure is based on a Chinese patent application with an application number of 202011185911.8 and an application date of October 30, 2020, and claims the priority of the Chinese patent application, the entire contents of which are hereby incorporated by reference into the present disclosure.
技术领域technical field
本公开实施例涉及计算机视觉技术领域,尤其涉及一种图像的增量聚类方法及装置、电子设备、存储介质及程序产品。The embodiments of the present disclosure relate to the technical field of computer vision, and in particular, to a method and apparatus for incremental clustering of images, an electronic device, a storage medium, and a program product.
背景技术Background technique
深度学习的发展极大地推动了图像处理技术的进步,以人脸识别为例,通过有监督学习得到的人脸识别模型在识别精度上有了质的飞跃,然而在面对爆炸式增长的无标签图像数据时,如何准确而快速地进行分类,仍是一个值得讨论和研究的问题。The development of deep learning has greatly promoted the progress of image processing technology. Taking face recognition as an example, the face recognition model obtained through supervised learning has made a qualitative leap in recognition accuracy. When labeling image data, how to classify it accurately and quickly is still an issue worthy of discussion and research.
发明内容SUMMARY OF THE INVENTION
针对上述问题,本公开提供了一种图像的增量聚类方法、装置、电子设备、存储介质及程序产品,有利于解决增量式聚类中因聚类中心发生漂移影响聚类效果的问题。In view of the above problems, the present disclosure provides an incremental clustering method, device, electronic device, storage medium and program product for images, which are beneficial to solve the problem that the clustering effect is affected by the drift of the clustering center in the incremental clustering .
为实现上述目的,本公开实施例第一方面提供了一种图像的增量聚类方法,该方法包括:To achieve the above purpose, a first aspect of the embodiments of the present disclosure provides an incremental clustering method for images, the method comprising:
获取第一图像数据集的第一聚类簇;将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。Obtain the first cluster of the first image data set; divide the first cluster into M first sub-clusters, and obtain the first sub-cluster corresponding to each of the M first sub-clusters. a cluster center; the M is an integer greater than or equal to 1; a second image data set is obtained, and the first cluster center is used to combine the second image data set and the first cluster.
结合第一方面,在一种可能的实施方式中,所述第一聚类簇包括第一聚类簇A、第一聚类簇B和第一聚类簇C;所述利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并,包括:With reference to the first aspect, in a possible implementation manner, the first cluster includes a first cluster A, a first cluster B, and a first cluster C; The cluster center merges the second image data set with the first cluster, including:
在所述第二图像数据集中包括多个图像数据的情况下,对所述多个图像数据进行聚类,得到孤立图像数据和第二聚类簇;利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并;以及,利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并;在所述第二图像数据集中只存在单个图像数据的情况下,利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并。When the second image data set includes a plurality of image data, cluster the plurality of image data to obtain isolated image data and a second cluster; Merging the isolated image data with the first cluster A; and merging the second cluster with the first cluster B using the first cluster center; in the second image data When only a single image data exists in the set, the single image data is merged with the first cluster C by using the first cluster center.
这样,对第二图像数据集中的多个图像数据进行聚类,利用得到孤立图像数据和第二聚类簇分别与第一聚类簇中包括的第一聚类簇A、第一聚类簇B和第一聚类簇C进行合并,可以实现聚类簇吸收单个样本和聚类簇间的合并。In this way, the plurality of image data in the second image data set is clustered, and the isolated image data and the second cluster are obtained by using the obtained isolated image data and the first cluster A and the first cluster included in the first cluster respectively. By merging B and the first cluster C, the cluster can absorb a single sample and merge between the clusters.
结合第一方面,在一种可能的实施方式中,所述第一聚类簇存在对应的第二聚类中心;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之前,所述方法还包括:With reference to the first aspect, in a possible implementation manner, the first cluster has a corresponding second cluster center; when using the first cluster center to combine the second image data set with the Before the first cluster is merged, the method further includes:
利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇。K first clusters are determined from the first clusters by using the second cluster centers.
结合第一方面,在一种可能的实施方式中,所述第二聚类簇存在对应的第三聚类中心;所述利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇,包括:With reference to the first aspect, in a possible implementation manner, the second cluster has a corresponding third cluster center; the second cluster center is determined from the first cluster by using the second cluster center Get K first clusters, including:
获取所述孤立图像数据与所述第二聚类中心之间的第一相似度;根据所述第一相似度从高到低对所述第一聚类簇进行排序得到第一聚类簇序列,选取所述第一聚类簇序列中前K个第一聚类簇;以及,获取所述第三聚类中心与所述第二聚类中心之间的第二相似度;根据所述第二相似度从高到低对所述第一聚类簇进行排序得到第二聚类簇序列,选取所述第二聚类簇序列中前K个第一聚类簇;或者,获取所述单个图像数据与所述第二聚类中心之间的第三相似度;根据所述第三相似度从高到低对所述第一聚类簇进行排序得到第三聚类簇序列,选取所述第三聚类簇序列中前K个第一聚类簇。Obtain the first similarity between the isolated image data and the second cluster center; sort the first clusters according to the first similarity from high to low to obtain a first cluster sequence , select the top K first clusters in the first cluster sequence; and, obtain the second similarity between the third cluster center and the second cluster center; Second, sort the first clusters from high to low similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence; or, obtain the single The third similarity between the image data and the second cluster center; sort the first cluster according to the third similarity from high to low to obtain a third cluster sequence, and select the The first K first clusters in the third cluster sequence.
这样,利用计算得到的第二聚类中心与孤立图像数据、第三聚类中心和单个图像数据的相似度,对第一聚类簇进行筛选,有利于确定出与第二图像数据集中的图像数据聚类类别更相近的第一聚类簇。In this way, using the calculated similarity between the second cluster center and the isolated image data, the third cluster center and the single image data, the first cluster is screened, which is beneficial to determine the image in the second image data set. The first cluster with more similar data cluster categories.
结合第一方面,在一种可能的实施方式中,所述利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并,包括:With reference to the first aspect, in a possible implementation manner, using the first cluster center to combine the isolated image data with the first cluster A includes:
获取所述孤立图像数据与第一聚类中心D之间的第四相似度;所述第一聚类中心D为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第四相似度大于第一阈值的所述第一聚类中心D的第一数量;将所述K个第一聚类簇中所述第一数量最大的第一聚类簇确定为所述第一聚类簇A;将所述孤立图像数据与所述第一聚类簇A合并。Obtain the fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the fourth similarity in each of the first clusters The first number of the first cluster centers D whose degree is greater than the first threshold; the first cluster with the largest first number among the K first clusters is determined as the first cluster Cluster A; merge the isolated image data with the first cluster cluster A.
这样,第一聚类簇A中存在最多与孤立图像数据更相近的第一子簇,将孤立图像数据合并到第一聚类簇A中能够使得聚类结果更为准确。In this way, there are first sub-clusters that are at most similar to the isolated image data in the first cluster A, and combining the isolated image data into the first cluster A can make the clustering result more accurate.
结合第一方面,在一种可能的实施方式中,所述利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并,包括:With reference to the first aspect, in a possible implementation manner, using the first cluster center to merge the second cluster with the first cluster B includes:
将所述第二聚类簇分割为N个第二子簇,并获取所述N个第二子簇中每个第二子簇对应的第四聚类中心;所述N为大于或等于1的整数;获取所述第四聚类中心与第一聚类中心E之间的第五相似度;所述第一聚类中心E为K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第五相似度大于第二阈值的所述第一聚类中心E的第二数量;将所述K个第一聚类簇中所述第二数量最大的第一聚类簇确定为所述第一聚类簇B;将所述第二聚类簇与所述第一聚类簇B合并。Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters The first cluster center corresponding to each first sub-cluster of ; for each first cluster in the K first clusters, determine the The second number of the first cluster centers E whose fifth similarity is greater than the second threshold; the first cluster with the largest second number among the K first clusters is determined as the first cluster a cluster B; the second cluster is merged with the first cluster B.
这样,第一聚类簇K数量最多,则将其确定为第一聚类簇B,也就是说第一聚类簇B存在最多与第二聚类簇的第二子簇更相近的第一子簇,将第二聚类簇合并到第一聚类簇B中能够使得聚类结果更为准确。In this way, if the number of the first cluster K is the largest, it is determined as the first cluster B, that is to say, the first cluster B has at most first clusters that are closer to the second subcluster of the second cluster. sub-cluster, merging the second cluster into the first cluster B can make the clustering result more accurate.
结合第一方面,在一种可能的实施方式中,所述利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并,包括:With reference to the first aspect, in a possible implementation manner, the use of the first cluster center to combine the single image data with the first cluster C includes:
获取所述单个图像数据与第一聚类中心F之间的第六相似度;所述第一聚类中心F为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第六相似度大于第三阈值的所述第一聚类中心F的第三数量;将所述K个第一聚类簇中所述第三数量最大的第一聚类簇确定为所述第一聚类簇C;将所述单个图像数据与所述第一聚类簇C合并。Obtain the sixth similarity between the single image data and the first cluster center F; the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the sixth similarity in each of the first clusters The third number of the first cluster centers F whose degree is greater than the third threshold; the first cluster with the largest third number among the K first clusters is determined as the first cluster Cluster C; merge the single image data with the first cluster cluster C.
这样,第一聚类簇C中存在最多与单个图像数据更相近的第一子簇,将单个图像数 据合并到第一聚类簇C中能够使得聚类结果更为准确。In this way, there are first sub-clusters that are at most similar to the single image data in the first cluster C, and merging the single image data into the first cluster C can make the clustering result more accurate.
结合第一方面,在一种可能的实施方式中,所述M小于或等于第四阈值;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之后,所述方法还包括:With reference to the first aspect, in a possible implementation manner, the M is less than or equal to a fourth threshold; when using the first cluster center to combine the second image data set with the first cluster Afterwards, the method further includes:
将合并后的第一聚类簇分割为R个第三子簇,并获取所述R个第三子簇中每个第三子簇的第五聚类中心;所述R为大于或等于1的整数;在所述R小于或等于所述第四阈值的情况下,保留所述R个第三子簇,并用所述R个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;在所述R大于所述第四阈值的情况下,获取所述R个第三子簇中每个第三子簇中的图像数据的第四数量;根据所述第四数量从大到小对所述R个第三子簇进行排序得到第四聚类簇序列,选取所述第四聚类簇序列中前P个第三子簇,并用所述P个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;所述P小于或等于所述第四阈值。Divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or equal to 1 Integer of ; when the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the fifth cluster centers corresponding to the R third subclusters are used for the The first cluster center is updated; when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters; according to the Sorting the R third subclusters from large to small with the fourth number to obtain a fourth clustering cluster sequence, selecting the first P third subclusters in the fourth clustering cluster sequence, and using the P third subclusters. The fifth cluster center corresponding to the three sub-clusters updates the first cluster center; the P is less than or equal to the fourth threshold.
这样,这样可以在子簇较多的情况下,通过保留图像数据较多的子簇来限制子中心的量,消除离群图像数据的影响,不仅便于维护,还可使得在长时间大规模增量聚类场景下仍然具有良好的聚类效果。In this way, in the case of many sub-clusters, the number of sub-centers can be limited by retaining sub-clusters with more image data, and the influence of outlier image data can be eliminated. In the case of quantitative clustering, it still has a good clustering effect.
结合第一方面,在一种可能的实施方式中,所述第一聚类簇通过对所述第一图像数据集中的图像数据进行聚类得到;所述将所述第一聚类簇分割为M个第一子簇,包括:With reference to the first aspect, in a possible implementation manner, the first cluster is obtained by clustering the image data in the first image data set; the first cluster is divided into M first subclusters, including:
获取所述第一聚类簇中的图像数据之间的第七相似度,得到相似度矩阵;基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇。Obtain the seventh similarity between the image data in the first cluster to obtain a similarity matrix; and divide the first cluster into the M first sub-clusters based on the similarity matrix.
这样,可以利用相似度矩阵将所述第一聚类簇分割为所述M个第一子簇。In this way, the first cluster can be divided into the M first sub-clusters by using the similarity matrix.
结合第一方面,在一种可能的实施方式中,所述基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇,包括:With reference to the first aspect, in a possible implementation manner, the dividing the first cluster into the M first sub-clusters based on the similarity matrix includes:
获取以所述第一聚类簇中的图像数据为顶点构成的连通图;从所述相似度矩阵中查询得到所述连通图中的顶点之间的所述第七相似度;将所述第七相似度大于第五阈值的多个顶点分割为一个第一子簇,得到所述M个第一子簇。Obtaining a connected graph composed of image data in the first cluster as vertices; querying the similarity matrix to obtain the seventh similarity between the vertices in the connected graph; Seven vertices with a similarity greater than the fifth threshold are divided into a first sub-cluster to obtain the M first sub-clusters.
这样,可以利用连通图,将所述第七相似度大于第五阈值的多个顶点分割为一个第一子簇。In this way, the plurality of vertices with the seventh similarity greater than the fifth threshold can be divided into a first sub-cluster by using the connectivity graph.
本公开实施例第二方面提供了一种图像的增量聚类装置,该装置包括:A second aspect of the embodiments of the present disclosure provides an apparatus for incremental clustering of images, and the apparatus includes:
第一获取模块,配置为获取第一图像数据集的第一聚类簇;第一分割模块,配置为将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;合并模块,配置为获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。a first obtaining module, configured to obtain a first cluster of a first image data set; a first segmentation module, configured to divide the first cluster into M first sub-clusters, and obtain the M first sub-clusters the first cluster center corresponding to each first sub-cluster in the first sub-cluster; the M is an integer greater than or equal to 1; the merging module is configured to obtain a second image data set, using the first cluster center The second image dataset is merged with the first cluster.
本公开实施例第三方面提供了一种电子设备,该电子设备包括输入设备和输出设备,还包括处理器,适于实现一条或多条指令;以及,计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行上述第一方面任一种实施方式中的步骤。A third aspect of the embodiments of the present disclosure provides an electronic device, the electronic device includes an input device and an output device, and further includes a processor adapted to implement one or more instructions; and a computer storage medium, the computer storage medium storing There is one or more instructions adapted to be loaded by the processor and to perform the steps in any of the embodiments of the first aspect above.
本公开实施例第四方面提供了一种计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行上述第一方面任一种实施方式中的步骤。A fourth aspect of the embodiments of the present disclosure provides a computer storage medium, where the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the foregoing first aspects steps in the implementation.
本公开实施例第五方面提供了一种计算机程序产品,所述计算机程序产品包括一条或多条指令,所述一条或多条指令适于由处理器加载并执行上述第一方面任一种实施方式中的步骤。A fifth aspect of the embodiments of the present disclosure provides a computer program product, the computer program product includes one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the implementations of the first aspect above steps in the method.
可以看出,本公开实施例通过获取第一图像数据集的第一聚类簇;将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中 心;所述M为大于或等于1的整数;获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。这样将第一聚类簇分割为多个第一子簇,基于第一子簇的第一聚类中心实现第一聚类簇对第二图像数据集的合并,通过维护多个第一聚类中心(即子中心)来解决随着图像数据的增多,聚类中心(第一聚类簇的聚类中心,即主中心)会受到新增图像数据的影响而产生漂移的问题,从而有利于使聚类结果更为准确,以提高聚类效果。另外,在聚类过程中,第二图像数据集不用再与第一图像数据集整个进行相似度计算,有利于降低计算复杂度。It can be seen that the embodiment of the present disclosure obtains the first cluster of the first image data set; divides the first cluster into M first sub-clusters, and obtains the M first sub-clusters. the first cluster center corresponding to each first sub-cluster; the M is an integer greater than or equal to 1; obtain a second image data set, and use the first cluster center to combine the second image data set with the The first cluster is merged. In this way, the first cluster is divided into a plurality of first sub-clusters, and the second image data set is merged by the first cluster based on the first cluster center of the first sub-cluster. By maintaining a plurality of first clusters center (ie, sub-center) to solve the problem that with the increase of image data, the cluster center (the cluster center of the first cluster, that is, the main center) will be affected by the new image data and cause drift, which is beneficial to Make the clustering results more accurate to improve the clustering effect. In addition, in the clustering process, the second image data set does not need to perform similarity calculation with the first image data set as a whole, which is beneficial to reduce the computational complexity.
附图说明Description of drawings
图1为本公开实施例提供的一种应用环境的示意图;FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种图像的增量聚类方法的流程示意图;2 is a schematic flowchart of a method for incremental clustering of images according to an embodiment of the present disclosure;
图3A为本公开实施例提供的一种第一聚类簇的连通图的示意图;3A is a schematic diagram of a connectivity graph of a first cluster according to an embodiment of the present disclosure;
图3B为本公开实施例提供的一种将第一聚类簇分割为第一子簇的示意图;3B is a schematic diagram of dividing a first cluster into first sub-clusters according to an embodiment of the present disclosure;
图4A为本公开实施例提供的一种第二图像数据集的聚类结果的示意图;4A is a schematic diagram of a clustering result of a second image data set according to an embodiment of the present disclosure;
图4B为本公开实施例提供的一种孤立图像数据与第一聚类簇合并的示意图;4B is a schematic diagram of merging isolated image data with a first cluster according to an embodiment of the present disclosure;
图4C为本公开实施例提供的一种第二聚类簇与第一聚类簇合并的示意图;4C is a schematic diagram of merging a second cluster with a first cluster according to an embodiment of the present disclosure;
图5为本公开实施例提供的一种对第一聚类中心进行更新的流程示意图;FIG. 5 is a schematic flowchart of updating a first cluster center according to an embodiment of the present disclosure;
图6为本公开实施例提供的另一种图像的增量聚类方法的流程示意图;6 is a schematic flowchart of another method for incremental clustering of images according to an embodiment of the present disclosure;
图7为本公开实施例提供的一种图像的增量聚类装置的结构示意图;FIG. 7 is a schematic structural diagram of an apparatus for incremental clustering of images according to an embodiment of the present disclosure;
图8为本公开实施例提供的一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
实施方式Implementation
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例可以是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to make those skilled in the art better understand the solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments can be Embodiments are part of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.
本公开说明书、权利要求书和附图中出现的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是在本公开的一些实施例中还包括没有列出的步骤或单元,或在本公开的一些实施例中还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,术语“第一”、“第二”和“第三”等是用于区别不同的对象,而并非用于描述特定的顺序。The appearances of the terms "comprising" and "having" and any variations thereof in this disclosure, the claims, and the drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but in some embodiments of the present disclosure also includes unlisted steps or units, or Other steps or units inherent to these processes, methods, products or devices are also included in some embodiments of the present disclosure. In addition, the terms "first", "second", "third", etc. are used to distinguish different objects and not to describe a specific order.
实际场景中,比如社交媒体、安防等,图像往往是增量式产生,因此增量聚类在解决分类问题中有着广泛的应用,传统的增量式聚类需要维护一些第一聚类簇,但是不同的簇稀疏程度不同,随着增量式聚类的不断推移,聚类中心发生漂移的可能性增大,聚类效果反而有所下降。In actual scenarios, such as social media, security, etc., images are often generated incrementally, so incremental clustering has a wide range of applications in solving classification problems. Traditional incremental clustering needs to maintain some first clusters. However, different clusters have different degrees of sparseness. With the continuous progress of incremental clustering, the possibility of cluster center drift increases, and the clustering effect decreases.
本公开实施例提出一种针对图像数据的增量聚类方法,可基于图1所示的应用环境实施,如图1所示,该应用环境主要包括图像处理中心101和图像采集设备102,图像处理中心101包括但不限于服务器1011、终端和数据库。在一些场景中,图像采集设备102可以是闸机通道、商场、小区等场景下布控的摄像机或摄像头,用于进行图像的采集,例如:人脸图像、视频监控图像,图像处理中心101可以是监控中心,图像处理中心101可引入视频云节点(Video Cloud Node,VCN)1012进行视频监控的管理,例如: 在显示器1013对图像进行展示,将图像聚类后存储至数据库1014。在一些场景中,图像采集设备102还可以是用户终端,其采集的图像可以是用户拍摄的照片,例如:用户在社交媒体发布的照片,图像处理中心可以是社交媒体的处理后台。其中,图像采集设备102可将采集的图像上传到图像处理中心101,由图像处理中心101进行特征提取、聚类分类、人脸识别等操作,由于图像采集设备侧的图像每天是增量式产生,而增量式聚类需要维护一些聚类簇,随着图像数据的不断增加,增量式聚类的不断进行,原始维护的聚类簇的聚类中心会存在漂移的风险,使得聚类效果逐渐变差,因此服务器1011可用于执行本公开实施例提出的增量聚类方法,以解决增量式聚类中因聚类中心发生漂移影响聚类效果的问题。其中,上述服务器1011可以是独立的物理服务器,也可以是服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、以及大数据和人工智能平台等基础云计算服务的云服务器。An embodiment of the present disclosure proposes an incremental clustering method for image data, which can be implemented based on the application environment shown in FIG. 1 . As shown in FIG. 1 , the application environment mainly includes an image processing center 101 and an image acquisition device 102 . The processing center 101 includes but is not limited to a server 1011, a terminal and a database. In some scenarios, the image acquisition device 102 may be a camera or a camera deployed in scenes such as gate passages, shopping malls, and residential areas, and is used to collect images, such as face images, video surveillance images, and the image processing center 101 may be The monitoring center, the image processing center 101 can introduce a video cloud node (Video Cloud Node, VCN) 1012 to manage the video monitoring, for example: display the images on the display 1013, and store the images in the database 1014 after clustering. In some scenarios, the image collection device 102 may also be a user terminal, and the images it collects may be photos taken by the user, for example, photos posted by the user on social media, and the image processing center may be the processing background of social media. Among them, the image acquisition device 102 can upload the collected images to the image processing center 101, and the image processing center 101 performs operations such as feature extraction, cluster classification, face recognition, etc. Since the images on the image acquisition device side are generated incrementally every day , and incremental clustering needs to maintain some clusters. With the continuous increase of image data and the continuous progress of incremental clustering, the cluster center of the original maintained cluster will have the risk of drift, which makes the clustering The effect gradually deteriorates, so the server 1011 can be used to execute the incremental clustering method proposed by the embodiment of the present disclosure, so as to solve the problem that the clustering effect is affected by the drift of the cluster center in the incremental clustering. The above-mentioned server 1011 may be an independent physical server, a server cluster or a distributed system, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware services , domain name services, security services, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
以下结合相关附图对本公开实施例提供的图像的增量聚类方法进行详细阐述。The incremental clustering method for images provided by the embodiments of the present disclosure will be described in detail below with reference to the related drawings.
图2为本公开实施例提供的一种图像的增量聚类方法的流程示意图,该图像的增量聚类方法应用于服务器,如图2所示,包括步骤S21至S23:FIG. 2 is a schematic flowchart of an image incremental clustering method provided by an embodiment of the present disclosure. The image incremental clustering method is applied to a server, as shown in FIG. 2 , including steps S21 to S23:
S21,获取第一图像数据集的第一聚类簇。S21: Acquire the first cluster of the first image data set.
第一图像数据集是指当前批图像数据之前已经被聚类为多个聚类簇的图像数据集,例如:假设图像采集设备在某一时刻批量上传的人脸图像的数据(比如人脸特征)为当前批数据,那么在这之前已经上传至服务器的人脸图像的数据即为第一图像数据集。第一聚类簇即对该第一图像数据集中的图像数据进行聚类得到的聚类簇,其采用的聚类算法可以是K均值聚类算法,应当理解的,每个聚类簇均存在对应的聚类中心,即第二聚类中心。The first image dataset refers to an image dataset that has been clustered into multiple clusters before the current batch of image data. ) is the current batch of data, then the data of the face image that has been uploaded to the server before this is the first image data set. The first cluster is a cluster obtained by clustering the image data in the first image data set, and the clustering algorithm used may be a K-means clustering algorithm. It should be understood that each cluster exists The corresponding cluster center, that is, the second cluster center.
S22,将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数。S22: Divide the first cluster into M first subclusters, and obtain a first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or An integer equal to 1.
图3A为本公开实施例提供的一种第一聚类簇的连通图的示意图,如图3A所示,第一聚类簇的连通图包括第一聚类簇301和第二聚类中心302,其中,第一聚类簇301为对第一图像数据集中的图像数据进行聚类得到的聚类簇;第二聚类中心302为每个聚类簇存在对应的聚类中心。FIG. 3A is a schematic diagram of a connectivity graph of a first cluster according to an embodiment of the present disclosure. As shown in FIG. 3A , the connectivity graph of the first cluster includes a first cluster 301 and a second cluster center 302 , wherein the first clustering cluster 301 is a clustering cluster obtained by clustering the image data in the first image data set; the second clustering center 302 is that each clustering cluster has a corresponding clustering center.
图3B为本公开实施例提供的一种将第一聚类簇分割为第一子簇的示意图,如图3B所示,第一聚类簇分割为第一子簇包括第一聚类簇301、第二聚类中心302、第一子簇303和第一聚类中心304,其中,第一子簇303为对第一聚类簇301进行分割后得到的子簇;第一聚类中心304为每个第一子簇的聚类中心。FIG. 3B is a schematic diagram of dividing a first cluster into first sub-clusters according to an embodiment of the present disclosure. As shown in FIG. 3B , the division of the first cluster into first sub-clusters includes a first cluster 301 , the second cluster center 302, the first sub-cluster 303 and the first cluster center 304, wherein the first sub-cluster 303 is a sub-cluster obtained by dividing the first cluster cluster 301; the first cluster center 304 is the cluster center of each first subcluster.
第一子簇即对第一聚类簇进行分割后得到的子簇,对于第一数据集的每个第一聚类簇,获取第一聚类簇中图像数据之间的相似度,即第七相似度,得到相似度矩阵,然后获取以第一聚类簇中的图像数据为顶点构成的连通图,如图3A所示,对于连通图中的每两个顶点,从相似度矩阵中查询其相似度,在对第一图像数据集进行聚类的情况下采用的阈值为X,即第五阈值,则将相似度大于该X的多个图像数据分割为一个更加紧密的第一子簇,由此得到M个第一子簇,如图3B所示,图3A所示的第一聚类簇经过连通图的分析被分割为了M个第一子簇。在得到M个第一子簇后,获取M个第一子簇中每个第一子簇的聚类中心,即第一聚类中心,那么每个第一聚类簇便可由一个主聚类中心和M个子聚类中心描述。以更加紧凑的子簇来描述第一聚类簇,有利于解决单一主聚类中心随着新增图像数据的并入表达能力减弱的问题。The first sub-cluster is the sub-cluster obtained by dividing the first cluster. For each first cluster in the first data set, the similarity between the image data in the first cluster is obtained, that is, the first sub-cluster is obtained. Seven degrees of similarity, get a similarity matrix, and then obtain a connected graph with the image data in the first cluster as vertices, as shown in Figure 3A, for every two vertices in the connected graph, query from the similarity matrix Its similarity, in the case of clustering the first image data set, the threshold used is X, that is, the fifth threshold, then the multiple image data whose similarity is greater than this X is divided into a more compact first sub-cluster , so that M first subclusters are obtained. As shown in FIG. 3B , the first cluster shown in FIG. 3A is divided into M first subclusters through the analysis of the connected graph. After the M first sub-clusters are obtained, the cluster center of each first sub-cluster in the M first sub-clusters is obtained, that is, the first cluster center, then each first cluster cluster can be composed of a main cluster Center and M sub-cluster center descriptions. Describing the first cluster with a more compact sub-cluster is beneficial to solve the problem that the expression ability of a single main cluster center is weakened with the incorporation of new image data.
S23,获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。S23: Acquire a second image data set, and combine the second image data set with the first cluster by using the first cluster center.
图4A为本公开实施例提供的一种第二图像数据集的聚类结果的示意图,如图4A所示,第二图像数据集的聚类结果包括第二图像数据集401、第二聚类簇402、孤立图像数据403和第三聚类中心404,其中,第二图像数据集401为图像采集设备上传的当前批图像的数据集;第二聚类簇402为对第二图像数据集中的图像数据进行聚类得到的聚类簇;孤立图像数据403为未被聚类的孤立图像数据;第三聚类中心404为每个第二聚类簇存在的聚类中心。FIG. 4A is a schematic diagram of a clustering result of a second image dataset provided by an embodiment of the present disclosure. As shown in FIG. 4A , the clustering result of the second image dataset includes a second image dataset 401 , a second cluster Cluster 402, isolated image data 403 and third cluster center 404, wherein the second image data set 401 is the data set of the current batch of images uploaded by the image acquisition device; The image data is clustered by clustering; the isolated image data 403 is the isolated image data that has not been clustered; the third cluster center 404 is the cluster center where each second cluster exists.
图4B为本公开实施例提供的一种孤立图像数据与第一聚类簇合并的示意图,如图4B所示,孤立图像数据与第一聚类簇合并包括第一聚类簇A 405和孤立图像数据403,其中,第一聚类簇A 405为第一聚类簇中确定出第一聚类簇A。FIG. 4B is a schematic diagram of merging isolated image data with a first cluster according to an embodiment of the present disclosure. As shown in FIG. 4B , merging isolated image data with a first cluster includes a first cluster A 405 and an isolated cluster A 405 . Image data 403, wherein the first cluster A 405 is the first cluster A determined in the first cluster.
图4C为本公开实施例提供的一种第二聚类簇与第一聚类簇合并的示意图,如图4C所示,第二聚类簇与第一聚类簇合并包括第一聚类簇B 406和第二聚类簇407,其中,第一聚类簇B 406与该第二聚类簇407属于同一个聚类类别。FIG. 4C is a schematic diagram of merging a second cluster with a first cluster according to an embodiment of the present disclosure. As shown in FIG. 4C , the combination of the second cluster and the first cluster includes the first cluster B 406 and the second cluster 407, wherein the first cluster B 406 and the second cluster 407 belong to the same cluster category.
第二图像数据集即图像采集设备上传的当前批图像的数据集,由图像采集设备上传的图像得到。其中,第一聚类簇包括第一聚类簇A、第一聚类簇B和第一聚类簇C,在第二图像数据集中包括多个图像数据的情况下,对多个图像数据进行聚类,得到聚类结果。该聚类结果包括未被聚类的孤立图像数据和若干个第二聚类簇,若干个第二聚类簇中每个第二聚类簇均存在对应的聚类中心,即第三聚类中心,请参见图4A。对于该孤立图像数据,从第一聚类簇中确定出第一聚类簇A,利用第一聚类中心将其与第一聚类簇A进行合并,即如图4B所示将孤立图像数据吸收到第一聚类簇A中,该第一聚类簇A与该孤立图像数据属于同一个聚类类别。对于每个第二聚类簇,从第一聚类簇中确定出第一聚类簇B,利用第一聚类中心将其与第一聚类簇B进行合并,即如图4C所示进行聚类簇与聚类簇之间的合并,该第一聚类簇B与该第二聚类簇属于同一个聚类类别。与孤立图像数据类似,在第二图像数据集中只存在单个图像数据的情况下,即新增的图像数据仅为单个,不用对第二图像数据集进行聚类操作,从第一聚类簇中确定出第一聚类簇C,利用第一聚类中心将其与第一聚类簇C进行合并,该第一聚类簇C与该单个图像数据属于同一个聚类类别。The second image data set is the data set of the current batch of images uploaded by the image acquisition device, and is obtained from the images uploaded by the image acquisition device. The first cluster includes a first cluster A, a first cluster B, and a first cluster C, and when the second image data set includes multiple image data, cluster to get the clustering result. The clustering result includes unclustered isolated image data and several second clusters, and each of the several second clusters has a corresponding cluster center, that is, the third cluster center, see Figure 4A. For the isolated image data, the first cluster A is determined from the first cluster, and the first cluster center is used to merge it with the first cluster A, that is, as shown in FIG. 4B , the isolated image data Absorbed into the first cluster A, the first cluster A and the isolated image data belong to the same cluster category. For each second cluster, determine the first cluster B from the first cluster, and use the first cluster center to merge it with the first cluster B, that is, as shown in FIG. 4C . For the merging between clusters, the first cluster B and the second cluster belong to the same cluster category. Similar to the isolated image data, in the case where there is only a single image data in the second image data set, that is, the newly added image data is only a single image data, and there is no need to perform a clustering operation on the second image data set. The first cluster C is determined, and the first cluster C is merged with the first cluster C by using the first cluster center, and the first cluster C and the single image data belong to the same cluster category.
在一种可能的实施方式中,在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之前,所述方法还包括:In a possible implementation manner, before using the first cluster center to combine the second image data set with the first cluster, the method further includes:
利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇。K first clusters are determined from the first clusters by using the second cluster centers.
其中,在将第二图像数据集与第一聚类簇合并之前,需要利用第一聚类簇的第二聚类中心对所有第一聚类簇进行初步筛选,从所有第一聚类簇中确定出K个第一聚类簇,然后再从K个聚类簇中选出上述的第一聚类簇A和第一聚类簇B,或第一聚类簇C。需要说明的是,该K个第一聚类簇可以是利用第二聚类中心对所有第一聚类进行排序后的top K个,例如:100个第一聚类簇经过排序后的前20个;该K个第一聚类簇还可以是经过排序后的所有第一聚类簇,例如:100个第一聚类簇经过排序后仍然选取100个。利用第二聚类中心对第一聚类簇进行初步的筛选,有利于确定出与第二图像数据集中的图像数据聚类类别更相近的第一聚类簇,比如上述的第一聚类簇A、第一聚类簇B和第一聚类簇C。Wherein, before merging the second image data set with the first cluster, all the first clusters need to be preliminarily screened by using the second cluster center of the first cluster, and from all the first clusters K first clusters are determined, and then the above-mentioned first cluster A and first cluster B, or first cluster C are selected from the K clusters. It should be noted that the K first clusters may be the top K after sorting all the first clusters by using the second cluster center, for example: the top 20 of the 100 first clusters after sorting The K first clusters may also be all sorted first clusters, for example, 100 first clusters are still selected after sorting. Using the second cluster center to preliminarily screen the first cluster is beneficial to determine the first cluster that is more similar to the image data clustering category in the second image data set, such as the above-mentioned first cluster A, the first cluster B and the first cluster C.
在一种可能的实施方式中,所述利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇,包括:In a possible implementation manner, the determining K first clusters from the first clusters by using the second cluster center includes:
获取所述孤立图像数据与所述第二聚类中心之间的第一相似度;根据所述第一相似度从高到低对所述第一聚类簇进行排序得到第一聚类簇序列,选取所述第一聚类簇序列中前K个第一聚类簇;以及,获取所述第三聚类中心与所述第二聚类中心之间的第二相似度;根据所述第二相似度从高到低对所述第一聚类簇进行排序得到第二聚类簇序列, 选取所述第二聚类簇序列中前K个第一聚类簇;或者,获取所述单个图像数据与所述第二聚类中心之间的第三相似度;根据所述第三相似度从高到低对所述第一聚类簇进行排序得到第三聚类簇序列,选取所述第三聚类簇序列中前K个第一聚类簇。Obtain the first similarity between the isolated image data and the second cluster center; sort the first clusters according to the first similarity from high to low to obtain a first cluster sequence , select the top K first clusters in the first cluster sequence; and, obtain the second similarity between the third cluster center and the second cluster center; Second, sort the first clusters from high to low similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence; or, obtain the single The third similarity between the image data and the second cluster center; sort the first cluster according to the third similarity from high to low to obtain a third cluster sequence, and select the The first K first clusters in the third cluster sequence.
其中,在第二图像数据集经过聚类得到孤立图像数据和多个第二聚类簇的情况下,针对孤立样本图像数据,计算其与每个第一聚类簇的第二聚类中心之间的第一相似度,针对第二聚类簇,计算其对应的第三聚类中心与每个第一聚类簇的第二聚类中心之间的第二相似度,分别按照第一相似度、第二相似度从高到低对所有第一聚类簇进行排序,得到对应的第一聚类簇序列和第二聚类簇序列,然后从第一聚类簇序列和第二聚类簇序列中分别选取出前K个第一聚类簇。在第二图像数据集中只包括单个图像数据的情况下,计算单个图像数据与每个第一聚类簇的第二聚类中心的第三相似度,按照第三相似度从高到低对所有第一聚类簇进行排序,得到对应的第三聚类簇序列,然后从第三聚类簇序列中选取出前K个第一聚类簇。Wherein, when the second image data set is clustered to obtain isolated image data and multiple second clusters, for the isolated sample image data, calculate the difference between it and the second cluster center of each first cluster. For the second cluster, calculate the second similarity between the corresponding third cluster center and the second cluster center of each first cluster, respectively according to the first similarity Sort all the first clusters from high to low degree and the second similarity to obtain the corresponding first and second cluster sequences, and then from the first and second clusters The first K first cluster clusters are respectively selected from the cluster sequence. In the case where only a single image data is included in the second image data set, the third similarity between the single image data and the second cluster center of each first cluster is calculated, and the third similarity is from high to low. The first clusters are sorted to obtain a corresponding third cluster sequence, and then the top K first clusters are selected from the third cluster sequence.
在一种可能的实施方式中,所述利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并,包括:In a possible implementation manner, the using the first cluster center to combine the isolated image data with the first cluster A includes:
获取所述孤立图像数据与第一聚类中心D之间的第四相似度;所述第一聚类中心D为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第四相似度大于第一阈值的所述第一聚类中心D的第一数量;将所述K个第一聚类簇中所述第一数量最大的第一聚类簇确定为所述第一聚类簇A;将所述孤立图像数据与所述第一聚类簇A合并。Obtain the fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the fourth similarity in each of the first clusters The first number of the first cluster centers D whose degree is greater than the first threshold; the first cluster with the largest first number among the K first clusters is determined as the first cluster Cluster A; merge the isolated image data with the first cluster cluster A.
其中,针对孤立样本图像数据的合并,需要从选取出的前K个第一聚类簇中确定出第一聚类簇A,需要说明的是,前K个第一聚类簇可以是排序后的所有第一聚类簇。首先计算孤立图像数据与K个第一聚类簇中每个第一聚类簇的每个第一子簇的聚类中心(即第一聚类中心D)之间的相似度,确定为第四相似度,然后对K个第一聚类簇进行分析,确定每个第一聚类簇中满足第四相似度大于第一阈值的第一聚类中心D的数量,确定为第一数量,将该第一数量最大的第一聚类簇确定为第一聚类簇A,例如:K个第一聚类簇中,第一聚类簇1有20个这样的第一聚类中心D,第一聚类簇2有18个这样的第一聚类中心D,…,第一聚类簇K有15个这样的第一聚类中心D,第一聚类簇1数量最多,则将其确定为第一聚类簇A,也就是说第一聚类簇A中存在最多与孤立图像数据更相近的第一子簇,将孤立图像数据合并到第一聚类簇A中能够使得聚类结果更为准确。Among them, for the merging of the isolated sample image data, the first cluster A needs to be determined from the first K first clusters selected. It should be noted that the first K first clusters may be sorted All the first clusters of . First, the similarity between the isolated image data and the cluster center (ie, the first cluster center D) of each first sub-cluster of each first cluster in the K first clusters is calculated, and is determined as the first cluster center D. Four similarity degrees, and then analyze the K first clusters to determine the number of first cluster centers D in each first cluster that satisfy the fourth similarity greater than the first threshold, and determine it as the first number, Determine the first cluster with the largest first number as the first cluster A, for example, among the K first clusters, the first cluster 1 has 20 such first cluster centers D, The first cluster 2 has 18 such first cluster centers D, ..., the first cluster K has 15 such first cluster centers D, and the first cluster 1 has the largest number, then it is It is determined to be the first cluster A, that is to say, the first sub-cluster A that is most similar to the isolated image data exists in the first cluster A. Merging the isolated image data into the first cluster A can make the clustering The results are more accurate.
在一种可能的实施方式中,所述利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并,包括:In a possible implementation manner, the merging the second cluster cluster and the first cluster cluster B by using the first cluster center includes:
将所述第二聚类簇分割为N个第二子簇,并获取所述N个第二子簇中每个第二子簇对应的第四聚类中心;所述N为大于或等于1的整数;获取所述第四聚类中心与第一聚类中心E之间的第五相似度;所述第一聚类中心E为K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第五相似度大于第二阈值的所述第一聚类中心E的第二数量;将所述K个第一聚类簇中所述第二数量最大的第一聚类簇确定为所述第一聚类簇B;将所述第二聚类簇与所述第一聚类簇B合并。Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters The first cluster center corresponding to each first sub-cluster of ; for each first cluster in the K first clusters, determine the The second number of the first cluster centers E whose fifth similarity is greater than the second threshold; the first cluster with the largest second number among the K first clusters is determined as the first cluster a cluster B; the second cluster is merged with the first cluster B.
其中,针对聚类簇与聚类簇之间的合并,需要从选取出的前K个第一聚类簇中确定出第一聚类簇B,需要说明的是,前K个第一聚类簇可以是排序后的所有第一聚类簇。首先按照分割第一聚类簇的方式将每个第二聚类簇分割为N个第二子簇,并计算出每个第二子簇的聚类中心,即第四聚类中心,然后计算第四聚类中心与K个第一聚类簇中每 个第一聚类簇的每个第一子簇的聚类中心(即第一聚类中心E)之间的相似度,确定为诶第五相似度,再对K个第一聚类簇进行分析,确定每个第一聚类簇中满足第五相似度大于第二阈值的第一聚类中心E的数量,确定为第二数量,将该第二数量最大的第一聚类簇确定为第一聚类簇B,例如:K个第一聚类簇中,第一聚类簇1有30个这样的第一聚类中心E,第一聚类簇2有15个这样的第一聚类中心E,…,第一聚类簇K有40个这样的第一聚类中心E,第一聚类簇K数量最多,则将其确定为第一聚类簇B,也就是说第一聚类簇B存在最多与第二聚类簇的第二子簇更相近的第一子簇,将第二聚类簇合并到第一聚类簇B中能够使得聚类结果更为准确。Among them, for the merging between the clusters, the first cluster B needs to be determined from the first K first clusters selected. It should be noted that the first K first clusters The clusters can be all the first cluster clusters after sorting. First, divide each second cluster into N second sub-clusters according to the method of dividing the first cluster, and calculate the cluster center of each second sub-cluster, that is, the fourth cluster center, and then calculate The similarity between the fourth cluster center and the cluster center (ie, the first cluster center E) of each first sub-cluster of each first cluster in the K first clusters is determined as eh The fifth similarity, and then analyze the K first clusters to determine the number of first cluster centers E that satisfy the fifth similarity greater than the second threshold in each first cluster, and determine it as the second number , determine the first cluster with the second largest number as the first cluster B, for example: among the K first clusters, the first cluster 1 has 30 such first cluster centers E , the first cluster 2 has 15 such first cluster centers E, ..., the first cluster K has 40 such first cluster centers E, and the first cluster K has the largest number, then the It is determined to be the first cluster B, that is to say, the first cluster B has a first sub-cluster that is at most similar to the second sub-cluster of the second cluster, and the second cluster is merged into the first sub-cluster. Clustering in cluster B can make the clustering result more accurate.
在一种可能的实施方式中,所述利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并,包括:In a possible implementation manner, the combining the single image data with the first cluster C by using the first cluster center includes:
获取所述单个图像数据与第一聚类中心F之间的第六相似度;所述第一聚类中心F为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第六相似度大于第三阈值的所述第一聚类中心F的第三数量;将所述K个第一聚类簇中所述第三数量最大的第一聚类簇确定为所述第一聚类簇C;将所述单个图像数据与所述第一聚类簇C合并。Obtain the sixth similarity between the single image data and the first cluster center F; the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster; for each of the K first clusters, determine the sixth similarity in each of the first clusters The third number of the first cluster centers F whose degree is greater than the third threshold; the first cluster with the largest third number among the K first clusters is determined as the first cluster Cluster C; merge the single image data with the first cluster cluster C.
其中,针对单个图像数据的合并,需要从选取出的前K个第一聚类簇中确定出第一聚类簇C,需要说明的是,前K个第一聚类簇可以是排序后的所有第一聚类簇。首先计算单个图像数据与K个第一聚类簇中每个第一聚类簇的每个第一子簇的聚类中心(即第一聚类中心F)之间的相似度,确定为第六相似度,然后对K个第一聚类簇进行分析,确定每个第一聚类簇中满足第六相似度大于第三阈值的第一聚类中心F的数量,确定为第三数量,将该第三数量最大的第一聚类簇确定为第一聚类簇C,也就是说第一聚类簇C中存在最多与单个图像数据更相近的第一子簇,将单个图像数据合并到第一聚类簇C中能够使得聚类结果更为准确。Among them, for the merging of single image data, it is necessary to determine the first cluster C from the selected top K first clusters. It should be noted that the top K first clusters may be sorted All first cluster clusters. First, the similarity between the single image data and the cluster center (ie, the first cluster center F) of each first sub-cluster of each first cluster in the K first clusters is calculated, and it is determined as the first cluster center. Six similarity degrees, and then analyze the K first clusters to determine the number of first cluster centers F that satisfy the sixth similarity greater than the third threshold in each first cluster, and determine it as the third number, Determine the first cluster with the third largest number as the first cluster C, that is to say, there is a first sub-cluster that is at most similar to the single image data in the first cluster C, and combine the single image data To the first cluster C can make the clustering result more accurate.
在一种可能的实施方式中,所述M小于或等于第四阈值;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之后,如图5所示,所述方法还包括:In a possible implementation manner, the M is less than or equal to a fourth threshold; after the second image data set and the first cluster are merged by using the first cluster center, as shown in FIG. 5 As shown, the method further includes:
S51,将合并后的第一聚类簇分割为R个第三子簇,并获取所述R个第三子簇中每个第三子簇的第五聚类中心;所述R为大于或等于1的整数;S51: Divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or an integer equal to 1;
S52,在所述R小于或等于所述第四阈值的情况下,保留所述R个第三子簇,并用所述R个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;S52, in the case that the R is less than or equal to the fourth threshold, retain the R third sub-clusters, and use the fifth cluster centers corresponding to the R third sub-clusters to A cluster center is updated;
S53,在所述R大于所述第四阈值的情况下,获取所述R个第三子簇中每个第三子簇中的图像数据的第四数量;S53, when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters;
S54,根据所述第四数量从大到小对所述R个第三子簇进行排序得到第四聚类簇序列,选取所述第四聚类簇序列中前P个第三子簇,并用所述P个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;所述P小于或等于所述第四阈值。S54, sort the R third subclusters from large to small according to the fourth number to obtain a fourth clustering cluster sequence, select the first P third subclusters in the fourth clustering cluster sequence, and use The fifth cluster centers corresponding to the P third subclusters update the first cluster centers; the P is less than or equal to the fourth threshold.
其中,在将孤立图像数据和第二聚类簇,或者单个图像数据合并到某个第一聚类簇之后,由于原第一聚类簇有新的图像数据被聚类进来,因此需要对原第一聚类簇的子中心进行更新。具体包括,按照分割第一聚类簇的方式将合并后的第一聚类簇分割为R个第三子簇,并计算出每个第三子簇的第五聚类中心,根据R确定第三子簇的数量,若第三子簇的数量小于或等于第四阈值,例如:20个,则保留这R个第三子簇,将这R个第三子簇的第五聚类中心作为合并后的第一聚类簇的新子中心,以更新原来的第一聚类中心,那么,合并后的第一聚类簇就采用第二聚类中心和R个第五聚类中心进行描述。Among them, after merging the isolated image data and the second cluster, or a single image data into a certain first cluster, since the original first cluster has new image data to be clustered in, it is necessary to The sub-centers of the first cluster are updated. Specifically, the merged first cluster is divided into R third sub-clusters according to the method of dividing the first cluster, and the fifth cluster center of each third sub-cluster is calculated, and the third sub-cluster is determined according to R. The number of three sub-clusters, if the number of third sub-clusters is less than or equal to the fourth threshold, for example: 20, the R third sub-clusters are reserved, and the fifth cluster center of these R third sub-clusters is used as The new sub-center of the merged first cluster to update the original first cluster center, then the merged first cluster is described by the second cluster center and the R fifth cluster centers .
另外,若第三子簇的数量大于第四阈值,则按照每个第三子簇中图像数据的数量(即第四数量)从大到小对R个第三子簇进行排序得到第四聚类簇序列,选取前P个第三子 簇保留下来,比如:仅保留前20个第三子簇,其余第三子簇舍弃,将这P个第三子簇的第五聚类中心作为合并后的第一聚类簇的新子中心,以更新原来的第一聚类中心,那么,合并后的第一聚类簇就采用第二聚类中心和P个第五聚类中心进行描述。应当理解的,每次将聚类簇分割为子簇的情况下,只保留预设数量个子簇,因此,M和N均小于或等于第四阈值,这样可以在子簇较多的情况下,通过保留图像数据较多的子簇来限制子中心的量,消除离群图像数据的影响,不仅便于维护,还可使得在长时间大规模增量聚类场景下仍然具有良好的聚类效果。In addition, if the number of third sub-clusters is greater than the fourth threshold, the R third sub-clusters are sorted according to the number of image data in each third sub-cluster (that is, the fourth number) from large to small to obtain the fourth cluster. Cluster-like sequence, select the first P third sub-clusters to keep, for example: only keep the first 20 third sub-clusters, discard the rest of the third sub-clusters, and use the fifth cluster center of the P third sub-clusters as the merge Then, the merged first cluster is described by using the second cluster center and the P fifth cluster centers. It should be understood that each time a cluster is divided into sub-clusters, only a preset number of sub-clusters are reserved. Therefore, both M and N are less than or equal to the fourth threshold, so that when there are many sub-clusters, By retaining the sub-clusters with more image data to limit the number of sub-centers and eliminate the influence of outlier image data, it is not only easy to maintain, but also has a good clustering effect in long-term large-scale incremental clustering scenarios.
可以看出,本公开实施例通过获取第一图像数据集的第一聚类簇;将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。这样将第一聚类簇分割为多个第一子簇,基于第一子簇的第一聚类中心实现第一聚类簇对第二图像数据集的合并,通过维护多个第一聚类中心(即子中心)来解决随着图像数据的增多,聚类中心(第一聚类簇的聚类中心,即主中心)会受到新增图像数据的影响而产生漂移的问题,从而有利于使聚类结果更为准确,以提高聚类效果。另外,在聚类过程中,第二图像数据集不用再与第一图像数据集整个进行相似度计算,有利于降低计算复杂度。It can be seen that the embodiment of the present disclosure obtains the first cluster of the first image data set; divides the first cluster into M first sub-clusters, and obtains the M first sub-clusters. the first cluster center corresponding to each first sub-cluster; the M is an integer greater than or equal to 1; obtain a second image data set, and use the first cluster center to combine the second image data set with the The first cluster is merged. In this way, the first cluster is divided into a plurality of first sub-clusters, and the second image data set is merged by the first cluster based on the first cluster center of the first sub-cluster. By maintaining a plurality of first clusters center (ie, sub-center) to solve the problem that with the increase of image data, the cluster center (the cluster center of the first cluster, that is, the main center) will be affected by the new image data and cause drift, which is beneficial to Make the clustering results more accurate to improve the clustering effect. In addition, in the clustering process, the second image data set does not need to perform similarity calculation with the first image data set as a whole, which is beneficial to reduce the computational complexity.
请参见图6,图6为本公开实施例提供的另一种图像的增量聚类方法的流程示意图,如图6所示,包括步骤S61至S66:Please refer to FIG. 6. FIG. 6 is a schematic flowchart of another image incremental clustering method provided by an embodiment of the present disclosure, as shown in FIG. 6, including steps S61 to S66:
S61,获取第一图像数据集的第一聚类簇;S61, obtaining the first cluster of the first image data set;
S62,将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;S62: Divide the first cluster into M first subclusters, and obtain a first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or an integer equal to 1;
S63,获取第二图像数据集;S63, obtaining a second image data set;
S64,在所述第二图像数据集中包括多个图像数据的情况下,对所述多个图像数据进行聚类,得到孤立图像数据和第二聚类簇;S64, in the case that the second image data set includes multiple image data, perform clustering on the multiple image data to obtain isolated image data and a second cluster;
S65,利用所述第一聚类中心将所述孤立图像数据与第一聚类簇A合并;以及,利用所述第一聚类中心将所述第二聚类簇与第一聚类簇B合并;S65, using the first cluster center to combine the isolated image data with the first cluster A; and using the first cluster center to combine the second cluster and the first cluster B merge;
S66,在所述第二图像数据集中只存在单个图像数据的情况下,利用所述第一聚类中心将所述单个图像数据与第一聚类簇C合并。S66, in the case that there is only a single image data in the second image data set, use the first cluster center to combine the single image data with the first cluster C.
其中,上述步骤S61至S66的实施方式,在图2至图5所示的实施例中已有相关说明,且能达到相同或相似的有益效果,此处不再赘述。The implementations of the above steps S61 to S66 have been described in the embodiments shown in FIG. 2 to FIG. 5 , and can achieve the same or similar beneficial effects, and will not be repeated here.
深度学习研究上的突破不断推动着人脸识别技术的发展,通过监督学习得到的人脸识别模型不断取得突破,但是面对大量的无标签的人脸数据时,如何准确快速地进行分类,是一个具有巨大的经济、社会价值的问题。Breakthroughs in deep learning research continue to promote the development of face recognition technology, and face recognition models obtained through supervised learning continue to make breakthroughs. However, when faced with a large amount of unlabeled face data, how to classify accurately and quickly is a problem. A problem of enormous economic and social value.
由于实际的场景,比如社交媒体、安防等领域,图片数据量往往比较大,而且数据是每天增量式地产生,因此增量式的聚类方式具有更大的实际应用价值。增量式的聚类方式在聚类过程需要维护一些聚类簇,传统的聚类算法采用单一的聚类中心来描述某个类簇,比如对类簇内所有样本特征取均值得到聚类中心,但是不同的簇稀疏程度不同,这样简单采用均值的单一聚类中心的方式容易丧失聚类簇内部的丰富样本信息,随着增量式聚类的过程不断进行,聚类效果会逐渐受到影响。Due to the actual scene, such as social media, security and other fields, the amount of image data is often relatively large, and the data is incrementally generated every day, so the incremental clustering method has greater practical application value. The incremental clustering method needs to maintain some clusters in the clustering process. The traditional clustering algorithm uses a single cluster center to describe a cluster, such as taking the mean of all sample features in the cluster to obtain the cluster center , but different clusters have different degrees of sparseness, so simply adopting a single cluster center with mean value is easy to lose the rich sample information inside the cluster. As the process of incremental clustering continues, the clustering effect will be gradually affected. .
在人脸聚类的实际应用过程中,不同人的人脸特征在特征空间数据中分布不尽相同,有些聚类簇内部样本比较紧凑,有些聚类簇内部样本可能会比较松散。如果采用单一中心来描述聚类簇的话,就会丧失聚类簇的这些内部信息,随着增量聚类的不断进行,已有样本的影响会不断减少,随着新样本的加入,聚类中心发生漂移的风险增大。In the actual application process of face clustering, the distribution of facial features of different people in the feature space data is not the same, and the samples in some clusters are relatively compact, and the samples in some clusters may be relatively loose. If a single center is used to describe the cluster, the internal information of the cluster will be lost. With the continuous progress of incremental clustering, the influence of the existing samples will continue to decrease. There is an increased risk of center drift.
本公开实施例提供的一种图像的增量聚类方法,包括步骤:An image incremental clustering method provided by an embodiment of the present disclosure includes the steps of:
S67、对聚类簇样本间进行相似度计算,将一个聚类簇分割成若干个更加紧密的子簇;S67, perform similarity calculation between the cluster samples, and divide a cluster into several closer sub-clusters;
对聚类簇样本间进行相似度计算,可以获得相似度矩阵S,假设聚类所采用的阈值为λ,需要设定一个更高的阈值λ',即满足λ'>λ来将一个聚类簇分割成若干个更加紧密的子簇。By calculating the similarity between cluster samples, the similarity matrix S can be obtained. Assuming that the threshold used for clustering is λ, a higher threshold λ' needs to be set, that is, λ'>λ is satisfied to cluster a cluster. The cluster is split into several tighter subclusters.
可以使用基于连通图分析的方式来分析聚类簇以获得聚类簇的多中心。对聚类簇计算相似度矩阵,通过采用高于聚类所使用的相似度阈值,可以将一个聚类簇分割成若干个更加紧凑的子簇,这样就可以得到多个子簇中心,加上作为主中心的聚类簇的中心,构成了聚类簇的多中心描述方式。Clusters can be analyzed using methods based on connectivity graph analysis to obtain the polycentricity of clusters. The similarity matrix is calculated for the clusters. By using a similarity threshold higher than that used for clustering, a cluster can be divided into several more compact sub-clusters, so that multiple sub-cluster centers can be obtained, plus as The center of the cluster in the main center constitutes the multi-center description of the cluster.
这里,使用基于连通图分析的聚类多中心的设计分析得到多个子中心包括:首先对每个聚类簇,通过设定更高的阈值(需要高于聚类阈值),将聚类簇打散成几个更加紧凑的连通子图,对每个连通子图来计算子中心,从而可以获得多个子中心,主中心还是对整个聚类簇采用常规的计算均值方式获取。Here, using the design analysis of cluster multi-center based on connected graph analysis to obtain multiple sub-centers includes: first, for each cluster, by setting a higher threshold (needs to be higher than the clustering threshold), the cluster is Scatter into several more compact connected sub-graphs, and calculate the sub-centers for each connected sub-graph, so that multiple sub-centers can be obtained.
S68、增量聚类过程中,每当有新批次数据加入的时候,会先对新数据进行一次聚类,会生成若干数量的聚类簇和未被聚类的孤立样本;S68. During the incremental clustering process, whenever a new batch of data is added, the new data will be clustered once, and a certain number of clusters and unclustered isolated samples will be generated;
S69、将生成若干数量的聚类簇和未被聚类的孤立样本,和步骤S67得到已有的聚类结果进行聚类合并。S69 , generating a number of clusters and unclustered isolated samples, and obtaining the existing clustering results in step S67 for cluster merging.
基于单一主中心和多个子中心的多中心增量聚类方法:在得到主中心和多个子中心的基础上,在增量聚类的过程中,首先利用主中心和新增数据进行TopK搜索粗筛,然后根据多个子中心来进一步确定是否吸收新的样本或者其他聚类簇。Multi-center incremental clustering method based on a single main center and multiple sub-centers: On the basis of obtaining the main center and multiple sub-centers, in the process of incremental clustering, first use the main center and new data to perform a TopK search. sieve, and then further determine whether to absorb new samples or other clusters based on multiple sub-centers.
这聚类合并的过程中过程涉及到聚类簇间的合并和聚类簇吸收单个孤立样本。针对孤立样本点的吸收,基于多中心的设计,首先会设定较低的阈值,采用主中心来搜索TopK,然后再根据子中心是否和样本点满足聚类阈值λ。这种情况下可能会有多个聚类簇和孤立样本点满足这样的要求,采用满足要求的子中心数目最多的聚类簇作为目标簇。在聚类簇之间合并的时候,同样采用较低阈值来筛选检索TopK,然后根据聚类簇之间是否有子中心对满足阈值要求,当有多个簇满足要求的时候,取满足阈值要求的子中心数目最多的簇作为目标簇。The process of cluster merging involves merging between clusters and absorbing individual isolated samples into clusters. For the absorption of isolated sample points, based on the multi-center design, a lower threshold is first set, and the main center is used to search for TopK, and then according to whether the sub-center and the sample point meet the clustering threshold λ. In this case, there may be multiple clusters and isolated sample points to meet such requirements, and the cluster with the largest number of sub-centers that meet the requirements is used as the target cluster. When merging between clusters, a lower threshold is also used to filter and retrieve TopK, and then according to whether there are sub-center pairs between clusters that meet the threshold requirements, when there are multiple clusters that meet the requirements, take the threshold that meets the requirements The cluster with the largest number of sub-centers is used as the target cluster.
使用基于多中心的增量聚类架构,综合利用了多中心机制中的单一主中心和多个子中心,在TopK近邻搜索的时候,采用主中心参与相似度的计算,然后通过多个子中心和待聚类的单个样本或者聚类簇计算相似度,来进一步确定是否完成单个样本的吸收或者聚类簇的合并。该架构综合利用了多中心表示的优点,能够在不增加过多计算复杂的情况下,同时提高聚类效果。Using a multi-center-based incremental clustering architecture, a single main center and multiple sub-centers in the multi-center mechanism are comprehensively utilized. During the TopK nearest neighbor search, the main center is used to participate in the calculation of similarity, and then through multiple sub-centers and pending The similarity of a single sample or cluster of clusters is calculated to further determine whether the absorption of a single sample or the merging of clusters is completed. This architecture comprehensively utilizes the advantages of multi-center representation, which can improve the clustering effect without increasing too much computational complexity.
聚类簇合并或者新样本加入的时候,需要对子中心进行更新,为了简化计算,可以建模成子中心的聚类,从而实现子中心的合并更新。同时为了防止子中心数据过多,可以对每个子中心根据所代表的样本点数目从大到小排序,例如,最多只取前20个子中心。When clusters are merged or new samples are added, the sub-centers need to be updated. In order to simplify the calculation, it can be modeled as a cluster of sub-centers, so as to realize the merged update of the sub-centers. At the same time, in order to prevent too much sub-center data, each sub-center can be sorted from large to small according to the number of sample points represented, for example, only the first 20 sub-centers are taken at most.
使用聚类簇多中心的增量更新的方式。在实际场景中,随着数据量不断增加,通过子中心的合并更新以及子中心数目的限制,可以防止子中心数目的不断增加,带来过多计算以及存储的负担,同时也可以减少离群干扰点的影响。An incremental update method using cluster polycenters. In actual scenarios, as the amount of data continues to increase, the combined update of sub-centers and the limitation of the number of sub-centers can prevent the continuous increase of the number of sub-centers, which will bring too much computational and storage burden, and can also reduce outliers. The influence of interference points.
本公开实施例中,充分考虑到大规模数据下人脸聚类的复杂情况,In the embodiment of the present disclosure, the complex situation of face clustering under large-scale data is fully considered,
首先,提出了人脸聚类簇多中心的构建方式,可以用这种方式获取人脸聚类簇的单一主中心和多个子中心的描述。解决了聚类簇的描述是维护一个聚类中心,忽略了聚类簇内部一些紧凑的子簇信息的问题,和随着数据不断增加,由于维护单个聚类中心,聚类中心会不断受到新样本的影响,存在一定的中心漂移的风险,同时聚类簇内部已有样 本的影响会不断弱化,减小中心的表达能力的问题。以及,单一的聚类中心在增量聚类过程中会丧失聚类簇内部的样本信息,增量式聚类过程中通常会对每个聚类簇维护单一的聚类中心,数据不断加入的过程中,通过聚类中心去和新的样本或者类簇之间来计算相似度来进行类簇的合并和更新,同时聚类中心也会不断更新。随着数据的不断加入,单一的多中心会逐渐失去簇内部丰富的样本信息,同时也容易发生漂移,从而随着时间积累而影响聚类效果的问题。Firstly, a multi-center construction method of face clusters is proposed, which can be used to obtain the description of a single main center and multiple sub-centers of a face cluster. It solves the problem that the description of a cluster is to maintain a cluster center, ignoring some compact sub-cluster information inside the cluster, and as the data continues to increase, due to the maintenance of a single cluster center, the cluster center will continue to be subject to new changes. The influence of the samples has a certain risk of center drift, and the influence of the existing samples in the cluster will continue to weaken, reducing the expression ability of the center. Also, a single cluster center will lose the sample information inside the cluster during the incremental clustering process. In the incremental clustering process, a single cluster center is usually maintained for each cluster, and data is continuously added. In the process, the clustering center is used to calculate the similarity between new samples or clusters to merge and update the clusters, and the clustering center will also be updated continuously. With the continuous addition of data, a single multi-center will gradually lose the rich sample information within the cluster, and it is also prone to drift, which will affect the clustering effect over time.
其次,提出了一种基于多中心的增量聚类架构,利用该架构,可以很好地平衡采用多中心表示进行增量聚类的计算复杂度和聚类精度,可以实现聚类簇吸收单个样本和聚类簇间的合并,解决了现有技术的多中心的设置在大规模数据场景下会对聚类计算速度和存储带来很大影响的问题。Secondly, an incremental clustering architecture based on multi-center is proposed. Using this architecture, the computational complexity and clustering accuracy of incremental clustering using multi-center representation can be well balanced. The merging of samples and clusters solves the problem that the multi-center setting of the prior art will have a great impact on the computing speed and storage of clustering in large-scale data scenarios.
最后,提出了一种多中心的增量更新的方式,该方法通过子中心间的合并更新,以及子中心数目的限制,使得能够在长时间大规模增量聚类场景下具有良好的聚类效果。基于该方式,可以限制多中心数目的增加,同时消除离群点的影响,解决了现有技术中由于人脸图片的特征一般具有较高的维度,维护多个多中心,在聚类的时候内存压力成倍地增加的问题,和在TopK近邻搜索的时候,使得计算额外成倍地增加的问题。Finally, a multi-center incremental update method is proposed. This method can achieve good clustering in long-term large-scale incremental clustering scenarios through the merged update between sub-centers and the limitation of the number of sub-centers. Effect. Based on this method, the increase of the number of polycenters can be limited, and the influence of outliers can be eliminated at the same time, which solves the problem of maintaining multiple polycenters in the prior art because the features of face pictures generally have high dimensions. The problem of multiplying the memory pressure, and the problem of multiplying the computation extra during the TopK nearest neighbor search.
基于图2或图6所示方法实施例的描述,本公开实施例还提供一种图像的增量聚类装置,请参见图7,图7为本公开实施例提供的一种图像的增量聚类装置的结构示意图,如图7所示,该装置包括:Based on the description of the method embodiment shown in FIG. 2 or FIG. 6 , an embodiment of the present disclosure further provides an apparatus for incremental clustering of images. Please refer to FIG. 7 . FIG. 7 provides an image increment according to an embodiment of the present disclosure. A schematic diagram of the structure of the clustering device, as shown in Figure 7, the device includes:
第一获取模块71,配置为获取第一图像数据集的第一聚类簇;The first acquisition module 71 is configured to acquire the first cluster of the first image data set;
第一分割模块72,配置为将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;A first segmentation module 72, configured to segment the first cluster into M first sub-clusters, and obtain a first cluster center corresponding to each first sub-cluster in the M first sub-clusters; The M is an integer greater than or equal to 1;
合并模块73,配置为获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。The merging module 73 is configured to obtain a second image data set, and use the first cluster center to merge the second image data set and the first cluster.
在一种可能的实施方式中,所述第一聚类簇包括第一聚类簇A、第一聚类簇B和第一聚类簇C;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并方面,合并模块73配置为:在所述第二图像数据集中包括多个图像数据的情况下,对所述多个图像数据进行聚类,得到孤立图像数据和第二聚类簇;利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并;以及,利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并;在所述第二图像数据集中只存在单个图像数据的情况下,利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并。In a possible implementation manner, the first cluster includes a first cluster A, a first cluster B and a first cluster C; In terms of merging the second image data set with the first cluster, the merging module 73 is configured to: if the second image data set includes a plurality of image data, cluster the plurality of image data, obtaining isolated image data and a second cluster; using the first cluster center to merge the isolated image data with the first cluster A; and using the first cluster center to combine the first cluster The two clusters are merged with the first cluster B; in the case that there is only a single image data in the second image data set, the first cluster center is used to combine the single image data with the second image data A cluster C is merged.
在一种可能的实施方式中,所述第一聚类簇存在对应的第二聚类中心;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之前,合并模块73还配置为:利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇。In a possible implementation manner, the first cluster has a corresponding second cluster center; when using the first cluster center to associate the second image data set with the first cluster Before merging, the merging module 73 is further configured to: determine K first clusters from the first clusters by using the second cluster center.
在一种可能的实施方式中,所述第二聚类簇存在对应的第三聚类中心;在利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇方面,合并模块73配置为:获取所述孤立图像数据与所述第二聚类中心之间的第一相似度;根据所述第一相似度从高到低对所述第一聚类簇进行排序得到第一聚类簇序列,选取所述第一聚类簇序列中前K个第一聚类簇;以及,获取所述第三聚类中心与所述第二聚类中心之间的第二相似度;根据所述第二相似度从高到低对所述第一聚类簇进行排序得到第二聚类簇序列,选取所述第二聚类簇序列中前K个第一聚类簇;或者,获取所述单个图像数据与所述第二聚类中心之间的第三相似度;根据所述第三相似度从高到低对所述第一聚类簇进行排序得到第三聚类簇序列,选取所述第三聚类簇序列中前K个第一聚类簇。In a possible implementation manner, the second cluster has a corresponding third cluster center; after using the second cluster center to determine K first clusters from the first cluster In terms of clusters, the merging module 73 is configured to: obtain a first similarity between the isolated image data and the second cluster center; Sorting the clusters to obtain a first cluster sequence, and selecting the top K first clusters in the first cluster sequence; and obtaining the distance between the third cluster center and the second cluster center the second similarity; sort the first clusters according to the second similarity from high to low to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence Clustering; or, obtaining a third similarity between the single image data and the second cluster center; sorting the first clusters according to the third similarity from high to low to obtain For the third cluster sequence, the top K first clusters in the third cluster sequence are selected.
在一种可能的实施方式中,在利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并方面,合并模块73配置为:获取所述孤立图像数据与第一聚类中心D 之间的第四相似度;所述第一聚类中心D为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第四相似度大于第一阈值的所述第一聚类中心D的第一数量;将所述K个第一聚类簇中所述第一数量最大的第一聚类簇确定为所述第一聚类簇A;将所述孤立图像数据与所述第一聚类簇A合并。In a possible implementation manner, in terms of merging the isolated image data with the first cluster A by using the first cluster center, the merging module 73 is configured to: obtain the isolated image data and the first cluster A. A fourth similarity between cluster centers D; the first cluster center D is the The first cluster center; for each first cluster in the K first clusters, determine that the fourth similarity in each first cluster is greater than the first threshold. The first number of the first cluster centers D; the first cluster with the largest number of the K first clusters is determined as the first cluster A; the isolated image The data is merged with the first cluster A.
在一种可能的实施方式中,在利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并方面,合并模块73配置为:将所述第二聚类簇分割为N个第二子簇,并获取所述N个第二子簇中每个第二子簇对应的第四聚类中心;所述N为大于或等于1的整数;获取所述第四聚类中心与第一聚类中心E之间的第五相似度;所述第一聚类中心E为K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第五相似度大于第二阈值的所述第一聚类中心E的第二数量;将所述K个第一聚类簇中所述第二数量最大的第一聚类簇确定为所述第一聚类簇B;将所述第二聚类簇与所述第一聚类簇B合并。In a possible implementation manner, in terms of merging the second cluster with the first cluster B by using the first cluster center, the merging module 73 is configured to: combine the second cluster The cluster is divided into N second subclusters, and the fourth cluster center corresponding to each second subcluster in the N second subclusters is obtained; the N is an integer greater than or equal to 1; obtain the The fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each first child of each first cluster in the K first clusters the first cluster center corresponding to the cluster; for each first cluster in the K first clusters, determine that the fifth similarity in each first cluster is greater than the first cluster The second number of the first cluster centers E with two thresholds; the first cluster with the largest second number among the K first clusters is determined as the first cluster B; The second cluster is merged with the first cluster B.
在一种可能的实施方式中,在利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并方面,合并模块73配置为:获取所述单个图像数据与第一聚类中心F之间的第六相似度;所述第一聚类中心F为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第六相似度大于第三阈值的所述第一聚类中心F的第三数量;将所述K个第一聚类簇中所述第三数量最大的第一聚类簇确定为所述第一聚类簇C;将所述单个图像数据与所述第一聚类簇C合并。In a possible implementation manner, in terms of merging the single image data and the first cluster C by using the first cluster center, the merging module 73 is configured to: obtain the single image data and the first cluster C. a sixth degree of similarity between cluster centers F; the first cluster center F is the The first cluster center; for each first cluster in the K first clusters, determine the sixth similarity greater than the third threshold in each first cluster. the third number of the first cluster centers F; the first cluster with the third largest number of the K first clusters is determined as the first cluster C; the single image The data is merged with the first cluster C.
在一种可能的实施方式中,所述M小于或等于第四阈值;第一分割模块72还配置为:将合并后的第一聚类簇分割为R个第三子簇,并获取所述R个第三子簇中每个第三子簇的第五聚类中心;所述R为大于或等于1的整数;在所述R小于或等于所述第四阈值的情况下,保留所述R个第三子簇,并用所述R个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;在所述R大于所述第四阈值的情况下,获取所述R个第三子簇中每个第三子簇中的图像数据的第四数量;根据所述第四数量从大到小对所述R个第三子簇进行排序得到第四聚类簇序列,选取所述第四聚类簇序列中前P个第三子簇,并用所述P个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;所述P小于或等于所述第四阈值。In a possible implementation manner, the M is less than or equal to a fourth threshold; the first dividing module 72 is further configured to: divide the merged first cluster into R third sub-clusters, and obtain the The fifth cluster center of each third sub-cluster in the R third sub-clusters; the R is an integer greater than or equal to 1; when the R is less than or equal to the fourth threshold, keep the R third sub-clusters, and the first cluster center is updated with the fifth cluster center corresponding to the R third sub-clusters; when the R is greater than the fourth threshold, Obtain the fourth quantity of image data in each of the R third sub-clusters; sort the R third sub-clusters according to the fourth quantity in descending order to obtain a fourth cluster Cluster-like sequence, select the first P third subclusters in the fourth clustering cluster sequence, and use the fifth clustering centers corresponding to the P third subclusters to update the first clustering center ; the P is less than or equal to the fourth threshold.
在一种可能的实施方式中,在将所述第一聚类簇分割为M个第一子簇方面,第一分割模块72配置为:获取所述第一聚类簇中的图像数据之间的第七相似度,得到相似度矩阵;基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇。In a possible implementation manner, in the aspect of dividing the first cluster into M first sub-clusters, the first dividing module 72 is configured to: acquire between the image data in the first cluster The seventh similarity is obtained, and a similarity matrix is obtained; the first cluster is divided into the M first sub-clusters based on the similarity matrix.
在一种可能的实施方式中,在基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇方面,第一分割模块72配置为:获取以所述第一聚类簇中的图像数据为顶点构成的连通图;从所述相似度矩阵中查询得到所述连通图中的顶点之间的所述第七相似度;将所述第七相似度大于第五阈值的多个顶点分割为一个第一子簇,得到所述M个第一子簇。In a possible implementation manner, in terms of dividing the first cluster into the M first sub-clusters based on the similarity matrix, the first dividing module 72 is configured to: obtain the first sub-cluster with the first sub-cluster The image data in the cluster is a connected graph composed of vertices; the seventh similarity between the vertices in the connected graph is obtained by querying the similarity matrix; the seventh similarity is greater than the fifth similarity The multiple vertices of the threshold are divided into a first sub-cluster to obtain the M first sub-clusters.
根据本公开的一个实施例,图7所示的图像的增量聚类装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本公开的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本公开的其它实施例中,基于图像的增量聚类装置也可以包括其它单元,在实际应用中,这些功能也可以 由其它单元协助实现,并且可以由多个单元协作实现。According to an embodiment of the present disclosure, each unit in the apparatus for incremental clustering of images shown in FIG. 7 may be respectively or all merged into one or several other units to form, or some of the unit(s) may be further It can be further divided into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure. The above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit. In other embodiments of the present disclosure, the image-based incremental clustering apparatus may also include other units, and in practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by a plurality of units in cooperation.
根据本公开的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图2或图6中所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图7所示的图像的增量聚类装置设备,以及来实现本公开实施例的图像的增量聚类方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。According to another embodiment of the present disclosure, a general-purpose computing device, such as a computer, may be implemented on a general-purpose computing device, such as a computer, including processing elements such as a central processing unit (CPU), random access storage medium (RAM), read-only storage medium (ROM), etc., and storage elements. Running a computer program (including program code) capable of executing the steps involved in the corresponding method as shown in FIG. 2 or FIG. 6, to construct the incremental clustering apparatus of the image as shown in FIG. 7, and to realize the present invention. Incremental clustering methods for images of disclosed embodiments. The computer program can be recorded on, for example, a computer-readable recording medium, and loaded in the above-mentioned computing device through the computer-readable recording medium, and executed therein.
基于上述方法实施例和装置实施例的描述,本公开实施例还提供一种电子设备。请参见图8,该电子设备至少包括处理器81、输入设备82、输出设备83以及计算机存储介质84。其中,电子设备内的处理器81、输入设备82、输出设备83以及计算机存储介质84可通过总线或其他方式连接。Based on the descriptions of the foregoing method embodiments and apparatus embodiments, the embodiments of the present disclosure further provide an electronic device. Referring to FIG. 8 , the electronic device includes at least a processor 81 , an input device 82 , an output device 83 and a computer storage medium 84 . The processor 81 , the input device 82 , the output device 83 and the computer storage medium 84 in the electronic device may be connected through a bus or other means.
计算机存储介质84可以存储在电子设备的存储器中,所述计算机存储介质84配置为存储计算机程序,所述计算机程序包括程序指令,所述处理器81配置为执行所述计算机存储介质84存储的程序指令。处理器81(或称CPU(Central Processing Unit,中央处理器))是电子设备的计算核心以及控制核心,其适于实现一条或多条指令,适于加载并执行一条或多条指令从而实现相应方法流程或相应功能。The computer storage medium 84 may be stored in the memory of the electronic device, the computer storage medium 84 configured to store a computer program including program instructions, the processor 81 configured to execute the program stored by the computer storage medium 84 instruction. The processor 81 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and the control core of the electronic device, which is suitable for implementing one or more instructions, and is suitable for loading and executing one or more instructions to achieve the corresponding Method flow or corresponding function.
在一个实施例中,本公开实施例提供的电子设备的处理器81可以配置为进行一系列图像的增量聚类处理:In one embodiment, the processor 81 of the electronic device provided by the embodiment of the present disclosure may be configured to perform incremental clustering processing of a series of images:
获取第一图像数据集的第一聚类簇;obtaining the first cluster of the first image data set;
将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。Divide the first cluster into M first subclusters, and obtain the first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or equal to 1 an integer of ; obtain a second image data set, and combine the second image data set with the first cluster by using the first cluster center.
再一个实施例中,所述第一聚类簇包括第一聚类簇A、第一聚类簇B和第一聚类簇C;处理器81执行所述利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并,包括:在所述第二图像数据集中包括多个图像数据的情况下,对所述多个图像数据进行聚类,得到孤立图像数据和第二聚类簇;利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并;以及,利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并;在所述第二图像数据集中只存在单个图像数据的情况下,利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并。In yet another embodiment, the first cluster includes a first cluster A, a first cluster B and a first cluster C; the processor 81 executes the process of using the first cluster center to Combining the second image data set with the first cluster includes: in the case that the second image data set includes a plurality of image data, clustering the plurality of image data to obtain an isolated image data and a second cluster; combining the isolated image data with the first cluster A using the first cluster center; and combining the second cluster using the first cluster center The cluster is merged with the first cluster cluster B; in the case that there is only a single image data in the second image data set, the single image data is combined with the first cluster by using the first cluster center Cluster C is merged.
再一个实施例中,所述第一聚类簇存在对应的第二聚类中心;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之前,处理器81还配置为执行:利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇。In yet another embodiment, the first cluster has a corresponding second cluster center; before using the first cluster center to merge the second image data set and the first cluster, The processor 81 is further configured to perform: using the second cluster center to determine K first clusters from the first clusters.
再一个实施例中,所述第二聚类簇存在对应的第三聚类中心;处理器81执行所述利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇,包括:获取所述孤立图像数据与所述第二聚类中心之间的第一相似度;根据所述第一相似度从高到低对所述第一聚类簇进行排序得到第一聚类簇序列,选取所述第一聚类簇序列中前K个第一聚类簇;以及,获取所述第三聚类中心与所述第二聚类中心之间的第二相似度;根据所述第二相似度从高到低对所述第一聚类簇进行排序得到第二聚类簇序列,选取所述第二聚类簇序列中前K个第一聚类簇;或者,获取所述单个图像数据与所述第二聚类中心之间的第三相似度;根据所述第三相似度从高到低对所述第一聚类簇进行排序得到第三聚类簇序列,选取所述第三聚类簇序列中前K个第一聚类簇。In yet another embodiment, the second cluster has a corresponding third cluster center; the processor 81 executes the process of determining the Kth cluster from the first cluster by using the second cluster center. a cluster, including: acquiring a first similarity between the isolated image data and the second cluster center; sorting the first clusters according to the first similarity from high to low Obtain the first cluster sequence, and select the top K first clusters in the first cluster sequence; and, obtain the second cluster center between the third cluster center and the second cluster center. similarity; sort the first clusters from high to low according to the second similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence ; Or, obtain the third similarity between the single image data and the second cluster center; sort the first cluster according to the third similarity from high to low to obtain the third cluster Cluster sequence, select the top K first clusters in the third cluster sequence.
再一个实施例中,处理器81执行所述利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并,包括:获取所述孤立图像数据与第一聚类中心D之间的第四相似度;所述第一聚类中心D为所述K个第一聚类簇中每个第一聚类簇的每个第一 子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第四相似度大于第一阈值的所述第一聚类中心D的第一数量;将所述K个第一聚类簇中所述第一数量最大的第一聚类簇确定为所述第一聚类簇A;将所述孤立图像数据与所述第一聚类簇A合并。In yet another embodiment, the processor 81 performing the process of using the first cluster center to merge the isolated image data with the first cluster A includes: acquiring the isolated image data and the first cluster A. The fourth similarity between centers D; the first cluster center D is the first cluster corresponding to each first sub-cluster of each first cluster in the K first clusters Class center; for each of the K first clusters, determine the first cluster whose fourth similarity is greater than a first threshold in each of the first clusters the first number of cluster centers D; determine the first cluster with the largest first number among the K first clusters as the first cluster A; combine the isolated image data with all The first cluster cluster A is merged.
再一个实施例中,处理器81执行所述利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并,包括:将所述第二聚类簇分割为N个第二子簇,并获取所述N个第二子簇中每个第二子簇对应的第四聚类中心;所述N为大于或等于1的整数;获取所述第四聚类中心与第一聚类中心E之间的第五相似度;所述第一聚类中心E为K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第五相似度大于第二阈值的所述第一聚类中心E的第二数量;将所述K个第一聚类簇中所述第二数量最大的第一聚类簇确定为所述第一聚类簇B;将所述第二聚类簇与所述第一聚类簇B合并。In yet another embodiment, the processor 81 performing the using the first cluster center to merge the second cluster with the first cluster B includes: dividing the second cluster is N second subclusters, and obtains the fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is an integer greater than or equal to 1; obtain the fourth cluster center The fifth similarity between the class center and the first cluster center E; the first cluster center E is the corresponding value of each first sub-cluster of each first cluster in the K first clusters the first cluster center; for each first cluster in the K first clusters, determine that the fifth similarity in each first cluster is greater than the second threshold; the second number of the first cluster centers E; the first cluster with the largest second number among the K first clusters is determined as the first cluster B; the The second cluster is merged with the first cluster B.
再一个实施例中,处理器81执行所述利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并,包括:获取所述单个图像数据与第一聚类中心F之间的第六相似度;所述第一聚类中心F为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第六相似度大于第三阈值的所述第一聚类中心F的第三数量;将所述K个第一聚类簇中所述第三数量最大的第一聚类簇确定为所述第一聚类簇C;将所述单个图像数据与所述第一聚类簇C合并。In yet another embodiment, the processor 81 performing the merging of the single image data and the first cluster C by using the first cluster center includes: acquiring the single image data and the first cluster C. The sixth similarity between centers F; the first cluster center F is the first cluster corresponding to each first sub-cluster of each first cluster in the K first clusters class center; for each first cluster in the K first clusters, determine the first cluster whose sixth similarity is greater than a third threshold in each first cluster the third number of class centers F; determine the first cluster with the largest third number among the K first clusters as the first cluster C; combine the single image data with all The first cluster cluster C is merged.
再一个实施例中,所述M小于或等于第四阈值;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之后,处理器81还配置为执行:将合并后的第一聚类簇分割为R个第三子簇,并获取所述R个第三子簇中每个第三子簇的第五聚类中心;所述R为大于或等于1的整数;在所述R小于或等于所述第四阈值的情况下,保留所述R个第三子簇,并用所述R个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;在所述R大于所述第四阈值的情况下,获取所述R个第三子簇中每个第三子簇中的图像数据的第四数量;根据所述第四数量从大到小对所述R个第三子簇进行排序得到第四聚类簇序列,选取所述第四聚类簇序列中前P个第三子簇,并用所述P个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;所述P小于或等于所述第四阈值。In yet another embodiment, the M is less than or equal to a fourth threshold; after merging the second image data set and the first cluster by using the first cluster center, the processor 81 is further configured to: Execute: divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or an integer equal to 1; when the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the fifth cluster center pair corresponding to the R third subclusters is used The first cluster center is updated; when the R is greater than the fourth threshold, obtain the fourth quantity of image data in each of the R third sub-clusters; according to Sorting the R third subclusters in descending order of the fourth number to obtain a fourth clustering cluster sequence, selecting the first P third subclusters in the fourth clustering clustering sequence, and using the P The fifth cluster center corresponding to the third sub-cluster updates the first cluster center; the P is less than or equal to the fourth threshold.
再一个实施例中,所述第一聚类簇通过对所述第一图像数据集中的图像数据进行聚类得到;处理器81执行所述将所述第一聚类簇分割为M个第一子簇,包括:获取所述第一聚类簇中的图像数据之间的第七相似度,得到相似度矩阵;基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇。In yet another embodiment, the first cluster is obtained by clustering the image data in the first image data set; the processor 81 executes the process of dividing the first cluster into M first clusters. sub-cluster, including: acquiring the seventh similarity between the image data in the first cluster to obtain a similarity matrix; dividing the first cluster into the M based on the similarity matrix first subcluster.
再一个实施例中,处理器81执行所述基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇,包括:获取以所述第一聚类簇中的图像数据为顶点构成的连通图;从所述相似度矩阵中查询得到所述连通图中的顶点之间的所述第七相似度;将所述第七相似度大于第五阈值的多个顶点分割为一个第一子簇,得到所述M个第一子簇。In yet another embodiment, the processor 81 performing the dividing the first cluster into the M first sub-clusters based on the similarity matrix includes: obtaining a The image data is a connected graph composed of vertices; the seventh similarity between the vertices in the connected graph is obtained by querying the similarity matrix; the plurality of vertices whose seventh similarity is greater than the fifth threshold are obtained Divide into a first sub-cluster to obtain the M first sub-clusters.
示例性的,上述电子设备可以是电脑、电脑主机、服务器、云服务器、服务器集群等,电子设备可包括但不仅限于处理器81、输入设备82、输出设备83以及计算机存储介质84,输入设备82可以是键盘、触摸屏等,输出设备83可以是扬声器、显示器、射频发送器等。本领域技术人员可以理解,所述示意图可以是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。Exemplarily, the above-mentioned electronic devices may be computers, computer hosts, servers, cloud servers, server clusters, etc. The electronic devices may include, but are not limited to, a processor 81, an input device 82, an output device 83, and a computer storage medium 84. The input device 82 It can be a keyboard, a touch screen, etc., and the output device 83 can be a speaker, a display, a radio frequency transmitter, and the like. Those skilled in the art can understand that the schematic diagram may be an example of an electronic device, and does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or different components.
需要说明的是,由于电子设备的处理器81执行计算机程序时实现上述的图像的增 量聚类方法中的步骤,因此上述图像的增量聚类方法的实施例均适用于该电子设备,且均能达到相同或相似的有益效果。It should be noted that, since the processor 81 of the electronic device implements the steps in the above-mentioned incremental image clustering method when executing the computer program, the above-mentioned embodiments of the incremental image clustering method are all applicable to the electronic device, and can achieve the same or similar beneficial effects.
本公开实施例还提供一种计算机程序产品,该计算机程序产品被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以通过硬件、软件或其结合的方式实现。在本公开的一些实施例中,所述计算机程序产品体现为计算机存储介质,在本公开的另一些实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Embodiments of the present disclosure further provide a computer program product, which implements any one of the methods in the foregoing embodiments when the computer program product is executed by a processor. The computer program product can be implemented in hardware, software or a combination thereof. In some embodiments of the present disclosure, the computer program product is embodied as a computer storage medium, and in other embodiments of the present disclosure, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
本公开实施例还提供了一种计算机存储介质(Memory),所述计算机存储介质是电子设备中的记忆设备,配置为存放程序和数据。可以理解的是,此处的计算机存储介质既可以包括终端中的内置存储介质,当然也可以包括终端所支持的扩展存储介质。计算机存储介质提供存储空间,该存储空间存储了终端的操作系统。并且,在该存储空间中还存放了适于被处理器81加载并执行的一条或多条的指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。需要说明的是,此处的计算机存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(Non-Volatile Memory),例如至少一个磁盘存储器;在本公开的一些实施例中,还可以是至少一个位于远离前述处理器81的计算机存储介质。在一个实施例中,可由处理器81加载并执行计算机存储介质中存放的一条或多条指令,以实现上述有关图像的增量聚类方法的相应步骤。Embodiments of the present disclosure also provide a computer storage medium (Memory), where the computer storage medium is a memory device in an electronic device and is configured to store programs and data. It can be understood that, the computer storage medium here may include both a built-in storage medium in the terminal, and certainly also an extended storage medium supported by the terminal. The computer storage medium provides storage space, and the storage space stores the operating system of the terminal. In addition, one or more instructions suitable for being loaded and executed by the processor 81 are also stored in the storage space, and these instructions may be one or more computer programs (including program codes). It should be noted that the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; in some embodiments of the present disclosure, it may also be at least one disk memory. A computer storage medium located remotely from the aforementioned processor 81 . In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 81 to implement the corresponding steps of the above-mentioned method for incremental clustering of images.
示例性的,计算机存储介质的计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。Exemplarily, the computer program of the computer storage medium includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc.
需要说明的是,由于计算机存储介质的计算机程序被处理器执行时实现上述的图像的增量聚类方法中的步骤,因此上述图像的增量聚类方法的所有实施例均适用于该计算机存储介质,且均能达到相同或相似的有益效果。It should be noted that, since the computer program of the computer storage medium is executed by the processor to realize the steps in the above-mentioned incremental image clustering method, all the embodiments of the above-mentioned incremental image clustering method are applicable to the computer storage medium. medium, and can achieve the same or similar beneficial effects.
以上对本公开实施例进行了详细介绍,本文中应用了个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。The embodiments of the present disclosure have been introduced in detail above, and the principles and implementations of the present disclosure are described in this document by applying an example. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure; at the same time, for the present disclosure Persons of ordinary skill in the art, according to the idea of the present disclosure, will have changes in the implementation manner and application scope. In summary, the contents of this specification should not be construed as limiting the present disclosure.
工业实用性Industrial Applicability
本实施例中,将第一聚类簇分割为多个第一子簇,基于第一子簇的第一聚类中心实现第一聚类簇对第二图像数据集的合并,通过维护多个第一聚类中心来解决随着图像数据的增多,聚类中心会受到新增图像数据的影响而产生漂移的问题,从而有利于使聚类结果更为准确,以提高聚类效果。In this embodiment, the first cluster is divided into a plurality of first sub-clusters, and the first cluster cluster is combined with the second image data set based on the first cluster center of the first sub-cluster. The first clustering center is used to solve the problem that with the increase of image data, the clustering center will drift due to the influence of the newly added image data, which is conducive to making the clustering result more accurate and improving the clustering effect.

Claims (23)

  1. 一种图像的增量聚类方法,所述方法包括:A method for incremental clustering of images, the method comprising:
    获取第一图像数据集的第一聚类簇;obtaining the first cluster of the first image data set;
    将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;Divide the first cluster into M first subclusters, and obtain the first cluster center corresponding to each first subcluster in the M first subclusters; the M is greater than or equal to 1 the integer;
    获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。Acquire a second image data set, and combine the second image data set with the first cluster by using the first cluster center.
  2. 根据权利要求1所述的方法,其中,所述第一聚类簇包括第一聚类簇A、第一聚类簇B和第一聚类簇C;所述利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并,包括:The method according to claim 1, wherein the first cluster includes a first cluster A, a first cluster B and a first cluster C; the use of the first cluster center Combining the second image dataset with the first cluster includes:
    在所述第二图像数据集中包括多个图像数据的情况下,对所述多个图像数据进行聚类,得到孤立图像数据和第二聚类簇;In the case that the second image data set includes a plurality of image data, clustering the plurality of image data to obtain isolated image data and a second cluster;
    利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并;以及,利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并;combining the isolated image data with the first cluster A using the first cluster center; and combining the second cluster with the first cluster using the first cluster center Cluster B merge;
    在所述第二图像数据集中只存在单个图像数据的情况下,利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并。When only a single image data exists in the second image data set, the single image data is merged with the first cluster C by using the first cluster center.
  3. 根据权利要求2所述的方法,其中,所述第一聚类簇存在对应的第二聚类中心;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之前,所述方法还包括:The method according to claim 2, wherein the first cluster has a corresponding second cluster center; when using the first cluster center to combine the second image data set with the first cluster Before the cluster merging, the method further includes:
    利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇。K first clusters are determined from the first clusters by using the second cluster centers.
  4. 根据权利要求3所述的方法,其中,所述第二聚类簇存在对应的第三聚类中心;所述利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇,包括:The method according to claim 3, wherein the second cluster has a corresponding third cluster center; the second cluster center is used to determine K from the first cluster The first cluster, including:
    获取所述孤立图像数据与所述第二聚类中心之间的第一相似度;obtaining the first similarity between the isolated image data and the second cluster center;
    根据所述第一相似度从高到低对所述第一聚类簇进行排序得到第一聚类簇序列,选取所述第一聚类簇序列中前K个第一聚类簇;以及,Sort the first clusters from high to low according to the first similarity to obtain a first cluster sequence, and select the top K first clusters in the first cluster sequence; and,
    获取所述第三聚类中心与所述第二聚类中心之间的第二相似度;obtaining the second similarity between the third cluster center and the second cluster center;
    根据所述第二相似度从高到低对所述第一聚类簇进行排序得到第二聚类簇序列,选取所述第二聚类簇序列中前K个第一聚类簇;或者,Sort the first clusters from high to low according to the second similarity to obtain a second cluster sequence, and select the top K first clusters in the second cluster sequence; or,
    获取所述单个图像数据与所述第二聚类中心之间的第三相似度;obtaining the third similarity between the single image data and the second cluster center;
    根据所述第三相似度从高到低对所述第一聚类簇进行排序得到第三聚类簇序列,选取所述第三聚类簇序列中前K个第一聚类簇。Sort the first clusters from high to low according to the third similarity to obtain a third cluster sequence, and select the top K first clusters in the third cluster sequence.
  5. 根据权利要求3所述的方法,其中,所述利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并,包括:The method according to claim 3, wherein the combining the isolated image data with the first cluster A by using the first cluster center comprises:
    获取所述孤立图像数据与第一聚类中心D之间的第四相似度;所述第一聚类中心D为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;Obtain the fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster;
    对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第四相似度大于第一阈值的所述第一聚类中心D的第一数量;For each first cluster in the K first clusters, determine the first cluster center D whose fourth similarity is greater than a first threshold in each first cluster the first quantity;
    将所述K个第一聚类簇中所述第一数量最大的第一聚类簇确定为所述第一聚类簇A;Determining the first cluster with the largest first number among the K first clusters as the first cluster A;
    将所述孤立图像数据与所述第一聚类簇A合并。The isolated image data is merged with the first cluster A.
  6. 根据权利要求3所述的方法,其中,所述利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并,包括:The method according to claim 3, wherein the combining the second cluster cluster with the first cluster cluster B by using the first cluster center comprises:
    将所述第二聚类簇分割为N个第二子簇,并获取所述N个第二子簇中每个第二子簇对应的第四聚类中心;所述N为大于或等于1的整数;Divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; the N is greater than or equal to 1 the integer;
    获取所述第四聚类中心与第一聚类中心E之间的第五相似度;所述第一聚类中心E为K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;Obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster;
    对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第五相似度大于第二阈值的所述第一聚类中心E的第二数量;For each first cluster in the K first clusters, determine the first cluster center E whose fifth similarity is greater than a second threshold in each first cluster the second quantity;
    将所述K个第一聚类簇中所述第二数量最大的第一聚类簇确定为所述第一聚类簇B;Determining the first cluster with the largest second number among the K first clusters as the first cluster B;
    将所述第二聚类簇与所述第一聚类簇B合并。The second cluster is merged with the first cluster B.
  7. 根据权利要求3所述的方法,其中,所述利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并,包括:The method according to claim 3, wherein the combining the single image data with the first cluster C using the first cluster center comprises:
    获取所述单个图像数据与第一聚类中心F之间的第六相似度;所述第一聚类中心F为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;Obtain the sixth similarity between the single image data and the first cluster center F; the first cluster center F is each of the first clusters in the K first clusters the first cluster center corresponding to the first sub-cluster;
    对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第六相似度大于第三阈值的所述第一聚类中心F的第三数量;For each first cluster in the K first clusters, determine the first cluster center F whose sixth similarity is greater than a third threshold in each first cluster the third quantity;
    将所述K个第一聚类簇中所述第三数量最大的第一聚类簇确定为所述第一聚类簇C;Determining the first cluster with the third largest number of the K first clusters as the first cluster C;
    将所述单个图像数据与所述第一聚类簇C合并。The single image data is merged with the first cluster C.
  8. 根据权利要求1至7任一项所述的方法,其中,所述M小于或等于第四阈值;在利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并之后,所述方法还包括:The method according to any one of claims 1 to 7, wherein the M is less than or equal to a fourth threshold; when using the first cluster center to associate the second image data set with the first cluster After the clusters are merged, the method further includes:
    将合并后的第一聚类簇分割为R个第三子簇,并获取所述R个第三子簇中每个第三子簇的第五聚类中心;所述R为大于或等于1的整数;Divide the merged first cluster into R third subclusters, and obtain the fifth cluster center of each third subcluster in the R third subclusters; the R is greater than or equal to 1 the integer;
    在所述R小于或等于所述第四阈值的情况下,保留所述R个第三子簇,并用所述R个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;In the case that the R is less than or equal to the fourth threshold, the R third subclusters are reserved, and the first clustering The class center is updated;
    在所述R大于所述第四阈值的情况下,获取所述R个第三子簇中每个第三子簇中的图像数据的第四数量;In the case that the R is greater than the fourth threshold, acquiring a fourth quantity of image data in each of the R third sub-clusters;
    根据所述第四数量从大到小对所述R个第三子簇进行排序得到第四聚类簇序列,选取所述第四聚类簇序列中前P个第三子簇,并用所述P个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;所述P小于或等于所述第四阈值。Sort the R third sub-clusters from large to small according to the fourth number to obtain a fourth cluster sequence, select the first P third sub-clusters in the fourth cluster sequence, and use the The fifth cluster centers corresponding to the P third subclusters update the first cluster centers; the P is less than or equal to the fourth threshold.
  9. 根据权利要求1至7任一项所述的方法,其中,所述第一聚类簇通过对所述第一图像数据集中的图像数据进行聚类得到;所述将所述第一聚类簇分割为M个第一子簇,包括:The method according to any one of claims 1 to 7, wherein the first cluster is obtained by clustering image data in the first image data set; Divide into M first subclusters, including:
    获取所述第一聚类簇中的图像数据之间的第七相似度,得到相似度矩阵;obtaining the seventh similarity between the image data in the first cluster to obtain a similarity matrix;
    基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇。The first cluster is divided into the M first sub-clusters based on the similarity matrix.
  10. 根据权利要求9所述的方法,其中,所述基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇,包括:The method according to claim 9, wherein the dividing the first cluster into the M first sub-clusters based on the similarity matrix comprises:
    获取以所述第一聚类簇中的图像数据为顶点构成的连通图;obtaining a connected graph composed of image data in the first cluster as vertices;
    从所述相似度矩阵中查询得到所述连通图中的顶点之间的所述第七相似度;Obtain the seventh similarity between vertices in the connected graph by querying the similarity matrix;
    将所述第七相似度大于第五阈值的多个顶点分割为一个第一子簇,得到所述M个第一子簇。The plurality of vertices with the seventh similarity greater than the fifth threshold are divided into a first subcluster to obtain the M first subclusters.
  11. 一种图像的增量聚类装置,所述装置包括:A device for incremental clustering of images, the device comprising:
    第一获取模块,配置为获取第一图像数据集的第一聚类簇;a first acquisition module, configured to acquire the first cluster of the first image data set;
    第一分割模块,配置为将所述第一聚类簇分割为M个第一子簇,并获取所述M个第一子簇中每个第一子簇对应的第一聚类中心;所述M为大于或等于1的整数;a first dividing module, configured to divide the first cluster into M first sub-clusters, and obtain a first cluster center corresponding to each first sub-cluster in the M first sub-clusters; The above M is an integer greater than or equal to 1;
    合并模块,配置为获取第二图像数据集,利用所述第一聚类中心将所述第二图像数据集与所述第一聚类簇合并。The merging module is configured to obtain a second image data set, and use the first cluster center to merge the second image data set and the first cluster.
  12. 根据权利要求11所述的装置,其中,所述第一聚类簇包括第一聚类簇A、第一聚类簇B和第一聚类簇C;所述合并模块包括:The apparatus according to claim 11, wherein the first cluster includes a first cluster A, a first cluster B and a first cluster C; the merging module includes:
    聚类子模块,配置为在所述第二图像数据集中包括多个图像数据的情况下,对所述多个图像数据进行聚类,得到孤立图像数据和第二聚类簇;a clustering submodule, configured to perform clustering on the plurality of image data when the second image data set includes a plurality of image data to obtain isolated image data and a second cluster;
    第一合并子模块,配置为利用所述第一聚类中心将所述孤立图像数据与所述第一聚类簇A合并;以及,a first merging submodule configured to use the first cluster center to merge the isolated image data with the first cluster A; and,
    第二合并子模块,配置为利用所述第一聚类中心将所述第二聚类簇与所述第一聚类簇B合并;a second merging submodule, configured to use the first cluster center to merge the second cluster with the first cluster B;
    第三合并子模块,配置为在所述第二图像数据集中只存在单个图像数据的情况下,利用所述第一聚类中心将所述单个图像数据与所述第一聚类簇C合并。A third merging submodule is configured to merge the single image data with the first cluster C by using the first cluster center when only a single image data exists in the second image data set.
  13. 根据权利要求12所述的装置,所述第一聚类簇存在对应的第二聚类中心;所述合并模块还包括:The device according to claim 12, wherein the first cluster has a corresponding second cluster center; the merging module further comprises:
    第一确定子模块,配置为利用所述第二聚类中心从所述第一聚类簇中确定出K个第一聚类簇。The first determination submodule is configured to determine K first clusters from the first clusters by using the second cluster centers.
  14. 根据权利要求13所述的装置,所述第二聚类簇存在对应的第三聚类中心;所述第一确定子模块包括:The device according to claim 13, wherein the second cluster has a corresponding third cluster center; the first determination submodule comprises:
    第一获取单元,配置为获取所述孤立图像数据与所述第二聚类中心之间的第一相似度;a first obtaining unit, configured to obtain a first similarity between the isolated image data and the second cluster center;
    第一排序单元,配置为根据所述第一相似度从高到低对所述第一聚类簇进行排序得到第一聚类簇序列,选取所述第一聚类簇序列中前K个第一聚类簇;以及,A first sorting unit, configured to sort the first clusters according to the first similarity from high to low to obtain a first cluster sequence, and select the first Kth in the first cluster sequence a cluster; and,
    第二获取单元,配置为获取所述第三聚类中心与所述第二聚类中心之间的第二相似度;a second obtaining unit, configured to obtain a second similarity between the third cluster center and the second cluster center;
    第二排序单元,配置为根据所述第二相似度从高到低对所述第一聚类簇进行排序得到第二聚类簇序列,选取所述第二聚类簇序列中前K个第一聚类簇;或者,The second sorting unit is configured to sort the first clusters according to the second similarity from high to low to obtain a second cluster sequence, and select the top K th cluster in the second cluster sequence a cluster; or,
    第三获取单元,配置为获取所述单个图像数据与所述第二聚类中心之间的第三相似度;a third obtaining unit, configured to obtain a third similarity between the single image data and the second cluster center;
    第三排序单元,配置为根据所述第三相似度从高到低对所述第一聚类簇进行排序得到第三聚类簇序列,选取所述第三聚类簇序列中前K个第一聚类簇。A third sorting unit, configured to sort the first clusters according to the third similarity from high to low to obtain a third cluster sequence, and select the top K th cluster in the third cluster sequence A cluster of clusters.
  15. 根据权利要求13所述的装置,所述第一合并子模块包括:The apparatus according to claim 13, the first merging submodule comprises:
    第四获取单元,配置为获取所述孤立图像数据与第一聚类中心D之间的第四相似度;所述第一聚类中心D为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;A fourth obtaining unit, configured to obtain a fourth similarity between the isolated image data and the first cluster center D; the first cluster center D is each of the K first clusters. the first cluster center corresponding to each first sub-cluster of a cluster;
    第一确定单元,配置为对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第四相似度大于第一阈值的所述第一聚类中心D的第一数量;A first determining unit, configured to, for each of the K first clusters, determine all of the first clusters whose fourth similarity is greater than a first threshold in each of the first clusters; Describe the first quantity of the first cluster center D;
    第二确定单元,配置为将所述K个第一聚类簇中所述第一数量最大的第一聚类簇确定为所述第一聚类簇A;a second determining unit, configured to determine the first cluster with the largest number of first clusters among the K first clusters as the first cluster A;
    第一合并单元,配置为将所述孤立图像数据与所述第一聚类簇A合并。a first merging unit, configured to merge the isolated image data with the first cluster A.
  16. 根据权利要求13所述的装置,所述第二合并子模块包括:The apparatus according to claim 13, the second merging submodule comprises:
    第一分割单元,配置为将所述第二聚类簇分割为N个第二子簇,并获取所述N 个第二子簇中每个第二子簇对应的第四聚类中心;所述N为大于或等于1的整数;a first dividing unit, configured to divide the second cluster into N second subclusters, and obtain a fourth cluster center corresponding to each second subcluster in the N second subclusters; Said N is an integer greater than or equal to 1;
    第五获取单元,配置为获取所述第四聚类中心与第一聚类中心E之间的第五相似度;所述第一聚类中心E为K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;A fifth obtaining unit, configured to obtain the fifth similarity between the fourth cluster center and the first cluster center E; the first cluster center E is each of the K first clusters. the first cluster center corresponding to each first sub-cluster of a cluster;
    第三确定单元,配置为对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第五相似度大于第二阈值的所述第一聚类中心E的第二数量;A third determination unit, configured to, for each of the K first clusters, determine all of the first clusters whose fifth similarity is greater than the second threshold in each of the first clusters Describe the second quantity of the first cluster center E;
    第四确定单元,配置为将所述K个第一聚类簇中所述第二数量最大的第一聚类簇确定为所述第一聚类簇B;a fourth determination unit, configured to determine the first cluster with the second largest number of the K first clusters as the first cluster B;
    第二合并单元,配置为将所述第二聚类簇与所述第一聚类簇B合并。The second merging unit is configured to merge the second cluster with the first cluster B.
  17. 根据权利要求13所述的装置,所述第三合并子模块包括:The apparatus according to claim 13, the third merging submodule comprises:
    第六获取单元,配置为获取所述单个图像数据与第一聚类中心F之间的第六相似度;所述第一聚类中心F为所述K个第一聚类簇中每个第一聚类簇的每个第一子簇对应的所述第一聚类中心;A sixth obtaining unit, configured to obtain a sixth degree of similarity between the single image data and the first cluster center F; the first cluster center F is each of the K first clusters. the first cluster center corresponding to each first sub-cluster of a cluster;
    第五确定单元,配置为对于所述K个第一聚类簇中的每个第一聚类簇,确定所述每个第一聚类簇中所述第六相似度大于第三阈值的所述第一聚类中心F的第三数量;A fifth determining unit, configured to, for each of the K first clusters, determine all of the first clusters whose sixth similarity is greater than a third threshold in each of the first clusters. Describe the third quantity of the first cluster center F;
    第六确定单元,配置为将所述K个第一聚类簇中所述第三数量最大的第一聚类簇确定为所述第一聚类簇C;a sixth determining unit, configured to determine the first cluster with the third largest number of the K first clusters as the first cluster C;
    第三合并单元,配置为将所述单个图像数据与所述第一聚类簇C合并。A third merging unit configured to merge the single image data with the first cluster C.
  18. 根据权利要求11至17任一项所述的装置,其中,所述M小于或等于第四阈值;所述装置还包括:The apparatus according to any one of claims 11 to 17, wherein the M is less than or equal to a fourth threshold; the apparatus further comprises:
    第二分割模块,配置为将合并后的第一聚类簇分割为R个第三子簇,并获取所述R个第三子簇中每个第三子簇的第五聚类中心;所述R为大于或等于1的整数;The second dividing module is configured to divide the merged first cluster into R third sub-clusters, and obtain the fifth cluster center of each third sub-cluster in the R third sub-clusters; The above R is an integer greater than or equal to 1;
    第一更新模块,配置为在所述R小于或等于所述第四阈值的情况下,保留所述R个第三子簇,并用所述R个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;a first update module, configured to retain the R third subclusters when the R is less than or equal to the fourth threshold, and use the fifth cluster corresponding to the R third subclusters The center updates the first cluster center;
    第二获取模块,配置为在所述R大于所述第四阈值的情况下,获取所述R个第三子簇中每个第三子簇中的图像数据的第四数量;a second acquisition module, configured to acquire a fourth quantity of image data in each of the R third subclusters in the case that the R is greater than the fourth threshold;
    第二更新模块,配置为根据所述第四数量从大到小对所述R个第三子簇进行排序得到第四聚类簇序列,选取所述第四聚类簇序列中前P个第三子簇,并用所述P个第三子簇对应的所述第五聚类中心对所述第一聚类中心进行更新;所述P小于或等于所述第四阈值。The second update module is configured to sort the R third sub-clusters from large to small according to the fourth number to obtain a fourth cluster sequence, and select the first P-th cluster sequence in the fourth cluster sequence. Three sub-clusters, and the first cluster centers are updated with the fifth cluster centers corresponding to the P third sub-clusters; the P is less than or equal to the fourth threshold.
  19. 根据权利要求11至17任一项所述的装置,其中,所述第一聚类簇通过对所述第一图像数据集中的图像数据进行聚类得到;所述第一分割模块包括:The apparatus according to any one of claims 11 to 17, wherein the first cluster is obtained by clustering image data in the first image data set; the first segmentation module comprises:
    获取子模块,配置为获取所述第一聚类簇中的图像数据之间的第七相似度,得到相似度矩阵;an obtaining submodule, configured to obtain the seventh similarity between the image data in the first cluster to obtain a similarity matrix;
    分割子模块,配置为基于所述相似度矩阵将所述第一聚类簇分割为所述M个第一子簇。A segmentation sub-module, configured to segment the first cluster into the M first sub-clusters based on the similarity matrix.
  20. 根据权利要求19所述的装置,所述分割子模块包括:The apparatus according to claim 19, the segmentation submodule comprises:
    第七获取单元,配置为获取以所述第一聚类簇中的图像数据为顶点构成的连通图;a seventh obtaining unit, configured to obtain a connected graph formed by taking the image data in the first cluster as vertices;
    查询单元,配置为从所述相似度矩阵中查询得到所述连通图中的顶点之间的所述第七相似度;a query unit, configured to query the similarity matrix to obtain the seventh similarity between the vertices in the connected graph;
    第二分割单元,配置为将所述第七相似度大于第五阈值的多个顶点分割为一个第一子簇,得到所述M个第一子簇。The second dividing unit is configured to divide a plurality of vertices with the seventh similarity greater than the fifth threshold into a first sub-cluster to obtain the M first sub-clusters.
  21. 一种电子设备,包括输入设备和输出设备,还包括:An electronic device includes an input device and an output device, and also includes:
    处理器,适于实现一条或多条指令;以及,a processor adapted to implement one or more instructions; and,
    计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行如权利要求1至10任一项所述的方法。A computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the method of any one of claims 1 to 10.
  22. 一种计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1至10任一项所述的方法。A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any one of claims 1 to 10.
  23. 一种计算机程序产品,所述计算机程序产品包括一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1至10任一项所述的方法。A computer program product comprising one or more instructions adapted to be loaded by a processor and to perform the method of any one of claims 1 to 10.
PCT/CN2020/134074 2020-10-30 2020-12-04 Image incremental clustering method and apparatus, electronic device, storage medium and program product WO2022088390A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022524182A JP2023502863A (en) 2020-10-30 2020-12-04 Image incremental clustering method and apparatus, electronic device, storage medium and program product
KR1020227013791A KR20220070482A (en) 2020-10-30 2020-12-04 Image incremental clustering method, apparatus, electronic device, storage medium and program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011185911.8 2020-10-30
CN202011185911.8A CN112257801B (en) 2020-10-30 2020-10-30 Incremental clustering method and device for images, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022088390A1 true WO2022088390A1 (en) 2022-05-05

Family

ID=74268958

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134074 WO2022088390A1 (en) 2020-10-30 2020-12-04 Image incremental clustering method and apparatus, electronic device, storage medium and program product

Country Status (5)

Country Link
JP (1) JP2023502863A (en)
KR (1) KR20220070482A (en)
CN (1) CN112257801B (en)
TW (1) TW202217597A (en)
WO (1) WO2022088390A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152543A (en) * 2023-10-30 2023-12-01 山东浪潮科学研究院有限公司 Image classification method, device, equipment and storage medium
CN117333926A (en) * 2023-11-30 2024-01-02 深圳须弥云图空间科技有限公司 Picture aggregation method and device, electronic equipment and readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327195A (en) * 2021-04-09 2021-08-31 中科创达软件股份有限公司 Image processing method and device, image processing model training method and device, and image pattern recognition method and device
CN113743533B (en) * 2021-09-17 2023-08-01 重庆紫光华山智安科技有限公司 Picture clustering method and device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110103700A1 (en) * 2007-12-03 2011-05-05 National University Corporation Hokkaido University Image classification device and image classification program
CN102129451A (en) * 2011-02-17 2011-07-20 上海交通大学 Method for clustering data in image retrieval system
CN110866555A (en) * 2019-11-11 2020-03-06 广州国音智能科技有限公司 Incremental data clustering method, device and equipment and readable storage medium
CN111242040A (en) * 2020-01-15 2020-06-05 佳都新太科技股份有限公司 Dynamic face clustering method, device, equipment and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012140315A1 (en) * 2011-04-15 2012-10-18 Nokia Corporation Method, apparatus and computer program product for providing incremental clustering of faces in digital images
CN103718190B (en) * 2011-07-29 2017-05-24 惠普发展公司,有限责任合伙企业 incremental image clustering
CN103886048B (en) * 2014-03-13 2017-04-26 浙江大学 Cluster-based increment digital book recommendation method
US11176206B2 (en) * 2015-12-01 2021-11-16 International Business Machines Corporation Incremental generation of models with dynamic clustering
CN107798354B (en) * 2017-11-16 2022-11-01 腾讯科技(深圳)有限公司 Image clustering method and device based on face image and storage equipment
CN109886311B (en) * 2019-01-25 2021-08-20 北京奇艺世纪科技有限公司 Incremental clustering method and device, electronic equipment and computer readable medium
CN111062407B (en) * 2019-10-15 2023-12-19 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN110781957B (en) * 2019-10-24 2023-05-30 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN111460153B (en) * 2020-03-27 2023-09-22 深圳价值在线信息科技股份有限公司 Hot topic extraction method, device, terminal equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110103700A1 (en) * 2007-12-03 2011-05-05 National University Corporation Hokkaido University Image classification device and image classification program
CN102129451A (en) * 2011-02-17 2011-07-20 上海交通大学 Method for clustering data in image retrieval system
CN110866555A (en) * 2019-11-11 2020-03-06 广州国音智能科技有限公司 Incremental data clustering method, device and equipment and readable storage medium
CN111242040A (en) * 2020-01-15 2020-06-05 佳都新太科技股份有限公司 Dynamic face clustering method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152543A (en) * 2023-10-30 2023-12-01 山东浪潮科学研究院有限公司 Image classification method, device, equipment and storage medium
CN117333926A (en) * 2023-11-30 2024-01-02 深圳须弥云图空间科技有限公司 Picture aggregation method and device, electronic equipment and readable storage medium
CN117333926B (en) * 2023-11-30 2024-03-15 深圳须弥云图空间科技有限公司 Picture aggregation method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
JP2023502863A (en) 2023-01-26
CN112257801B (en) 2022-04-29
CN112257801A (en) 2021-01-22
KR20220070482A (en) 2022-05-31
TW202217597A (en) 2022-05-01

Similar Documents

Publication Publication Date Title
WO2022088390A1 (en) Image incremental clustering method and apparatus, electronic device, storage medium and program product
Han et al. Semisupervised feature selection via spline regression for video semantic recognition
CN106528874B (en) The CLR multi-tag data classification method of big data platform is calculated based on Spark memory
WO2021109464A1 (en) Personalized teaching resource recommendation method for large-scale users
CN104392250A (en) Image classification method based on MapReduce
WO2019080908A1 (en) Image processing method and apparatus for implementing image recognition, and electronic device
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN112529068B (en) Multi-view image classification method, system, computer equipment and storage medium
Yao et al. Spatio-temporal information for human action recognition
Yadav et al. Vid-win: Fast video event matching with query-aware windowing at the edge for the internet of multimedia things
WO2023155508A1 (en) Graph convolutional neural network and knowledge base-based paper correlation analysis method
Bartolini et al. A general framework for real-time analysis of massive multimedia streams
Ye et al. Efficient point cloud segmentation with geometry-aware sparse networks
Liu et al. Application of gcForest to visual tracking using UAV image sequences
CN109934852B (en) Video description method based on object attribute relation graph
KR102039244B1 (en) Data clustering method using firefly algorithm and the system thereof
Ła̧giewka et al. Distributed image retrieval with colour and keypoint features
Lu et al. Video person re-identification using key frame screening with index and feature reorganization based on inter-frame relation
Cao et al. A parallel Adaboost-backpropagation neural network for massive image dataset classification
Han et al. Real-time adversarial GAN-based abnormal crowd behavior detection
Wang et al. MIC-KMeans: a maximum information coefficient based high-dimensional clustering algorithm
Tan et al. A novel image matting method using sparse manual clicks
Yan et al. Alpha matting with image pixel correlation
Dhoot et al. Efficient Dimensionality Reduction for Big Data Using Clustering Technique
Hanif et al. Re-ranking person re-identification using distance aggregation of k-nearest neighbors hierarchical tree

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022524182

Country of ref document: JP

Kind code of ref document: A

Ref document number: 20227013791

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959549

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20959549

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20959549

Country of ref document: EP

Kind code of ref document: A1