WO2021082426A1

WO2021082426A1 - Human face clustering method and apparatus, computer device, and storage medium

Info

Publication number: WO2021082426A1
Application number: PCT/CN2020/093348
Authority: WO
Inventors: 蔡中印; 陆进; 陈斌; 宋晨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-29
Filing date: 2020-05-29
Publication date: 2021-05-06
Also published as: CN110889433A

Abstract

A human face clustering method and apparatus, a computer device, and a storage medium, relating to the field of artificial intelligence clustering. The human face clustering method comprises: acquiring clustered facial images, and performing feature extraction on the clustered facial images by means of a feature extraction model to acquire facial feature vectors (S201); performing central divisive clustering on the facial feature vectors in order to uniformly divide the face feature vectors and acquire original clustering class clusters (S202); performing inter-class connection clustering on the original clustering class clusters to acquire a connected clustering class cluster (S203); and performing intra-class trimming on the connected clustering class cluster to acquire a target clustering class cluster (S204), so as to improve the accuracy of human face clustering.

Description

Face clustering method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29 , 2019 , the application number is 201911037526.6 , and the invention title is " face clustering method, device, computer equipment and storage medium ". The entire content of the application is approved The reference is incorporated in this application.

Technical field

This application relates to the field of artificial intelligence clustering, in particular to a face clustering method, device, computer equipment and storage medium.

Background technique

Traditional face clustering algorithms include, but are not limited to, kmeans-clustering algorithm, rank order clustering method based on shared neighbors, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise, that is, density-based clustering algorithm), the inventor Realizing that these traditional face clustering algorithms cluster large-scale face images, the clustering results are inaccurate. For example, the rank order clustering method based on shared neighbors is easy to divide the face feature vector of the same person into multiple clusters, and the clustering results are not accurate; the clustering time complexity of the DBSCAN clustering algorithm is too high to support thousands of Ten thousand-level face feature vector clustering; K-means clustering algorithm is difficult to accurately cluster face feature vectors, and it is difficult to evenly divide large-scale face feature vectors, which affects the accuracy of clustering results.

Summary of the invention

The embodiments of the present application provide a face clustering method, device, computer equipment, and storage medium to solve the problem of inaccurate face clustering.

A face clustering method, including:

Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;

Performing center split clustering on the face feature vector to obtain original clustering clusters;

Perform inter-class connected clustering on the original clusters to obtain connected clusters;

Perform intra-class pruning on the connected clusters to obtain target clusters.

A face clustering device includes:

The face feature vector acquiring module is used to acquire clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to acquire face feature vectors;

The original clustering cluster module is used to perform center split clustering on the face feature vector to obtain the original clustering cluster;

The connected cluster cluster module is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters;

The target cluster cluster module is used to perform intra-class pruning on the connected cluster cluster to obtain the target cluster cluster.

A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:

One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

The details of one or more embodiments of the present application are set forth in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings, and claims.

The aforementioned face clustering method, device, computer equipment, and storage medium acquire clustered face images, use a feature extraction model to perform feature extraction on the clustered face images, and perform center split clustering on the acquired face feature vectors. Classes, uniformly divide the face feature vector, and quickly obtain the original clustering clusters, so as to improve the clustering speed. Inter-class connected clustering is performed on the original cluster clusters, and connected cluster clusters are obtained, so as to divide the facial feature vectors belonging to the same user into the same clusters to ensure the accuracy of clustering. Perform intra-class pruning on the connected clusters to obtain target clusters to ensure that there are no interfering face feature vectors in the obtained target clusters, so as to ensure the accuracy of face clustering.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is a schematic diagram of an application environment of a face clustering method in an embodiment of the present application;

Fig. 2 is a flowchart of a face clustering method in an embodiment of the present application;

Fig. 3 is a flowchart of a face clustering method in an embodiment of the present application;

Fig. 4 is a flowchart of a face clustering method in an embodiment of the present application;

Fig. 5 is a flowchart of a face clustering method in an embodiment of the present application;

Fig. 6 is a flowchart of a face clustering method in an embodiment of the present application;

FIG. 7 is a flowchart of a face clustering method in an embodiment of the present application;

Fig. 8 is a functional block diagram of a face clustering device in an embodiment of the present application;

Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The face clustering method provided by the embodiments of the present application can be used to divide large-scale face images, so that face images belonging to the same user can be divided into a target cluster; or it can be applied to people In face recognition, for example, the face image to be recognized in the camera can be intercepted, and the face clustering method of this embodiment is used to cluster the face image to be recognized and the known face image in the image database to determine the face to be recognized Whether the image exists in the image database, to realize the recognition of the face image, such as the face image recognition in the scene of missing person tracking.

The face clustering method can be applied to the application environment shown in Figure 1. Specifically, the face clustering method is applied in a face clustering system. The face clustering system includes a client and a server as shown in FIG. Large-scale face images are clustered by center splitting to evenly divide face feature vectors, and then connected clustering is performed on the evenly divided face feature vectors to obtain target clustering clusters to ensure the accuracy of face clustering. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablets, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a face clustering method is provided. The method is applied to the server in FIG. 1 as an example for description, and includes the following steps:

S201: Obtain a clustered face image, and use a feature extraction model to perform feature extraction on the clustered face image to obtain a face feature vector.

Among them, clustering face images refers to face images that need to be clustered. The clustered face images in this embodiment include clustered face images of at least two users, and each user includes at least two clustered face images.

As an example, during the model training process of the face clustering method, multiple clustered face images of the same user may carry the same user identification, so as to subsequently verify whether the clustering effect is accurate. The user ID is used to identify a unique user ID. For example, the user ID may be the user's name or ID card.

As an example, in the face recognition process of the face clustering method, clustered face images of unknown identities that do not carry user identifications can be clustered with clustered face images of known identities that carry user identifications. According to the clustering result, the identity of the clustered face image of the unknown identity that does not carry the user identity is determined.

The feature extraction model is a pre-trained model for feature extraction of clustered face images, and the feature extraction model may be a face feature extraction model based on convolutional neural network training.

The face feature vector refers to the vector obtained after feature extraction of the clustered face images using the feature extraction model. For example, the feature extraction model can be used to extract the 512-dimensional feature vector of each clustered face image, or the feature extraction model can be used to extract the 128-dimensional feature vector of each clustered face image.

As an example, input the clustered face image into a pre-trained feature extraction model for feature extraction, and then the 512-dimensional feature vector of each clustered face image can be quickly obtained. In this embodiment, the face feature vector is Refers to the vector corresponding to the facial features in the facial image.

S202: Perform central split clustering on the face feature vector to obtain original clusters.

Among them, the original cluster cluster refers to the cluster obtained after the first clustering of the face feature vector, and the first cluster here is the center split clustering. Central split clustering is a process of clustering face feature vectors based on the similarity of face feature vectors. As an example, the central split clustering can uniformly divide the face feature vectors corresponding to multiple clustered face images, so as to quickly obtain the original clusters.

Specifically, the server extracts face feature vectors extracted from all clustered face images, generates an initial face matrix based on all face feature vectors, and calculates an initial feature average vector of the initial face matrix. For example, the mean function of the numpy library of python can be used to calculate the mean value of the initial face matrix, so as to quickly obtain the initial feature average vector of the initial face matrix, obtain the cluster center based on the initial feature average vector, and compare the face according to the cluster center. The feature vector performs cluster splitting to obtain the original cluster clusters. Among them, the initial face matrix refers to a matrix formed by gathering all face feature vectors. The initial feature average vector refers to the average vector corresponding to the initial face matrix. As an example, the computer uses the mean function of the python numpy library to calculate the average value of the initial face matrix, compresses the rows of the initial face matrix, and averages each column , Get a row vector, the row vector is the initial feature average vector, for example, if the initial face matrix is

Then the initial feature average vector is

among them,

Among them, the value of i is 1, 2, 3, 4, and 5. The clustering center refers to the face feature vector selected from the initial face matrix based on the initial feature average vector for clustering.

Compared with the K-means clustering algorithm or other traditional clustering algorithms, the method of randomly selecting K cluster centers for clustering makes the face feature vector to be randomly divided. The time complexity of the K-means clustering algorithm is n ² , N is the number of face feature vectors, corresponding to large-scale clustering of face feature vectors, the time complexity is very large, and there are problems of low clustering accuracy and low efficiency. This embodiment is based on the initial feature average vector to obtain the cluster center to uniformly divide the face feature vector, and its time complexity is l*log(n), where n is the number of face feature vectors, and l is the number of center classification clusters. For clustering large-scale face feature vectors, center split clustering has lower time complexity than traditional clustering algorithms such as K-means clustering algorithm, and can effectively improve the speed of face clustering. Among them, time complexity refers to the running time for face clustering.

S203: Perform inter-class connected clustering on the original clusters to obtain connected clusters.

Among them, inter-class connected clustering refers to a method of gathering any two original clusters with high similarity into one cluster, so as to divide the facial feature vectors belonging to the same user into the same cluster.

Specifically, center-split clustering is a process of uniformly dividing face feature vectors based on the initial feature average vector in order to improve the clustering speed, but the original cluster cluster formed by the center-split clustering process may have an original cluster. The clusters contain the face feature vectors of different users. Therefore, it is necessary to perform inter-class connected clustering on the original cluster clusters to divide the face feature vectors belonging to the same user into the same cluster, which helps to improve The accuracy of the clustering results can solve the problem that the facial feature vectors corresponding to the same user can be easily divided into different clusters in traditional clustering algorithms.

S204: Perform intra-class pruning on the connected clusters to obtain target clusters.

Among them, intra-class pruning is a process of pruning each connected cluster cluster to eliminate the error face feature vector in the connected cluster cluster. The error face feature vector is that the face feature vector in a connected cluster cluster is not the face feature vector of the same user.

As an example, the server may calculate the intra-class similarity in the connected clusters, sort the intra-class similarity, determine the face feature vector with low intra-class similarity as the error face feature vector, and eliminate the error The face feature vector achieves the purpose of intra-class pruning of connected clusters and ensures the accuracy of face clustering.

In the face clustering method provided in this embodiment, the clustered face images are acquired, the feature extraction model is used to perform feature extraction on the clustered face images, and the acquired face feature vectors are subjected to center split clustering, and The face feature vector is uniformly divided to quickly obtain the original cluster clusters in order to improve the clustering speed. Inter-class connected clustering is performed on the original clusters, and connected clusters are obtained to divide the facial feature vectors belonging to the same user into the same clusters to ensure accurate clustering. Perform intra-class pruning on connected clusters to obtain target clusters to ensure that there are no interfering face feature vectors in the obtained target clusters to ensure the accuracy of face clustering.

In one embodiment, after step S202, that is, after performing central split clustering on the face feature vector to obtain the original clusters, the face clustering method further includes: if the face features in the original clusters If the number of vectors is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster clusters to at least two GPUs for processing.

Wherein, the first number threshold is a preset number threshold for judging whether it is necessary to divide the face feature vectors in the original cluster cluster. The program interface is an interface for distributing the face feature vectors of the original cluster clusters to different GPUs for processing. The program interface includes but is not limited to the MPI interface. The GPU (Graphic Processing Unit, that is, the face image processor) is the core of the entire graphics card. The GPU is specifically one of the most basic components of the computer. Its purpose is to convert the display information required by the computer system to drive the display, and provide the display Line or interlace scanning signal, which controls the correct display of the monitor, is an important component for connecting the monitor and the main board of the personal computer. The MPI interface is an application program interface for information transfer, including protocol and semantic description.

As an example, when the number of human face feature vectors in any original cluster cluster is greater than the first number threshold, the server allocates the face feature vectors of the original cluster cluster to at least two GPUs for calculation and processing, specifically Assign to at least two GPUs for similarity calculations to speed up the calculation of similarity, thereby speeding up face clustering, and the GPU can send the calculated similarity results to the server so that the server can calculate the similarity according to the calculated similarity As a result, the face feature vector is divided to realize connected clustering between classes. Understandably, the GPU can be used to calculate the subsequent initial similarity and the first similarity to speed up the clustering speed.

Specifically, if the number of face feature vectors in each original cluster cluster is greater than the first number threshold, then the number of face feature vectors in each original cluster cluster is divided by the maximum number that the GPU can handle. Round up processing to obtain the number of GPUs required for each original cluster cluster, so that the minimum GPU can ensure that all face feature vectors can be processed in the subsequent process, so as to achieve the use of at least two GPUs to face feature vectors Perform follow-up processing to improve processing efficiency and save GPU resources. The program interface is used to distribute the feature vectors of all faces in each original cluster to at least two GPUs for calculation and processing, so as to quickly calculate the similarity between face feature vectors degree.

In one embodiment, as shown in FIG. 3, step S202, that is, performing central split clustering on the face feature vector to obtain the original cluster cluster includes:

S301: Generate an initial face matrix based on the face feature vector, and calculate an initial feature average vector corresponding to the initial face matrix.

Among them, the initial face matrix refers to a matrix loaded with all face feature vectors. The initial feature average vector refers to the average vector of the initial face matrix.

Specifically, load all face feature vectors to generate the initial face matrix, and calculate the initial feature average vector corresponding to the initial face matrix. For example, you can use the mean function of matlab to calculate the initial feature average vector corresponding to the initial face matrix, or you can use Python's numpy library quickly calculates the initial feature average vector corresponding to all initial face matrices, in order to subsequently obtain the cluster center based on the initial feature average vector for center split clustering.

S302: Cluster the initial face matrix based on the initial feature average vector to obtain the first cluster.

Specifically, the initial face matrix is clustered according to the initial feature average vector, that is, the cluster center is obtained according to the initial feature average vector, so as to cluster according to the cluster center to obtain the first cluster cluster, which realizes the rapid The face feature vectors are evenly divided. The cluster center refers to the face feature vector selected from the initial face matrix for clustering.

S303: If the number of face feature vectors in the first cluster cluster is less than the second number threshold, determine the first cluster cluster as the original cluster cluster.

Wherein, the second number threshold is a preset number threshold used to determine whether it is necessary to divide the face feature vectors in the first cluster cluster, and the second number threshold is obtained after testing to ensure that the original cluster class is obtained The number of face feature vectors in the cluster is appropriate to avoid dividing the face feature vector of a user into too many original clusters.

Specifically, the cluster centers are obtained based on the initial feature average vector, and the initial face matrix is clustered based on the cluster centers to obtain the first cluster cluster. If the number of face feature vectors in the first cluster cluster is less than the first cluster cluster, Second, the number threshold is used to determine the first cluster cluster as the original cluster cluster. Understandably, if the number of face feature vectors in the first cluster is not less than the second number threshold, it is necessary to continue to divide according to S301-S303 to ensure that the number of face feature vectors is less than the second number threshold The cluster of clusters is determined as the original clusters to ensure that the number of face feature vectors in the obtained original clusters is appropriate, and to avoid dividing the face feature vector of a user into too many original clusters.

Further, the cluster centers are obtained based on the initial feature average vector, and after the initial face matrix is clustered based on the cluster centers, the degree of dispersion of the face feature vectors in the first cluster cluster obtained by the clustering can also be judged, if The degree of dispersion of the face feature vector in the first cluster is less than the preset dispersion threshold, and the first cluster is determined as the original cluster. Understandably, if the degree of dispersion of the face feature vectors in the first cluster is not less than the preset discrete threshold, it is necessary to continue to divide according to S301-S303 to ensure that the number of face feature vectors is less than the preset discrete threshold The clusters of, are determined as the original clusters, to ensure that the number of face feature vectors in the original clusters is appropriate, and to avoid dividing the face feature vectors of a user into too many original clusters. Among them, the degree of dispersion refers to the degree of difference between the face feature vectors in the same cluster, and the degree of dispersion is used to measure the level of risk. In this embodiment, the face feature vector in the first cluster The degree of dispersion can be expressed by the range, average difference and standard deviation of the face feature vector in the first cluster, and then compared with the preset dispersion threshold. If the degree of dispersion of the face feature vector in the first cluster is If it is less than the preset discrete threshold, the first cluster cluster is determined as the original cluster cluster. The preset discrete threshold is a preset value used to determine whether it is necessary to divide the face feature vector in the first cluster.

In the face clustering method provided in this embodiment, an initial face matrix is generated based on the face feature vector, and the initial feature average vector corresponding to the initial face matrix is calculated, in order to subsequently perform central split clustering based on the initial feature average vector . Based on the initial feature average vector, the clustering center is obtained and the initial face matrix is clustered, so as to realize the rapid uniform division of the face feature vector and obtain the first cluster cluster. If the number of face feature vectors of the first cluster is less than the second number threshold, the first cluster is determined as the original cluster to ensure that the number of face feature vectors in the obtained original cluster is appropriate , To avoid dividing the face feature vector of a user into too many original clusters.

In one embodiment, as shown in FIG. 4, step S302, clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster includes:

S401: Calculate the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sort the initial similarities, and obtain the sorting result.

Wherein, the initial similarity is a value indicating the degree of similarity between each face feature vector in the initial face matrix and the initial feature average vector. Understandably, one face feature vector in the initial face matrix corresponds to an initial similarity. The sorting result refers to the result of sorting the initial similarity in descending order or descending order.

Specifically, after obtaining the initial feature average vector, the server calculates the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector. Specifically, the dot function of the numpy library or the cosine similarity calculation formula can be used to calculate the initial similarity between each face feature vector and the feature average vector, and then the initial similarity is in the order of small to large or large to small The order is sorted, and the sorting result is obtained, so that the cluster center for clustering can be quickly obtained according to the sorting result.

S402: Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix.

Specifically, from the sorting result, z (z is a positive integer) initial similarities are quickly obtained from the sorting result at a preset interval. As an example, in the ranking result formed by 100 initial similarities, z initial similarities are extracted, and the preset interval can be 5. Then z initial similarities are extracted from the ranking result interval, for example, the fifth of the ranking result is extracted Initial similarity, the 10th initial similarity of the sorting result, the 15th initial similarity of the sorting result...the 5zth initial similarity of the sorting result, the corresponding facial feature vector is determined according to the extracted initial similarity Perform clustering as a clustering center to speed up the clustering speed. The traditional clustering algorithm randomly selects cluster centers for clustering, and the determination of z cluster centers according to the similarity in this embodiment can ensure that the face feature vectors are evenly divided, which can speed up the clustering speed, improve the clustering accuracy, and reduce The probability that the same user is classified into different clusters.

S403: Cluster all face feature vectors in the initial face matrix according to z cluster centers to generate a first cluster cluster.

Specifically, all face feature vectors are clustered according to z cluster centers to generate z first cluster clusters. Understandably, since the cluster centers are based on each face feature vector in the initial face matrix The similarity extraction with the initial feature average vector can ensure that the face feature vector is evenly divided to generate the first cluster cluster.

In the face clustering method provided in this embodiment, the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector is calculated, the initial similarity is sorted, and the sorting result is obtained, so as to follow the initial similarity Degree to obtain the cluster center. Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to z initial similarities from the initial face matrix, which can ensure that the face feature vectors are evenly divided and reduce the classification of the same user to different The probability of clustering partitions is to cluster all face feature vectors in the initial face matrix according to z clustering centers to ensure that the face feature vectors can be evenly divided to generate the first cluster cluster.

In an embodiment, as shown in FIG. 5, step S203, that is, performing inter-class connected clustering on the original cluster clusters to obtain connected clusters includes:

S501: Calculate the first feature average vector corresponding to each original cluster cluster, and use the similarity algorithm to calculate the first feature average vector and any face feature vector in the original cluster cluster to determine the first feature average vector The first degree of similarity with any face feature vector in the original cluster.

Wherein, the first feature average vector is the average vector of the face feature vectors in the original cluster cluster, and the calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, it will not be repeated here. The similarity algorithm is an algorithm for calculating the similarity of variables or points in space. The similarity algorithm in this embodiment includes but is not limited to the cosine similarity algorithm and the Euclidean distance algorithm. The first similarity is a value indicating the similarity of any two face feature vectors in the original cluster.

Specifically, the original cluster clusters are divided by the central split clustering. If the face feature vector is not accurately divided, there may be face feature vectors that do not belong to the same user in an original cluster cluster. Therefore, , Calculate the first feature average vector corresponding to each original cluster cluster, and use the similarity algorithm to calculate the similarity between the face feature vector and the first feature average vector in the original cluster cluster, so as to use the first feature average The vector determines whether there are facial feature vectors that do not belong to the same user in the original clusters.

S502: Connect the face feature vectors with the first similarity greater than the first connected cluster threshold in the original cluster clusters into a second cluster cluster.

Wherein, the first connected cluster threshold is a preset value used to determine whether the face feature vectors in any original cluster cluster belong to the same user.

Specifically, the first similarity of any two face feature vectors in the original cluster cluster is greater than the first connected clustering threshold. The two face feature vectors have a high similarity, and they are most likely to be the face of the same user. Feature vector. Therefore, the face feature vector whose first similarity is greater than the first connected clustering threshold is connected to the first cluster cluster to exclude the face feature vector that does not belong to the same user. Clustering of the face feature vectors together provides technical support. For example, the original cluster cluster includes face feature vectors a, b, c, d, e, and f, but there may be face feature vectors that do not belong to the same user in the original cluster cluster. Face feature vector a The first similarity with the first feature average vector is 0.89, the first similarity between the face feature vector b and the first feature average vector is 0.88, and the first similarity between the face feature vector c and the first feature average vector is 0.95, the first similarity between the face feature vector d and the first feature average vector is 0.75, the first similarity between the face feature vector e and the first feature average vector is 0.53, the face feature vector f and the first feature average The first similarity of the vector is 0.85, and the first connected clustering threshold is 0.7, then e is deleted, a, b, c, d, and f are connected to clusters, and the second cluster is obtained.

S503: Perform inter-class clustering on all second clusters to obtain connected clusters.

Specifically, after clustering the face feature vectors belonging to the same user in the original cluster clusters to a second cluster cluster, it is also necessary to determine whether any two second cluster clusters belong to the same user Therefore, inter-class clustering is performed on all the second cluster clusters to gather the facial feature vectors belonging to a user together, which can effectively prevent the facial features belonging to a user from being divided into different In the clusters, improve the accuracy of face clustering.

In the face clustering method provided in this embodiment, the first feature average vector corresponding to each original cluster cluster is calculated, and the similarity algorithm is used to compare any one of the first feature average vector and the corresponding original cluster cluster. The face feature vector is calculated, and the first similarity between the first feature average vector and any face feature vector in the corresponding original cluster cluster is determined, so that the first similarity in the original cluster cluster is greater than the first connectedness The face feature vectors of the clustering threshold are connected to the second cluster cluster, so that the second cluster cluster can exclude the face feature vectors that do not belong to the same user, so that the face feature vectors belonging to the same user can be clustered. Together to provide technical support. Perform inter-class clustering on all the second cluster clusters to obtain connected cluster clusters, so as to cluster the face feature vectors belonging to the same user together to improve the accuracy of face clustering.

In one embodiment, as shown in FIG. 6, step S503, performing inter-class clustering on all second cluster clusters to obtain connected cluster clusters, includes:

S601: Calculate the second feature average vector of each second cluster cluster, and determine the second similarity corresponding to any two second cluster clusters based on the second feature average vector.

Among them, the second feature average vector is the average vector of the face feature vectors in the second cluster cluster. The calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, we will not repeat them here. . The second degree of similarity refers to the value of the degree of similarity between any two second cluster clusters.

Specifically, the mean function of matlab can be used to calculate the mean value of all face feature vectors of the second cluster cluster, or the function of the numpy library of python can be used to average all face feature vectors of the second cluster cluster. Calculate to obtain the second feature average vector of the second cluster; then use the similarity algorithm to calculate the similarity of the second feature average vector of any two second clusters to quickly obtain any two second feature average vectors. The second similarity of the two clusters, so that the facial feature vectors of the same user are gathered in the same cluster according to the second similarity.

S602: If the second degree of similarity is greater than the second connected cluster threshold, merge two second cluster clusters into connected cluster clusters.

The second connected cluster threshold refers to a value used to determine whether any two second cluster clusters belong to the same user.

Specifically, by calculating the second feature average vector of each second cluster, and then calculating the second similarity of any two second clusters based on the average of the second cluster, the second similarity is greater than the second connected When the clustering threshold is used, it means that the two second cluster clusters have greater similarity and belong to the same user. Therefore, clusters with the second similarity greater than the second connected cluster threshold are gathered into the same connected cluster. In the cluster, to improve the accuracy of clustering and avoid the face feature vector of a user from being divided into different clusters. Understandably, when the second similarity is less than the second connected cluster threshold, it means that the similarity of the two second cluster clusters is small, and if they do not belong to the same user, clustering is not performed.

In the face clustering method provided in this embodiment, the second feature average vector of each second cluster cluster is calculated, and the second similarity corresponding to any two second cluster clusters is determined based on the second feature average vector Degree, the two second cluster clusters with the second similarity greater than the second connected cluster threshold are merged into connected cluster clusters, and the face feature vectors belonging to the same user are clustered into connected cluster clusters, To improve the accuracy of face clustering.

In an embodiment, as shown in FIG. 7, step S204, performing intra-class pruning on the connected cluster clusters to obtain the target cluster cluster includes:

S701: Perform average calculation on connected clusters, and obtain a third feature average vector corresponding to each connected cluster.

Among them, the third feature average vector refers to the average vector of the face feature vectors in the connected clusters. The calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, it is not one-to-one here. Go into details.

Specifically, since clustering is performed by using the second feature average vector of any two second cluster clusters to generate connected cluster clusters, but the second feature average vectors of the two second cluster clusters are not equal, then There may be any two face feature vectors with large differences in similarity in the connected cluster clusters generated by clustering. Therefore, in order to ensure the accuracy of clustering, the mean function of the python numpy library needs to be used to compare the connected clusters. The average value of the face feature vectors of the clusters is calculated to quickly obtain the third feature average vector corresponding to each connected cluster cluster, so as to subsequently exclude face feature vectors with large differences in similarity based on the third feature average vector.

S702: Calculate the third similarity between any face feature vector in the connected cluster cluster and the third feature average vector.

Among them, the third similarity refers to the value of the degree of similarity between the face feature vector and the third feature average vector in the connected cluster clusters.

Specifically, the cosine similarity calculation formula is used to calculate the center similarity between any face feature vector in the connected cluster cluster and the third feature average vector, and then people who do not belong to the same user are excluded according to the third feature average vector Face feature vectors to improve the accuracy of clustering.

S703: Cut the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters.

Wherein, the third connected cluster threshold is preset to determine whether there is a face feature vector of a user who does not belong to the connected cluster cluster in the connected cluster cluster.

Specifically, when the third similarity between the face feature vector in the connected cluster cluster and the third feature average vector is less than the third connected cluster threshold, the face feature vector is deleted to exclude the connected cluster class The cluster does not belong to the face feature vector of the user represented by the connected cluster cluster, and obtain the target cluster cluster, that is, the target cluster cluster is the connected cluster cluster to exclude users who do not belong to the cluster represented by the cluster The face feature vector of, to ensure that the class is clean.

In the face clustering method provided in this embodiment, the average value of the connected clusters is calculated, the third feature average vector corresponding to each connected cluster is obtained, and any person in the connected cluster is calculated. According to the third similarity between the face feature vector and the third feature average vector, the face feature vectors that do not belong to the connected cluster cluster are subsequently excluded according to the third feature average vector, so as to improve the clustering accuracy. Cut the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters to exclude the connected clusters that do not belong to the people represented by the clusters Face feature vector to ensure that the class is clean.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

In one embodiment, a face clustering device is provided, and the face clustering device corresponds to the face clustering method in the above-mentioned embodiment one-to-one. As shown in FIG. 8, the face clustering device includes a face feature vector acquisition module 801, an original cluster cluster module 802, a connected cluster cluster module 803, and a target cluster cluster module 804. The detailed description of each functional module is as follows:

The face feature vector obtaining module 801 is used to obtain clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to obtain face feature vectors.

The original cluster cluster module 802 is used to perform center split clustering on the face feature vector to obtain the original cluster cluster.

The connected cluster cluster module 803 is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters.

The target cluster cluster module 804 is used to perform intra-class pruning on the connected cluster clusters to obtain the target cluster cluster.

Preferably, after the original clustering cluster module 802, the face clustering apparatus further includes: a GPU processing module.

The GPU processing module is configured to, if the number of face feature vectors in the original cluster cluster is greater than the first number threshold, use a program interface to allocate the face feature vectors in the original cluster cluster to at least two GPUs for processing.

Preferably, the original cluster cluster module 802 includes: an initial feature average vector calculation unit, a first cluster cluster acquisition unit, and a first judgment unit.

The initial feature average vector calculation unit is used to generate an initial face matrix based on the face feature vector, and calculate the initial feature average vector corresponding to the initial face matrix.

The first cluster cluster acquiring unit is configured to cluster the initial face matrix based on the initial feature average vector to acquire the first cluster cluster.

The first judging unit is configured to determine the first cluster cluster as the original cluster cluster if the number of face feature vectors in the first cluster cluster is less than the second number threshold.

Preferably, the first cluster cluster acquiring unit includes: a sorting result acquiring subunit, a cluster center acquiring subunit, and a first cluster cluster acquiring subunit.

The sorting result obtaining subunit is used to calculate the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sort the initial similarity, and obtain the sorting result.

The cluster center acquisition subunit is used to extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix.

The first cluster cluster acquisition subunit is used to cluster all face feature vectors in the initial face matrix according to z cluster centers to generate the first cluster cluster.

Preferably, the connected clustering cluster module 803 includes: a first similarity calculation unit, a connected unit, and an inter-cluster clustering unit.

The first similarity calculation unit calculates the first feature average vector corresponding to each original cluster cluster, and uses the similarity algorithm to calculate the first feature average vector and any face feature vector in the corresponding original cluster cluster , Determine the first similarity between the first feature average vector and any face feature vector in the corresponding original cluster cluster.

The connected unit is used to connect the face feature vectors with the first similarity greater than the first connected cluster threshold in the original cluster clusters into the second cluster cluster.

The inter-class clustering unit is used to perform inter-class clustering on all second clusters to obtain connected clusters.

Preferably, the inter-class clustering unit includes: a second feature average vector and a second judgment unit.

The second feature average vector calculation subunit is used to calculate the second feature average vector of each second cluster cluster, and determine the second similarity corresponding to any two second cluster clusters based on the second feature average vector.

The second judgment unit is configured to merge the two second cluster clusters into connected cluster clusters if the second similarity is greater than the second connected cluster threshold.

Preferably, the target cluster cluster module 804 includes: a third feature average vector calculation unit, a third similarity calculation unit, and a third judgment unit.

The third feature average vector calculation unit is used to calculate the average value of the connected cluster clusters, and obtain the third feature average vector corresponding to each connected cluster cluster.

The third similarity calculation unit is used to calculate the third similarity between any face feature vector in the connected cluster cluster and the third feature average vector.

The third judging unit is used to crop the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters.

For the specific limitation of the face clustering device, please refer to the above limitation on the face clustering method, which will not be repeated here. Each module in the aforementioned face clustering device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium. The database of the computer device is used to store the generated target clusters. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a face clustering method. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. When the processor executes the computer-readable instructions, the human The steps of the face clustering method, such as steps S201-S204 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 7, are not repeated here in order to avoid repetition. Alternatively, the processor implements the functions of the modules/units in this embodiment of the face clustering device when the processor executes computer-readable instructions, such as the face feature acquisition module 801, the original clustering cluster module 802, and the face clustering module 802 shown in FIG. The functions of the connected clustering cluster module 803 and the target clustering cluster module 804 are not repeated here in order to avoid repetition.

In an embodiment, one or more readable storage media storing computer readable instructions are provided. The readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor to implement the foregoing implementations. The steps of the face clustering method in the example, such as steps S201-S204 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 7, are not repeated here to avoid repetition. Alternatively, the processor implements the functions of the modules/units in this embodiment of the face clustering device when the processor executes computer-readable instructions, such as the face feature acquisition module 801, the original clustering cluster module 802, and the face clustering module 802 shown in FIG. The functions of the connected clustering cluster module 803 and the target clustering cluster module 804 are not repeated here in order to avoid repetition. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A face clustering method, which includes:

Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;

Performing center split clustering on the face feature vector to obtain original clustering clusters;

Perform inter-class connected clustering on the original clusters to obtain connected clusters;

Perform intra-class pruning on the connected clusters to obtain target clusters.
5. The face clustering method according to claim 1, wherein, after the central split clustering is performed on the face feature vector to obtain the original cluster clusters, the face clustering method further comprises:

If the number of the face feature vectors in the original cluster is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster to at least two GPUs. deal with.
The face clustering method according to claim 1, wherein said performing center split clustering on said face feature vector to obtain original cluster clusters comprises:

Generating an initial face matrix based on the face feature vector, and calculating an initial feature average vector corresponding to the initial face matrix;

Clustering the initial face matrix based on the initial feature average vector to obtain a first cluster cluster;

If the number of face feature vectors in the first cluster is less than the second number threshold, the first cluster is determined as the original cluster.
The face clustering method according to claim 3, wherein the clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster comprises:

Calculating the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sorting the initial similarity, and obtaining a sorting result;

Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix;

Clustering all face feature vectors in the initial face matrix according to the z cluster centers to generate a first cluster cluster.
The face clustering method according to claim 1, wherein the performing inter-class connected clustering on the original clusters to obtain connected clusters includes:

Calculate the first feature average vector corresponding to each of the original cluster clusters, and use a similarity algorithm to calculate the first feature average vector and any one of the face feature vectors in the original cluster clusters, Determining the first similarity between the first feature average vector and any one of the face feature vectors in the original cluster clusters;

Connecting the face feature vectors with the first similarity greater than the first connected clustering threshold in the original clustering clusters into a second clustering cluster;

Perform inter-class clustering on all the second clusters to obtain connected clusters.
8. The face clustering method according to claim 5, wherein said performing inter-class clustering on all said second cluster clusters to obtain connected cluster clusters comprises:

Calculating a second feature average vector of each of the second cluster clusters, and determining the second similarity corresponding to any two second cluster clusters based on the second feature average vector;

If the second degree of similarity is greater than the second connected clustering threshold, the two second clusters are merged into connected clusters.
5. The face clustering method according to claim 1, wherein said performing intra-class pruning on said connected clusters to obtain target clusters comprises:

Performing average calculation on the connected clusters to obtain a third feature average vector corresponding to each of the connected clusters;

Calculating a third degree of similarity between any one of the face feature vectors and the third feature average vector in the connected clusters;

The face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold are cropped to obtain the target clusters.
A face clustering device, which includes:

The face feature vector acquiring module is used to acquire clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to acquire face feature vectors;

The original clustering cluster module is used to perform center split clustering on the face feature vector to obtain the original clustering cluster;

The connected cluster cluster module is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters;

The target cluster cluster module is used to perform intra-class pruning on the connected cluster cluster to obtain the target cluster cluster.
A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:

Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;

Performing center split clustering on the face feature vector to obtain original clustering clusters;

Perform inter-class connected clustering on the original clusters to obtain connected clusters;

Perform intra-class pruning on the connected clusters to obtain target clusters.
The computer device according to claim 9, wherein, after the central split clustering is performed on the face feature vector to obtain the original cluster clusters, the processor further implements when the computer-readable instruction is executed The following steps:

If the number of the face feature vectors in the original cluster is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster to at least two GPUs. deal with.
9. The computer device according to claim 9, wherein said performing center split clustering on said face feature vector to obtain original clusters comprises:

Generating an initial face matrix based on the face feature vector, and calculating an initial feature average vector corresponding to the initial face matrix;

Clustering the initial face matrix based on the initial feature average vector to obtain a first cluster cluster;

If the number of face feature vectors in the first cluster is less than the second number threshold, the first cluster is determined as the original cluster.
11. The computer device of claim 11, wherein the clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster comprises:

Calculating the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sorting the initial similarity, and obtaining a sorting result;

Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix;

Clustering all face feature vectors in the initial face matrix according to the z cluster centers to generate a first cluster cluster.
10. The computer device according to claim 10, wherein said performing inter-class connected clustering on said original clusters to obtain connected clusters comprises:

Calculate the first feature average vector corresponding to each of the original cluster clusters, and use a similarity algorithm to calculate the first feature average vector and any one of the face feature vectors in the original cluster clusters, Determining the first similarity between the first feature average vector and any one of the face feature vectors in the original cluster clusters;

Connecting the face feature vectors with the first similarity greater than the first connected clustering threshold in the original clustering clusters into a second clustering cluster;

Perform inter-class clustering on all the second clusters to obtain connected clusters.
The computer device according to claim 13, wherein said performing inter-class clustering on all said second clustering clusters to obtain connected clustering clusters comprises:

Calculating a second feature average vector of each of the second cluster clusters, and determining the second similarity corresponding to any two second cluster clusters based on the second feature average vector;

If the second degree of similarity is greater than the second connected clustering threshold, the two second clusters are merged into connected clusters.
One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;

Performing center split clustering on the face feature vector to obtain original clustering clusters;

Perform inter-class connected clustering on the original clusters to obtain connected clusters;

Perform intra-class pruning on the connected clusters to obtain target clusters.
The readable storage medium according to claim 15, wherein, after the central split clustering is performed on the face feature vector to obtain the original cluster clusters, the computer readable instructions are processed by one or more When the processor executes, the one or more processors further execute the following steps:

If the number of the face feature vectors in the original cluster is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster to at least two GPUs. deal with.
15. The readable storage medium according to claim 15, wherein said performing central split clustering on said face feature vector to obtain original clusters comprises:

Generating an initial face matrix based on the face feature vector, and calculating an initial feature average vector corresponding to the initial face matrix;

Clustering the initial face matrix based on the initial feature average vector to obtain a first cluster cluster;

If the number of face feature vectors in the first cluster is less than the second number threshold, the first cluster is determined as the original cluster.
17. The readable storage medium of claim 17, wherein the clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster comprises:

Calculating the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sorting the initial similarity, and obtaining a sorting result;

Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix;

Clustering all face feature vectors in the initial face matrix according to the z cluster centers to generate a first cluster cluster.
15. The readable storage medium according to claim 15, wherein said performing inter-class connected clustering on the original clusters to obtain connected clusters comprises:

Calculate the first feature average vector corresponding to each of the original clusters, and use a similarity algorithm to calculate the first feature average vector and any one of the face feature vectors in the original clusters, Determining the first similarity between the first feature average vector and any one of the face feature vectors in the original cluster cluster;

Connecting the face feature vectors with the first similarity greater than the first connected clustering threshold in the original clustering clusters into a second clustering cluster;

Perform inter-class clustering on all the second clusters to obtain connected clusters.
19. The readable storage medium of claim 19, wherein said performing inter-class clustering on all said second cluster clusters to obtain connected cluster clusters comprises:

Calculating a second feature average vector of each of the second cluster clusters, and determining the second similarity corresponding to any two second cluster clusters based on the second feature average vector;

If the second degree of similarity is greater than the second connected clustering threshold, the two second clusters are merged into connected clusters.