WO2021082426A1 - Human face clustering method and apparatus, computer device, and storage medium - Google Patents

Human face clustering method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021082426A1
WO2021082426A1 PCT/CN2020/093348 CN2020093348W WO2021082426A1 WO 2021082426 A1 WO2021082426 A1 WO 2021082426A1 CN 2020093348 W CN2020093348 W CN 2020093348W WO 2021082426 A1 WO2021082426 A1 WO 2021082426A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
clustering
face
clusters
initial
Prior art date
Application number
PCT/CN2020/093348
Other languages
French (fr)
Chinese (zh)
Inventor
蔡中印
陆进
陈斌
宋晨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911037526.6A external-priority patent/CN110889433B/en
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082426A1 publication Critical patent/WO2021082426A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • This application relates to the field of artificial intelligence clustering, in particular to a face clustering method, device, computer equipment and storage medium.
  • Traditional face clustering algorithms include, but are not limited to, kmeans-clustering algorithm, rank order clustering method based on shared neighbors, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise, that is, density-based clustering algorithm), the inventor Realizing that these traditional face clustering algorithms cluster large-scale face images, the clustering results are inaccurate.
  • the rank order clustering method based on shared neighbors is easy to divide the face feature vector of the same person into multiple clusters, and the clustering results are not accurate;
  • the clustering time complexity of the DBSCAN clustering algorithm is too high to support thousands of Ten thousand-level face feature vector clustering;
  • K-means clustering algorithm is difficult to accurately cluster face feature vectors, and it is difficult to evenly divide large-scale face feature vectors, which affects the accuracy of clustering results.
  • the embodiments of the present application provide a face clustering method, device, computer equipment, and storage medium to solve the problem of inaccurate face clustering.
  • a face clustering method including:
  • a face clustering device includes:
  • the face feature vector acquiring module is used to acquire clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to acquire face feature vectors;
  • the original clustering cluster module is used to perform center split clustering on the face feature vector to obtain the original clustering cluster
  • the connected cluster cluster module is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters;
  • the target cluster cluster module is used to perform intra-class pruning on the connected cluster cluster to obtain the target cluster cluster.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
  • the aforementioned face clustering method, device, computer equipment, and storage medium acquire clustered face images, use a feature extraction model to perform feature extraction on the clustered face images, and perform center split clustering on the acquired face feature vectors.
  • Classes uniformly divide the face feature vector, and quickly obtain the original clustering clusters, so as to improve the clustering speed.
  • Inter-class connected clustering is performed on the original cluster clusters, and connected cluster clusters are obtained, so as to divide the facial feature vectors belonging to the same user into the same clusters to ensure the accuracy of clustering.
  • Perform intra-class pruning on the connected clusters to obtain target clusters to ensure that there are no interfering face feature vectors in the obtained target clusters, so as to ensure the accuracy of face clustering.
  • FIG. 1 is a schematic diagram of an application environment of a face clustering method in an embodiment of the present application
  • Fig. 2 is a flowchart of a face clustering method in an embodiment of the present application
  • Fig. 3 is a flowchart of a face clustering method in an embodiment of the present application.
  • Fig. 4 is a flowchart of a face clustering method in an embodiment of the present application.
  • Fig. 5 is a flowchart of a face clustering method in an embodiment of the present application.
  • Fig. 6 is a flowchart of a face clustering method in an embodiment of the present application.
  • FIG. 7 is a flowchart of a face clustering method in an embodiment of the present application.
  • Fig. 8 is a functional block diagram of a face clustering device in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the face clustering method provided by the embodiments of the present application can be used to divide large-scale face images, so that face images belonging to the same user can be divided into a target cluster; or it can be applied to people
  • face recognition for example, the face image to be recognized in the camera can be intercepted, and the face clustering method of this embodiment is used to cluster the face image to be recognized and the known face image in the image database to determine the face to be recognized Whether the image exists in the image database, to realize the recognition of the face image, such as the face image recognition in the scene of missing person tracking.
  • the face clustering method can be applied to the application environment shown in Figure 1. Specifically, the face clustering method is applied in a face clustering system.
  • the face clustering system includes a client and a server as shown in FIG. Large-scale face images are clustered by center splitting to evenly divide face feature vectors, and then connected clustering is performed on the evenly divided face feature vectors to obtain target clustering clusters to ensure the accuracy of face clustering.
  • the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablets, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a face clustering method is provided.
  • the method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
  • S201 Obtain a clustered face image, and use a feature extraction model to perform feature extraction on the clustered face image to obtain a face feature vector.
  • clustering face images refers to face images that need to be clustered.
  • the clustered face images in this embodiment include clustered face images of at least two users, and each user includes at least two clustered face images.
  • multiple clustered face images of the same user may carry the same user identification, so as to subsequently verify whether the clustering effect is accurate.
  • the user ID is used to identify a unique user ID.
  • the user ID may be the user's name or ID card.
  • clustered face images of unknown identities that do not carry user identifications can be clustered with clustered face images of known identities that carry user identifications. According to the clustering result, the identity of the clustered face image of the unknown identity that does not carry the user identity is determined.
  • the feature extraction model is a pre-trained model for feature extraction of clustered face images, and the feature extraction model may be a face feature extraction model based on convolutional neural network training.
  • the face feature vector refers to the vector obtained after feature extraction of the clustered face images using the feature extraction model.
  • the feature extraction model can be used to extract the 512-dimensional feature vector of each clustered face image, or the feature extraction model can be used to extract the 128-dimensional feature vector of each clustered face image.
  • the face feature vector is Refers to the vector corresponding to the facial features in the facial image.
  • S202 Perform central split clustering on the face feature vector to obtain original clusters.
  • the original cluster cluster refers to the cluster obtained after the first clustering of the face feature vector, and the first cluster here is the center split clustering.
  • Central split clustering is a process of clustering face feature vectors based on the similarity of face feature vectors.
  • the central split clustering can uniformly divide the face feature vectors corresponding to multiple clustered face images, so as to quickly obtain the original clusters.
  • the server extracts face feature vectors extracted from all clustered face images, generates an initial face matrix based on all face feature vectors, and calculates an initial feature average vector of the initial face matrix.
  • the mean function of the numpy library of python can be used to calculate the mean value of the initial face matrix, so as to quickly obtain the initial feature average vector of the initial face matrix, obtain the cluster center based on the initial feature average vector, and compare the face according to the cluster center.
  • the feature vector performs cluster splitting to obtain the original cluster clusters.
  • the initial face matrix refers to a matrix formed by gathering all face feature vectors.
  • the initial feature average vector refers to the average vector corresponding to the initial face matrix.
  • the computer uses the mean function of the python numpy library to calculate the average value of the initial face matrix, compresses the rows of the initial face matrix, and averages each column , Get a row vector, the row vector is the initial feature average vector, for example, if the initial face matrix is Then the initial feature average vector is among them, Among them, the value of i is 1, 2, 3, 4, and 5.
  • the clustering center refers to the face feature vector selected from the initial face matrix based on the initial feature average vector for clustering.
  • the method of randomly selecting K cluster centers for clustering makes the face feature vector to be randomly divided.
  • the time complexity of the K-means clustering algorithm is n 2 , N is the number of face feature vectors, corresponding to large-scale clustering of face feature vectors, the time complexity is very large, and there are problems of low clustering accuracy and low efficiency.
  • This embodiment is based on the initial feature average vector to obtain the cluster center to uniformly divide the face feature vector, and its time complexity is l*log(n), where n is the number of face feature vectors, and l is the number of center classification clusters.
  • center split clustering has lower time complexity than traditional clustering algorithms such as K-means clustering algorithm, and can effectively improve the speed of face clustering.
  • time complexity refers to the running time for face clustering.
  • S203 Perform inter-class connected clustering on the original clusters to obtain connected clusters.
  • inter-class connected clustering refers to a method of gathering any two original clusters with high similarity into one cluster, so as to divide the facial feature vectors belonging to the same user into the same cluster.
  • center-split clustering is a process of uniformly dividing face feature vectors based on the initial feature average vector in order to improve the clustering speed, but the original cluster cluster formed by the center-split clustering process may have an original cluster.
  • the clusters contain the face feature vectors of different users. Therefore, it is necessary to perform inter-class connected clustering on the original cluster clusters to divide the face feature vectors belonging to the same user into the same cluster, which helps to improve
  • the accuracy of the clustering results can solve the problem that the facial feature vectors corresponding to the same user can be easily divided into different clusters in traditional clustering algorithms.
  • S204 Perform intra-class pruning on the connected clusters to obtain target clusters.
  • intra-class pruning is a process of pruning each connected cluster cluster to eliminate the error face feature vector in the connected cluster cluster.
  • the error face feature vector is that the face feature vector in a connected cluster cluster is not the face feature vector of the same user.
  • the server may calculate the intra-class similarity in the connected clusters, sort the intra-class similarity, determine the face feature vector with low intra-class similarity as the error face feature vector, and eliminate the error
  • the face feature vector achieves the purpose of intra-class pruning of connected clusters and ensures the accuracy of face clustering.
  • the clustered face images are acquired, the feature extraction model is used to perform feature extraction on the clustered face images, and the acquired face feature vectors are subjected to center split clustering, and The face feature vector is uniformly divided to quickly obtain the original cluster clusters in order to improve the clustering speed.
  • Inter-class connected clustering is performed on the original clusters, and connected clusters are obtained to divide the facial feature vectors belonging to the same user into the same clusters to ensure accurate clustering.
  • Perform intra-class pruning on connected clusters to obtain target clusters to ensure that there are no interfering face feature vectors in the obtained target clusters to ensure the accuracy of face clustering.
  • the face clustering method further includes: if the face features in the original clusters If the number of vectors is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster clusters to at least two GPUs for processing.
  • the first number threshold is a preset number threshold for judging whether it is necessary to divide the face feature vectors in the original cluster cluster.
  • the program interface is an interface for distributing the face feature vectors of the original cluster clusters to different GPUs for processing.
  • the program interface includes but is not limited to the MPI interface.
  • the GPU Graphic Processing Unit, that is, the face image processor
  • the GPU is specifically one of the most basic components of the computer. Its purpose is to convert the display information required by the computer system to drive the display, and provide the display Line or interlace scanning signal, which controls the correct display of the monitor, is an important component for connecting the monitor and the main board of the personal computer.
  • the MPI interface is an application program interface for information transfer, including protocol and semantic description.
  • the server allocates the face feature vectors of the original cluster cluster to at least two GPUs for calculation and processing, specifically Assign to at least two GPUs for similarity calculations to speed up the calculation of similarity, thereby speeding up face clustering, and the GPU can send the calculated similarity results to the server so that the server can calculate the similarity according to the calculated similarity
  • the face feature vector is divided to realize connected clustering between classes.
  • the GPU can be used to calculate the subsequent initial similarity and the first similarity to speed up the clustering speed.
  • the number of face feature vectors in each original cluster cluster is divided by the maximum number that the GPU can handle. Round up processing to obtain the number of GPUs required for each original cluster cluster, so that the minimum GPU can ensure that all face feature vectors can be processed in the subsequent process, so as to achieve the use of at least two GPUs to face feature vectors Perform follow-up processing to improve processing efficiency and save GPU resources.
  • the program interface is used to distribute the feature vectors of all faces in each original cluster to at least two GPUs for calculation and processing, so as to quickly calculate the similarity between face feature vectors degree.
  • step S202 that is, performing central split clustering on the face feature vector to obtain the original cluster cluster includes:
  • S301 Generate an initial face matrix based on the face feature vector, and calculate an initial feature average vector corresponding to the initial face matrix.
  • the initial face matrix refers to a matrix loaded with all face feature vectors.
  • the initial feature average vector refers to the average vector of the initial face matrix.
  • load all face feature vectors to generate the initial face matrix and calculate the initial feature average vector corresponding to the initial face matrix.
  • S302 Cluster the initial face matrix based on the initial feature average vector to obtain the first cluster.
  • the initial face matrix is clustered according to the initial feature average vector, that is, the cluster center is obtained according to the initial feature average vector, so as to cluster according to the cluster center to obtain the first cluster cluster, which realizes the rapid
  • the face feature vectors are evenly divided.
  • the cluster center refers to the face feature vector selected from the initial face matrix for clustering.
  • the second number threshold is a preset number threshold used to determine whether it is necessary to divide the face feature vectors in the first cluster cluster, and the second number threshold is obtained after testing to ensure that the original cluster class is obtained.
  • the number of face feature vectors in the cluster is appropriate to avoid dividing the face feature vector of a user into too many original clusters.
  • the cluster centers are obtained based on the initial feature average vector, and the initial face matrix is clustered based on the cluster centers to obtain the first cluster cluster. If the number of face feature vectors in the first cluster cluster is less than the first cluster cluster, Second, the number threshold is used to determine the first cluster cluster as the original cluster cluster. Understandably, if the number of face feature vectors in the first cluster is not less than the second number threshold, it is necessary to continue to divide according to S301-S303 to ensure that the number of face feature vectors is less than the second number threshold The cluster of clusters is determined as the original clusters to ensure that the number of face feature vectors in the obtained original clusters is appropriate, and to avoid dividing the face feature vector of a user into too many original clusters.
  • the cluster centers are obtained based on the initial feature average vector, and after the initial face matrix is clustered based on the cluster centers, the degree of dispersion of the face feature vectors in the first cluster cluster obtained by the clustering can also be judged, if The degree of dispersion of the face feature vector in the first cluster is less than the preset dispersion threshold, and the first cluster is determined as the original cluster.
  • the degree of dispersion of the face feature vectors in the first cluster is not less than the preset discrete threshold, it is necessary to continue to divide according to S301-S303 to ensure that the number of face feature vectors is less than the preset discrete threshold
  • the clusters of are determined as the original clusters, to ensure that the number of face feature vectors in the original clusters is appropriate, and to avoid dividing the face feature vectors of a user into too many original clusters.
  • the degree of dispersion refers to the degree of difference between the face feature vectors in the same cluster, and the degree of dispersion is used to measure the level of risk.
  • the face feature vector in the first cluster The degree of dispersion can be expressed by the range, average difference and standard deviation of the face feature vector in the first cluster, and then compared with the preset dispersion threshold. If the degree of dispersion of the face feature vector in the first cluster is If it is less than the preset discrete threshold, the first cluster cluster is determined as the original cluster cluster.
  • the preset discrete threshold is a preset value used to determine whether it is necessary to divide the face feature vector in the first cluster.
  • an initial face matrix is generated based on the face feature vector, and the initial feature average vector corresponding to the initial face matrix is calculated, in order to subsequently perform central split clustering based on the initial feature average vector .
  • the clustering center is obtained and the initial face matrix is clustered, so as to realize the rapid uniform division of the face feature vector and obtain the first cluster cluster. If the number of face feature vectors of the first cluster is less than the second number threshold, the first cluster is determined as the original cluster to ensure that the number of face feature vectors in the obtained original cluster is appropriate , To avoid dividing the face feature vector of a user into too many original clusters.
  • step S302 clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster includes:
  • S401 Calculate the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sort the initial similarities, and obtain the sorting result.
  • the initial similarity is a value indicating the degree of similarity between each face feature vector in the initial face matrix and the initial feature average vector. Understandably, one face feature vector in the initial face matrix corresponds to an initial similarity.
  • the sorting result refers to the result of sorting the initial similarity in descending order or descending order.
  • the server calculates the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector.
  • the dot function of the numpy library or the cosine similarity calculation formula can be used to calculate the initial similarity between each face feature vector and the feature average vector, and then the initial similarity is in the order of small to large or large to small The order is sorted, and the sorting result is obtained, so that the cluster center for clustering can be quickly obtained according to the sorting result.
  • S402 Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix.
  • z (z is a positive integer) initial similarities are quickly obtained from the sorting result at a preset interval.
  • the preset interval can be 5.
  • z initial similarities are extracted from the ranking result interval, for example, the fifth of the ranking result is extracted
  • the 10th initial similarity of the sorting result, the 15th initial similarity of the sorting result...the 5zth initial similarity of the sorting result, the corresponding facial feature vector is determined according to the extracted initial similarity Perform clustering as a clustering center to speed up the clustering speed.
  • the traditional clustering algorithm randomly selects cluster centers for clustering, and the determination of z cluster centers according to the similarity in this embodiment can ensure that the face feature vectors are evenly divided, which can speed up the clustering speed, improve the clustering accuracy, and reduce The probability that the same user is classified into different clusters.
  • S403 Cluster all face feature vectors in the initial face matrix according to z cluster centers to generate a first cluster cluster.
  • all face feature vectors are clustered according to z cluster centers to generate z first cluster clusters. Understandably, since the cluster centers are based on each face feature vector in the initial face matrix The similarity extraction with the initial feature average vector can ensure that the face feature vector is evenly divided to generate the first cluster cluster.
  • the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector is calculated, the initial similarity is sorted, and the sorting result is obtained, so as to follow the initial similarity Degree to obtain the cluster center. Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to z initial similarities from the initial face matrix, which can ensure that the face feature vectors are evenly divided and reduce the classification of the same user to different
  • the probability of clustering partitions is to cluster all face feature vectors in the initial face matrix according to z clustering centers to ensure that the face feature vectors can be evenly divided to generate the first cluster cluster.
  • step S203 that is, performing inter-class connected clustering on the original cluster clusters to obtain connected clusters includes:
  • S501 Calculate the first feature average vector corresponding to each original cluster cluster, and use the similarity algorithm to calculate the first feature average vector and any face feature vector in the original cluster cluster to determine the first feature average vector The first degree of similarity with any face feature vector in the original cluster.
  • the first feature average vector is the average vector of the face feature vectors in the original cluster cluster, and the calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, it will not be repeated here.
  • the similarity algorithm is an algorithm for calculating the similarity of variables or points in space.
  • the similarity algorithm in this embodiment includes but is not limited to the cosine similarity algorithm and the Euclidean distance algorithm.
  • the first similarity is a value indicating the similarity of any two face feature vectors in the original cluster.
  • the original cluster clusters are divided by the central split clustering. If the face feature vector is not accurately divided, there may be face feature vectors that do not belong to the same user in an original cluster cluster. Therefore, , Calculate the first feature average vector corresponding to each original cluster cluster, and use the similarity algorithm to calculate the similarity between the face feature vector and the first feature average vector in the original cluster cluster, so as to use the first feature average
  • the vector determines whether there are facial feature vectors that do not belong to the same user in the original clusters.
  • S502 Connect the face feature vectors with the first similarity greater than the first connected cluster threshold in the original cluster clusters into a second cluster cluster.
  • the first connected cluster threshold is a preset value used to determine whether the face feature vectors in any original cluster cluster belong to the same user.
  • the first similarity of any two face feature vectors in the original cluster cluster is greater than the first connected clustering threshold.
  • the two face feature vectors have a high similarity, and they are most likely to be the face of the same user. Feature vector. Therefore, the face feature vector whose first similarity is greater than the first connected clustering threshold is connected to the first cluster cluster to exclude the face feature vector that does not belong to the same user.
  • Clustering of the face feature vectors together provides technical support.
  • the original cluster cluster includes face feature vectors a, b, c, d, e, and f, but there may be face feature vectors that do not belong to the same user in the original cluster cluster.
  • Face feature vector a The first similarity with the first feature average vector is 0.89, the first similarity between the face feature vector b and the first feature average vector is 0.88, and the first similarity between the face feature vector c and the first feature average vector is 0.95, the first similarity between the face feature vector d and the first feature average vector is 0.75, the first similarity between the face feature vector e and the first feature average vector is 0.53, the face feature vector f and the first feature average
  • the first similarity of the vector is 0.85, and the first connected clustering threshold is 0.7, then e is deleted, a, b, c, d, and f are connected to clusters, and the second cluster is obtained.
  • S503 Perform inter-class clustering on all second clusters to obtain connected clusters.
  • inter-class clustering is performed on all the second cluster clusters to gather the facial feature vectors belonging to a user together, which can effectively prevent the facial features belonging to a user from being divided into different In the clusters, improve the accuracy of face clustering.
  • the first feature average vector corresponding to each original cluster cluster is calculated, and the similarity algorithm is used to compare any one of the first feature average vector and the corresponding original cluster cluster.
  • the face feature vector is calculated, and the first similarity between the first feature average vector and any face feature vector in the corresponding original cluster cluster is determined, so that the first similarity in the original cluster cluster is greater than the first connectedness
  • the face feature vectors of the clustering threshold are connected to the second cluster cluster, so that the second cluster cluster can exclude the face feature vectors that do not belong to the same user, so that the face feature vectors belonging to the same user can be clustered. Together to provide technical support. Perform inter-class clustering on all the second cluster clusters to obtain connected cluster clusters, so as to cluster the face feature vectors belonging to the same user together to improve the accuracy of face clustering.
  • step S503 performing inter-class clustering on all second cluster clusters to obtain connected cluster clusters, includes:
  • S601 Calculate the second feature average vector of each second cluster cluster, and determine the second similarity corresponding to any two second cluster clusters based on the second feature average vector.
  • the second feature average vector is the average vector of the face feature vectors in the second cluster cluster.
  • the calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, we will not repeat them here. .
  • the second degree of similarity refers to the value of the degree of similarity between any two second cluster clusters.
  • the mean function of matlab can be used to calculate the mean value of all face feature vectors of the second cluster cluster, or the function of the numpy library of python can be used to average all face feature vectors of the second cluster cluster. Calculate to obtain the second feature average vector of the second cluster; then use the similarity algorithm to calculate the similarity of the second feature average vector of any two second clusters to quickly obtain any two second feature average vectors. The second similarity of the two clusters, so that the facial feature vectors of the same user are gathered in the same cluster according to the second similarity.
  • the second connected cluster threshold refers to a value used to determine whether any two second cluster clusters belong to the same user.
  • the second similarity is greater than the second connected
  • the clustering threshold it means that the two second cluster clusters have greater similarity and belong to the same user. Therefore, clusters with the second similarity greater than the second connected cluster threshold are gathered into the same connected cluster. In the cluster, to improve the accuracy of clustering and avoid the face feature vector of a user from being divided into different clusters. Understandably, when the second similarity is less than the second connected cluster threshold, it means that the similarity of the two second cluster clusters is small, and if they do not belong to the same user, clustering is not performed.
  • the second feature average vector of each second cluster cluster is calculated, and the second similarity corresponding to any two second cluster clusters is determined based on the second feature average vector Degree, the two second cluster clusters with the second similarity greater than the second connected cluster threshold are merged into connected cluster clusters, and the face feature vectors belonging to the same user are clustered into connected cluster clusters, To improve the accuracy of face clustering.
  • step S204 performing intra-class pruning on the connected cluster clusters to obtain the target cluster cluster includes:
  • S701 Perform average calculation on connected clusters, and obtain a third feature average vector corresponding to each connected cluster.
  • the third feature average vector refers to the average vector of the face feature vectors in the connected clusters.
  • the calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, it is not one-to-one here. Go into details.
  • the mean function of the python numpy library needs to be used to compare the connected clusters.
  • the average value of the face feature vectors of the clusters is calculated to quickly obtain the third feature average vector corresponding to each connected cluster cluster, so as to subsequently exclude face feature vectors with large differences in similarity based on the third feature average vector.
  • S702 Calculate the third similarity between any face feature vector in the connected cluster cluster and the third feature average vector.
  • the third similarity refers to the value of the degree of similarity between the face feature vector and the third feature average vector in the connected cluster clusters.
  • the cosine similarity calculation formula is used to calculate the center similarity between any face feature vector in the connected cluster cluster and the third feature average vector, and then people who do not belong to the same user are excluded according to the third feature average vector Face feature vectors to improve the accuracy of clustering.
  • S703 Cut the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters.
  • the third connected cluster threshold is preset to determine whether there is a face feature vector of a user who does not belong to the connected cluster cluster in the connected cluster cluster.
  • the face feature vector is deleted to exclude the connected cluster class
  • the cluster does not belong to the face feature vector of the user represented by the connected cluster cluster, and obtain the target cluster cluster, that is, the target cluster cluster is the connected cluster cluster to exclude users who do not belong to the cluster represented by the cluster
  • the face feature vector of to ensure that the class is clean.
  • the average value of the connected clusters is calculated, the third feature average vector corresponding to each connected cluster is obtained, and any person in the connected cluster is calculated.
  • the face feature vectors that do not belong to the connected cluster cluster are subsequently excluded according to the third feature average vector, so as to improve the clustering accuracy. Cut the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters to exclude the connected clusters that do not belong to the people represented by the clusters Face feature vector to ensure that the class is clean.
  • a face clustering device is provided, and the face clustering device corresponds to the face clustering method in the above-mentioned embodiment one-to-one.
  • the face clustering device includes a face feature vector acquisition module 801, an original cluster cluster module 802, a connected cluster cluster module 803, and a target cluster cluster module 804.
  • the detailed description of each functional module is as follows:
  • the face feature vector obtaining module 801 is used to obtain clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to obtain face feature vectors.
  • the original cluster cluster module 802 is used to perform center split clustering on the face feature vector to obtain the original cluster cluster.
  • the connected cluster cluster module 803 is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters.
  • the target cluster cluster module 804 is used to perform intra-class pruning on the connected cluster clusters to obtain the target cluster cluster.
  • the face clustering apparatus further includes: a GPU processing module.
  • the GPU processing module is configured to, if the number of face feature vectors in the original cluster cluster is greater than the first number threshold, use a program interface to allocate the face feature vectors in the original cluster cluster to at least two GPUs for processing.
  • the original cluster cluster module 802 includes: an initial feature average vector calculation unit, a first cluster cluster acquisition unit, and a first judgment unit.
  • the initial feature average vector calculation unit is used to generate an initial face matrix based on the face feature vector, and calculate the initial feature average vector corresponding to the initial face matrix.
  • the first cluster cluster acquiring unit is configured to cluster the initial face matrix based on the initial feature average vector to acquire the first cluster cluster.
  • the first judging unit is configured to determine the first cluster cluster as the original cluster cluster if the number of face feature vectors in the first cluster cluster is less than the second number threshold.
  • the first cluster cluster acquiring unit includes: a sorting result acquiring subunit, a cluster center acquiring subunit, and a first cluster cluster acquiring subunit.
  • the sorting result obtaining subunit is used to calculate the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sort the initial similarity, and obtain the sorting result.
  • the cluster center acquisition subunit is used to extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix.
  • the first cluster cluster acquisition subunit is used to cluster all face feature vectors in the initial face matrix according to z cluster centers to generate the first cluster cluster.
  • the connected clustering cluster module 803 includes: a first similarity calculation unit, a connected unit, and an inter-cluster clustering unit.
  • the first similarity calculation unit calculates the first feature average vector corresponding to each original cluster cluster, and uses the similarity algorithm to calculate the first feature average vector and any face feature vector in the corresponding original cluster cluster , Determine the first similarity between the first feature average vector and any face feature vector in the corresponding original cluster cluster.
  • the connected unit is used to connect the face feature vectors with the first similarity greater than the first connected cluster threshold in the original cluster clusters into the second cluster cluster.
  • the inter-class clustering unit is used to perform inter-class clustering on all second clusters to obtain connected clusters.
  • the inter-class clustering unit includes: a second feature average vector and a second judgment unit.
  • the second feature average vector calculation subunit is used to calculate the second feature average vector of each second cluster cluster, and determine the second similarity corresponding to any two second cluster clusters based on the second feature average vector.
  • the second judgment unit is configured to merge the two second cluster clusters into connected cluster clusters if the second similarity is greater than the second connected cluster threshold.
  • the target cluster cluster module 804 includes: a third feature average vector calculation unit, a third similarity calculation unit, and a third judgment unit.
  • the third feature average vector calculation unit is used to calculate the average value of the connected cluster clusters, and obtain the third feature average vector corresponding to each connected cluster cluster.
  • the third similarity calculation unit is used to calculate the third similarity between any face feature vector in the connected cluster cluster and the third feature average vector.
  • the third judging unit is used to crop the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters.
  • Each module in the aforementioned face clustering device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
  • the database of the computer device is used to store the generated target clusters.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize a face clustering method.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions, the human The steps of the face clustering method, such as steps S201-S204 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 7, are not repeated here in order to avoid repetition.
  • the processor implements the functions of the modules/units in this embodiment of the face clustering device when the processor executes computer-readable instructions, such as the face feature acquisition module 801, the original clustering cluster module 802, and the face clustering module 802 shown in FIG.
  • the functions of the connected clustering cluster module 803 and the target clustering cluster module 804 are not repeated here in order to avoid repetition.
  • one or more readable storage media storing computer readable instructions are provided.
  • the readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor to implement the foregoing implementations.
  • the steps of the face clustering method in the example, such as steps S201-S204 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 7, are not repeated here to avoid repetition.
  • the processor implements the functions of the modules/units in this embodiment of the face clustering device when the processor executes computer-readable instructions, such as the face feature acquisition module 801, the original clustering cluster module 802, and the face clustering module 802 shown in FIG.
  • the functions of the connected clustering cluster module 803 and the target clustering cluster module 804 are not repeated here in order to avoid repetition.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A human face clustering method and apparatus, a computer device, and a storage medium, relating to the field of artificial intelligence clustering. The human face clustering method comprises: acquiring clustered facial images, and performing feature extraction on the clustered facial images by means of a feature extraction model to acquire facial feature vectors (S201); performing central divisive clustering on the facial feature vectors in order to uniformly divide the face feature vectors and acquire original clustering class clusters (S202); performing inter-class connection clustering on the original clustering class clusters to acquire a connected clustering class cluster (S203); and performing intra-class trimming on the connected clustering class cluster to acquire a target clustering class cluster (S204), so as to improve the accuracy of human face clustering.

Description

人脸聚类方法、装置、计算机设备及存储介质Face clustering method, device, computer equipment and storage medium
本申请要求于 20191029日提交中国专利局、申请号为 201911037526.6,发明名称为“ 人脸聚类方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。 This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29 , 2019 , the application number is 201911037526.6 , and the invention title is " face clustering method, device, computer equipment and storage medium ". The entire content of the application is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能聚类领域,尤其涉及一种人脸聚类方法、装置、计算机设备及存储介质。This application relates to the field of artificial intelligence clustering, in particular to a face clustering method, device, computer equipment and storage medium.
背景技术Background technique
传统人脸聚类算法包括包括但不限于kmeans-聚类算法、基于共享邻居的rank order聚类方法和DBSCAN(Density-Based Spatial Clustering of Applications with Noise,即基于密度的聚类算法),发明人意识到这些传统人脸聚类算法对于大规模人脸图像进行聚类时,存在聚类结果不精确。例如,基于共享邻居的rank order聚类方法,容易将同一人的人脸特征向量划分成为多个类簇,聚类结果不准确;DBSCAN聚类算法的聚类时间复杂度过高,不能支持千万级人脸特征向量聚类;K-means聚类算法难以对人脸特征向量进行精准聚类,对大规模人脸特征向量难以均匀划分,影响聚类结果的精确度。Traditional face clustering algorithms include, but are not limited to, kmeans-clustering algorithm, rank order clustering method based on shared neighbors, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise, that is, density-based clustering algorithm), the inventor Realizing that these traditional face clustering algorithms cluster large-scale face images, the clustering results are inaccurate. For example, the rank order clustering method based on shared neighbors is easy to divide the face feature vector of the same person into multiple clusters, and the clustering results are not accurate; the clustering time complexity of the DBSCAN clustering algorithm is too high to support thousands of Ten thousand-level face feature vector clustering; K-means clustering algorithm is difficult to accurately cluster face feature vectors, and it is difficult to evenly divide large-scale face feature vectors, which affects the accuracy of clustering results.
发明内容Summary of the invention
本申请实施例提供一种人脸聚类方法、装置、计算机设备及存储介质,以解决人脸聚类不精确的问题。The embodiments of the present application provide a face clustering method, device, computer equipment, and storage medium to solve the problem of inaccurate face clustering.
一种人脸聚类方法,包括:A face clustering method, including:
获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;
对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;Performing center split clustering on the face feature vector to obtain original clustering clusters;
对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;Perform inter-class connected clustering on the original clusters to obtain connected clusters;
对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。Perform intra-class pruning on the connected clusters to obtain target clusters.
一种人脸聚类装置,包括:A face clustering device includes:
人脸特征向量获取模块,用于获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;The face feature vector acquiring module is used to acquire clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to acquire face feature vectors;
原始聚类类簇模块,用于对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;The original clustering cluster module is used to perform center split clustering on the face feature vector to obtain the original clustering cluster;
连通聚类类簇模块,用于对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;The connected cluster cluster module is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters;
目标聚类类簇模块,用于对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。The target cluster cluster module is used to perform intra-class pruning on the connected cluster cluster to obtain the target cluster cluster.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;
对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;Performing center split clustering on the face feature vector to obtain original clustering clusters;
对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;Perform inter-class connected clustering on the original clusters to obtain connected clusters;
对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。Perform intra-class pruning on the connected clusters to obtain target clusters.
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸 特征向量;Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;
对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;Performing center split clustering on the face feature vector to obtain original clustering clusters;
对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;Perform inter-class connected clustering on the original clusters to obtain connected clusters;
对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。Perform intra-class pruning on the connected clusters to obtain target clusters.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings, and claims.
上述人脸聚类方法、装置、计算机设备及存储介质,通过获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,对获取的人脸特征向量进行中心分裂聚类,对人脸特征向量进行均匀划分,快速获得原始聚类类簇,以便提高聚类速度。对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇,以将属于同一个用户的人脸特征向量划分到同一个类簇中,确保聚类的精确度。对所述连通聚类类簇进行类内修剪,获取目标聚类类簇,以确保获取的目标聚类类簇内没有干扰的人脸特征向量,以保障人脸聚类的精确度。The aforementioned face clustering method, device, computer equipment, and storage medium acquire clustered face images, use a feature extraction model to perform feature extraction on the clustered face images, and perform center split clustering on the acquired face feature vectors. Classes, uniformly divide the face feature vector, and quickly obtain the original clustering clusters, so as to improve the clustering speed. Inter-class connected clustering is performed on the original cluster clusters, and connected cluster clusters are obtained, so as to divide the facial feature vectors belonging to the same user into the same clusters to ensure the accuracy of clustering. Perform intra-class pruning on the connected clusters to obtain target clusters to ensure that there are no interfering face feature vectors in the obtained target clusters, so as to ensure the accuracy of face clustering.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本申请一实施例中人脸聚类方法的一应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a face clustering method in an embodiment of the present application;
图2是本申请一实施例中人脸聚类方法的一流程图;Fig. 2 is a flowchart of a face clustering method in an embodiment of the present application;
图3是本申请一实施例中人脸聚类方法的一流程图;Fig. 3 is a flowchart of a face clustering method in an embodiment of the present application;
图4是本申请一实施例中人脸聚类方法的一流程图;Fig. 4 is a flowchart of a face clustering method in an embodiment of the present application;
图5是本申请一实施例中人脸聚类方法的一流程图;Fig. 5 is a flowchart of a face clustering method in an embodiment of the present application;
图6是本申请一实施例中人脸聚类方法的一流程图;Fig. 6 is a flowchart of a face clustering method in an embodiment of the present application;
图7是本申请一实施例中人脸聚类方法的一流程图;FIG. 7 is a flowchart of a face clustering method in an embodiment of the present application;
图8是本申请一实施例中人脸聚类装置的一原理框图;Fig. 8 is a functional block diagram of a face clustering device in an embodiment of the present application;
图9是本申请一实施例中计算机设备的一示意图。Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请实施例提供的人脸聚类方法,可以用于对大规模的人脸图像进行划分,以使属于同一个用户的人脸图像能够划分为一个目标聚类类簇;或者可以应用于人脸识别中,例如,可以截取摄像头中的待识别人脸图像,采用本实施例的人脸聚类方法对待识别人脸图像和图像数据库中已知人脸图像进行聚类,以判断待识别人脸图像是否存在于图像数据库中,实现对人脸图像进行识别,例如失踪人员追踪等场景下进行人脸图像识别。The face clustering method provided by the embodiments of the present application can be used to divide large-scale face images, so that face images belonging to the same user can be divided into a target cluster; or it can be applied to people In face recognition, for example, the face image to be recognized in the camera can be intercepted, and the face clustering method of this embodiment is used to cluster the face image to be recognized and the known face image in the image database to determine the face to be recognized Whether the image exists in the image database, to realize the recognition of the face image, such as the face image recognition in the scene of missing person tracking.
该人脸聚类方法可应用如图1所示的应用环境中。具体地,该人脸聚类方法应用在人脸聚类系统中,该人脸聚类系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于通过对大规模的人脸图像进行中心分裂聚类,以均匀划分人脸特征向量,然后对均匀划分的人脸特征向量进行连通聚类,获得目标聚类类簇,保证人脸聚类的精确度。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可 穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The face clustering method can be applied to the application environment shown in Figure 1. Specifically, the face clustering method is applied in a face clustering system. The face clustering system includes a client and a server as shown in FIG. Large-scale face images are clustered by center splitting to evenly divide face feature vectors, and then connected clustering is performed on the evenly divided face feature vectors to obtain target clustering clusters to ensure the accuracy of face clustering. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablets, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种人脸聚类方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a face clustering method is provided. The method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
S201:获取聚类人脸图像,采用特征提取模型对聚类人脸图像进行特征提取,获取人脸特征向量。S201: Obtain a clustered face image, and use a feature extraction model to perform feature extraction on the clustered face image to obtain a face feature vector.
其中,聚类人脸图像是指需要进行聚类的人脸图像。本实施例的聚类人脸图像包括至少2个用户的聚类人脸图像,每一个用户包括至少两张聚类人脸图像。Among them, clustering face images refers to face images that need to be clustered. The clustered face images in this embodiment include clustered face images of at least two users, and each user includes at least two clustered face images.
作为一示例,在人脸聚类方法进行模型训练过程中,可使同一个用户的多个聚类人脸图像携带有相同的用户标识,以便后续验证聚类效果是否准确。该用户标识用于识别唯一用户的标识,例如,用户标识可以是该用户的姓名或者身份证等。As an example, during the model training process of the face clustering method, multiple clustered face images of the same user may carry the same user identification, so as to subsequently verify whether the clustering effect is accurate. The user ID is used to identify a unique user ID. For example, the user ID may be the user's name or ID card.
作为一示例,在人脸聚类方法在进行人脸识别过程中,可将不携带用户标识的未知身份的聚类人脸图像与携带用户标识的已知身份的聚类人脸图像进行聚类,根据聚类结果确定不携带用户标识的未知身份的聚类人脸图像的身份。As an example, in the face recognition process of the face clustering method, clustered face images of unknown identities that do not carry user identifications can be clustered with clustered face images of known identities that carry user identifications. According to the clustering result, the identity of the clustered face image of the unknown identity that does not carry the user identity is determined.
特征提取模型是预先训练好的用于对聚类人脸图像进行特征提取的模型,该特征提取模型可以是基于卷积神经网络训练得到的人脸特征提取模型。The feature extraction model is a pre-trained model for feature extraction of clustered face images, and the feature extraction model may be a face feature extraction model based on convolutional neural network training.
人脸特征向量是指利用特征提取模型对聚类人脸图像进行特征提取后获得的向量。例如,可以利用特征提取模型提取每一张聚类人脸图像的的512维特征向量,或者利用特征提取模型提取每一张聚类人脸图像的的128维特征向量等。The face feature vector refers to the vector obtained after feature extraction of the clustered face images using the feature extraction model. For example, the feature extraction model can be used to extract the 512-dimensional feature vector of each clustered face image, or the feature extraction model can be used to extract the 128-dimensional feature vector of each clustered face image.
作为一示例,将聚类人脸图像输入预先训练好的特征提取模型进行特征提取,即可快速获取每一张聚类人脸图像的512维特征向量,本实施例中,人脸特征向量是指人脸图像中的人脸特征对应的向量。As an example, input the clustered face image into a pre-trained feature extraction model for feature extraction, and then the 512-dimensional feature vector of each clustered face image can be quickly obtained. In this embodiment, the face feature vector is Refers to the vector corresponding to the facial features in the facial image.
S202:对人脸特征向量进行中心分裂聚类,获取原始聚类类簇。S202: Perform central split clustering on the face feature vector to obtain original clusters.
其中,原始聚类类簇是指对人脸特征向量进行第一次聚类后获得的类簇,此处的第一次聚类是中心分裂聚类。中心分裂聚类是一种基于人脸特征向量相似度对人脸特征向量进行聚类的处理过程。作为一示例,中心分裂聚类可对多个聚类人脸图像对应的人脸特征向量进行均匀划分,以快速获得原始聚类类簇。Among them, the original cluster cluster refers to the cluster obtained after the first clustering of the face feature vector, and the first cluster here is the center split clustering. Central split clustering is a process of clustering face feature vectors based on the similarity of face feature vectors. As an example, the central split clustering can uniformly divide the face feature vectors corresponding to multiple clustered face images, so as to quickly obtain the original clusters.
具体地,服务器提取到所有聚类人脸图像提取出的人脸特征向量,基于所有人脸特征向量生成初始人脸矩阵,并计算初始人脸矩阵的初始特征平均向量。例如,可以利用python的numpy库的mean函数对初始人脸矩阵进行均值计算,以便快速获取初始人脸矩阵的初始特征平均向量,基于初始特征平均向量获取聚类中心,根据聚类中心对人脸特征向量进行聚类分裂,以获取原始聚类类簇。其中,初始人脸矩阵是指所有人脸特征向量聚集在一起形成的矩阵。初始特征平均向量是指初始人脸矩阵对应的平均向量,作为一示例,计算机采用python的numpy库的mean函数对初始人脸矩阵进行均值计算,压缩初始人脸矩阵的行,对各列求均值,得到一个行向量,该行向量即为初始特征平均向量,例如,若初始人脸矩阵为
Figure PCTCN2020093348-appb-000001
则初始特征平均向量为
Figure PCTCN2020093348-appb-000002
其中,
Figure PCTCN2020093348-appb-000003
其中,i的取值为1、2、3、4和5。聚类中心是指基于初始特征平均向量从初始人脸矩阵中选取出来的用于进行聚类的人脸特征向量。
Specifically, the server extracts face feature vectors extracted from all clustered face images, generates an initial face matrix based on all face feature vectors, and calculates an initial feature average vector of the initial face matrix. For example, the mean function of the numpy library of python can be used to calculate the mean value of the initial face matrix, so as to quickly obtain the initial feature average vector of the initial face matrix, obtain the cluster center based on the initial feature average vector, and compare the face according to the cluster center. The feature vector performs cluster splitting to obtain the original cluster clusters. Among them, the initial face matrix refers to a matrix formed by gathering all face feature vectors. The initial feature average vector refers to the average vector corresponding to the initial face matrix. As an example, the computer uses the mean function of the python numpy library to calculate the average value of the initial face matrix, compresses the rows of the initial face matrix, and averages each column , Get a row vector, the row vector is the initial feature average vector, for example, if the initial face matrix is
Figure PCTCN2020093348-appb-000001
Then the initial feature average vector is
Figure PCTCN2020093348-appb-000002
among them,
Figure PCTCN2020093348-appb-000003
Among them, the value of i is 1, 2, 3, 4, and 5. The clustering center refers to the face feature vector selected from the initial face matrix based on the initial feature average vector for clustering.
相比于K-means聚类算法或者其他传统聚类算法,随机选取K个类中心进行聚类的方式,使得人脸特征向量进行随机划分,K-means聚类算法的时间复杂度为n 2,n为人脸特 征向量的数量,对应大规模的人脸特征向量聚类,时间复杂度非常大,存在聚类准确率低和效率低的问题。本实施例是基于初始特征平均向量获取聚类中心对人脸特征向量进行均匀划分,其时间复杂度为l*log(n),n为人脸特征向量的数量,l为中心分类聚类次数,对于大规模的人脸特征向量进行聚类,中心分裂聚类比K-means聚类算法等传统聚类算法的时间复杂度较低,可以有效提高人脸聚类的速度。其中,时间复杂度是指实现人脸聚类的运行时间。 Compared with the K-means clustering algorithm or other traditional clustering algorithms, the method of randomly selecting K cluster centers for clustering makes the face feature vector to be randomly divided. The time complexity of the K-means clustering algorithm is n 2 , N is the number of face feature vectors, corresponding to large-scale clustering of face feature vectors, the time complexity is very large, and there are problems of low clustering accuracy and low efficiency. This embodiment is based on the initial feature average vector to obtain the cluster center to uniformly divide the face feature vector, and its time complexity is l*log(n), where n is the number of face feature vectors, and l is the number of center classification clusters. For clustering large-scale face feature vectors, center split clustering has lower time complexity than traditional clustering algorithms such as K-means clustering algorithm, and can effectively improve the speed of face clustering. Among them, time complexity refers to the running time for face clustering.
S203:对原始聚类类簇进行类间连通聚类,获取连通聚类类簇。S203: Perform inter-class connected clustering on the original clusters to obtain connected clusters.
其中,类间连通聚类是指将相似度高的任意两个原始聚类类簇聚聚拢成一个类簇的方法,以将属于同一个用户的人脸特征向量划分到同一个类簇中。Among them, inter-class connected clustering refers to a method of gathering any two original clusters with high similarity into one cluster, so as to divide the facial feature vectors belonging to the same user into the same cluster.
具体地,中心分裂聚类是基于初始特征平均向量对人脸特征向量进行均匀划分的过程,以便提高聚类速度,但中心分裂聚类过程形成的原始聚类类簇可能存在一个原始聚类类簇中包含不同用户的人脸特征向量,因此,需要对原始聚类类簇进行类间连通聚类,以将属于同一个用户的人脸特征向量划分到同一个类簇中,有助于提高聚类结果的准确性,以解决传统聚类算法中容易将同一个用户对应的人脸特征向量划分在不同类簇的问题。Specifically, center-split clustering is a process of uniformly dividing face feature vectors based on the initial feature average vector in order to improve the clustering speed, but the original cluster cluster formed by the center-split clustering process may have an original cluster. The clusters contain the face feature vectors of different users. Therefore, it is necessary to perform inter-class connected clustering on the original cluster clusters to divide the face feature vectors belonging to the same user into the same cluster, which helps to improve The accuracy of the clustering results can solve the problem that the facial feature vectors corresponding to the same user can be easily divided into different clusters in traditional clustering algorithms.
S204:对连通聚类类簇进行类内修剪,获取目标聚类类簇。S204: Perform intra-class pruning on the connected clusters to obtain target clusters.
其中,类内修剪是对每一连通聚类类簇进行修剪,以排除连通聚类类簇中的误差人脸特征向量的过程。误差人脸特征向量是在一个连通聚类类簇中的人脸特征向量不为同一个用户的人脸特征向量。Among them, intra-class pruning is a process of pruning each connected cluster cluster to eliminate the error face feature vector in the connected cluster cluster. The error face feature vector is that the face feature vector in a connected cluster cluster is not the face feature vector of the same user.
作为一示例,服务器可计算连通聚类类簇中的类内相似度,对类内相似度进行排序,将类内相似度较低的人脸特征向量确定为误差人脸特征向量,并排除误差人脸特征向量,达到对连通聚类类簇进行类内修剪的目的,确保人脸聚类的精确度。As an example, the server may calculate the intra-class similarity in the connected clusters, sort the intra-class similarity, determine the face feature vector with low intra-class similarity as the error face feature vector, and eliminate the error The face feature vector achieves the purpose of intra-class pruning of connected clusters and ensures the accuracy of face clustering.
本实施例所提供的人脸聚类方法中,通过获取聚类人脸图像,采用特征提取模型对聚类人脸图像进行特征提取,对获取的人脸特征向量进行中心分裂聚类,对人脸特征向量进行均匀划分,快速获得原始聚类类簇,以便提高聚类速度。对原始聚类类簇进行类间连通聚类,获取连通聚类类簇,以将属于同一个用户的人脸特征向量划分到同一个类簇中,确保可精准地进行聚类。对连通聚类类簇进行类内修剪,获取目标聚类类簇,以确保获取的目标聚类类簇内没有干扰的人脸特征向量,以保障人脸聚类的精确度。In the face clustering method provided in this embodiment, the clustered face images are acquired, the feature extraction model is used to perform feature extraction on the clustered face images, and the acquired face feature vectors are subjected to center split clustering, and The face feature vector is uniformly divided to quickly obtain the original cluster clusters in order to improve the clustering speed. Inter-class connected clustering is performed on the original clusters, and connected clusters are obtained to divide the facial feature vectors belonging to the same user into the same clusters to ensure accurate clustering. Perform intra-class pruning on connected clusters to obtain target clusters to ensure that there are no interfering face feature vectors in the obtained target clusters to ensure the accuracy of face clustering.
在一实施例中,在步骤S202之后,即在对人脸特征向量进行中心分裂聚类,获取原始聚类类簇之后,人脸聚类方法还包括:若原始聚类类簇中人脸特征向量的数量大于第一数量阈值,则采用程序接口将原始聚类类簇中的人脸特征向量分配到至少两个GPU进行处理。In one embodiment, after step S202, that is, after performing central split clustering on the face feature vector to obtain the original clusters, the face clustering method further includes: if the face features in the original clusters If the number of vectors is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster clusters to at least two GPUs for processing.
其中,第一数量阈值是预先设置的用于判断是否需要对原始聚类类簇中的人脸特征向量进行划分处理的数量阈值。程序接口是用于实现将原始聚类类簇的人脸特征向量分配到不同的GPU进行处理的接口,该程序接口包括但不限于MPI接口。GPU(Graphic Processing Unit,即人脸图像处理器)是整个显卡的核心,GPU具体是计算机最基本组成部分之一,用途是将计算机系统所需要的显示信息进行转换驱动显示器,并向显示器提供逐行或隔行扫描信号,控制显示器的正确显示,是连接显示器和个人计算机主板的重要组件。MPI接口是信息传递应用程序接口,包括协议和和语义说明。Wherein, the first number threshold is a preset number threshold for judging whether it is necessary to divide the face feature vectors in the original cluster cluster. The program interface is an interface for distributing the face feature vectors of the original cluster clusters to different GPUs for processing. The program interface includes but is not limited to the MPI interface. The GPU (Graphic Processing Unit, that is, the face image processor) is the core of the entire graphics card. The GPU is specifically one of the most basic components of the computer. Its purpose is to convert the display information required by the computer system to drive the display, and provide the display Line or interlace scanning signal, which controls the correct display of the monitor, is an important component for connecting the monitor and the main board of the personal computer. The MPI interface is an application program interface for information transfer, including protocol and semantic description.
作为一示例,服务器在任一原始聚类类簇中所有人脸特征向量的数量大于第一数量阈值时,将原始聚类类簇的人脸特征向量分配到至少两个GPU进行计算处理,具体是分配到至少两个GPU中进行相似度计算,以加快计算相似度的速度,从而加快人脸聚类速度,且GPU可以将计算得到的相似度结果发送给服务器,以便服务器根据计算得到的相似度结果划分人脸特征向量,实现进行类间连通聚类。可以理解地,GPU可以用于计算后续的初始相似度和第一相似度等,以加快聚类速度。As an example, when the number of human face feature vectors in any original cluster cluster is greater than the first number threshold, the server allocates the face feature vectors of the original cluster cluster to at least two GPUs for calculation and processing, specifically Assign to at least two GPUs for similarity calculations to speed up the calculation of similarity, thereby speeding up face clustering, and the GPU can send the calculated similarity results to the server so that the server can calculate the similarity according to the calculated similarity As a result, the face feature vector is divided to realize connected clustering between classes. Understandably, the GPU can be used to calculate the subsequent initial similarity and the first similarity to speed up the clustering speed.
具体地,若每一原始聚类类簇中人脸特征向量的数量大于第一数量阈值,则将每一原 始聚类类簇所有人脸特征向量的数量除以GPU能够处理的最大数量并进行向上取整处理,获得每一原始聚类类簇需要的GPU数量,以实现采用最少的GPU均可确保所有人脸特征向量均能进行后续处理,以实现采用至少两个GPU对人脸特征向量进行后续处理,提高处理效率并且节省GPU资源,采用程序接口将每一原始聚类类簇中所有人脸特征向量分布到至少两个GPU进行计算处理,以快速计算人脸特征向量之间的相似度。Specifically, if the number of face feature vectors in each original cluster cluster is greater than the first number threshold, then the number of face feature vectors in each original cluster cluster is divided by the maximum number that the GPU can handle. Round up processing to obtain the number of GPUs required for each original cluster cluster, so that the minimum GPU can ensure that all face feature vectors can be processed in the subsequent process, so as to achieve the use of at least two GPUs to face feature vectors Perform follow-up processing to improve processing efficiency and save GPU resources. The program interface is used to distribute the feature vectors of all faces in each original cluster to at least two GPUs for calculation and processing, so as to quickly calculate the similarity between face feature vectors degree.
在一实施例中,如图3所示,步骤S202,即对人脸特征向量进行中心分裂聚类,获取原始聚类类簇,包括:In one embodiment, as shown in FIG. 3, step S202, that is, performing central split clustering on the face feature vector to obtain the original cluster cluster includes:
S301:基于人脸特征向量生成初始人脸矩阵,计算初始人脸矩阵对应的初始特征平均向量。S301: Generate an initial face matrix based on the face feature vector, and calculate an initial feature average vector corresponding to the initial face matrix.
其中,初始人脸矩阵是指加载所有人脸特征向量的矩阵。初始特征平均向量是指初始人脸矩阵的平均向量。Among them, the initial face matrix refers to a matrix loaded with all face feature vectors. The initial feature average vector refers to the average vector of the initial face matrix.
具体地,加载所有人脸特征向量生成初始人脸矩阵,计算初始人脸矩阵对应的初始特征平均向量,例如,可以采用matlab的mean函数计算初始人脸矩阵对应的初始特征平均向量,也可以采用python的numpy库以快速计算所有初始人脸矩阵对应的初始特征平均向量,为便后续根据该初始特征平均向量获取聚类中心进行中心分裂聚类。Specifically, load all face feature vectors to generate the initial face matrix, and calculate the initial feature average vector corresponding to the initial face matrix. For example, you can use the mean function of matlab to calculate the initial feature average vector corresponding to the initial face matrix, or you can use Python's numpy library quickly calculates the initial feature average vector corresponding to all initial face matrices, in order to subsequently obtain the cluster center based on the initial feature average vector for center split clustering.
S302:基于初始特征平均向量对初始人脸矩阵进行聚类,获取第一聚类类簇。S302: Cluster the initial face matrix based on the initial feature average vector to obtain the first cluster.
具体地,根据初始特征平均向量对初始人脸矩阵进行聚类,即依据初始特征平均向量获取聚类中心,以便依据聚类中心进行聚类,获取第一聚类类簇,实现了对快速地人脸特征向量均匀划分。聚类中心是指从初始人脸矩阵中选取出来的用于进行聚类的人脸特征向量。Specifically, the initial face matrix is clustered according to the initial feature average vector, that is, the cluster center is obtained according to the initial feature average vector, so as to cluster according to the cluster center to obtain the first cluster cluster, which realizes the rapid The face feature vectors are evenly divided. The cluster center refers to the face feature vector selected from the initial face matrix for clustering.
S303:若第一聚类类簇中人脸特征向量的数量小于第二数量阈值,将第一聚类类簇确定为原始聚类类簇。S303: If the number of face feature vectors in the first cluster cluster is less than the second number threshold, determine the first cluster cluster as the original cluster cluster.
其中,第二数量阈值是预先设置的用于判断是否需要对第一聚类类簇中人脸特征向量进行划分的数量阈值,该第二数量阈值经过测试获得,以确保获得的原始聚类类簇中人脸特征向量数量适宜,避免将一个用户的人脸特征向量划分到太多个原始聚类类簇。Wherein, the second number threshold is a preset number threshold used to determine whether it is necessary to divide the face feature vectors in the first cluster cluster, and the second number threshold is obtained after testing to ensure that the original cluster class is obtained The number of face feature vectors in the cluster is appropriate to avoid dividing the face feature vector of a user into too many original clusters.
具体地,基于初始特征平均向量获取聚类中心,基于聚类中心对初始人脸矩阵进行聚类,获取第一聚类类簇,若第一聚类类簇中人脸特征向量的数量小于第二数量阈值,将第一聚类类簇确定为原始聚类类簇。可以理解地,若第一聚类类簇中人脸特征向量的数量不小于第二数量阈值,则还需要继续按照S301-S303进行划分,以确保将人脸特征向量的数量小于第二数量阈值的聚类类簇确定为原始聚类类簇,确保获得的原始聚类类簇中人脸特征向量数量适宜,避免将一个用户的人脸特征向量划分到太多个原始聚类类簇。Specifically, the cluster centers are obtained based on the initial feature average vector, and the initial face matrix is clustered based on the cluster centers to obtain the first cluster cluster. If the number of face feature vectors in the first cluster cluster is less than the first cluster cluster, Second, the number threshold is used to determine the first cluster cluster as the original cluster cluster. Understandably, if the number of face feature vectors in the first cluster is not less than the second number threshold, it is necessary to continue to divide according to S301-S303 to ensure that the number of face feature vectors is less than the second number threshold The cluster of clusters is determined as the original clusters to ensure that the number of face feature vectors in the obtained original clusters is appropriate, and to avoid dividing the face feature vector of a user into too many original clusters.
进一步地,基于初始特征平均向量获取聚类中心,基于聚类中心对初始人脸矩阵进行聚类后,还可以通过判断聚类获得的第一聚类类簇中人脸特征向量离散程度,若第一聚类类簇中人脸特征向量离散程度小于预设离散阈值,则将第一聚类类簇确定为原始聚类类簇。可以理解地,若第一聚类类簇中人脸特征向量离散程度不小于预设离散阈值,则还需要继续按照S301-S303进行划分,以确保将人脸特征向量的数量小于预设离散阈值的聚类类簇确定为原始聚类类簇,确保原始聚类类簇中人脸特征向量数量适宜,避免将一个用户的人脸特征向量划分到太多个原始聚类类簇。其中,离散程度是指通过同一个聚类类簇中人脸特征向量之间的差异程度,离散程度用来衡量风险大小的指标,本实施例中,第一聚类类簇中人脸特征向量离散程度可以采用第一聚类类簇中人脸特征向量的极差、平均差和标准差等表示,然后与预设离散阈值进行比较,若第一聚类类簇中人脸特征向量离散程度小于预设离散阈值,则将第一聚类类簇确定为原始聚类类簇。预设离散阈值是预先设置的用于判断是否需要对第一聚类类簇中人脸特征向量进行划分的值。Further, the cluster centers are obtained based on the initial feature average vector, and after the initial face matrix is clustered based on the cluster centers, the degree of dispersion of the face feature vectors in the first cluster cluster obtained by the clustering can also be judged, if The degree of dispersion of the face feature vector in the first cluster is less than the preset dispersion threshold, and the first cluster is determined as the original cluster. Understandably, if the degree of dispersion of the face feature vectors in the first cluster is not less than the preset discrete threshold, it is necessary to continue to divide according to S301-S303 to ensure that the number of face feature vectors is less than the preset discrete threshold The clusters of, are determined as the original clusters, to ensure that the number of face feature vectors in the original clusters is appropriate, and to avoid dividing the face feature vectors of a user into too many original clusters. Among them, the degree of dispersion refers to the degree of difference between the face feature vectors in the same cluster, and the degree of dispersion is used to measure the level of risk. In this embodiment, the face feature vector in the first cluster The degree of dispersion can be expressed by the range, average difference and standard deviation of the face feature vector in the first cluster, and then compared with the preset dispersion threshold. If the degree of dispersion of the face feature vector in the first cluster is If it is less than the preset discrete threshold, the first cluster cluster is determined as the original cluster cluster. The preset discrete threshold is a preset value used to determine whether it is necessary to divide the face feature vector in the first cluster.
本实施例所提供的人脸聚类方法中,基于人脸特征向量生成初始人脸矩阵,计算初始人脸矩阵对应的初始特征平均向量,为便后续根据该初始特征平均向量进行中心分裂聚 类。基于初始特征平均向量获取聚类中心对初始人脸矩阵进行聚类,以实现快速对人脸特征向量均匀划分,获取第一聚类类簇。若第一聚类类簇的人脸特征向量数量小于第二数量阈值,将第一聚类类簇确定为原始聚类类簇,以确保获得的原始聚类类簇中人脸特征向量数量适宜,避免将一个用户的人脸特征向量划分到太多个原始聚类类簇。In the face clustering method provided in this embodiment, an initial face matrix is generated based on the face feature vector, and the initial feature average vector corresponding to the initial face matrix is calculated, in order to subsequently perform central split clustering based on the initial feature average vector . Based on the initial feature average vector, the clustering center is obtained and the initial face matrix is clustered, so as to realize the rapid uniform division of the face feature vector and obtain the first cluster cluster. If the number of face feature vectors of the first cluster is less than the second number threshold, the first cluster is determined as the original cluster to ensure that the number of face feature vectors in the obtained original cluster is appropriate , To avoid dividing the face feature vector of a user into too many original clusters.
在一实施例中,如图4所示,步骤S302,基于初始特征平均向量对初始人脸矩阵进行聚类,获取第一聚类类簇,包括:In one embodiment, as shown in FIG. 4, step S302, clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster includes:
S401:计算初始人脸矩阵中每一人脸特征向量与初始特征平均向量的初始相似度,对初始相似度进行排序,获取排序结果。S401: Calculate the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sort the initial similarities, and obtain the sorting result.
其中,初始相似度是表示初始人脸矩阵中每一人脸特征向量与初始特征平均向量相似程度的值,可以理解地,初始人脸矩阵中一个人脸特征向量对应一个初始相似度。排序结果是指对初始相似度按照由大到小的顺序或者按照由小到大的顺序进行排序的结果。Wherein, the initial similarity is a value indicating the degree of similarity between each face feature vector in the initial face matrix and the initial feature average vector. Understandably, one face feature vector in the initial face matrix corresponds to an initial similarity. The sorting result refers to the result of sorting the initial similarity in descending order or descending order.
具体地,服务器在获取到初始特征平均向量后,计算初始人脸矩阵中每一人脸特征向量与初始特征平均向量的初始相似度。具体可以采用numpy库的dot函数,或者采用余弦相似度计算公式,以计算每一人脸特征向量与特征平均向量的初始相似度,然后将初始相似度由小到大的顺序或者由大到小的顺序进行排序,获取该排序结果,以便后续根据排序结果快速获取用于进行聚类的聚类中心。Specifically, after obtaining the initial feature average vector, the server calculates the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector. Specifically, the dot function of the numpy library or the cosine similarity calculation formula can be used to calculate the initial similarity between each face feature vector and the feature average vector, and then the initial similarity is in the order of small to large or large to small The order is sorted, and the sorting result is obtained, so that the cluster center for clustering can be quickly obtained according to the sorting result.
S402:从排序结果间隔抽取z个初始相似度,从初始人脸矩阵中抽取与z个初始相似度相对应的z个聚类中心。S402: Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix.
具体地,从排序结果中,按照预设间隔快速从排序结果中获取z(z为正整数)个初始相似度。作为一示例,在100个初始相似度所形成的排序结果中,抽取z个初始相似度,预设间隔可以5,则从排序结果间隔抽取z个初始相似度,例如,抽取排序结果的第5个初始相似度、排序结果的第10个初始相似度、排序结果的第15个初始相似度……排序结果的第5z个初始相似度,依据抽取的初始相似度确定相对应的人脸特征向量作为聚类中心进行聚类,以加快聚类速度。传统的聚类算法随机选取类中心进行聚类,而本实施例根据相似度确定z个聚类中心可以保证对人脸特征向量均匀划分,可加快聚类速度,并提高聚类精准度,降低同一个用户分到不同聚类类簇的概率。Specifically, from the sorting result, z (z is a positive integer) initial similarities are quickly obtained from the sorting result at a preset interval. As an example, in the ranking result formed by 100 initial similarities, z initial similarities are extracted, and the preset interval can be 5. Then z initial similarities are extracted from the ranking result interval, for example, the fifth of the ranking result is extracted Initial similarity, the 10th initial similarity of the sorting result, the 15th initial similarity of the sorting result...the 5zth initial similarity of the sorting result, the corresponding facial feature vector is determined according to the extracted initial similarity Perform clustering as a clustering center to speed up the clustering speed. The traditional clustering algorithm randomly selects cluster centers for clustering, and the determination of z cluster centers according to the similarity in this embodiment can ensure that the face feature vectors are evenly divided, which can speed up the clustering speed, improve the clustering accuracy, and reduce The probability that the same user is classified into different clusters.
S403:依据z个聚类中心对初始人脸矩阵中的所有人脸特征向量进行聚类,生成第一聚类类簇。S403: Cluster all face feature vectors in the initial face matrix according to z cluster centers to generate a first cluster cluster.
具体地,依据z个聚类中心对所有人脸特征向量进行聚类,以生成z个第一聚类类簇,可以理解地,由于聚类中心是根据初始人脸矩阵中每一人脸特征向量与初始特征平均向量的相似度抽取的,则可以确保将人脸特征向量均匀划分,以生成第一聚类类簇。Specifically, all face feature vectors are clustered according to z cluster centers to generate z first cluster clusters. Understandably, since the cluster centers are based on each face feature vector in the initial face matrix The similarity extraction with the initial feature average vector can ensure that the face feature vector is evenly divided to generate the first cluster cluster.
本实施例所提供的人脸聚类方法中,计算初始人脸矩阵中每一人脸特征向量与初始特征平均向量的初始相似度,对初始相似度进行排序,获取排序结果,以便后续根据初始相似度获取聚类中心。从排序结果间隔抽取z个初始相似度,从初始人脸矩阵中抽取与z个初始相似度相对应的z个聚类中心,可以保证对人脸特征向量均匀划分,降低同一个用户分到不同聚类分区的概率,依据z个聚类中心对初始人脸矩阵中的所有人脸特征向量进行聚类,确保可将人脸特征向量均匀划分以生成第一聚类类簇。In the face clustering method provided in this embodiment, the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector is calculated, the initial similarity is sorted, and the sorting result is obtained, so as to follow the initial similarity Degree to obtain the cluster center. Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to z initial similarities from the initial face matrix, which can ensure that the face feature vectors are evenly divided and reduce the classification of the same user to different The probability of clustering partitions is to cluster all face feature vectors in the initial face matrix according to z clustering centers to ensure that the face feature vectors can be evenly divided to generate the first cluster cluster.
在一实施例中,如图5所示,步骤S203,即对原始聚类类簇进行类间连通聚类,获取连通聚类类簇,包括:In an embodiment, as shown in FIG. 5, step S203, that is, performing inter-class connected clustering on the original cluster clusters to obtain connected clusters includes:
S501:计算每一原始聚类类簇对应的第一特征平均向量,采用相似度算法对第一特征平均向量和原始聚类类簇中任意一个人脸特征向量进行计算,确定第一特征平均向量和原始聚类类簇中任意一个人脸特征向量的第一相似度。S501: Calculate the first feature average vector corresponding to each original cluster cluster, and use the similarity algorithm to calculate the first feature average vector and any face feature vector in the original cluster cluster to determine the first feature average vector The first degree of similarity with any face feature vector in the original cluster.
其中,第一特征平均向量是原始聚类类簇中人脸特征向量的平均向量,其计算过程与步骤S202中的初始特征平均向量的计算过程相同,为避免重复,此处不一一赘述。相似度算法是用于计算变量或者空间中点的相似度的算法,本实施例中相似度算法包括但不限 于余弦相似度算法和欧式距离算法等。第一相似度是表示原始聚类类簇中任意两个人脸特征向量的相似程度的值。Wherein, the first feature average vector is the average vector of the face feature vectors in the original cluster cluster, and the calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, it will not be repeated here. The similarity algorithm is an algorithm for calculating the similarity of variables or points in space. The similarity algorithm in this embodiment includes but is not limited to the cosine similarity algorithm and the Euclidean distance algorithm. The first similarity is a value indicating the similarity of any two face feature vectors in the original cluster.
具体地,原始聚类类簇是由中心分裂聚类划分得到的,没有对人脸特征向量进行精确划分,则一个原始聚类类簇中可能存在不属于同一个用户的人脸特征向量,因此,计算每一原始聚类类簇对应的第一特征平均向量,采用相似度算法对原始聚类类簇中的人脸特征向量和第一特征平均向量进行相似度计算,以便利用第一特征平均向量判断原始聚类类簇中是否存在不属于同一个用户的人脸特征向量。Specifically, the original cluster clusters are divided by the central split clustering. If the face feature vector is not accurately divided, there may be face feature vectors that do not belong to the same user in an original cluster cluster. Therefore, , Calculate the first feature average vector corresponding to each original cluster cluster, and use the similarity algorithm to calculate the similarity between the face feature vector and the first feature average vector in the original cluster cluster, so as to use the first feature average The vector determines whether there are facial feature vectors that do not belong to the same user in the original clusters.
S502:将原始聚类类簇中第一相似度大于第一连通聚类阈值的人脸特征向量连通为第二聚类类簇。S502: Connect the face feature vectors with the first similarity greater than the first connected cluster threshold in the original cluster clusters into a second cluster cluster.
其中,第一连通聚类阈值是预先设置的用于判断任意原始聚类类簇中的人脸特征向量是否属于同一个用户的值。Wherein, the first connected cluster threshold is a preset value used to determine whether the face feature vectors in any original cluster cluster belong to the same user.
具体地,原始聚类类簇中任意两个人脸特征向量的第一相似度大于第一连通聚类阈值,这两个人脸特征向量的相似度较高,极有可能为同一个用户的人脸特征向量,因此,将第一相似度大于第一连通聚类阈值的人脸特征向量连通为第一聚类类簇,以排除不属于同一个用户的人脸特征向量,为将属于同一个用户的人脸特征向量聚类在一起提供技术支持。例如,原始聚类类簇中包括人脸特征向量a、b、c、d、e和f,但原始聚类类簇中可能存在不属于同一个用户的人脸特征向量,人脸特征向量a和第一特征平均向量的第一相似度为0.89、人脸特征向量b和第一特征平均向量的第一相似度为0.88、人脸特征向量c和第一特征平均向量的第一相似度为0.95、人脸特征向量d和第一特征平均向量的第一相似度为0.75、人脸特征向量e和第一特征平均向量的第一相似度为0.53、人脸特征向量f和第一特征平均向量的第一相似度为0.85,第一连通聚类阈值为0.7,则将e删除,将a、b、c、d和f进行连通聚类,获取第二聚类类簇。Specifically, the first similarity of any two face feature vectors in the original cluster cluster is greater than the first connected clustering threshold. The two face feature vectors have a high similarity, and they are most likely to be the face of the same user. Feature vector. Therefore, the face feature vector whose first similarity is greater than the first connected clustering threshold is connected to the first cluster cluster to exclude the face feature vector that does not belong to the same user. Clustering of the face feature vectors together provides technical support. For example, the original cluster cluster includes face feature vectors a, b, c, d, e, and f, but there may be face feature vectors that do not belong to the same user in the original cluster cluster. Face feature vector a The first similarity with the first feature average vector is 0.89, the first similarity between the face feature vector b and the first feature average vector is 0.88, and the first similarity between the face feature vector c and the first feature average vector is 0.95, the first similarity between the face feature vector d and the first feature average vector is 0.75, the first similarity between the face feature vector e and the first feature average vector is 0.53, the face feature vector f and the first feature average The first similarity of the vector is 0.85, and the first connected clustering threshold is 0.7, then e is deleted, a, b, c, d, and f are connected to clusters, and the second cluster is obtained.
S503:对所有第二聚类类簇进行类间聚类,获取连通聚类类簇。S503: Perform inter-class clustering on all second clusters to obtain connected clusters.
具体地,将原始聚类类簇中属于同一个用户的人脸特征向量聚类中一个第二聚类类簇之后,还需要判断任意两个第二聚类类簇中是否存在属于同一个用户的人脸特征向量,因此,对所有第二聚类类簇进行类间聚类,以将属于一个用户的人脸特征向量聚拢在一起,可以有效避免属于一个用户的人脸特征被划分在不同的类簇中,提高人脸聚类的准确度。Specifically, after clustering the face feature vectors belonging to the same user in the original cluster clusters to a second cluster cluster, it is also necessary to determine whether any two second cluster clusters belong to the same user Therefore, inter-class clustering is performed on all the second cluster clusters to gather the facial feature vectors belonging to a user together, which can effectively prevent the facial features belonging to a user from being divided into different In the clusters, improve the accuracy of face clustering.
本实施例所提供的人脸聚类方法中,计算每一原始聚类类簇对应的第一特征平均向量,采用相似度算法对第一特征平均向量和对应的原始聚类类簇中任意一个人脸特征向量进行计算,确定第一特征平均向量和对应的原始聚类类簇中任意一个人脸特征向量的第一相似度,以将原始聚类类簇中第一相似度大于第一连通聚类阈值的人脸特征向量连通为第二聚类类簇,以使第二聚类类簇可排除不属于同一个用户的人脸特征向量,为将属于同一个用户的人脸特征向量聚类在一起提供技术支持。对所有第二聚类类簇进行类间聚类,获取连通聚类类簇,以将属于同一个用户的人脸特征向量聚类在一起,提高人脸聚类的精确度。In the face clustering method provided in this embodiment, the first feature average vector corresponding to each original cluster cluster is calculated, and the similarity algorithm is used to compare any one of the first feature average vector and the corresponding original cluster cluster. The face feature vector is calculated, and the first similarity between the first feature average vector and any face feature vector in the corresponding original cluster cluster is determined, so that the first similarity in the original cluster cluster is greater than the first connectedness The face feature vectors of the clustering threshold are connected to the second cluster cluster, so that the second cluster cluster can exclude the face feature vectors that do not belong to the same user, so that the face feature vectors belonging to the same user can be clustered. Together to provide technical support. Perform inter-class clustering on all the second cluster clusters to obtain connected cluster clusters, so as to cluster the face feature vectors belonging to the same user together to improve the accuracy of face clustering.
在一实施例中,如图6所示,步骤S503,对所有第二聚类类簇进行类间聚类,获取连通聚类类簇,包括:In one embodiment, as shown in FIG. 6, step S503, performing inter-class clustering on all second cluster clusters to obtain connected cluster clusters, includes:
S601:计算每一第二聚类类簇的第二特征平均向量,基于第二特征平均向量确定任意两个第二聚类类簇对应的第二相似度。S601: Calculate the second feature average vector of each second cluster cluster, and determine the second similarity corresponding to any two second cluster clusters based on the second feature average vector.
其中,第二特征平均向量是第二聚类类簇中人脸特征向量的平均向量,其计算过程与步骤S202中的初始特征平均向量的计算过程相同,为避免重复,此处不一一赘述。第二相似度是指任意两个第二聚类类簇的相似程度的值。Among them, the second feature average vector is the average vector of the face feature vectors in the second cluster cluster. The calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, we will not repeat them here. . The second degree of similarity refers to the value of the degree of similarity between any two second cluster clusters.
具体地,可以采用matlab的mean函数对第二聚类类簇的所有人脸特征向量进行均值计算,或者可以采用python的numpy库的函数对第二聚类类簇的所有人脸特征向量进行均值计算,以获取第二聚类类簇的第二特征平均向量;再采用相似度算法对任意两个第二 聚类类簇的第二特征平均向量进行相似度计算,以快速获取任意两个第二聚类类簇的第二相似度,以便后续根据第二相似度将同一个用户的人脸特征向量聚拢在同一个聚类类簇中。Specifically, the mean function of matlab can be used to calculate the mean value of all face feature vectors of the second cluster cluster, or the function of the numpy library of python can be used to average all face feature vectors of the second cluster cluster. Calculate to obtain the second feature average vector of the second cluster; then use the similarity algorithm to calculate the similarity of the second feature average vector of any two second clusters to quickly obtain any two second feature average vectors. The second similarity of the two clusters, so that the facial feature vectors of the same user are gathered in the same cluster according to the second similarity.
S602:若第二相似度大于第二连通聚类阈值,则将两个第二聚类类簇合并为连通聚类类簇。S602: If the second degree of similarity is greater than the second connected cluster threshold, merge two second cluster clusters into connected cluster clusters.
其中,第二连通聚类阈值是指用于判断任意两个第二聚类类簇是否属于同一个用户的值。The second connected cluster threshold refers to a value used to determine whether any two second cluster clusters belong to the same user.
具体地,通过计算每一第二聚类类簇的第二特征平均向量,然后根据该类平均值计算任意两个第二聚类类簇的第二相似度,第二相似度大于第二连通聚类阈值时,则说明两个第二聚类类簇的相似度较大,属于同一个用户,因此,将第二相似度大于第二连通聚类阈值的类簇聚拢至同一连通聚类类簇中,以提高聚类精确度,避免一个用户的人脸特征向量被划分为不同的类簇。可以理解地,第二相似度小于第二连通聚类阈值时,则说明两个第二聚类类簇的相似度较小,不属于同一个用户,则不进行聚类。Specifically, by calculating the second feature average vector of each second cluster, and then calculating the second similarity of any two second clusters based on the average of the second cluster, the second similarity is greater than the second connected When the clustering threshold is used, it means that the two second cluster clusters have greater similarity and belong to the same user. Therefore, clusters with the second similarity greater than the second connected cluster threshold are gathered into the same connected cluster. In the cluster, to improve the accuracy of clustering and avoid the face feature vector of a user from being divided into different clusters. Understandably, when the second similarity is less than the second connected cluster threshold, it means that the similarity of the two second cluster clusters is small, and if they do not belong to the same user, clustering is not performed.
本实施例所提供的人脸聚类方法中,计算每一第二聚类类簇的第二特征平均向量,基于第二特征平均向量确定任意两个第二聚类类簇对应的第二相似度,将第二相似度大于第二连通聚类阈值的两个第二聚类类簇合并为连通聚类类簇,将属于同一个用户的人脸特征向量聚类成连通聚类类簇,以提高人脸聚类精确度。In the face clustering method provided in this embodiment, the second feature average vector of each second cluster cluster is calculated, and the second similarity corresponding to any two second cluster clusters is determined based on the second feature average vector Degree, the two second cluster clusters with the second similarity greater than the second connected cluster threshold are merged into connected cluster clusters, and the face feature vectors belonging to the same user are clustered into connected cluster clusters, To improve the accuracy of face clustering.
在一实施例中,如图7所示,步骤S204,对连通聚类类簇进行类内修剪,获取目标聚类类簇,包括:In an embodiment, as shown in FIG. 7, step S204, performing intra-class pruning on the connected cluster clusters to obtain the target cluster cluster includes:
S701:对连通聚类类簇进行平均值计算,获取每一连通聚类类簇对应的第三特征平均向量。S701: Perform average calculation on connected clusters, and obtain a third feature average vector corresponding to each connected cluster.
其中,第三特征平均向量是指连通聚类类簇中的人脸特征向量的平均向量,其计算过程与步骤S202中的初始特征平均向量的计算过程相同,为避免重复,此处不一一赘述。Among them, the third feature average vector refers to the average vector of the face feature vectors in the connected clusters. The calculation process is the same as the calculation process of the initial feature average vector in step S202. To avoid repetition, it is not one-to-one here. Go into details.
具体地,由于通过任意两个第二聚类类簇的第二特征平均向量进行聚类生成连通聚类类簇,但两个第二聚类类簇的第二特征平均向量并不相等,那么聚类生成的连通聚类类簇中可能存在相似度差别较大的任意两个人脸特征向量,因此,为了确保聚类的精确度,还需采用python的numpy库的mean函数对连通聚类类簇的人脸特征向量进行平均值计算,以快速获取每一连通聚类类簇对应的第三特征平均向量,以便后续根据第三特征平均向量排除相似度差别较大的人脸特征向量。Specifically, since clustering is performed by using the second feature average vector of any two second cluster clusters to generate connected cluster clusters, but the second feature average vectors of the two second cluster clusters are not equal, then There may be any two face feature vectors with large differences in similarity in the connected cluster clusters generated by clustering. Therefore, in order to ensure the accuracy of clustering, the mean function of the python numpy library needs to be used to compare the connected clusters. The average value of the face feature vectors of the clusters is calculated to quickly obtain the third feature average vector corresponding to each connected cluster cluster, so as to subsequently exclude face feature vectors with large differences in similarity based on the third feature average vector.
S702:计算连通聚类类簇中任意一个人脸特征向量与第三特征平均向量之间的第三相似度。S702: Calculate the third similarity between any face feature vector in the connected cluster cluster and the third feature average vector.
其中,第三相似度是指连通聚类类簇中人脸特征向量与第三特征平均向量的相似程度的值。Among them, the third similarity refers to the value of the degree of similarity between the face feature vector and the third feature average vector in the connected cluster clusters.
具体地,采用余弦相似度计算公式计算连通聚类类簇中任意一个人脸特征向量与第三特征平均向量之间的中心相似度,后续根据第三特征平均向量排除不属于同一个用户的人脸特征向量,以提高聚类精确度。Specifically, the cosine similarity calculation formula is used to calculate the center similarity between any face feature vector in the connected cluster cluster and the third feature average vector, and then people who do not belong to the same user are excluded according to the third feature average vector Face feature vectors to improve the accuracy of clustering.
S703:对连通聚类类簇中第三相似度小于第三连通聚类阈值的人脸特征向量进行裁剪,获取目标聚类类簇。S703: Cut the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters.
其中,第三连通聚类阈值是预先设置用于判断连通聚类类簇中是否存在不属于连通聚类类簇所代表的用户的人脸特征向量。Wherein, the third connected cluster threshold is preset to determine whether there is a face feature vector of a user who does not belong to the connected cluster cluster in the connected cluster cluster.
具体地,当连通聚类类簇中的人脸特征向量与第三特征平均向量的第三相似度小于第三连通聚类阈值时,则将该人脸特征向量删除,以排除连通聚类类簇中不属于该连通聚类类簇所代表的用户的人脸特征向量,获取目标聚类类簇,即目标聚类类簇是连通聚类类簇中排除不属于该类簇所代表的用户的人脸特征向量,确保类内干净。Specifically, when the third similarity between the face feature vector in the connected cluster cluster and the third feature average vector is less than the third connected cluster threshold, the face feature vector is deleted to exclude the connected cluster class The cluster does not belong to the face feature vector of the user represented by the connected cluster cluster, and obtain the target cluster cluster, that is, the target cluster cluster is the connected cluster cluster to exclude users who do not belong to the cluster represented by the cluster The face feature vector of, to ensure that the class is clean.
本实施例所提供的人脸聚类方法中,对连通聚类类簇进行平均值计算,获取每一连通 聚类类簇对应的第三特征平均向量,计算连通聚类类簇中任意一个人脸特征向量与第三特征平均向量之间的第三相似度,后续根据第三特征平均向量排除不属于该连通聚类类簇的人脸特征向量,以提高聚类精确度。对连通聚类类簇中第三相似度小于第三连通聚类阈值的人脸特征向量进行裁剪,获取目标聚类类簇,以排除连通聚类类簇中不属于该类簇所代表的人的人脸特征向量,确保类内干净。In the face clustering method provided in this embodiment, the average value of the connected clusters is calculated, the third feature average vector corresponding to each connected cluster is obtained, and any person in the connected cluster is calculated. According to the third similarity between the face feature vector and the third feature average vector, the face feature vectors that do not belong to the connected cluster cluster are subsequently excluded according to the third feature average vector, so as to improve the clustering accuracy. Cut the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters to exclude the connected clusters that do not belong to the people represented by the clusters Face feature vector to ensure that the class is clean.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
在一实施例中,提供一种人脸聚类装置,该人脸聚类装置与上述实施例中人脸聚类方法一一对应。如图8所示,该人脸聚类装置包括人脸特征向量获取模块801、原始聚类类簇模块802、连通聚类类簇模块803和目标聚类类簇模块804。各功能模块详细说明如下:In one embodiment, a face clustering device is provided, and the face clustering device corresponds to the face clustering method in the above-mentioned embodiment one-to-one. As shown in FIG. 8, the face clustering device includes a face feature vector acquisition module 801, an original cluster cluster module 802, a connected cluster cluster module 803, and a target cluster cluster module 804. The detailed description of each functional module is as follows:
人脸特征向量获取模块801,用于获取聚类人脸图像,采用特征提取模型对聚类人脸图像进行特征提取,获取人脸特征向量。The face feature vector obtaining module 801 is used to obtain clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to obtain face feature vectors.
原始聚类类簇模块802,用于对人脸特征向量进行中心分裂聚类,获取原始聚类类簇。The original cluster cluster module 802 is used to perform center split clustering on the face feature vector to obtain the original cluster cluster.
连通聚类类簇模块803,用于对原始聚类类簇进行类间连通聚类,获取连通聚类类簇。The connected cluster cluster module 803 is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters.
目标聚类类簇模块804,用于对连通聚类类簇进行类内修剪,获取目标聚类类簇。The target cluster cluster module 804 is used to perform intra-class pruning on the connected cluster clusters to obtain the target cluster cluster.
优选地,在原始聚类类簇模块802之后,人脸聚类装置还包括:GPU处理模块。Preferably, after the original clustering cluster module 802, the face clustering apparatus further includes: a GPU processing module.
GPU处理模块,用于若原始聚类类簇中人脸特征向量的数量大于第一数量阈值,则采用程序接口将原始聚类类簇中的人脸特征向量分配到至少两个GPU进行处理。The GPU processing module is configured to, if the number of face feature vectors in the original cluster cluster is greater than the first number threshold, use a program interface to allocate the face feature vectors in the original cluster cluster to at least two GPUs for processing.
优选地,原始聚类类簇模块802,包括:初始特征平均向量计算单元、第一聚类类簇获取单元和第一判断单元。Preferably, the original cluster cluster module 802 includes: an initial feature average vector calculation unit, a first cluster cluster acquisition unit, and a first judgment unit.
初始特征平均向量计算单元,用于基于人脸特征向量生成初始人脸矩阵,计算初始人脸矩阵对应的初始特征平均向量。The initial feature average vector calculation unit is used to generate an initial face matrix based on the face feature vector, and calculate the initial feature average vector corresponding to the initial face matrix.
第一聚类类簇获取单元,用于基于初始特征平均向量对初始人脸矩阵进行聚类,获取第一聚类类簇。The first cluster cluster acquiring unit is configured to cluster the initial face matrix based on the initial feature average vector to acquire the first cluster cluster.
第一判断单元,用于若第一聚类类簇中人脸特征向量的数量小于第二数量阈值,将第一聚类类簇确定为原始聚类类簇。The first judging unit is configured to determine the first cluster cluster as the original cluster cluster if the number of face feature vectors in the first cluster cluster is less than the second number threshold.
优选地,第一聚类类簇获取单元,包括:排序结果获取子单元、聚类中心获取子单元和第一聚类类簇获取子单元。Preferably, the first cluster cluster acquiring unit includes: a sorting result acquiring subunit, a cluster center acquiring subunit, and a first cluster cluster acquiring subunit.
排序结果获取子单元,用于计算初始人脸矩阵中每一人脸特征向量与初始特征平均向量的初始相似度,对初始相似度进行排序,获取排序结果。The sorting result obtaining subunit is used to calculate the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sort the initial similarity, and obtain the sorting result.
聚类中心获取子单元,用于从排序结果间隔抽取z个初始相似度,从初始人脸矩阵中抽取与z个初始相似度相对应的z个聚类中心。The cluster center acquisition subunit is used to extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix.
第一聚类类簇获取子单元,用于依据z个聚类中心对初始人脸矩阵中的所有人脸特征向量进行聚类,生成第一聚类类簇。The first cluster cluster acquisition subunit is used to cluster all face feature vectors in the initial face matrix according to z cluster centers to generate the first cluster cluster.
优选地,连通聚类类簇模块803,包括:第一相似度计算单元、连通单元和类间聚类单元。Preferably, the connected clustering cluster module 803 includes: a first similarity calculation unit, a connected unit, and an inter-cluster clustering unit.
第一相似度计算单元,计算每一原始聚类类簇对应的第一特征平均向量,采用相似度算法对第一特征平均向量和对应的原始聚类类簇中任意一个人脸特征向量进行计算,确定第一特征平均向量和对应的原始聚类类簇中任意一个人脸特征向量的第一相似度。The first similarity calculation unit calculates the first feature average vector corresponding to each original cluster cluster, and uses the similarity algorithm to calculate the first feature average vector and any face feature vector in the corresponding original cluster cluster , Determine the first similarity between the first feature average vector and any face feature vector in the corresponding original cluster cluster.
连通单元,用于将原始聚类类簇中第一相似度大于第一连通聚类阈值的人脸特征向量连通为第二聚类类簇。The connected unit is used to connect the face feature vectors with the first similarity greater than the first connected cluster threshold in the original cluster clusters into the second cluster cluster.
类间聚类单元,用于对所有第二聚类类簇进行类间聚类,获取连通聚类类簇。The inter-class clustering unit is used to perform inter-class clustering on all second clusters to obtain connected clusters.
优选地,类间聚类单元,包括:第二特征平均向量和第二判断单元。Preferably, the inter-class clustering unit includes: a second feature average vector and a second judgment unit.
第二特征平均向量计算子单元,用于计算每一第二聚类类簇的第二特征平均向量,基于第二特征平均向量确定任意两个第二聚类类簇对应的第二相似度。The second feature average vector calculation subunit is used to calculate the second feature average vector of each second cluster cluster, and determine the second similarity corresponding to any two second cluster clusters based on the second feature average vector.
第二判断单元,用于若第二相似度大于第二连通聚类阈值,则将两个第二聚类类簇合并为连通聚类类簇。The second judgment unit is configured to merge the two second cluster clusters into connected cluster clusters if the second similarity is greater than the second connected cluster threshold.
优选地,目标聚类类簇模块804,包括:第三特征平均向量计算单元、第三相似度计算单元和第三判断单元。Preferably, the target cluster cluster module 804 includes: a third feature average vector calculation unit, a third similarity calculation unit, and a third judgment unit.
第三特征平均向量计算单元,用于对连通聚类类簇进行平均值计算,获取每一连通聚类类簇对应的第三特征平均向量。The third feature average vector calculation unit is used to calculate the average value of the connected cluster clusters, and obtain the third feature average vector corresponding to each connected cluster cluster.
第三相似度计算单元,用于计算连通聚类类簇中任意一个人脸特征向量与第三特征平均向量之间的第三相似度。The third similarity calculation unit is used to calculate the third similarity between any face feature vector in the connected cluster cluster and the third feature average vector.
第三判断单元,用于对连通聚类类簇中第三相似度小于第三连通聚类阈值的人脸特征向量进行裁剪,获取目标聚类类簇。The third judging unit is used to crop the face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold to obtain the target clusters.
关于人脸聚类装置的具体限定可以参见上文中对于人脸聚类方法的限定,在此不再赘述。上述人脸聚类装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the face clustering device, please refer to the above limitation on the face clustering method, which will not be repeated here. Each module in the aforementioned face clustering device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储生成的目标聚类类簇。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种人脸聚类方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium. The database of the computer device is used to store the generated target clusters. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a face clustering method. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中人脸聚类方法的步骤,例如图2所示的步骤S201-S204,或者图3至图7中所示的步骤,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现人脸聚类装置这一实施例中的各模块/单元的功能,例如图8所示的人脸特征获取模块801、原始聚类类簇模块802、连通聚类类簇模块803和目标聚类类簇模块804的功能,为避免重复,这里不再赘述。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. When the processor executes the computer-readable instructions, the human The steps of the face clustering method, such as steps S201-S204 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 7, are not repeated here in order to avoid repetition. Alternatively, the processor implements the functions of the modules/units in this embodiment of the face clustering device when the processor executes computer-readable instructions, such as the face feature acquisition module 801, the original clustering cluster module 802, and the face clustering module 802 shown in FIG. The functions of the connected clustering cluster module 803 and the target clustering cluster module 804 are not repeated here in order to avoid repetition.
在一实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,该可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述实施例中人脸聚类方法的步骤,例如图2所示的步骤S201-S204,或者图3至图7中所示的步骤,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现人脸聚类装置这一实施例中的各模块/单元的功能,例如图8所示的人脸特征获取模块801、原始聚类类簇模块802、连通聚类类簇模块803和目标聚类类簇模块804的功能,为避免重复,这里不再赘述。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In an embodiment, one or more readable storage media storing computer readable instructions are provided. The readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor to implement the foregoing implementations. The steps of the face clustering method in the example, such as steps S201-S204 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 7, are not repeated here to avoid repetition. Alternatively, the processor implements the functions of the modules/units in this embodiment of the face clustering device when the processor executes computer-readable instructions, such as the face feature acquisition module 801, the original clustering cluster module 802, and the face clustering module 802 shown in FIG. The functions of the connected clustering cluster module 803 and the target clustering cluster module 804 are not repeated here in order to avoid repetition. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM (SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种人脸聚类方法,其中,包括:A face clustering method, which includes:
    获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;
    对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;Performing center split clustering on the face feature vector to obtain original clustering clusters;
    对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;Perform inter-class connected clustering on the original clusters to obtain connected clusters;
    对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。Perform intra-class pruning on the connected clusters to obtain target clusters.
  2. 如权利要求1所述的人脸聚类方法,其中,在所述对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇之后,所述人脸聚类方法还包括:5. The face clustering method according to claim 1, wherein, after the central split clustering is performed on the face feature vector to obtain the original cluster clusters, the face clustering method further comprises:
    若所述原始聚类类簇中所述人脸特征向量的数量大于第一数量阈值,则采用程序接口将所述原始聚类类簇中的所述人脸特征向量分配到至少两个GPU进行处理。If the number of the face feature vectors in the original cluster is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster to at least two GPUs. deal with.
  3. 如权利要求1所述的人脸聚类方法,其中,所述对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇,包括:The face clustering method according to claim 1, wherein said performing center split clustering on said face feature vector to obtain original cluster clusters comprises:
    基于所述人脸特征向量生成初始人脸矩阵,计算所述初始人脸矩阵对应的初始特征平均向量;Generating an initial face matrix based on the face feature vector, and calculating an initial feature average vector corresponding to the initial face matrix;
    基于所述初始特征平均向量对所述初始人脸矩阵进行聚类,获取第一聚类类簇;Clustering the initial face matrix based on the initial feature average vector to obtain a first cluster cluster;
    若所述第一聚类类簇中人脸特征向量的数量小于第二数量阈值,将所述第一聚类类簇确定为原始聚类类簇。If the number of face feature vectors in the first cluster is less than the second number threshold, the first cluster is determined as the original cluster.
  4. 如权利要求3所述的人脸聚类方法,其中,所述基于所述初始特征平均向量对所述初始人脸矩阵进行聚类,获取第一聚类类簇,包括:The face clustering method according to claim 3, wherein the clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster comprises:
    计算所述初始人脸矩阵中每一人脸特征向量与所述初始特征平均向量的初始相似度,对所述初始相似度进行排序,获取排序结果;Calculating the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sorting the initial similarity, and obtaining a sorting result;
    从所述排序结果间隔抽取z个初始相似度,从所述初始人脸矩阵中抽取与z个初始相似度相对应的z个聚类中心;Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix;
    依据所述z个聚类中心对所述初始人脸矩阵中的所有人脸特征向量进行聚类,生成第一聚类类簇。Clustering all face feature vectors in the initial face matrix according to the z cluster centers to generate a first cluster cluster.
  5. 如权利要求1所述的人脸聚类方法,其中,所述对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇,包括:The face clustering method according to claim 1, wherein the performing inter-class connected clustering on the original clusters to obtain connected clusters includes:
    计算每一所述原始聚类类簇对应的第一特征平均向量,采用相似度算法对所述第一特征平均向量和所述原始聚类类簇中任意一个所述人脸特征向量进行计算,确定所述第一特征平均向量和所述原始聚类类簇中任意一个所述人脸特征向量的第一相似度;Calculate the first feature average vector corresponding to each of the original cluster clusters, and use a similarity algorithm to calculate the first feature average vector and any one of the face feature vectors in the original cluster clusters, Determining the first similarity between the first feature average vector and any one of the face feature vectors in the original cluster clusters;
    将原始聚类类簇中所述第一相似度大于第一连通聚类阈值的人脸特征向量连通为第二聚类类簇;Connecting the face feature vectors with the first similarity greater than the first connected clustering threshold in the original clustering clusters into a second clustering cluster;
    对所有所述第二聚类类簇进行类间聚类,获取连通聚类类簇。Perform inter-class clustering on all the second clusters to obtain connected clusters.
  6. 如权利要求5所述的人脸聚类方法,其中,所述对所有所述第二聚类类簇进行类间聚类,获取连通聚类类簇,包括:8. The face clustering method according to claim 5, wherein said performing inter-class clustering on all said second cluster clusters to obtain connected cluster clusters comprises:
    计算每一所述第二聚类类簇的第二特征平均向量,基于所述第二特征平均向量确定任意两个第二聚类类簇对应的第二相似度;Calculating a second feature average vector of each of the second cluster clusters, and determining the second similarity corresponding to any two second cluster clusters based on the second feature average vector;
    若所述第二相似度大于第二连通聚类阈值,则将两个所述第二聚类类簇合并为连通聚类类簇。If the second degree of similarity is greater than the second connected clustering threshold, the two second clusters are merged into connected clusters.
  7. 如权利要求1所述的人脸聚类方法,其中,所述对所述连通聚类类簇进行类内修剪,获取目标聚类类簇,包括:5. The face clustering method according to claim 1, wherein said performing intra-class pruning on said connected clusters to obtain target clusters comprises:
    对所述连通聚类类簇进行平均值计算,获取每一所述连通聚类类簇对应的第三特征平均向量;Performing average calculation on the connected clusters to obtain a third feature average vector corresponding to each of the connected clusters;
    计算所述连通聚类类簇中任意一个所述人脸特征向量与所述第三特征平均向量之间的第三相似度;Calculating a third degree of similarity between any one of the face feature vectors and the third feature average vector in the connected clusters;
    对连通聚类类簇中所述第三相似度小于第三连通聚类阈值的人脸特征向量进行裁剪,获取目标聚类类簇。The face feature vectors of the connected clusters whose third similarity is less than the third connected cluster threshold are cropped to obtain the target clusters.
  8. 一种人脸聚类装置,其中,包括:A face clustering device, which includes:
    人脸特征向量获取模块,用于获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;The face feature vector acquiring module is used to acquire clustered face images, and use a feature extraction model to perform feature extraction on the clustered face images to acquire face feature vectors;
    原始聚类类簇模块,用于对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;The original clustering cluster module is used to perform center split clustering on the face feature vector to obtain the original clustering cluster;
    连通聚类类簇模块,用于对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;The connected cluster cluster module is used to perform inter-category connected clustering on the original cluster cluster to obtain connected cluster clusters;
    目标聚类类簇模块,用于对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。The target cluster cluster module is used to perform intra-class pruning on the connected cluster cluster to obtain the target cluster cluster.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:
    获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;
    对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;Performing center split clustering on the face feature vector to obtain original clustering clusters;
    对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;Perform inter-class connected clustering on the original clusters to obtain connected clusters;
    对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。Perform intra-class pruning on the connected clusters to obtain target clusters.
  10. 如权利要求9所述的计算机设备,其中,在所述对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein, after the central split clustering is performed on the face feature vector to obtain the original cluster clusters, the processor further implements when the computer-readable instruction is executed The following steps:
    若所述原始聚类类簇中所述人脸特征向量的数量大于第一数量阈值,则采用程序接口将所述原始聚类类簇中的所述人脸特征向量分配到至少两个GPU进行处理。If the number of the face feature vectors in the original cluster is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster to at least two GPUs. deal with.
  11. 如权利要求9所述的计算机设备,其中,所述对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇,包括:9. The computer device according to claim 9, wherein said performing center split clustering on said face feature vector to obtain original clusters comprises:
    基于所述人脸特征向量生成初始人脸矩阵,计算所述初始人脸矩阵对应的初始特征平均向量;Generating an initial face matrix based on the face feature vector, and calculating an initial feature average vector corresponding to the initial face matrix;
    基于所述初始特征平均向量对所述初始人脸矩阵进行聚类,获取第一聚类类簇;Clustering the initial face matrix based on the initial feature average vector to obtain a first cluster cluster;
    若所述第一聚类类簇中人脸特征向量的数量小于第二数量阈值,将所述第一聚类类簇确定为原始聚类类簇。If the number of face feature vectors in the first cluster is less than the second number threshold, the first cluster is determined as the original cluster.
  12. 如权利要求11所述的计算机设备,其中,所述基于所述初始特征平均向量对所述初始人脸矩阵进行聚类,获取第一聚类类簇,包括:11. The computer device of claim 11, wherein the clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster comprises:
    计算所述初始人脸矩阵中每一人脸特征向量与所述初始特征平均向量的初始相似度,对所述初始相似度进行排序,获取排序结果;Calculating the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sorting the initial similarity, and obtaining a sorting result;
    从所述排序结果间隔抽取z个初始相似度,从所述初始人脸矩阵中抽取与z个初始相似度相对应的z个聚类中心;Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix;
    依据所述z个聚类中心对所述初始人脸矩阵中的所有人脸特征向量进行聚类,生成第一聚类类簇。Clustering all face feature vectors in the initial face matrix according to the z cluster centers to generate a first cluster cluster.
  13. 如权利要求10所述的计算机设备,其中,所述对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇,包括:10. The computer device according to claim 10, wherein said performing inter-class connected clustering on said original clusters to obtain connected clusters comprises:
    计算每一所述原始聚类类簇对应的第一特征平均向量,采用相似度算法对所述第一特征平均向量和所述原始聚类类簇中任意一个所述人脸特征向量进行计算,确定所述第一特征平均向量和所述原始聚类类簇中任意一个所述人脸特征向量的第一相似度;Calculate the first feature average vector corresponding to each of the original cluster clusters, and use a similarity algorithm to calculate the first feature average vector and any one of the face feature vectors in the original cluster clusters, Determining the first similarity between the first feature average vector and any one of the face feature vectors in the original cluster clusters;
    将原始聚类类簇中所述第一相似度大于第一连通聚类阈值的人脸特征向量连通为第二聚类类簇;Connecting the face feature vectors with the first similarity greater than the first connected clustering threshold in the original clustering clusters into a second clustering cluster;
    对所有所述第二聚类类簇进行类间聚类,获取连通聚类类簇。Perform inter-class clustering on all the second clusters to obtain connected clusters.
  14. 如权利要求13所述的计算机设备,其中,所述对所有所述第二聚类类簇进行类 间聚类,获取连通聚类类簇,包括:The computer device according to claim 13, wherein said performing inter-class clustering on all said second clustering clusters to obtain connected clustering clusters comprises:
    计算每一所述第二聚类类簇的第二特征平均向量,基于所述第二特征平均向量确定任意两个第二聚类类簇对应的第二相似度;Calculating a second feature average vector of each of the second cluster clusters, and determining the second similarity corresponding to any two second cluster clusters based on the second feature average vector;
    若所述第二相似度大于第二连通聚类阈值,则将两个所述第二聚类类簇合并为连通聚类类簇。If the second degree of similarity is greater than the second connected clustering threshold, the two second clusters are merged into connected clusters.
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取聚类人脸图像,采用特征提取模型对所述聚类人脸图像进行特征提取,获取人脸特征向量;Acquiring a clustered face image, and using a feature extraction model to perform feature extraction on the clustered face image to acquire a face feature vector;
    对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇;Performing center split clustering on the face feature vector to obtain original clustering clusters;
    对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇;Perform inter-class connected clustering on the original clusters to obtain connected clusters;
    对所述连通聚类类簇进行类内修剪,获取目标聚类类簇。Perform intra-class pruning on the connected clusters to obtain target clusters.
  16. 如权利要求15所述的可读存储介质,其中,在所述对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 15, wherein, after the central split clustering is performed on the face feature vector to obtain the original cluster clusters, the computer readable instructions are processed by one or more When the processor executes, the one or more processors further execute the following steps:
    若所述原始聚类类簇中所述人脸特征向量的数量大于第一数量阈值,则采用程序接口将所述原始聚类类簇中的所述人脸特征向量分配到至少两个GPU进行处理。If the number of the face feature vectors in the original cluster is greater than the first number threshold, a program interface is used to allocate the face feature vectors in the original cluster to at least two GPUs. deal with.
  17. 如权利要求15所述的可读存储介质,其中,所述对所述人脸特征向量进行中心分裂聚类,获取原始聚类类簇,包括:15. The readable storage medium according to claim 15, wherein said performing central split clustering on said face feature vector to obtain original clusters comprises:
    基于所述人脸特征向量生成初始人脸矩阵,计算所述初始人脸矩阵对应的初始特征平均向量;Generating an initial face matrix based on the face feature vector, and calculating an initial feature average vector corresponding to the initial face matrix;
    基于所述初始特征平均向量对所述初始人脸矩阵进行聚类,获取第一聚类类簇;Clustering the initial face matrix based on the initial feature average vector to obtain a first cluster cluster;
    若所述第一聚类类簇中人脸特征向量的数量小于第二数量阈值,将所述第一聚类类簇确定为原始聚类类簇。If the number of face feature vectors in the first cluster is less than the second number threshold, the first cluster is determined as the original cluster.
  18. 如权利要求17所述的可读存储介质,其中,所述基于所述初始特征平均向量对所述初始人脸矩阵进行聚类,获取第一聚类类簇,包括:17. The readable storage medium of claim 17, wherein the clustering the initial face matrix based on the initial feature average vector to obtain the first cluster cluster comprises:
    计算所述初始人脸矩阵中每一人脸特征向量与所述初始特征平均向量的初始相似度,对所述初始相似度进行排序,获取排序结果;Calculating the initial similarity between each face feature vector in the initial face matrix and the initial feature average vector, sorting the initial similarity, and obtaining a sorting result;
    从所述排序结果间隔抽取z个初始相似度,从所述初始人脸矩阵中抽取与z个初始相似度相对应的z个聚类中心;Extract z initial similarities from the sorting result interval, and extract z cluster centers corresponding to the z initial similarities from the initial face matrix;
    依据所述z个聚类中心对所述初始人脸矩阵中的所有人脸特征向量进行聚类,生成第一聚类类簇。Clustering all face feature vectors in the initial face matrix according to the z cluster centers to generate a first cluster cluster.
  19. 如权利要求15所述的可读存储介质,其中,所述对所述原始聚类类簇进行类间连通聚类,获取连通聚类类簇,包括:15. The readable storage medium according to claim 15, wherein said performing inter-class connected clustering on the original clusters to obtain connected clusters comprises:
    计算每一所述原始聚类类簇对应的第一特征平均向量,采用相似度算法对所述第一特征平均向量和所述原始聚类类簇中任意一个所述人脸特征向量进行计算,确定所述第一特征平均向量和所述原始聚类类簇中任意一个所述人脸特征向量的第一相似度;Calculate the first feature average vector corresponding to each of the original clusters, and use a similarity algorithm to calculate the first feature average vector and any one of the face feature vectors in the original clusters, Determining the first similarity between the first feature average vector and any one of the face feature vectors in the original cluster cluster;
    将原始聚类类簇中所述第一相似度大于第一连通聚类阈值的人脸特征向量连通为第二聚类类簇;Connecting the face feature vectors with the first similarity greater than the first connected clustering threshold in the original clustering clusters into a second clustering cluster;
    对所有所述第二聚类类簇进行类间聚类,获取连通聚类类簇。Perform inter-class clustering on all the second clusters to obtain connected clusters.
  20. 如权利要求19所述的可读存储介质,其中,所述对所有所述第二聚类类簇进行类间聚类,获取连通聚类类簇,包括:19. The readable storage medium of claim 19, wherein said performing inter-class clustering on all said second cluster clusters to obtain connected cluster clusters comprises:
    计算每一所述第二聚类类簇的第二特征平均向量,基于所述第二特征平均向量确定任意两个第二聚类类簇对应的第二相似度;Calculating a second feature average vector of each of the second cluster clusters, and determining the second similarity corresponding to any two second cluster clusters based on the second feature average vector;
    若所述第二相似度大于第二连通聚类阈值,则将两个所述第二聚类类簇合并为连通聚 类类簇。If the second degree of similarity is greater than the second connected clustering threshold, the two second clusters are merged into connected clusters.
PCT/CN2020/093348 2019-10-29 2020-05-29 Human face clustering method and apparatus, computer device, and storage medium WO2021082426A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911037526.6 2019-10-29
CN201911037526.6A CN110889433B (en) 2019-10-29 Face clustering method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021082426A1 true WO2021082426A1 (en) 2021-05-06

Family

ID=69746536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093348 WO2021082426A1 (en) 2019-10-29 2020-05-29 Human face clustering method and apparatus, computer device, and storage medium

Country Status (1)

Country Link
WO (1) WO2021082426A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114373212A (en) * 2022-01-10 2022-04-19 中国民航信息网络股份有限公司 Face recognition model construction method, face recognition method and related equipment
CN114612967A (en) * 2022-03-03 2022-06-10 北京百度网讯科技有限公司 Face clustering method, device, equipment and storage medium
CN114707617A (en) * 2022-05-31 2022-07-05 每日互动股份有限公司 Data processing system for acquiring pkg cluster
CN114926879A (en) * 2022-05-19 2022-08-19 西北工业大学 Face image clustering method based on distance map optimization
CN115880745A (en) * 2022-09-07 2023-03-31 以萨技术股份有限公司 Data processing system for acquiring human face image characteristics
CN116403080A (en) * 2023-06-09 2023-07-07 江西云眼视界科技股份有限公司 Face clustering evaluation method, system, computer and readable storage medium
CN117576493A (en) * 2024-01-16 2024-02-20 武汉明炀大数据科技有限公司 Cloud storage compression method and system for large sample data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160307068A1 (en) * 2015-04-15 2016-10-20 Stmicroelectronics S.R.L. Method of clustering digital images, corresponding system, apparatus and computer program product
CN109766754A (en) * 2018-12-04 2019-05-17 平安科技(深圳)有限公司 Human face five-sense-organ clustering method, device, computer equipment and storage medium
CN109815788A (en) * 2018-12-11 2019-05-28 平安科技(深圳)有限公司 A kind of picture clustering method, device, storage medium and terminal device
CN110889433A (en) * 2019-10-29 2020-03-17 平安科技(深圳)有限公司 Face clustering method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160307068A1 (en) * 2015-04-15 2016-10-20 Stmicroelectronics S.R.L. Method of clustering digital images, corresponding system, apparatus and computer program product
CN109766754A (en) * 2018-12-04 2019-05-17 平安科技(深圳)有限公司 Human face five-sense-organ clustering method, device, computer equipment and storage medium
CN109815788A (en) * 2018-12-11 2019-05-28 平安科技(深圳)有限公司 A kind of picture clustering method, device, storage medium and terminal device
CN110889433A (en) * 2019-10-29 2020-03-17 平安科技(深圳)有限公司 Face clustering method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI MINGDA: "Research on Cluster Algorithm of Similarity Segmentation Based on Point Sorting", CHINESE MASTER’S THESES FULL-TEXT DATABASE, no. 8, 1 April 2015 (2015-04-01), pages 1 - 54, XP055810076, ISSN: 1674-0246 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114373212A (en) * 2022-01-10 2022-04-19 中国民航信息网络股份有限公司 Face recognition model construction method, face recognition method and related equipment
CN114612967A (en) * 2022-03-03 2022-06-10 北京百度网讯科技有限公司 Face clustering method, device, equipment and storage medium
CN114926879A (en) * 2022-05-19 2022-08-19 西北工业大学 Face image clustering method based on distance map optimization
CN114926879B (en) * 2022-05-19 2024-03-05 西北工业大学 Face image clustering method based on distance map optimization
CN114707617A (en) * 2022-05-31 2022-07-05 每日互动股份有限公司 Data processing system for acquiring pkg cluster
CN114707617B (en) * 2022-05-31 2022-08-26 每日互动股份有限公司 Data processing system for acquiring pkg cluster
CN115880745A (en) * 2022-09-07 2023-03-31 以萨技术股份有限公司 Data processing system for acquiring human face image characteristics
CN116403080A (en) * 2023-06-09 2023-07-07 江西云眼视界科技股份有限公司 Face clustering evaluation method, system, computer and readable storage medium
CN116403080B (en) * 2023-06-09 2023-08-11 江西云眼视界科技股份有限公司 Face clustering evaluation method, system, computer and readable storage medium
CN117576493A (en) * 2024-01-16 2024-02-20 武汉明炀大数据科技有限公司 Cloud storage compression method and system for large sample data
CN117576493B (en) * 2024-01-16 2024-04-02 武汉明炀大数据科技有限公司 Cloud storage compression method and system for large sample data

Also Published As

Publication number Publication date
CN110889433A (en) 2020-03-17

Similar Documents

Publication Publication Date Title
WO2021082426A1 (en) Human face clustering method and apparatus, computer device, and storage medium
US11348249B2 (en) Training method for image semantic segmentation model and server
US11537884B2 (en) Machine learning model training method and device, and expression image classification method and device
Luo et al. Real-world image datasets for federated learning
CN109241903B (en) Sample data cleaning method, device, computer equipment and storage medium
WO2020015075A1 (en) Facial image comparison method and apparatus, computer device, and storage medium
WO2020015076A1 (en) Facial image comparison method and apparatus, computer device, and storage medium
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN107798354B (en) Image clustering method and device based on face image and storage equipment
CN111340237A (en) Data processing and model operation method, device and computer equipment
US11126827B2 (en) Method and system for image identification
CN109271917B (en) Face recognition method and device, computer equipment and readable storage medium
CN111667001B (en) Target re-identification method, device, computer equipment and storage medium
CN108509994B (en) Method and device for clustering character images
CN110598638A (en) Model training method, face gender prediction method, device and storage medium
CN110378249B (en) Text image inclination angle recognition method, device and equipment
CN110705489B (en) Training method and device for target recognition network, computer equipment and storage medium
US20210089825A1 (en) Systems and methods for cleaning data
CN112001932A (en) Face recognition method and device, computer equipment and storage medium
CN112434556A (en) Pet nose print recognition method and device, computer equipment and storage medium
CN113505797B (en) Model training method and device, computer equipment and storage medium
WO2021068524A1 (en) Image matching method and apparatus, computer device, and storage medium
WO2023173646A1 (en) Expression recognition method and apparatus
CN111832581A (en) Lung feature recognition method and device, computer equipment and storage medium
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883416

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883416

Country of ref document: EP

Kind code of ref document: A1