CN116704577A - Face recognition and clustering method and system - Google Patents

Face recognition and clustering method and system Download PDF

Info

Publication number
CN116704577A
CN116704577A CN202310676420.0A CN202310676420A CN116704577A CN 116704577 A CN116704577 A CN 116704577A CN 202310676420 A CN202310676420 A CN 202310676420A CN 116704577 A CN116704577 A CN 116704577A
Authority
CN
China
Prior art keywords
face
clustering
model
unknown
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310676420.0A
Other languages
Chinese (zh)
Inventor
吴叔義
朱超
许哲浩
刘孟寅
郭秀峰
罗威
侯丽
罗准辰
鲁珂琦
马雨琪
韩梓航
毛宇成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Military Science Information Research Center Of Military Academy Of Chinese Pla
University of Science and Technology Beijing USTB
Original Assignee
Military Science Information Research Center Of Military Academy Of Chinese Pla
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Military Science Information Research Center Of Military Academy Of Chinese Pla, University of Science and Technology Beijing USTB filed Critical Military Science Information Research Center Of Military Academy Of Chinese Pla
Priority to CN202310676420.0A priority Critical patent/CN116704577A/en
Publication of CN116704577A publication Critical patent/CN116704577A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/7635Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks based on graphs, e.g. graph cuts or spectral clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a face recognition and clustering method and a system, wherein the method comprises the following steps: using a face detection model, inputting a picture to be detected, and outputting a rectangular detection frame of the face position in the picture to be detected; after cutting a rectangular detection frame, amplifying and correcting, inputting a face recognition model, outputting a feature vector corresponding to the face image, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is smaller than a set threshold value, the faces belong to unknown faces; inputting an unknown face by using a face clustering model, and aggregating faces with potential identical identities in the unknown face; the face clustering model is an incremental clustering model based on graph connection. The application has the advantages that: the incremental clustering method based on graph connection is used, so that the calculation cost of clustering result updating is reduced; the method adopts a multithreading acceleration and batch clustering mode to achieve the balance of precision and speed.

Description

Face recognition and clustering method and system
Technical Field
The application belongs to the technical fields of artificial intelligence, computer vision, face detection, face recognition, face clustering and incremental clustering, and particularly relates to a face recognition and clustering method and system.
Background
The closest prior art to the present application includes face detection, face recognition, face clustering, and the like. Face detection and recognition are mainly based on deep learning neural networks. The human face detection model is similar to the general target detection model, and is trained by a large-scale human face detection data set, receives picture input and outputs a rectangular detection frame of the human face position in the picture. The specific method is mainly represented by SRCFD and other models, and training methods such as model architecture search and small-scale face data enhancement are mainly adopted, so that model efficiency and robustness are enhanced.
And taking the cut, enlarged and corrected face image as input, outputting a feature vector corresponding to the face image by the face recognition model, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is greater than a certain threshold, considering that the face belongs to the identity with the maximum similarity, otherwise, obtaining the unknown face. The CNN models such as ResNet-100 and the like are generally adopted to be matched with the training methods such as Partial-FC and the like, and the parallel training and Partial category approximation methods of the models are mainly adopted to accelerate the training speed on the large-scale face recognition training data.
In an open source face analysis code library represented by Insight, face detection and recognition can be constructed in two stages in cascade: in the first stage, the face detection result is used for cutting, amplifying and correcting the face image; in the second stage, the face image is input into a face detection model, and the corresponding characteristics of the face image are obtained. However, such open source libraries do not further integrate other modules or methods such as face clustering.
Similar to face recognition, given feature vectors that have been determined to be unknown faces, face clustering methods cluster vectors that may belong to the same person into the same cluster, typically using rule-based clustering methods. Generally, the known face recognition calculates the similarity between the face to be recognized and all the faces with known identities to determine the identities, so that the optimization space is small. However, the unknown face clustering only needs to determine the attribution of the faces to be clustered in each cluster, so that the similarity is not required to be calculated with all the faces, and a certain optimization space is provided.
For the offline clustering method, all unknown face feature vectors are subjected to similarity calculation in pairs, and if a new face appears, the original clustering result is discarded and clustering is performed again. The method is mainly represented by K-Means, and when unknown faces are accumulated to a certain scale, the calculation cost is overlarge. Further, for the online clustering method, the previous clustering result can be saved, the similarity between the new face and the members of each cluster is calculated to determine the home cluster, and the original cluster members do not need to be recalculated.
However, this method is not applicable to such practical cases: the face of an unknown person in the early stage is less and is attributed to other clusters, and the face data volume of the unknown person in the later stage is increased, but the clusters cannot be independent from the original clusters. At the same time, comparisons with all members in the cluster (or randomly selecting K comparisons) are still needed, and if the cluster size is too large, the computational overhead is still large.
The on-line clustering method based on graph connection introduces the concepts of large clusters and small clusters, updates the attributed large clusters based on the connection between the small clusters as graph nodes, and only updates the feature vector of the center of the small clusters so as to reduce the comparison with cluster members. However, the clustering method is not directly used for a face clustering task, is not integrated with a face detection and recognition model, and is a unified face recognition clustering system which can be used for processing video data.
In open source network videos of different scenes, a large number of human faces are included, including faces with known identities and faces with unknown identities, so that the following problems exist:
(1) The angle and the size of the face are various, and the face recognition by the same thunder is difficult;
(2) Along with the accumulation of data, the scale of the unknown face is continuously enlarged, the existing open source library is not combined with a clustering method, unknown face feature vectors possibly belonging to the same identity are clustered to confirm the identity of the cluster, and the known face database is expanded, so that the calculation cost for updating the clustering result each time is very large due to the continuous increase of new data;
(3) Lack of efficient optimization for video processing: because of the limitation of data compression formats such as MP4, each frame is gradually accumulated by a key frame and a differential frame, and in order to detect all faces as far as possible, the frames need to be extracted at fixed time intervals; in the prior art, the hardware performance is not optimized in the hardware level, so that the calculation efficiency is low.
Disclosure of Invention
The application aims to overcome the defects that the computing cost for face clustering is very high and the optimization aiming at video processing is not carried out during computing in the prior art.
In order to achieve the above object, the present application provides a face recognition and clustering method, which includes:
using a face detection model, inputting a picture to be detected, and outputting a rectangular detection frame of the face position in the picture to be detected;
after cutting a rectangular detection frame, amplifying and correcting, inputting a face recognition model, outputting a feature vector corresponding to the face image, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is smaller than a set threshold value, the faces belong to unknown faces;
inputting an unknown face by using a face clustering model, and aggregating faces with potential identical identities in the unknown face;
the face detection model and the face recognition model are trained neural network models;
the face clustering model is an incremental clustering model based on graph connection.
As an improvement of the method, the incremental clustering model based on graph connection is realized by the following steps:
step 1: inputting an unknown face;
step 2: if the number of the current small clusters is less than 1, creating a new small cluster, and taking the feature vector of the unknown face as the center of the small cluster;
if the number of the current small clusters is greater than or equal to 1, calculating the similarity between the unknown face and the centers of all the small clusters, if the similarity is lower than a set threshold value, creating a new small cluster, and taking the feature vector of the unknown face as the center of the small cluster;
if the similarity between the unknown face and the centers of 1 small cluster is higher than a set threshold value and the similarity is greater than the similarity between the unknown face and the centers of other clusters, adding the unknown face into the small cluster;
step 3: the average value of the feature vectors of all faces of the updated small cluster is recalculated and used as the center of the updated cluster;
step 4: calculating the similarity between every two small clusters, increasing the connection between the small clusters when the similarity is larger than a set threshold value, and disconnecting the connection between the small clusters when the similarity is smaller than the set threshold value;
step 5: constructing a graph by taking small clusters as nodes and the connection among the small clusters as edges; calculating the connectivity between nodes, if any two nodes of the current graph can be reached through the connection between the nodes in the current graph, only one connected graph of the current graph is the current graph, otherwise, the graph is divided into a plurality of connected graphs, so that any two nodes between any two connected graphs can not be reached through the connection between any nodes; the small clusters in each connected graph form a large cluster, and belong to the same face identity.
As an improvement of the above method, the calculation method of the connected graph is a depth-first search or a breadth-first search.
As an improvement of the method, the picture to be detected is a picture extracted from video.
As an improvement of the above method, the specific implementation process of the face recognition and clustering method is as follows:
establishing a frame extraction thread, a target detection thread and a data queue;
extracting pictures from the video by using a CPU in the frame extraction thread and putting the pictures into the data queue;
and the face detection model, the face recognition model and the face clustering model acquire pictures from the data queue in the target detection thread by using a picture processing chip, and perform detection, recognition and clustering.
As an improvement of the above method, the frame extracting thread and the object detecting thread are 1 or more threads, respectively.
As an improvement of the method, the picture processing chip is a GPU, NPU or TPU chip.
As an improvement of the method, a plurality of unknown faces are formed into a batch, feature vectors of the unknown face images are calculated in batches, the feature vectors are cached, and when the cache area is full or reaches the set time, face clustering is carried out on all the cached feature vectors.
The application also provides a face recognition and clustering system, which is realized based on the method, and comprises the following steps:
the human face detection module is used for inputting a picture to be detected by using a human face detection model and outputting a rectangular detection frame of the human face position in the picture to be detected; the face detection model is a trained neural network model;
the face recognition module is used for amplifying and correcting the rectangular detection frame after cutting, inputting a face recognition model, outputting a feature vector corresponding to the face image, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is smaller than a set threshold value, the face belongs to an unknown face; the face recognition model is a trained neural network model; and
the face clustering module is used for inputting an unknown face by using a face clustering model and aggregating faces with potential identical identities in the unknown face; the face clustering model is an incremental clustering model based on graph connection.
As an improvement of the above system, the system further comprises:
and the picture acquisition module is used for extracting the picture to be detected from the video.
Compared with the prior art, the application has the advantages that:
1. the incremental clustering method based on graph connection is provided, a small cluster center updating mechanism is used, the calculation cost of clustering result updating is reduced, graph connection between small clusters is updated and increased or decreased, so that large clusters to which the small clusters belong are dynamically updated, and the clustering calculation amount is reduced;
2. based on the video data compression format characteristics and the hardware characteristics of the CPU and the GPU, multi-thread and batch optimization aiming at the video are provided, so that the prediction speed is increased;
3. integrating a face detection model, a face recognition model and a face clustering method into a unified multitask model, and reducing the calculation cost of clustering result updating by using an incremental clustering method based on graph connection; aiming at the characteristics of video data, a multithreading acceleration and batch clustering mode is used to achieve the balance of precision and speed.
Drawings
FIG. 1 is a diagram of an overall system architecture of a face recognition and clustering method;
FIG. 2 is a schematic diagram of a multitasking model of a face recognition and clustering method;
FIG. 3 is a flow chart of incremental clustering based on graph connections;
FIG. 4 is a schematic diagram of multi-threaded optimization for video;
FIG. 5 is a schematic diagram of batch optimization for video.
Detailed Description
The technical scheme of the application is described in detail below with reference to the accompanying drawings.
The face recognition and clustering method and system provided by the application integrate a face detection model, a face recognition model and a face clustering method into a unified multitask model, and reduce the calculation cost of clustering result updating by using an incremental clustering method based on graph connection; aiming at the characteristics of video data, a multithreading acceleration and batch clustering mode is used to achieve the balance of precision and speed.
1. The general system architecture of the technical proposal of the application
As shown in fig. 1, the overall system architecture of the present application includes three modules: (1) the multi-task model integrates a face detection model, a face recognition model and a face clustering method and is used for two tasks of known face recognition and unknown face clustering; (2) the incremental clustering method based on graph connection is used for reducing the cost of updating the clustering result and ensuring that the clustering clusters are continuously updated; (3) optimization for video, including multithreading and batch optimization.
2. Multitasking model
The multi-task model integrates a face clustering method besides two main modules of a face detection model and a face recognition model, and is used for aggregating faces with potential identical identity in unknown faces determined by the face recognition model.
The human face detection model is similar to the general target detection model, and is trained by a large-scale human face detection data set, receives picture input and outputs a rectangular detection frame of the human face position in the picture.
And taking the cut, enlarged and corrected face image as input, outputting a feature vector corresponding to the face image by the face recognition model, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is greater than a certain threshold, considering that the face belongs to the identity with the maximum similarity, otherwise, obtaining the unknown face.
As shown in FIG. 2, the relationship of the three tasks of detection, identification and clustering is connected in front of and behind, and the relationship is provided with the division of the known face and the unknown face. The unknown face refers to: and calculating the similarity between the feature vector and the face feature vectors in all the known face databases, wherein all the similarity is lower than the face with the threshold value. Meanwhile, the three tasks can independently output intermediate results for users to use.
The face detection model and the face recognition model may be neural network models such as ViT or CNN.
3. Incremental clustering based on graph connection
In the practical application scenario, as new video data is continuously increased, the number of unknown faces obtained after face detection and recognition is also continuously increased. With the expansion of the size of the unknown face, the calculation cost of updating the clustering result each time needs to be weighed.
The offline clustering method is directly used, all clustering results need to be recalculated, and the similarity is calculated with all cluster members when the single cluster size becomes large, and the cost is huge. And once the attribution of the cluster is determined, the affiliated cluster cannot be changed, if the faces of some unknown person at the early stage are fewer and are attributed to other clusters, the face data volume of the unknown person at the later stage is increased, and the cluster cannot be independent from the original cluster.
Therefore, the application provides an incremental clustering method based on graph connection, which expands the concept of clusters into small clusters and large clusters. The clusters formed by all faces belonging to the same identity are marked as large clusters, and the clusters formed by part of faces belonging to the same identity are marked as small clusters due to different postures, appearances and the like, wherein one large cluster can comprise one or more small clusters.
As shown in fig. 3, the clustering method comprises the following steps:
step 1: inputting a new unknown face;
step 2: if the number of the current small clusters is less than 1, directly creating a new small cluster, and taking the feature vector of the human face as the center of the small cluster;
if the number of the current small clusters is greater than or equal to 1, judging whether a new small cluster needs to be created or not by calculating the similarity between the feature vector of the face at the current position and the center of each small cluster, so that the calculation times only need the number of the clusters and are not similar to all members of all clusters;
if the similarity between the unknown face and the centers of all the small clusters is lower than a threshold value, directly creating a new small cluster, and taking the feature vector of the face as the center of the small cluster;
if the similarity between the unknown face and the center of a small cluster is higher than a threshold value and the similarity is greater than the similarity between the unknown face and the centers of other clusters, adding the face into the small cluster;
step 3: the average value of the feature vectors of all faces of the updated small cluster is recalculated and used as the center of the updated cluster;
step 4: calculating the similarity between every two small clusters, increasing the connection between the small clusters when the similarity is larger than a set threshold value, and disconnecting the connection between the small clusters when the similarity is smaller than the set threshold value;
step 5: and constructing a graph by taking small clusters as nodes and connecting the small clusters as edges, and calculating the connectivity (whether the nodes are connected) between the nodes, wherein if any two nodes of the current graph can be reached through the connection between the nodes in the current graph, only one connected graph is the current graph, otherwise, the graph is divided into a plurality of connected graphs, so that any two nodes between any two connected graphs can not be reached through the connection between any nodes. The small clusters in each connected graph form a large cluster, and belong to the same face identity.
There are various classical algorithms for obtaining several connected graphs in the current graph, including depth-first, breadth-first searches, etc.
And for similarity calculation between the feature vector and the center of each small cluster and similarity calculation between every two small cluster centers, adopting cosine similarity, euclidean distance or Min Shi distance.
It can be seen that after the concept of big clusters and small clusters is introduced, incremental clustering based on graph connection allows updating of big clusters by taking small clusters as units, and the calculated amount is obviously less than that of the previous few common offline or online clustering methods, so that the high-efficiency balance of precision and speed is realized.
4. Optimization for video
Unlike a common single frame image, video has a special data compression storage format, such as a common MP4 format, and the like. This format does not keep the pictures corresponding to all frames intact, but only stores a few frames of key frames, the rest of the frames are stored in the form of differential frames of the most recent forward key frame, and if a complete picture of this frame is to be obtained, it is necessary to repeatedly superimpose differential frames starting from this key frame until this frame.
Meanwhile, for face detection and recognition in video, all faces should be detected as much as possible, so that only key frames cannot be extracted, and frames must be densely extracted as input images at certain time intervals. Thus, the above-described serial frame extraction is a necessary requirement, and thus takes a certain time. At the hardware level, CPU hardware is mainly responsible for video frame extraction, GPU hardware is responsible for face detection and recognition, and the two types of hardware generally do not interfere with each other when processing data.
Accordingly, the present application proposes a multi-threaded optimization method for video data characteristics, as well as CPU and GPU hardware characteristics. As shown in fig. 4, a frame extraction thread and a detection identification thread (i.e., a main thread) are set up, and a data queue between threads is established as a buffer of input and output data due to the asynchronous production and consumption data speeds of the two threads.
As can be seen from the figure, the object detection on the GPU is performed in parallel during the time interval in which frames are pumped on the CPU each time. In addition, if the frame extraction thread is finished and pictures still exist in the queue, the main process continues target detection until the queue is empty. Compared with the method that target detection is started after all frames are waited to be extracted, the total time required for outputting the result is greatly shortened.
The GPU hardware may also be other non-CPU hardware that can accelerate the neural network model operation, such as a special chip for NPU at the mobile end, TPU at the server end, and the like.
For simplicity of illustration, there are 1 main thread and frame extraction threads, and in practical engineering practice, the number of both threads may be greater than 1 (only one queue and shared with each thread), so as to further increase the overall prediction speed on video data.
In the face detection and recognition process of each video, repeated initiation of clustering can lead to frequent switching of thread tasks, and the input and output communication efficiency among all hardware is affected. As shown in fig. 5, the application proposes batch optimization for video, and can collect all unknown faces, and then perform batch parallelization calculation, for example, batch calculation on the similarity between all faces to be clustered and all small cluster centers. And caching the unknown face feature vectors of each batch to a hard disk, and when the cache library is full or a timer is activated, initiating a clustering task by using the batch of data, and emptying the cache after clustering is finished.
The application aims to reduce the calculation cost of clustering result updating by using an incremental clustering method based on graph connection; aiming at the characteristics of video data, a multithreading acceleration and batch clustering mode is used to achieve the balance of precision and speed.
The face recognition and clustering method provided by the application is used for comparing the accuracy and the speed of a common online clustering method (5 members are randomly selected for calculating the similarity of each cluster, and the whole cluster participates in calculation if the number is insufficient). The test data is 4635 face pictures, belongs to 932-class identities, and the comparison result is shown in the following table:
Method clustering accuracy 1 Clustering accuracy 2 Number of clusters Clustering is time-consuming
General 98.79(4579/4635) 97.67(4527/4635) 967 3 minutes 27 seconds
The application is that 99.78(4625/4635) 98.40(4561/4635) 988 2 minutes 45 seconds
Two different evaluation criteria were used: (1) clustering accuracy 1: possibly, a plurality of clusters belong to the same identity, and the result is high; (2) clustering accuracy 2: one cluster belongs to one identity at most, so that the clustering result is prevented from being fragmented and the result is lower. Brackets on the right side of the accuracy are accuracy calculation formulas (face pictures/total number of face pictures with correct clustering results). It can be seen that the face recognition and clustering method of the present application has significant advantages in both accuracy and speed.
The method has the advantages that before and after batch optimization for the video is used, on 720P resolution MP4 format video for 4 minutes (including frame extraction, detection and identification, if batch optimization is not used, clustering is directly initiated again, frame extraction intervals are 3 seconds), 5 times of testing are repeated, the time consumption is reduced from 40+/-1 seconds to 35+/-1 seconds, and the total time consumption of a prediction stage is remarkably reduced.
The test is repeated 5 times on the video in the format (including frame extraction, detection and identification, 3 seconds of frame extraction interval) before and after the multithreading optimization for the video, the time consumption is reduced from 35+/-1 seconds to 30+/-2 seconds, and the total time consumption of the prediction stage is obviously reduced.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present application and are not limiting. Although the present application has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present application, which is intended to be covered by the appended claims.

Claims (10)

1. A face recognition and clustering method, the method comprising:
using a face detection model, inputting a picture to be detected, and outputting a rectangular detection frame of the face position in the picture to be detected;
after cutting a rectangular detection frame, amplifying and correcting, inputting a face recognition model, outputting a feature vector corresponding to the face image, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is smaller than a set threshold value, the faces belong to unknown faces;
inputting an unknown face by using a face clustering model, and aggregating faces with potential identical identities in the unknown face;
the face detection model and the face recognition model are trained neural network models;
the face clustering model is an incremental clustering model based on graph connection.
2. The face recognition and clustering method according to claim 1, wherein the incremental clustering model based on graph connection comprises the following specific steps:
step 1: inputting an unknown face;
step 2: if the number of the current small clusters is less than 1, creating a new small cluster, and taking the feature vector of the unknown face as the center of the small cluster;
if the number of the current small clusters is greater than or equal to 1, calculating the similarity between the unknown face and the centers of all the small clusters, if the similarity is lower than a set threshold value, creating a new small cluster, and taking the feature vector of the unknown face as the center of the small cluster;
if the similarity between the unknown face and the centers of 1 small cluster is higher than a set threshold value and the similarity is greater than the similarity between the unknown face and the centers of other clusters, adding the unknown face into the small cluster;
step 3: the average value of the feature vectors of all faces of the updated small cluster is recalculated and used as the center of the updated cluster;
step 4: calculating the similarity between every two small clusters, increasing the connection between the small clusters when the similarity is larger than a set threshold value, and disconnecting the connection between the small clusters when the similarity is smaller than the set threshold value;
step 5: constructing a graph by taking small clusters as nodes and the connection among the small clusters as edges; calculating the connectivity between nodes, if any two nodes of the current graph can be reached through the connection between the nodes in the current graph, only one connected graph of the current graph is the current graph, otherwise, the graph is divided into a plurality of connected graphs, so that any two nodes between any two connected graphs can not be reached through the connection between any nodes; the small clusters in each connected graph form a large cluster, and belong to the same face identity.
3. The face recognition and clustering method of claim 2, wherein the connectivity map calculation method is depth-first search or breadth-first search.
4. The face recognition and clustering method according to claim 1, wherein the picture to be detected is a picture extracted from a video.
5. The face recognition and clustering method according to claim 4, wherein the face recognition and clustering method is specifically performed as follows:
establishing a frame extraction thread, a target detection thread and a data queue;
extracting pictures from the video by using a CPU in the frame extraction thread and putting the pictures into the data queue;
and the face detection model, the face recognition model and the face clustering model acquire pictures from the data queue in the target detection thread by using a picture processing chip, and perform detection, recognition and clustering.
6. The face recognition and clustering method of claim 5, wherein the frame extraction thread and the object detection thread are 1 or more threads, respectively.
7. The face recognition and clustering method of claim 5, wherein the picture processing chip is a GPU, NPU or TPU chip.
8. The face recognition and clustering method according to claim 1, wherein a plurality of unknown faces are combined into a batch, feature vectors of the unknown face images are calculated in batches and cached, and when the cache area is full or reaches a set time, face clustering is performed on all the cached feature vectors.
9. A face recognition and clustering system implemented based on any one of the methods of claims 1-8, the system comprising:
the human face detection module is used for inputting a picture to be detected by using a human face detection model and outputting a rectangular detection frame of the human face position in the picture to be detected; the face detection model is a trained neural network model;
the face recognition module is used for amplifying and correcting the rectangular detection frame after cutting, inputting a face recognition model, outputting a feature vector corresponding to the face image, performing similarity calculation on the feature vector corresponding to all the faces with known identities, and if the similarity is smaller than a set threshold value, the face belongs to an unknown face; the face recognition model is a trained neural network model; and
the face clustering module is used for inputting an unknown face by using a face clustering model and aggregating faces with potential identical identities in the unknown face; the face clustering model is an incremental clustering model based on graph connection.
10. The face recognition and clustering system of claim 9, wherein the system further comprises:
and the picture acquisition module is used for extracting the picture to be detected from the video.
CN202310676420.0A 2023-06-08 2023-06-08 Face recognition and clustering method and system Pending CN116704577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310676420.0A CN116704577A (en) 2023-06-08 2023-06-08 Face recognition and clustering method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310676420.0A CN116704577A (en) 2023-06-08 2023-06-08 Face recognition and clustering method and system

Publications (1)

Publication Number Publication Date
CN116704577A true CN116704577A (en) 2023-09-05

Family

ID=87838742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310676420.0A Pending CN116704577A (en) 2023-06-08 2023-06-08 Face recognition and clustering method and system

Country Status (1)

Country Link
CN (1) CN116704577A (en)

Similar Documents

Publication Publication Date Title
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
CN104679818B (en) A kind of video key frame extracting method and system
CN111539480B (en) Multi-category medical image recognition method and equipment
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN112861695B (en) Pedestrian identity re-identification method and device, electronic equipment and storage medium
CN107609105B (en) Construction method of big data acceleration structure
CN112802054A (en) Mixed Gaussian model foreground detection method fusing image segmentation
CN111309718B (en) Distribution network voltage data missing filling method and device
CN110769259A (en) Image data compression method for tracking track content of video target
CN112948613B (en) Image incremental clustering method, system, medium and device
CN116704577A (en) Face recognition and clustering method and system
CN112434798A (en) Multi-scale image translation method based on semi-supervised learning
CN112560731A (en) Feature clustering method, database updating method, electronic device and storage medium
CN117036897A (en) Method for detecting few sample targets based on Meta RCNN
CN113743251B (en) Target searching method and device based on weak supervision scene
CN115578765A (en) Target identification method, device, system and computer readable storage medium
Cai et al. An online face clustering algorithm for face monitoring and retrieval in real-time videos
CN116342466A (en) Image matting method and related device
CN113706459A (en) Detection and simulation restoration device for abnormal brain area of autism patient
CN110310297B (en) Image segmentation method and system based on multi-resolution search particle swarm algorithm
CN113420608A (en) Human body abnormal behavior identification method based on dense space-time graph convolutional network
CN112070023B (en) Neighborhood prior embedded type collaborative representation mode identification method
CN117132777B (en) Image segmentation method, device, electronic equipment and storage medium
CN117235137B (en) Professional information query method and device based on vector database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination