CN111160077A - Large-scale dynamic face clustering method - Google Patents
Large-scale dynamic face clustering method Download PDFInfo
- Publication number
- CN111160077A CN111160077A CN201811327622.XA CN201811327622A CN111160077A CN 111160077 A CN111160077 A CN 111160077A CN 201811327622 A CN201811327622 A CN 201811327622A CN 111160077 A CN111160077 A CN 111160077A
- Authority
- CN
- China
- Prior art keywords
- similarity
- nodes
- clustering
- node
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Abstract
The invention provides a large-scale face dynamic clustering method, which aims at each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity; randomly selecting a node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node; and when all the nodes are finished, completing one iteration, and repeating until the iteration times are reached to obtain k clusters. The invention can realize the intelligent classification of the monitoring videos through the clustering result; the high-accuracy output of the face clustering algorithm based on the monitoring video can effectively assist in solving the problems of subsequent face recognition, character identity judgment and the like.
Description
Technical Field
The invention belongs to the technical field of face clustering in the field of image processing, and particularly relates to a large-scale face dynamic clustering method.
Background
In the era of data explosion, face data is rapidly increasing. How to cluster the human face big data and extract valuable information is a problem which needs to be solved urgently at present. At present, the face clustering mainly adopts a mode of combining deep learning representation and a clustering algorithm, so the quality of the face clustering effect mainly depends on the extraction of face features and the selection of the clustering algorithm.
The extraction of the face features needs to ensure high-precision separability, reduce the dimensionality of feature values as much as possible and ensure the classification efficiency. With the development of deep learning, the convolutional neural network becomes a mainstream method for extracting human face features due to the deep structure, strong learning ability and layered nonlinear mapping.
The essence of clustering is to group data according to their characteristics so that the similarity of data within a group is as large as possible and the similarity of data between groups is as small as possible. At present, the clustering algorithm is mainly divided into: partitional clustering algorithms, hierarchical clustering algorithms, density-based clustering, grid-based clustering, model-based clustering, and the like. The hierarchical clustering algorithm is simple in definition of distance and similarity, the number of clustering clusters does not need to be preset before clustering, the problem of poor clustering effect caused by lack of basis in setting can be avoided, and a multi-level clustering structure with different granularities can be obtained, so that the hierarchical clustering algorithm is the most widely applied clustering algorithm.
The hierarchical clustering algorithm is an efficient graph clustering algorithm, but has the following disadvantages: firstly, clustering results are influenced by a similarity threshold value; secondly, under the condition of a large number of categories, the algorithm may have a poor result, that is, the more categories are, the poorer the feature vector distinctiveness under the current space is; thirdly, for small graphs, the randomness of the algorithm is larger, and for larger graphs, the randomness disappears; fourth, the time complexity of the algorithm similarity matrix calculation is O (n)2) And for large-scale face clustering, the algorithm has slow clustering speed.
Disclosure of Invention
The invention aims to provide a large-scale face dynamic clustering algorithm aiming at the defects of a hierarchical clustering algorithm, which is characterized in that relatively few data points are used for describing the characteristics of a data set, newly added data and original data are clustered by using representative points representing clusters respectively, and the clusters are combined according to the clustering result so as to complete clustering updating.
The technical scheme of the invention is as follows:
a large-scale face dynamic clustering method is characterized by comprising the following steps:
(1) constructing an undirected graph: for each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity;
(2) iteration: randomly selecting a node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node;
(3) after all the nodes are finished, one iteration is finished, and the step 2 is repeated until the iteration times are reached;
(4) and finishing to obtain k clusters.
F of LFW data set1Measure value 0.997, F in VGGFace2 data set1Measure value 0.901, F in CASIA-Webface dataset1The measure value is 0.929, a better clustering effect is achieved, and the time complexity of clustering is O (n × p).
The invention has wide application scenes and can be used for face retrieval and labeling, face recognition preprocessing, face database construction and the like. For the public security department, the face clustering can be used for tracking criminals and searching missing population by fusing information such as time, place and the like on the basis of retrieval, so that a manual investigation mode is replaced, and the working efficiency is greatly improved; for the pharmaceutical industry, a face clustering technology can be used for excavating suspected drug vendors and standardizing the pharmaceutical market; aiming at the conditions that the number of the monitoring videos is large and the monitoring videos are difficult to manage at present, the face clustering technology is applied to the monitoring videos, and the intelligent classification of the monitoring videos can be realized through clustering results; the high-accuracy output of the face clustering algorithm based on the monitoring video can effectively assist in solving the problems of subsequent face recognition, character identity judgment and the like.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention.
Detailed Description
As shown in fig. 1, the present invention comprises the steps of:
(1) constructing an undirected graph: for each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity;
(2) iteration: random selectionA node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node;
(3) after all the nodes are finished, one iteration is finished, and the step 2 is repeated until the iteration times are reached;
(4) and finishing to obtain k clusters.
In the step (2), the dynamic clustering of the incremental data includes the following steps:
(21) determining a similarity threshold value, iteration times and a P value;
(22) selecting P data from the data set for hierarchical clustering;
(23) carrying out hierarchical clustering on the newly generated class center and the existing class center;
(24) merging class centers into a class of classes;
(25) deleting the P data which are clustered from the data set;
(26) and if the data set is not empty, skipping to the step 2, otherwise ending.
The following is a specific embodiment of the present invention, the process is as follows:
firstly, a CNN + ArcFace Loss method is adopted for feature extraction:
1) performing face detection and face alignment by adopting MTCNN;
2) and performing model training by adopting a ResNet network frame structure and an MS-celeb-1M data set, wherein the dimensionality of the output feature vector is 512.
II, determining related parameters:
by experimenting with data of different scales (100,200,300, …,2000), the P value with better algorithm performance was selected.
The invention adopts the cosine distance of the characteristic vector extracted by the CNN + ArcFace Loss method as the similarity metric value of the data, and obtains the optimal similarity threshold value and the optimal iteration times by comparing different similarity threshold values and the clustering results of the iteration times one by one.
Because the LFW data set is smaller, the P value is set to be 1000, the VGGFace2 and CASIA-Webface data set are larger, and the P value is set to be 2000; the similarity thresholds are respectively 0.49, 0.40 and 0.40; the number of iterations is 8.
Claims (2)
1. A large-scale face dynamic clustering method is characterized by comprising the following steps:
(1) constructing an undirected graph: for each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity;
(2) iteration: randomly selecting a node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node;
(3) after all the nodes are finished, one iteration is finished, and the step 2 is repeated until the iteration times are reached;
(4) and finishing to obtain k clusters.
2. The large-scale dynamic face clustering method according to claim 1, wherein the dynamic clustering of incremental data in step (2) comprises the following steps:
(21) determining a similarity threshold value, iteration times and a P value;
(22) selecting P data from the data set for hierarchical clustering;
(23) carrying out hierarchical clustering on the newly generated class center and the existing class center;
(24) merging class centers into a class of classes;
(25) deleting the P data which are clustered from the data set;
(26) and if the data set is not empty, skipping to the step 2, otherwise ending.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811327622.XA CN111160077A (en) | 2018-11-08 | 2018-11-08 | Large-scale dynamic face clustering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811327622.XA CN111160077A (en) | 2018-11-08 | 2018-11-08 | Large-scale dynamic face clustering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111160077A true CN111160077A (en) | 2020-05-15 |
Family
ID=70555240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811327622.XA Pending CN111160077A (en) | 2018-11-08 | 2018-11-08 | Large-scale dynamic face clustering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160077A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149525A (en) * | 2020-09-07 | 2020-12-29 | 浙江工业大学 | Face recognition method based on Laplace peak clustering |
CN114612990A (en) * | 2022-03-22 | 2022-06-10 | 河海大学 | Unmanned aerial vehicle face recognition method based on super-resolution |
-
2018
- 2018-11-08 CN CN201811327622.XA patent/CN111160077A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149525A (en) * | 2020-09-07 | 2020-12-29 | 浙江工业大学 | Face recognition method based on Laplace peak clustering |
CN114612990A (en) * | 2022-03-22 | 2022-06-10 | 河海大学 | Unmanned aerial vehicle face recognition method based on super-resolution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dong et al. | Automatic age estimation based on deep learning algorithm | |
CN109344285B (en) | Monitoring-oriented video map construction and mining method and equipment | |
Omran et al. | Differential evolution methods for unsupervised image classification | |
Ibrahim et al. | Cluster representation of the structural description of images for effective classification | |
CN110880019B (en) | Method for adaptively training target domain classification model through unsupervised domain | |
CN109993100B (en) | Method for realizing facial expression recognition based on deep feature clustering | |
CN104765768A (en) | Mass face database rapid and accurate retrieval method | |
Badawi et al. | A hybrid memetic algorithm (genetic algorithm and great deluge local search) with back-propagation classifier for fish recognition | |
CN113177132B (en) | Image retrieval method based on depth cross-modal hash of joint semantic matrix | |
CN106780639B (en) | Hash coding method based on significance characteristic sparse embedding and extreme learning machine | |
CN111414461A (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN112906770A (en) | Cross-modal fusion-based deep clustering method and system | |
CN108595558B (en) | Image annotation method based on data equalization strategy and multi-feature fusion | |
WO2022062419A1 (en) | Target re-identification method and system based on non-supervised pyramid similarity learning | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN103065158A (en) | Action identification method of independent subspace analysis (ISA) model based on relative gradient | |
CN110852152B (en) | Deep hash pedestrian re-identification method based on data enhancement | |
CN112784921A (en) | Task attention guided small sample image complementary learning classification algorithm | |
Wan et al. | LFRNet: Localizing, focus, and refinement network for salient object detection of surface defects | |
CN111160077A (en) | Large-scale dynamic face clustering method | |
CN109241315B (en) | Rapid face retrieval method based on deep learning | |
Pandey et al. | A hierarchical clustering approach for image datasets | |
CN113010705A (en) | Label prediction method, device, equipment and storage medium | |
CN104200222B (en) | Object identifying method in a kind of picture based on factor graph model | |
CN109871469A (en) | Tuftlet crowd recognition method based on dynamic graphical component |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200515 |