CN111160077A - Large-scale dynamic face clustering method - Google Patents

Large-scale dynamic face clustering method Download PDF

Info

Publication number
CN111160077A
CN111160077A CN201811327622.XA CN201811327622A CN111160077A CN 111160077 A CN111160077 A CN 111160077A CN 201811327622 A CN201811327622 A CN 201811327622A CN 111160077 A CN111160077 A CN 111160077A
Authority
CN
China
Prior art keywords
similarity
nodes
clustering
node
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811327622.XA
Other languages
Chinese (zh)
Inventor
张陈欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Changfeng Science Technology Industry Group Corp
Original Assignee
China Changfeng Science Technology Industry Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Changfeng Science Technology Industry Group Corp filed Critical China Changfeng Science Technology Industry Group Corp
Priority to CN201811327622.XA priority Critical patent/CN111160077A/en
Publication of CN111160077A publication Critical patent/CN111160077A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The invention provides a large-scale face dynamic clustering method, which aims at each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity; randomly selecting a node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node; and when all the nodes are finished, completing one iteration, and repeating until the iteration times are reached to obtain k clusters. The invention can realize the intelligent classification of the monitoring videos through the clustering result; the high-accuracy output of the face clustering algorithm based on the monitoring video can effectively assist in solving the problems of subsequent face recognition, character identity judgment and the like.

Description

Large-scale dynamic face clustering method
Technical Field
The invention belongs to the technical field of face clustering in the field of image processing, and particularly relates to a large-scale face dynamic clustering method.
Background
In the era of data explosion, face data is rapidly increasing. How to cluster the human face big data and extract valuable information is a problem which needs to be solved urgently at present. At present, the face clustering mainly adopts a mode of combining deep learning representation and a clustering algorithm, so the quality of the face clustering effect mainly depends on the extraction of face features and the selection of the clustering algorithm.
The extraction of the face features needs to ensure high-precision separability, reduce the dimensionality of feature values as much as possible and ensure the classification efficiency. With the development of deep learning, the convolutional neural network becomes a mainstream method for extracting human face features due to the deep structure, strong learning ability and layered nonlinear mapping.
The essence of clustering is to group data according to their characteristics so that the similarity of data within a group is as large as possible and the similarity of data between groups is as small as possible. At present, the clustering algorithm is mainly divided into: partitional clustering algorithms, hierarchical clustering algorithms, density-based clustering, grid-based clustering, model-based clustering, and the like. The hierarchical clustering algorithm is simple in definition of distance and similarity, the number of clustering clusters does not need to be preset before clustering, the problem of poor clustering effect caused by lack of basis in setting can be avoided, and a multi-level clustering structure with different granularities can be obtained, so that the hierarchical clustering algorithm is the most widely applied clustering algorithm.
The hierarchical clustering algorithm is an efficient graph clustering algorithm, but has the following disadvantages: firstly, clustering results are influenced by a similarity threshold value; secondly, under the condition of a large number of categories, the algorithm may have a poor result, that is, the more categories are, the poorer the feature vector distinctiveness under the current space is; thirdly, for small graphs, the randomness of the algorithm is larger, and for larger graphs, the randomness disappears; fourth, the time complexity of the algorithm similarity matrix calculation is O (n)2) And for large-scale face clustering, the algorithm has slow clustering speed.
Disclosure of Invention
The invention aims to provide a large-scale face dynamic clustering algorithm aiming at the defects of a hierarchical clustering algorithm, which is characterized in that relatively few data points are used for describing the characteristics of a data set, newly added data and original data are clustered by using representative points representing clusters respectively, and the clusters are combined according to the clustering result so as to complete clustering updating.
The technical scheme of the invention is as follows:
a large-scale face dynamic clustering method is characterized by comprising the following steps:
(1) constructing an undirected graph: for each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity;
(2) iteration: randomly selecting a node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node;
(3) after all the nodes are finished, one iteration is finished, and the step 2 is repeated until the iteration times are reached;
(4) and finishing to obtain k clusters.
F of LFW data set1Measure value 0.997, F in VGGFace2 data set1Measure value 0.901, F in CASIA-Webface dataset1The measure value is 0.929, a better clustering effect is achieved, and the time complexity of clustering is O (n × p).
The invention has wide application scenes and can be used for face retrieval and labeling, face recognition preprocessing, face database construction and the like. For the public security department, the face clustering can be used for tracking criminals and searching missing population by fusing information such as time, place and the like on the basis of retrieval, so that a manual investigation mode is replaced, and the working efficiency is greatly improved; for the pharmaceutical industry, a face clustering technology can be used for excavating suspected drug vendors and standardizing the pharmaceutical market; aiming at the conditions that the number of the monitoring videos is large and the monitoring videos are difficult to manage at present, the face clustering technology is applied to the monitoring videos, and the intelligent classification of the monitoring videos can be realized through clustering results; the high-accuracy output of the face clustering algorithm based on the monitoring video can effectively assist in solving the problems of subsequent face recognition, character identity judgment and the like.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention.
Detailed Description
As shown in fig. 1, the present invention comprises the steps of:
(1) constructing an undirected graph: for each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity;
(2) iteration: random selectionA node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node;
(3) after all the nodes are finished, one iteration is finished, and the step 2 is repeated until the iteration times are reached;
(4) and finishing to obtain k clusters.
In the step (2), the dynamic clustering of the incremental data includes the following steps:
(21) determining a similarity threshold value, iteration times and a P value;
(22) selecting P data from the data set for hierarchical clustering;
(23) carrying out hierarchical clustering on the newly generated class center and the existing class center;
(24) merging class centers into a class of classes;
(25) deleting the P data which are clustered from the data set;
(26) and if the data set is not empty, skipping to the step 2, otherwise ending.
The following is a specific embodiment of the present invention, the process is as follows:
firstly, a CNN + ArcFace Loss method is adopted for feature extraction:
1) performing face detection and face alignment by adopting MTCNN;
2) and performing model training by adopting a ResNet network frame structure and an MS-celeb-1M data set, wherein the dimensionality of the output feature vector is 512.
II, determining related parameters:
by experimenting with data of different scales (100,200,300, …,2000), the P value with better algorithm performance was selected.
The invention adopts the cosine distance of the characteristic vector extracted by the CNN + ArcFace Loss method as the similarity metric value of the data, and obtains the optimal similarity threshold value and the optimal iteration times by comparing different similarity threshold values and the clustering results of the iteration times one by one.
Because the LFW data set is smaller, the P value is set to be 1000, the VGGFace2 and CASIA-Webface data set are larger, and the P value is set to be 2000; the similarity thresholds are respectively 0.49, 0.40 and 0.40; the number of iterations is 8.

Claims (2)

1. A large-scale face dynamic clustering method is characterized by comprising the following steps:
(1) constructing an undirected graph: for each node aiAll assign an initial class (a)i) I; calculating the similarity between different nodes, and if the similarity is greater than a similarity threshold value threshold, forming a correlation edge, wherein the weight is the similarity;
(2) iteration: randomly selecting a node aiFirstly, if a plurality of nodes in the neighborhood belong to the same class, adding the weights of the nodes; selecting the category j with the maximum weight of all neighbor nodes under the node as the category of the current node;
(3) after all the nodes are finished, one iteration is finished, and the step 2 is repeated until the iteration times are reached;
(4) and finishing to obtain k clusters.
2. The large-scale dynamic face clustering method according to claim 1, wherein the dynamic clustering of incremental data in step (2) comprises the following steps:
(21) determining a similarity threshold value, iteration times and a P value;
(22) selecting P data from the data set for hierarchical clustering;
(23) carrying out hierarchical clustering on the newly generated class center and the existing class center;
(24) merging class centers into a class of classes;
(25) deleting the P data which are clustered from the data set;
(26) and if the data set is not empty, skipping to the step 2, otherwise ending.
CN201811327622.XA 2018-11-08 2018-11-08 Large-scale dynamic face clustering method Pending CN111160077A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811327622.XA CN111160077A (en) 2018-11-08 2018-11-08 Large-scale dynamic face clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811327622.XA CN111160077A (en) 2018-11-08 2018-11-08 Large-scale dynamic face clustering method

Publications (1)

Publication Number Publication Date
CN111160077A true CN111160077A (en) 2020-05-15

Family

ID=70555240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811327622.XA Pending CN111160077A (en) 2018-11-08 2018-11-08 Large-scale dynamic face clustering method

Country Status (1)

Country Link
CN (1) CN111160077A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149525A (en) * 2020-09-07 2020-12-29 浙江工业大学 Face recognition method based on Laplace peak clustering
CN114612990A (en) * 2022-03-22 2022-06-10 河海大学 Unmanned aerial vehicle face recognition method based on super-resolution

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149525A (en) * 2020-09-07 2020-12-29 浙江工业大学 Face recognition method based on Laplace peak clustering
CN114612990A (en) * 2022-03-22 2022-06-10 河海大学 Unmanned aerial vehicle face recognition method based on super-resolution

Similar Documents

Publication Publication Date Title
Dong et al. Automatic age estimation based on deep learning algorithm
CN109344285B (en) Monitoring-oriented video map construction and mining method and equipment
Omran et al. Differential evolution methods for unsupervised image classification
Ibrahim et al. Cluster representation of the structural description of images for effective classification
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN104765768A (en) Mass face database rapid and accurate retrieval method
Badawi et al. A hybrid memetic algorithm (genetic algorithm and great deluge local search) with back-propagation classifier for fish recognition
CN113177132B (en) Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN106780639B (en) Hash coding method based on significance characteristic sparse embedding and extreme learning machine
CN111414461A (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN112906770A (en) Cross-modal fusion-based deep clustering method and system
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
WO2022062419A1 (en) Target re-identification method and system based on non-supervised pyramid similarity learning
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN103065158A (en) Action identification method of independent subspace analysis (ISA) model based on relative gradient
CN110852152B (en) Deep hash pedestrian re-identification method based on data enhancement
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
Wan et al. LFRNet: Localizing, focus, and refinement network for salient object detection of surface defects
CN111160077A (en) Large-scale dynamic face clustering method
CN109241315B (en) Rapid face retrieval method based on deep learning
Pandey et al. A hierarchical clustering approach for image datasets
CN113010705A (en) Label prediction method, device, equipment and storage medium
CN104200222B (en) Object identifying method in a kind of picture based on factor graph model
CN109871469A (en) Tuftlet crowd recognition method based on dynamic graphical component

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200515