CN110991281B - Dynamic face recognition method - Google Patents

Dynamic face recognition method Download PDF

Info

Publication number
CN110991281B
CN110991281B CN201911145377.5A CN201911145377A CN110991281B CN 110991281 B CN110991281 B CN 110991281B CN 201911145377 A CN201911145377 A CN 201911145377A CN 110991281 B CN110991281 B CN 110991281B
Authority
CN
China
Prior art keywords
face
dimensional
layers
model
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911145377.5A
Other languages
Chinese (zh)
Other versions
CN110991281A (en
Inventor
高建彬
蒋文韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201911145377.5A priority Critical patent/CN110991281B/en
Publication of CN110991281A publication Critical patent/CN110991281A/en
Application granted granted Critical
Publication of CN110991281B publication Critical patent/CN110991281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Abstract

The invention provides a dynamic face recognition method, which comprises the following steps: firstly, shooting a video through a camera so as to obtain a marked face image; performing targeted illumination compensation on the marked face image; constructing a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) for three-dimensional face reconstruction to obtain a face three-dimensional model; extracting the features of the human face three-dimensional model to obtain a human face feature vector; and carrying out face matching based on the face feature vector, and further realizing face recognition. According to the dynamic face recognition method provided by the invention, the problems of gesture shielding and the like can be well solved through three-dimensional face reconstruction, and the superior dynamic face recognition effect compared with the prior art can be obtained by performing face recognition on the three-dimensional face reconstruction model based on the CNN.

Description

Dynamic face recognition method
Technical Field
The invention relates to a face recognition technology, in particular to a dynamic face recognition method.
Background
The research of face recognition dates back to the sixties and seventies of the last century, and the face recognition system has the following advantages: 1. optional characteristics: the user can almost acquire the face image in an unconscious state without specially matching with face acquisition equipment, and the sampling mode has no 'compulsory'; "2, non-contact: the user can obtain the face image without directly contacting with the equipment; 3. concurrency: the face recognition method has the advantages that sorting, judging and identifying of multiple faces can be carried out in an actual application scene; 4. also in line with the visual characteristics: the character of 'recognizing people' is simple to operate, the result is visual, and the concealment is good. The traditional face recognition technology is mainly based on face recognition of visible light images, which is a familiar recognition mode, and the technology has been developed for more than 30 years. However, this method has a defect that it is difficult to overcome, and especially when the ambient light changes, the recognition effect will be rapidly reduced, which cannot meet the needs of the actual system. Techniques for solving the illumination problem include three-dimensional image face recognition and thermal imaging face recognition. However, the two technologies are still far from mature and the recognition effect is not satisfactory. In recent years, with the development of deep learning, the success rate of face recognition in a control environment is high, but a satisfactory effect is difficult to achieve in a non-control environment, some factors restrict large-scale commercialization of a face recognition system, factors such as illumination change, different postures, local face shielding, complex expression and age change greatly increase the difficulty of face recognition, and meanwhile, deep learning is driven by data, and at present, it is not practical for the same person to obtain a large number of images of various types of the face recognition system, so that the detection effect under an uncontrollable condition is also influenced to a certain extent. Therefore, the method for improving the accuracy and the robustness of the face recognition method has great research value, and has great application value in the fields of security monitoring, identity authentication, crime investigation, intelligent control and the like. Early recognition methods based on human face geometric features were generally used, and the geometric features commonly used were the local shape features of five sense organs of a human face, such as eyes, nose, mouth, and the like. After that, a face recognition method based on correlation matching, subspace and statistics appears, and then based on a neural network, gutta and the like propose a mixed neural network, lawrence and the like to realize sample clustering through a multi-stage SOM, a convolutional neural network CNN is used for face recognition, lin and the like adopt a probability decision-based neural network method, demers and the like propose to extract face image features by adopting a principal component neural network method, further compress the features by using an autocorrelation neural network, and finally a multilayer perceptron (MLP) is adopted to realize face recognition. Er and the like adopt Principal Component Analysis (PCA) to carry out dimension compression, then Linear Discriminant Analysis (LDA) is used for extracting features, and then face recognition is carried out based on a Radial Basis Function (RBF). The advantage of the neural network is that the implicit expression of the rules and rules is obtained through the learning process, and the neural network is highly adaptive. Finally, an identification method based on a three-dimensional model can overcome the defects of the traditional two-dimensional image, atick and the like expand the thought of a characteristic face, mainly represent the front face by a three-dimensional function, nastar and the like provide a method based on a gray plane, a face image is modeled into a variable 3D grid surface, the face matching problem is converted into an elastic matching problem, anh and the like provide a 3D face reconstruction algorithm, a deep convolutional neural network method is used for reconstructing the 3D face, the 3D face can be accurately estimated through a plurality of pictures of the same face, and a good effect is achieved. Generally, three-dimensional face images are based on a plurality of pictures, and inspired by success of a Deep Neural Network (DNN) since 2017, dou et al propose a DNN-based method for performing end-to-end 3D face reconstruction from a single 2D image; jackson et al propose that large-pose three-dimensional face reconstruction is realized from a single image, a three-dimensional face is directly regressed from a CNN instead of estimating parameters of a face 3D deformation statistical model (3 DMM) through the CNN; in 2018, feng Y et al proposed a reconstruction algorithm that can accomplish both three-dimensional reconstruction and end-to-end alignment. Lu et al propose a three-dimensional reconstruction method based on single-picture geometric details by aligning the projection of 3D face markers with 2D markers detected from the input image, generating a smooth and rough 3D face by an example-based bilinear face model; then, using local correction, refining the 3D face by luminosity consistency constraint; finally, a shadow shape method is applied on the media side to recover fine geometric details. The method keeps higher reduction degree, and can improve the identification effect to a great extent during identification. Feng et al assessed dense three-dimensional reconstructions of field 2D images, provided a dedicated data set, and compared the effects of the current three more advanced 3D reconstruction systems.
However, the existing dynamic face recognition technology has a series of problems: (1) illumination problems: when faced with the examination of various environmental light sources, the phenomena of side light, backlight, top light, high light and the like may occur, and the illumination at each time interval may be different, even the illumination at each position in the monitoring area may be different. (2) character pose and accouterment problems: because the dynamic monitoring is of a non-fitting type, monitored personnel pass through a monitoring area in a natural posture, and therefore various non-frontal face postures such as a side face, a head-down posture and a head-up posture and ornaments phenomena such as a hat, a mask and glasses can occur. (3) image problems of the camera: many technical parameters of the camera affect the quality of video images, such as the size of the sensor, the processing speed of the DSP, the built-in image processing chip and lens, etc., and some setting parameters built in the camera also affect the quality of video images, such as exposure time, aperture, dynamic white balance, etc. (4) frame loss, face missing detection and the like: the situations of video frame loss and face missing detection can be caused by required network identification and system calculation identification; in the area with large monitoring people flow, people lose frames and people face missing detection is often caused due to the bandwidth problem and the computing power problem of network transmission.
Disclosure of Invention
Aiming at the problems, in order to overcome the defects, the invention provides an efficient, reasonable and effective dynamic face recognition method to solve a series of problems such as illumination postures, and the method is suitable for performing effective face recognition in a non-controllable environment. In order to achieve the above object, the dynamic face recognition method of the present invention comprises the following steps:
(1) A video is taken by a high profile camera (e.g., sony SSC-N21), a common face detection algorithm is invoked, and frames from the video that may contain faces are truncated for input to the detector.
(2) Preprocessing the extracted frames of the human face, estimating an illumination mode by utilizing an illumination mode parameter space, and then performing targeted illumination compensation so as to eliminate the influences of shadow, highlight and the like caused by non-uniform front illumination;
(3) In order to eliminate the influence of the posture, a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) is constructed to realize three-dimensional face reconstruction, a residual block is combined with common convolution operation, meanwhile, a feature point is added in the reconstruction process as a guide, and as the alignment of the feature points of the face is carried out according to the feature points in the reconstruction process, the alignment is carried out without redundant steps after the reconstruction.
(4) Constructing a residual error network model for extracting feature vectors, training the residual error network model by adopting large-margin cosine measurement, finely adjusting network parameters of the residual error network model, simultaneously increasing a skip level structure (by taking the characteristics of the residual error network into account, the output of each layer can spread across multiple layers), and optimizing the feature vector extraction result by adjusting the number of skip layers in each training and the number of convolutional layers of the residual error network model.
(5) And matching the extracted feature vectors to realize face recognition.
The invention discloses a dynamic face recognition method, which improves a loss function in the processes of three-dimensional face reconstruction and subsequent face recognition aiming at a three-dimensional face model and realizes the dynamic face recognition efficiency through the three-dimensional face reconstruction. The invention carries out three-dimensional face reconstruction by constructing a three-dimensional face reconstruction model based on the Convolutional Neural Network (CNN), extracts features by constructing a residual network and finally carries out face matching, carries out illumination compensation on the face before carrying out feature extraction, eliminates the influence of illumination, can well solve the problems of gesture shielding and the like by three-dimensional face reconstruction, and can obtain better effect than the prior effect by carrying out face identification by the three-dimensional face reconstruction model based on the Convolutional Neural Network (CNN), thereby obtaining good effect when carrying out dynamic face identification.
Drawings
FIG. 1 is a flow chart of the dynamic face recognition implementation provided by the present invention
FIG. 2 is a model of face detection algorithm of the present invention
FIG. 3 is a schematic diagram of a three-dimensional human face reconstruction model according to the present invention
FIG. 4 is a schematic diagram of the residual error network model of the present invention
FIG. 5 is a diagram illustrating a residual block structure
FIG. 6 shows the comparison of the three-dimensional face reconstruction results
FIG. 7 shows a comparison of human face recognition effects
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The invention provides a dynamic face recognition method, as shown in fig. 1, the method comprises the following steps:
step 1: firstly, a video is shot through a camera, a face detector (using the existing face detector) is designed for the purpose, a common face detection algorithm is called, image frames possibly containing faces are intercepted from the video, each image frame of the faces intercepted from the video is input into the face detector, and a marked face image is obtained. The specific implementation process is shown in fig. 2. Wherein the camera is a high-configuration camera, such as a Sony SSC-N21 model.
Step 2: then aiming at the influence of illumination, estimating an illumination mode by using an illumination mode parameter space, and performing targeted illumination compensation on the marked face image obtained in the step (1) to obtain a marked face image subjected to illumination compensation so as to eliminate the influence of shadow, highlight and the like caused by non-uniform front illumination;
and step 3: and 3, performing three-dimensional face reconstruction on the marked face image subjected to illumination compensation obtained in the step 2. As for the two-dimensional face, the three-dimensional face has great advantages, the problems of posture and the like can be effectively solved, the influence of the posture is eliminated, and the recognition effect is improved. The three-dimensional face reconstruction model adopts a mode of combining residual error blocks and common convolution operation, and meanwhile, characteristic points are added in the reconstruction process to serve as guidance. The specific implementation process is shown in fig. 3. The alignment of the three-dimensional face reconstruction model is specifically performed by: the feature points are added in the three-dimensional face reconstruction process as a guide, and the process enables a three-dimensional face model after three-dimensional face reconstruction to directly determine coordinate values of the feature points, so that the three-dimensional face model after three-dimensional face reconstruction is an aligned three-dimensional face model.
The three-dimensional face reconstruction model is realized by adopting a structure combining an encoder (encoder) and a decoder (decoder), the three-dimensional face is represented by a three-dimensional face (Volumeric), the face is regarded as 200 cross slices from a behind-the-ear plane to a nose tip plane, each cross slice is an equal-altitude point, and feature point guidance is added in the working process of the encoder and the decoder to directly obtain the three-dimensional face model.
And 4, step 4: extracting features of the human face three-dimensional model obtained in the step 3, designing a residual network model for extracting features, training and optimizing the residual network model by adopting large-margin cosine measurement, fine-tuning network parameters of the residual network model, increasing a skip level structure (by taking the characteristics of the residual network as a reference, the output of each layer can skip multi-layer propagation), optimizing a recognition result by adjusting the number of skip layers during each training (the number of skip layers after the number of skip layers during each training is adjusted is the same) and the number of convolutional layers of the residual network model, and extracting the human face feature vector of the human face three-dimensional model generated in the step 3 by training the optimized residual network model, wherein the specific implementation process is shown in fig. 4. In this embodiment, the residual network model includes 18 convolutional layers connected in sequence, each layer is convolved with convolution kernels having the same size of 3 × 3, the number of convolution kernels in the 1 st to 6 th convolutional layers is 64, the number of convolution kernels in the 7 th to 12 th convolutional layers is 128, the number of convolution kernels in the 13 th to 18 th convolutional layers is 256, and an additional connection is added after each two layers of convolutional layers, that is, an input of a first layer of the two layers of convolutional layers and an output after passing through the two layers of convolutional layers are fused and then used as an input of a next layer of the two layers of convolutional layers, and after multiple jumping and fusion, the input is input into a full-connection layer after average pooling, and finally, a face feature vector of the face three-dimensional model is obtained. The purpose of this is that for a general convolutional neural network, as the number of convolutional layers increases, the problem of gradient disappearance or gradient explosion is easily generated, and then the adoption of this skip level structure can make it possible to avoid the above problem even in a deep network model.
Aiming at the problem of softmax loss, a normalized cosine loss function with large margin is adopted during training, and a jump-level structure (namely, the number of layers of each jump in the residual error network model is adjusted) is adopted, so that the problems of gradient disappearance and the like along with the increase of depth in the training process are solved.
And 5: and finally, comparing the face feature vectors extracted in the step 4 in a preset face library to realize face matching and further carry out face recognition, wherein the face recognition is realized by adopting a recognition network, and a recognition result is output through the recognition network, namely which person is recognized is output.
The core of the invention is mainly that end-to-end three-dimensional face reconstruction is carried out by utilizing a three-dimensional face reconstruction model, and a loss function of a subsequent residual network model is improved to a certain extent, the three-dimensional face reconstruction model in the invention is different from the prior face reconstruction model in that a corresponding face three-dimensional model is obtained by CNN network regression directly and alignment is not needed, compared with the traditional method, the invention can generate a face three-dimensional model in a one-to-one manner, and the reconstruction effect is superior to that of the traditional method (the traditional method at present mainly reconstructs a three-dimensional face model based on fitting 3DMM parameters and generates a face three-dimensional model in a multi-to-one manner).
Outputting a corresponding UV position mapping chart aiming at each three-dimensional face reconstruction, then outputting a face three-dimensional model aiming at each UV position mapping chart, carrying out face recognition aiming at the obtained face three-dimensional model each time, and specifically outputting which person is recognized.
The most important part of the method is the three-dimensional face reconstruction part, and the performance of the three-dimensional face reconstruction model influences the face recognition efficiency of the whole method.
The three-dimensional human face reconstruction model generally adopts an encoder-decoder structure, a 256 x 3 human face image is input, an 8 x 512 feature map is formed after encoding, a 256 x 3 UV position mapping map is output after decoding, wherein the UV position mapping map is a two-dimensional image for recording three-dimensional coordinates of facial point clouds of all human face images, semantic information is reserved in each UV position mapping map, RGB information of each point in the three-dimensional coordinates of the facial point clouds of the human face images is contained in the UV position mapping map, the encoding structure is formed by cascading 10 identical residual blocks, the structure of the residual blocks is shown in figure 5, each residual block structure comprises 3 convolutional layers, the first layer adopts identical convolutional kernels with the size of 1 x 1, the number of the convolutional kernels is 64, the second layer adopts identical convolutional kernels with the size of 3 x 3, the number of convolution kernels is 64, the same convolution kernels with the size of 1 x 1 are adopted in the third layer, the number of convolution kernels is 256, each convolution layer is activated through a Relu function after convolution and input into the next convolution layer, finally, the output after passing through the three-layer residual block structure and the input before inputting into the first convolution layer of the residual block structure are subjected to summation, then the Relu function activation is carried out to obtain the output of each residual block structure, the decoding structure is composed of 17 layers of deconvolution layers, the size of the deconvolution kernel in the first layer is 4 x 4, the step size is 2, the padding is 1, the size of the deconvolution kernel in the second layer is 2 x 2, the step size is 1 and the padding is 0, the parameters of the odd layers in the 17 layers of deconvolution layers are the same as those in the first layer, the parameters of the even layers are the same as those in the second layer, and the size of the finally obtained image is the same as the input through the decoding structure, and finally, adopting a Sigmoid function by an output layer. The loss function adopted by the three-dimensional face reconstruction model is shown as a formula (1):
Figure BDA0002282017700000061
wherein (x, y) represents coordinate values in the UV position map, W (x, y) represents a weight of each point in the UV position map, P (x, y) represents a prediction of the UV position map, and
Figure BDA0002282017700000062
a true UV bit representing the current faceMap information is set.
And training the three-dimensional face reconstruction model, wherein a 300W-based synthetic data set (300W-LP) is adopted as a training set when the three-dimensional face reconstruction model is trained, and the synthetic data set comprises annotations of different face image angles and estimated 3DMM coefficients. The size of the facial image of the human face in the training sample of the training set is firstly scaled to 256 × 256, then an adaptive moment estimation (Adam) optimizer is used, the learning rate is started from 0.0001, the attenuation is half every 5 cycles, and the size of the training batch in the training process is set to be 16.
The test set of the three-dimensional face reconstruction model described above was chosen as a data set AFLW2000-3D containing 2000 unconstrained face images for evaluation of the three-dimensional face reconstruction and alignment. The experimental effect is shown in fig. 6 (where the first image in each line is an input face image, the second image in each line is a result reconstructed by using the three-dimensional face reconstruction model, and the third image in each line is a result obtained by using a VRN-Guide (an algorithm with the best reconstruction effect at present)), and it can be seen from fig. 6 that the reconstruction effect of the present invention is better than that of the VRN-Guide.
Inputting the three-dimensional face reconstruction into the residual error network model for feature extraction, wherein the specific structure is shown in fig. 4, and different from the previous feature extraction, the residual error network model is trained by adopting an improved cosine measure-based loss function (LMCL), and the loss function of the residual error network model is shown in formula (2):
Figure BDA0002282017700000063
formula (2) shows that if the input face number to be recognized is the current face label to be recognized, an interval m is added during loss calculation, if not, the interval m is not added, wherein m is an interval, training samples in the training set can be further separated, and if the weight of a full connection layer for recognizing the current face to be recognized in the residual error network model is W j J represents a current face label to be recognized in the residual error network model, and the input vector is x'Assuming that the fully-connected layer offset bias is 0, the output after passing through the fully-connected layer (for example, the current face label to be recognized is 1, and then W is j That is, for a full connection layer weight of the current face label 1 to be recognized, j = 1):
f j =||W j ||*||x′||*cos(θ j ) (3)
wherein theta is j Is W j Forming an angle with x', and forming an angle with W j Setting 1 as | l, s as (s is a constant), N represents the total number of training samples in the training set, i represents a label corresponding to any input face in the training set, yi represents the number of the input face to be recognized, and theta yi Full connection layer weight W for representing input face to be recognized and current face to be recognized j The angle between them, after adding the interval m, yields the loss function used in equation (2).
The final effect of the Loss function adopted by the residual network model relative to other Loss functions is shown in fig. 7, wherein the effect of the Loss function (LMCL) adopted by the invention is generally better than that of the Loss function (Softmax Loss, L-Softmax Loss, a-Softmax Loss and the like) commonly used at present under various data sets (LFW, YTF, MF1 RAN K1 and the like).
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.

Claims (1)

1. A dynamic face recognition method is characterized by comprising the following steps:
step 1: firstly, shooting a video through a camera, using a face detector, calling a face detection algorithm in the face detector, intercepting image frames containing faces from the video, and inputting each frame of face image frames intercepted from the video into the face detector to obtain a marked face image;
and 2, step: then aiming at the influence of illumination, estimating an illumination mode by utilizing an illumination mode parameter space, and carrying out targeted illumination compensation on the marked face image obtained in the step 1 to obtain a marked face image subjected to illumination compensation so as to eliminate the influence of shadow and highlight caused by non-uniform front illumination;
and step 3: performing three-dimensional face reconstruction on the marked face image subjected to illumination compensation obtained in the step 2 to eliminate the influence of the posture and improve the recognition effect, and constructing a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) to perform end-to-end three-dimensional face reconstruction to obtain a corresponding face three-dimensional model;
and 4, step 4: extracting features of the human face three-dimensional model obtained in the step 3, constructing a residual error network model for extracting features, training and optimizing the residual error network model by adopting large-margin cosine measurement, finely adjusting network parameters of the residual error network model, and simultaneously adding a skip level structure, namely the output of each layer of convolutional layer of the residual error network model can spread across multiple layers of convolutional layers, optimizing a recognition result by adjusting the number of skip layers during each training and the number of convolutional layers of the residual error network model, wherein the number of skip layers during each training is the same as the number of convolutional layers of the residual error network model after the number of skip layers during each training is adjusted, and extracting the human face feature vector of the human face three-dimensional model generated in the step 3 by the trained and optimized residual error network model;
and 5: finally, comparing the face feature vectors extracted in the step 4 in a preset face library to realize face matching and further carry out face recognition, wherein the face recognition is realized by adopting a recognition network, and a recognition result is output through the recognition network, namely which person is recognized;
the three-dimensional face reconstruction model in the step 3 is realized by adopting a mode of combining a residual block with a common convolution operation and adopting a structure of combining an encoder (encoder) and a decoder (decoder), the three-dimensional face is represented by a three-dimensional face (Volumeric), the face is considered as 200 cross slices from an ear back plane to a nose tip plane, each cross slice is an equal altitude point, meanwhile, feature point guidance is added in the working process of the encoder and the decoder, and the specific operation of aligning the three-dimensional face reconstruction model is as follows: the characteristic points are added in the three-dimensional face reconstruction process as guidance, and the process enables a three-dimensional face model after three-dimensional face reconstruction to directly determine coordinate values of the characteristic points, so that the three-dimensional face model after three-dimensional face reconstruction is an aligned three-dimensional face model;
outputting a corresponding UV position mapping chart for each three-dimensional face reconstruction, then outputting a face three-dimensional model for each UV position mapping chart, and performing face recognition on the obtained face three-dimensional model each time to specifically output which person is recognized;
specifically, the three-dimensional human face reconstruction model generally adopts a coding-decoding (encoder-decoder) structure, a 256 × 256 × 3 human face image is input, then an 8 × 8 × 512 feature map is formed after coding, and finally a 256 × 256 × 3 UV position mapping map is output after decoding, wherein the UV position mapping map is a two-dimensional image for recording three-dimensional coordinates of all human face image face point clouds, semantic information is reserved in each UV position mapping map, and RGB information of each point in the three-dimensional coordinates of the human face image face point clouds is contained; the coding structure is formed by cascading 10 identical residual block structures, each residual block structure comprises 3 convolutional layers, the first layer adopts convolutional kernels with the same size of 1 x 1, the number of the convolutional kernels is 64, the second layer adopts convolutional kernels with the same size of 3 x 3, the number of the convolutional kernels is 64, the third layer adopts convolutional kernels with the same size of 1 x 1, the number of the convolutional kernels is 256, each convolutional layer is activated through a Relu function after being convolved and input into the next convolutional layer, finally, the output after passing through the three layers of residual block structures and the input before inputting into the first layer of convolutional layer of the residual block structure are subjected to summation, and then the output of each residual block structure is obtained after the Relu function activation; the decoding structure is composed of 17 layers of deconvolution layers, wherein the first layer is provided with a deconvolution kernel with the size of 4 multiplied by 4, the step length of 2 and the padding of 1, the second layer is provided with a deconvolution kernel with the size of 2 multiplied by 2, the step length of 1 and the padding of 0, the parameters of odd-numbered layers of deconvolution layers in the 17 layers of deconvolution layers are the same as the parameters of the first layer of deconvolution layers, the parameters of even-numbered layers of deconvolution layers are the same as the parameters of the second layer of deconvolution layers, the size of the finally obtained image is the same as the input through the decoding structure, and the final output layer adopts a Sigmoid function; the loss function adopted by the three-dimensional face reconstruction model is as follows:
Figure FDA0003687500960000021
wherein (x, y) represents coordinate values in the UV position map, W (x, y) represents weights of points in the UV position map, P (x, y) represents a prediction of the UV position map, and
Figure FDA0003687500960000022
real UV position mapping map information representing a current face;
training the three-dimensional face reconstruction model, wherein a 300W-based synthetic data set 300W-LP is used as a training set when the three-dimensional face reconstruction model is trained, and the training set comprises annotations of different face image angles and estimated 3DMM coefficients; firstly, scaling the size of a face image in a training sample of the training set to 256 multiplied by 256, then using an adaptive moment estimation (Adam) optimizer, starting from 0.0001, attenuating the learning rate by half every 5 periods, and setting the size of a training batch to be 16 in the training process;
selecting a data set AFLW2000-3D from the test set of the three-dimensional face reconstruction model, wherein the data set AFLW2000-3D comprises 2000 unconstrained face images for evaluating three-dimensional face reconstruction and alignment;
the residual error network model adopted in the step 4 comprises 18 convolutional layers which are sequentially connected, each layer is convoluted by adopting the same convolution kernel with the size of 3 × 3, the number of the convolution kernels in the 1 st to 6 th convolutional layers is 64, the number of the convolution kernels in the 7 th to 12 th convolutional layers is 128, the number of the convolution kernels in the 13 th to 18 th convolutional layers is 256, an additional connection is added after each two layers of convolutional layers, namely, the input of the first layer of the two layers of convolutional layers and the output of the first layer after the two layers of convolutional layers are fused to be used as the input of the next convolutional layer of the two layers of convolutional layers, after multiple jumping fusion, the input of the fully-connected layers is subjected to average pooling, and finally the face feature vector of the face three-dimensional model is obtained, and the problem of gradient disappearance or gradient explosion generated along with the increase of the depth in the training process is avoided by adopting the jump structure;
the residual network model is trained by adopting an improved cosine measure-based loss function (LMCL), and the loss function of the residual network model is as follows:
Figure FDA0003687500960000031
wherein m is an interval, training samples in a training set of the residual error network model can be further divided, and if the weight of a full connection layer for identifying the current face to be identified in the residual error network model is W j J represents the current face label to be recognized in the residual network model, the input vector is x', and if the offset bias of the full connection layer is 0, the output after the full connection layer is performed is as follows:
f j =||W j ||*||x′||*cos(θ j )
wherein theta is j Is W j Forming an included angle with x', and using | | | W j Setting 1 as | l, s as | x' | and s as a constant, N representing the total number of training samples in the training set of the residual error network model, i representing the label corresponding to any input human face in the training set of the residual error network model, yi representing the number of the input human face to be recognized, theta yi Full connection layer weight W for representing input face to be recognized and current face to be recognized j The included angle between the two is added into the interval m, and then the Loss function Loss is obtained 2
CN201911145377.5A 2019-11-21 2019-11-21 Dynamic face recognition method Active CN110991281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911145377.5A CN110991281B (en) 2019-11-21 2019-11-21 Dynamic face recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911145377.5A CN110991281B (en) 2019-11-21 2019-11-21 Dynamic face recognition method

Publications (2)

Publication Number Publication Date
CN110991281A CN110991281A (en) 2020-04-10
CN110991281B true CN110991281B (en) 2022-11-04

Family

ID=70085444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911145377.5A Active CN110991281B (en) 2019-11-21 2019-11-21 Dynamic face recognition method

Country Status (1)

Country Link
CN (1) CN110991281B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488857A (en) * 2020-04-29 2020-08-04 北京华捷艾米科技有限公司 Three-dimensional face recognition model training method and device
CN112001268B (en) * 2020-07-31 2024-01-12 中科智云科技有限公司 Face calibration method and equipment
CN111950496B (en) * 2020-08-20 2023-09-15 广东工业大学 Mask person identity recognition method
CN112200006A (en) * 2020-09-15 2021-01-08 青岛邃智信息科技有限公司 Human body attribute detection and identification method under community monitoring scene
CN112489205A (en) * 2020-12-16 2021-03-12 北京航星机器制造有限公司 Method for manufacturing simulated human face
CN112507916B (en) * 2020-12-16 2021-07-27 苏州金瑞阳信息科技有限责任公司 Face detection method and system based on facial expression
CN112686202B (en) * 2021-01-12 2023-04-25 武汉大学 Human head identification method and system based on 3D reconstruction
CN112819928B (en) * 2021-01-27 2022-10-28 成都数字天空科技有限公司 Model reconstruction method and device, electronic equipment and storage medium
CN112882666A (en) * 2021-03-15 2021-06-01 上海电力大学 Three-dimensional modeling and model filling-based 3D printing system and method
CN113255466A (en) * 2021-04-30 2021-08-13 广州有酱料网络科技有限公司 Sauce supply chain logistics monitoring system
CN113468984A (en) * 2021-06-16 2021-10-01 哈尔滨理工大学 Crop pest and disease leaf identification system, identification method and pest and disease prevention method
CN113469269A (en) * 2021-07-16 2021-10-01 上海电力大学 Residual convolution self-coding wind-solar-charged scene generation method based on multi-channel fusion
CN113705393A (en) * 2021-08-16 2021-11-26 武汉大学 3D face model-based depression angle face recognition method and system
CN116188612A (en) * 2023-02-20 2023-05-30 信扬科技(佛山)有限公司 Image reconstruction method, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090403A (en) * 2016-11-22 2018-05-29 上海银晨智能识别科技有限公司 A kind of face dynamic identifying method and system based on 3D convolutional neural networks
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN109299643A (en) * 2018-07-17 2019-02-01 深圳职业技术学院 A kind of face identification method and system based on big attitude tracking
CN110020620A (en) * 2019-03-29 2019-07-16 中国科学院深圳先进技术研究院 Face identification method, device and equipment under a kind of big posture
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170000748A (en) * 2015-06-24 2017-01-03 삼성전자주식회사 Method and apparatus for face recognition
JP6754619B2 (en) * 2015-06-24 2020-09-16 三星電子株式会社Samsung Electronics Co.,Ltd. Face recognition method and device
CN108229322B (en) * 2017-11-30 2021-02-12 北京市商汤科技开发有限公司 Video-based face recognition method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090403A (en) * 2016-11-22 2018-05-29 上海银晨智能识别科技有限公司 A kind of face dynamic identifying method and system based on 3D convolutional neural networks
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN109299643A (en) * 2018-07-17 2019-02-01 深圳职业技术学院 A kind of face identification method and system based on big attitude tracking
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
CN110020620A (en) * 2019-03-29 2019-07-16 中国科学院深圳先进技术研究院 Face identification method, device and equipment under a kind of big posture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Josef Kittler等.Conformal Mapping of a 3D Face Representation onto a 2D Image for CNN Based Face Recognition.《2018 International Conference on Biometrics (ICB)》.2018,124-131. *
夏洋洋.基于深度学习的非限定条件下人脸识别研究.《中国优秀硕士学位论文全文数据库(信息科技辑)》.2017,全文. *

Also Published As

Publication number Publication date
CN110991281A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110991281B (en) Dynamic face recognition method
Lee et al. From big to small: Multi-scale local planar guidance for monocular depth estimation
Li et al. In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking
CN110119686B (en) Safety helmet real-time detection method based on convolutional neural network
CN113936339B (en) Fighting identification method and device based on double-channel cross attention mechanism
CN112766160B (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN112766158B (en) Multi-task cascading type face shielding expression recognition method
CN106650653B (en) Construction method of human face recognition and age synthesis combined model based on deep learning
CN112418095B (en) Facial expression recognition method and system combined with attention mechanism
CN107403142B (en) A kind of detection method of micro- expression
CA2934514A1 (en) System and method for identifying faces in unconstrained media
CN109684969B (en) Gaze position estimation method, computer device, and storage medium
CN112418041B (en) Multi-pose face recognition method based on face orthogonalization
CN109960975B (en) Human face generation and human face recognition method based on human eyes
WO2023040679A1 (en) Fusion method and apparatus for facial images, and device and storage medium
Yu et al. Detecting deepfake-forged contents with separable convolutional neural network and image segmentation
CN116645917A (en) LED display screen brightness adjusting system and method thereof
CN115484410A (en) Event camera video reconstruction method based on deep learning
CN113850182A (en) Action identification method based on DAMR-3 DNet
Chen et al. Face recognition with masks based on spatial fine-grained frequency domain broadening
CN116091793A (en) Light field significance detection method based on optical flow fusion
CN115546828A (en) Method for recognizing cow faces in complex cattle farm environment
CN114898447A (en) Personalized fixation point detection method and device based on self-attention mechanism
CN114067187A (en) Infrared polarization visible light face translation method based on countermeasure generation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant