CN110991281A

CN110991281A - Dynamic face recognition method

Info

Publication number: CN110991281A
Application number: CN201911145377.5A
Authority: CN
Inventors: 高建彬; 蒋文韬
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2020-04-10
Anticipated expiration: 2039-11-21
Also published as: CN110991281B

Abstract

The invention provides a dynamic face recognition method, which comprises the following steps: firstly, shooting a video through a camera so as to obtain a marked face image; performing targeted illumination compensation on the marked face image; constructing a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) for three-dimensional face reconstruction to obtain a face three-dimensional model; extracting the features of the human face three-dimensional model to obtain a human face feature vector; and carrying out face matching based on the face feature vector, and further realizing face recognition. According to the dynamic face recognition method provided by the invention, the problems of gesture shielding and the like can be well solved through three-dimensional face reconstruction, and the superior dynamic face recognition effect compared with the prior art can be obtained by performing face recognition on the three-dimensional face reconstruction model based on the CNN.

Description

Dynamic face recognition method

Technical Field

The invention relates to a face recognition technology, in particular to a dynamic face recognition method.

Background

The research of face recognition dates back to the sixties and seventies of the last century, and the face recognition system has the following advantages: 1. optional characteristics: the user can almost acquire the face image in an unconscious state without specially matching with face acquisition equipment, and the sampling mode is not mandatory; "2, non-contact: the user can obtain the face image without directly contacting with the equipment; 3. concurrency: the method can be used for sorting, judging and identifying a plurality of faces in an actual application scene; 4. also in line with the visual characteristics: the characteristic of 'people can be identified by the appearance', the operation is simple, the result is visual, and the concealment is good. The traditional face recognition technology is mainly based on face recognition of visible light images, which is a familiar recognition mode, and the technology has been developed for over 30 years. However, this method has a defect that it is difficult to overcome, and especially when the ambient light changes, the recognition effect will be rapidly reduced, which cannot meet the needs of the actual system. Technologies for solving the illumination problem include three-dimensional image face recognition and thermal imaging face recognition. However, the two technologies are still far from mature and the recognition effect is not satisfactory. In recent years, with the development of deep learning, the success rate of face recognition in a control environment is high, but a satisfactory effect is difficult to achieve in a non-control environment, some factors restrict large-scale commercialization of a face recognition system, factors such as illumination change, different postures, local face shielding, complex expression and age change greatly increase the difficulty of face recognition, and meanwhile, deep learning is driven by data, and at present, it is not practical for the same person to obtain a large number of images of various types of the face recognition system, so that the detection effect under an uncontrollable condition is also influenced to a certain extent. Therefore, the method for improving the accuracy and the robustness of the face recognition method has great research value, and has great application value in the fields of security monitoring, identity authentication, crime investigation, intelligent control and the like. Early recognition methods based on human face geometric features were generally used, and the geometric features commonly used were the local shape features of five sense organs of a human face, such as eyes, nose, mouth, and the like. After that, a face recognition method based on correlation matching, subspace and statistics appears, and then based on a neural network, Gutta and the like propose a mixed neural network, Lawrence and the like to realize sample clustering through a multi-stage SOM, a convolutional neural network CNN is used for face recognition, Lin and the like adopt a probability decision-based neural network method, Demers and the like propose to extract face image features by adopting a principal component neural network method, an autocorrelation neural network is used to further compress the features, and finally a multilayer perceptron (MLP) is adopted to realize face recognition. Er and the like adopt Principal Component Analysis (PCA) to carry out dimension compression, then Linear Discriminant Analysis (LDA) is used for extracting features, and then face recognition is carried out based on a Radial Basis Function (RBF). The advantage of the neural network is that the implicit expression of the rules and regulations is obtained through the learning process, and the neural network is highly adaptive. Finally, an identification method based on a three-dimensional model can overcome the defects of the traditional two-dimensional image, Atick and the like expand the thought of a characteristic face, mainly represent the front face by a three-dimensional function, Nastar and the like provide a method based on a gray plane, a face image is modeled into a variable 3D grid surface, the face matching problem is converted into an elastic matching problem, Anh and the like provide a 3D face reconstruction algorithm, the 3D face is reconstructed by using a deep convolution neural network method, the 3D face can be accurately estimated by a plurality of pictures of the same face, and a good effect is achieved. Generally, three-dimensional face images are based on multiple pictures, and since 2017, inspired by the success of Deep Neural Network (DNN), Dou et al propose a DNN-based method for end-to-end 3D face reconstruction from a single 2D image; jackson et al propose that large-pose three-dimensional face reconstruction is realized from a single image, a three-dimensional face is directly regressed from a CNN instead of estimating parameters of a face 3D deformation statistical model (3DMM) through the CNN; in 2018, Feng Y et al propose a reconstruction algorithm that can simultaneously accomplish three-dimensional reconstruction and end-to-end alignment. Lu et al propose a three-dimensional reconstruction method based on single-picture geometric details by aligning the projection of 3D face markers with 2D markers detected from the input image, generating a smooth and coarse 3D face by an exemplary-based bilinear face model; then, using local correction, refining the 3D face by luminosity consistency constraint; finally, a shadow shape method is applied on the media side to recover fine geometric details. The method keeps higher reduction degree, and can also improve the identification effect to a great extent during identification. Feng et al evaluated dense three-dimensional reconstruction of 2D images in the field, provided a dedicated data set, while comparing the effects of the three more advanced 3D reconstruction systems at present.

However, the existing dynamic face recognition technology has a series of problems: (1) the problem of illumination: when faced with the examination of various environmental light sources, the phenomena of side light, backlight, top light, high light and the like may occur, and the illumination at each time interval may be different, even the illumination at each position in the monitoring area may be different. (2) Character pose and accouterment problems: because the dynamic monitoring is of a non-fitting type, monitored personnel pass through a monitoring area in a natural posture, and therefore various non-frontal face postures such as a side face, a head-down posture and a head-up posture and ornaments phenomena such as a hat, a mask and glasses can occur. (3) Image problems of the camera: many technical parameters of the camera affect the video image quality, such as the size of the sensor, the processing speed of the DSP, the built-in image processing chip and the lens, etc., and some setting parameters built in the camera also affect the video quality, such as the exposure time, the aperture, the dynamic white balance, etc. (4) Frame loss, face missing detection and the like: the situations of video frame loss and face missing detection can be caused by required network identification and system calculation identification; in the area with large monitoring people flow, people lose frames and people face missing detection is often caused due to the bandwidth problem and the computing power problem of network transmission.

Disclosure of Invention

Aiming at the problems, in order to overcome the defects, the invention provides an efficient, reasonable and effective dynamic face recognition method to solve a series of problems such as illumination postures, and the method is suitable for performing effective face recognition in a non-controllable environment. In order to achieve the above object, the dynamic face recognition method of the present invention comprises the following steps:

(1) a video is captured by a high-profile camera (e.g., sony SSC-N21), a common face detection algorithm is invoked, and frames from the video that may contain faces are intercepted by the input detector.

(2) Preprocessing the extracted frames of the human face, estimating an illumination mode by utilizing an illumination mode parameter space, and then performing targeted illumination compensation so as to eliminate the influences of shadow, highlight and the like caused by non-uniform front illumination;

(3) in order to eliminate the influence of the posture, a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) is constructed to realize three-dimensional face reconstruction, a residual block is combined with common convolution operation, meanwhile, a feature point is added in the reconstruction process as a guide, and as the alignment of the feature points of the face is carried out according to the feature points in the reconstruction process, the alignment is carried out without redundant steps after the reconstruction.

(4) Constructing a residual error network model for extracting feature vectors, training the residual error network model by adopting large-margin cosine measurement, finely adjusting network parameters of the residual error network model, simultaneously increasing a skip level structure (by taking the characteristics of the residual error network into account, the output of each layer can spread across multiple layers), and optimizing the feature vector extraction result by adjusting the number of skip layers in each training and the number of convolutional layers of the residual error network model.

(5) And matching the extracted feature vectors to realize face recognition.

The invention discloses a dynamic face recognition method, which improves a loss function in the processes of three-dimensional face reconstruction and subsequent face recognition aiming at a three-dimensional face model and realizes the dynamic face recognition efficiency through the three-dimensional face reconstruction. The invention carries out three-dimensional face reconstruction by constructing a three-dimensional face reconstruction model based on the Convolutional Neural Network (CNN), then extracts features by constructing a residual network and finally carries out face matching, illumination compensation is carried out on the face before feature extraction, the influence of illumination is eliminated, meanwhile, the problems of gesture shielding and the like can be well solved by three-dimensional face reconstruction, and the face recognition by the three-dimensional face reconstruction model based on the Convolutional Neural Network (CNN) can also obtain better effect compared with the prior art, thus obtaining good effect when carrying out dynamic face recognition.

Drawings

FIG. 1 is a flow chart of the dynamic face recognition implementation provided by the present invention

FIG. 2 is a model of face detection algorithm of the present invention

FIG. 3 is a schematic diagram of a three-dimensional human face reconstruction model according to the present invention

FIG. 4 is a schematic diagram of the residual error network model of the present invention

FIG. 5 is a diagram illustrating a residual block structure

FIG. 6 shows the comparison of three-dimensional face reconstruction results

FIG. 7 shows a comparison of human face recognition effects

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The invention provides a dynamic face recognition method, as shown in fig. 1, the method comprises the following steps:

step 1: firstly, a video is shot through a camera, a face detector (using the existing face detector) is designed for the purpose, a common face detection algorithm is called, image frames possibly containing faces are intercepted from the video, each frame of face image frames intercepted from the video is input into the face detector, and a marked face image is obtained. The specific implementation process is shown in fig. 2. Wherein the camera is a high-profile camera, such as a Sony SSC-N21 model.

Step 2: then aiming at the influence of illumination, estimating an illumination mode by using an illumination mode parameter space, and performing targeted illumination compensation on the marked face image obtained in the step (1) to obtain a marked face image subjected to illumination compensation so as to eliminate the influence of shadow, highlight and the like caused by non-uniform front illumination;

and step 3: and 3, performing three-dimensional face reconstruction on the marked face image subjected to illumination compensation obtained in the step 2. As for the two-dimensional face, the three-dimensional face has great advantages, the problems of posture and the like can be effectively solved, the influence of the posture is eliminated, and the recognition effect is improved. The three-dimensional human face reconstruction model adopts a mode of combining a residual block and a common convolution operation, and meanwhile, a characteristic point is added in the reconstruction process as a guide. The specific implementation process is shown in fig. 3. The alignment of the three-dimensional face reconstruction model is specifically performed by: the feature points are added in the three-dimensional face reconstruction process as a guide, and the process enables a three-dimensional face model after three-dimensional face reconstruction to directly determine coordinate values of the feature points, so that the three-dimensional face model after three-dimensional face reconstruction is an aligned three-dimensional face model.

The three-dimensional face reconstruction model is realized by adopting a structure combining an encoder (encoder) and a decoder (decoder), the three-dimensional face is represented by a three-dimensional face (Volumeric), the face is regarded as 200 cross slices from a behind-the-ear plane to a nose tip plane, each cross slice is an equal-altitude point, and feature point guidance is added in the working process of the encoder and the decoder to directly obtain the three-dimensional face model.

And 4, step 4: the feature extraction is performed on the face three-dimensional model obtained in the step 3, a residual network model is designed for feature extraction, training optimization is performed on the residual network model by adopting large-margin cosine measurement, network parameters of the residual network model are finely adjusted, a skip level structure is added (by taking the characteristics of the residual network as a reference, the output of each layer can skip multi-layer propagation), the recognition result is optimized by adjusting the number of skip layers during each training (the number of skip layers after the number of skip layers during each training is adjusted is the same) and the number of convolutional layers of the residual network model, the face feature vector of the face three-dimensional model generated in the step 3 is extracted by training the optimized residual network model, and the specific implementation process is shown in fig. 4. In this embodiment, the residual network model includes 18 convolutional layers connected in sequence, each layer is convolved with convolution kernels having the same size of 3 × 3, the number of convolution kernels in the 1 st to 6 th convolutional layers is 64, the number of convolution kernels in the 7 th to 12 th convolutional layers is 128, the number of convolution kernels in the 13 th to 18 th convolutional layers is 256, and an additional connection is added after each two layers of convolutional layers, that is, an input of a first layer of the two layers of convolutional layers and an output after passing through the two layers of convolutional layers are fused and then used as an input of a next layer of the two layers of convolutional layers, and after multiple jumping and fusion, the input is input into a full-connection layer after average pooling, and finally, a face feature vector of the face three-dimensional model is obtained. The purpose of this is that for a general convolutional neural network, as the number of convolutional layers increases, the problem of gradient disappearance or gradient explosion is easily generated, and then the adoption of this skip level structure can make it possible to avoid the above problem even in a deep network model.

Aiming at the problem of softmax loss, a normalized cosine loss function with large margin is adopted during training, and a jump-level structure (namely, the number of layers of each jump in the residual error network model is adjusted) is adopted, so that the problems of gradient disappearance and the like along with the increase of depth in the training process are solved.

And 5: and finally, comparing the face feature vectors extracted in the step 4 in a preset face library to realize face matching and further carry out face recognition, wherein the face recognition is realized by adopting a recognition network, and a recognition result is output through the recognition network, namely which person is recognized is output.

The core of the invention is mainly that end-to-end three-dimensional face reconstruction is carried out by utilizing a three-dimensional face reconstruction model, and a certain improvement is carried out on the loss function of a subsequent residual error network model, the three-dimensional face reconstruction model in the invention is different from the prior face reconstruction model in that a corresponding face three-dimensional model is obtained directly through CNN network regression without alignment, and compared with the traditional method, the invention can generate a face three-dimensional model in a one-to-one mode, and the reconstruction effect is superior to that of the traditional method (the traditional method at present mainly reconstructs a face three-dimensional model based on fitting 3DMM parameters, and generates the face three-dimensional model in a multi-to-one mode).

Outputting a corresponding UV position mapping chart aiming at each three-dimensional face reconstruction, then outputting a face three-dimensional model aiming at each UV position mapping chart, carrying out face recognition aiming at the obtained face three-dimensional model each time, and specifically outputting which person is recognized.

The most important part of the method is the three-dimensional face reconstruction part, and the performance of the three-dimensional face reconstruction model influences the face recognition efficiency of the whole method.

The three-dimensional human face reconstruction model generally adopts an encoder-decoder structure, a 256 x 3 human face image is input, an 8 x 512 feature map is formed after encoding, a 256 x 3 UV position mapping map is output after decoding, wherein the UV position mapping map is a two-dimensional image for recording three-dimensional coordinates of facial point clouds of all human face images, semantic information is reserved in each UV position mapping map, RGB information of each point in the three-dimensional coordinates of the facial point clouds of the human face images is contained in the UV position mapping map, the encoding structure is formed by cascading 10 identical residual blocks, the structure of the residual blocks is shown in figure 5, each residual block structure comprises 3 convolutional layers, the first layer adopts identical convolutional kernels with the size of 1 x 1, the number of the convolutional kernels is 64, the second layer adopts identical convolutional kernels with the size of 3 x 3, the number of convolution kernels is 64, the same convolution kernels with the size of 1 multiplied by 1 are adopted in the third layer, the number of convolution kernels is 256, each convolution layer is activated through a Relu function and input into the next convolution layer after being convolved, finally, the output after passing through the three-layer residual block structure and the input before inputting the first convolution layer of the residual block structure are summed, then, after Relu function activation, obtaining the output of each residual block structure, wherein the decoding structure is composed of 17 layers of deconvolution layers, the first layer sets the size of a deconvolution kernel to be 4 multiplied by 4, the step length to be 2, padding to be 1, the size of a deconvolution kernel to be 2 multiplied by 2, the step length to be 1, and the padding to be 0, the parameters of the odd layers in the 17-layer reverse convolution layer are the same as those of the first layer, the parameters of the even layers are the same as those of the second layer, and by the decoding structure, the size of the finally obtained image is the same as that of the input image, and the final output layer adopts a Sigmoid function. The loss function adopted by the three-dimensional face reconstruction model is shown in formula (1):

wherein (x, y) represents coordinate values in the UV position map, W (x, y) represents weights of each point in the UV position map, and P (x, y) tableShowing a prediction of the UV position map, and

and representing a real UV position map information of the current face.

And training the three-dimensional face reconstruction model, wherein a 300W-based synthetic data set (300W-LP) is adopted as a training set when the three-dimensional face reconstruction model is trained, and the synthetic data set comprises annotations of different face image angles and estimated 3DMM coefficients. The size of the facial image of the human face in the training sample of the training set is firstly scaled to 256 × 256, then an adaptive moment estimation (Adam) optimizer is used, the learning rate is started from 0.0001, the attenuation is half every 5 cycles, and the size of the training batch in the training process is set to be 16.

The test set of the three-dimensional face reconstruction model described above was selected as a data set AFLW2000-3D, which contains 2000 unconstrained face images for evaluation of the three-dimensional face reconstruction and alignment. The experimental effect is shown in fig. 6 (where the first image in each line is an input face image, the second image in each line is a result reconstructed by using the three-dimensional face reconstruction model, and the third image in each line is a result obtained by using a VRN-Guide (an algorithm with the best reconstruction effect at present)), and it can be seen from fig. 6 that the reconstruction effect of the present invention is superior to the VRN-Guide.

Inputting the three-dimensional face reconstruction into the residual error network model for feature extraction, wherein the specific structure is shown in fig. 4, and different from the previous feature extraction, the residual error network model is trained by adopting an improved cosine measure-based loss function (LMCL), and the loss function of the residual error network model is shown in formula (2):

formula (2) shows that if the input face number to be recognized is the current face label to be recognized, the interval m is added when calculating the loss, if not, the interval m is not added, wherein m is an interval, and the training samples in the training set can be further separated, for exampleIf the weight of the full connection layer for identifying the current face to be identified in the residual error network model is W_jJ represents the current face label to be recognized in the residual network model, the input vector is x', here, assuming that the offset bias of the full connection layer is 0, the output after passing through the full connection layer (for example, the current face label to be recognized is 1, then W_jThat is, for a full connection layer weight of the current face label 1 to be recognized, j is 1):

f_j＝||W_j||*||x′||*cos(θ_j) (3)

wherein theta is_jIs W_jForming an angle with x', and forming an angle with W_jSetting | l as 1, | x' | as s (s is a constant), N represents the total number of training samples in the training set, i represents any input face corresponding label in the training set, yi represents the input face number to be recognized, and theta_yiFull connection layer weight W for representing input face to be recognized and current face to be recognized_jThe angle between them, after adding the interval m, yields the loss function used in equation (2).

The final effect of the Loss function adopted by the residual network model relative to other Loss functions is shown in fig. 7, wherein the effect of the Loss function (LMCL) adopted by the invention is generally better than that of the Loss function (Softmax Loss, L-Softmax Loss, a-Softmax Loss and the like) commonly used at present under various data sets (LFW, YTF, MF1 RANK1 and the like).

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.

Claims

1. A dynamic face recognition method is characterized by comprising the following steps:

step 1: firstly, shooting a video through a camera, using a face detector, calling a face detection algorithm in the face detector, intercepting image frames containing faces from the video, inputting each frame of face image frames intercepted from the video into the face detector, and obtaining a marked face image;

step 2: then aiming at the influence of illumination, estimating an illumination mode by utilizing an illumination mode parameter space, and carrying out targeted illumination compensation on the marked face image obtained in the step 1 to obtain a marked face image subjected to illumination compensation so as to eliminate the influence of shadow and highlight caused by non-uniform front illumination;

and step 3: performing three-dimensional face reconstruction on the marked face image subjected to illumination compensation obtained in the step 2 to eliminate the influence of the posture and improve the recognition effect, and constructing a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) to perform end-to-end three-dimensional face reconstruction to obtain a corresponding face three-dimensional model;

and 4, step 4: extracting the features of the human face three-dimensional model obtained in the step 3, constructing a residual error network model for extracting the features, meanwhile, the residual error network model is trained and optimized by adopting large-margin cosine measurement, network parameters of the residual error network model are finely adjusted, and a skip level structure is added, i.e. the output of each convolutional layer of the residual network model can be propagated across multiple convolutional layers, optimizing the recognition result by adjusting the number of hopping layers in each training and the number of convolutional layers of the residual error network model, after adjusting the number of hopping layers in each training and the number of convolutional layers of the residual error network model, the number of layers of each jump is the same during the training, and the human face feature vector of the human face three-dimensional model generated in the step 3 is extracted through a training optimized residual error network model;

and 5: and finally, comparing the face feature vectors extracted in the step 4 in a preset face library to realize face matching and further carry out face recognition, wherein the face recognition is realized by adopting a recognition network, and a recognition result is output through the recognition network, namely which person is recognized.

2. The dynamic face recognition method according to claim 1, wherein the three-dimensional face reconstruction model in step 3 is implemented by using a combination of an encoder (encoder) and a decoder (decoder) in a manner of combining a residual block with a general convolution operation, the three-dimensional face is represented by a three-dimensional face (Volumetric), the face is considered as 200 cross-slices from a behind-the-ear plane to a nose tip plane, each cross-slice is an equal altitude point, and feature point guidance is added in the working process of the encoder and the decoder, and the specific operation of aligning the three-dimensional face reconstruction model is as follows: the characteristic points are added in the three-dimensional face reconstruction process as guidance, and the process enables a three-dimensional face model after three-dimensional face reconstruction to directly determine coordinate values of the characteristic points, so that the three-dimensional face model after three-dimensional face reconstruction is an aligned three-dimensional face model;

outputting a corresponding UV position mapping chart for each three-dimensional face reconstruction, then outputting a face three-dimensional model for each UV position mapping chart, and performing face recognition on the obtained face three-dimensional model each time to specifically output which person is recognized;

specifically, the three-dimensional human face reconstruction model generally adopts a coding-decoding (encoder-decoder) structure, a 256 × 256 × 3 human face image is input, then an 8 × 8 × 512 feature map is formed after coding, and finally a 256 × 256 × 3 UV position mapping map is output after decoding, wherein the UV position mapping map is a two-dimensional image for recording three-dimensional coordinates of all human face image face point clouds, semantic information is reserved in each UV position mapping map, and RGB information of each point in the three-dimensional coordinates of the human face image face point clouds is contained; the coding structure is formed by cascading 10 identical residual block structures, each residual block structure comprises 3 convolutional layers, the first layer adopts convolutional kernels with the same size of 1 x 1, the number of the convolutional kernels is 64, the second layer adopts convolutional kernels with the same size of 3 x 3, the number of the convolutional kernels is 64, the third layer adopts convolutional kernels with the same size of 1 x 1, the number of the convolutional kernels is 256, each convolutional layer is activated through a Relu function after being convolved and input into the next convolutional layer, finally, the output after passing through the three layers of residual block structures and the input before inputting into the first layer of convolutional layer of the residual block structure are subjected to summation, and then the output of each residual block structure is obtained after the Relu function activation; the decoding structure is composed of 17 layers of deconvolution layers, the first layer is provided with a deconvolution kernel with the size of 4 multiplied by 4, the step length of 2 and the padding of 1, the second layer is provided with a deconvolution kernel with the size of 2 multiplied by 2, the step length of 1 and the padding of 0, parameters of odd-numbered layers of deconvolution layers in the 17 layers of deconvolution layers are the same as parameters of the first layer of deconvolution layers, parameters of even-numbered layers of deconvolution layers are the same as parameters of the second layer of deconvolution layers, the size of the finally obtained image is the same as the input through the decoding structure, and finally the output layer adopts a Sigmoid function; the loss function adopted by the three-dimensional face reconstruction model is as follows:

wherein (x, y) represents coordinate values in the UV position map, W (x, y) represents weights of points in the UV position map, P (x, y) represents a prediction of the UV position map, and

real UV position mapping map information representing a current face;

training the three-dimensional face reconstruction model, wherein a 300W-based synthetic data set 300W-LP is used as a training set when the three-dimensional face reconstruction model is trained, and the training set comprises annotations of different face image angles and estimated 3DMM coefficients; firstly, scaling the size of a face image in a training sample of the training set to 256 multiplied by 256, then using an adaptive moment estimation (Adam) optimizer, starting from 0.0001, attenuating the learning rate by half every 5 periods, and setting the size of a training batch to be 16 in the training process;

the test set of the three-dimensional face reconstruction model selects a data set AFLW2000-3D, wherein the data set AFLW2000-3D comprises 2000 unconstrained face images for evaluating three-dimensional face reconstruction and alignment.

3. The dynamic face recognition method according to claim 2, wherein the residual network model used in step 4 comprises 18 convolutional layers connected in sequence, each layer is convolved with a convolution kernel with the same size of 3 x 3, the number of convolution kernels in the 1 st to 6 th convolutional layers is 64, the number of convolution kernels in the 7 th to 12 th convolutional layers is 128, the number of convolution kernels in the 13 th to 18 th convolutional layers is 256, and an additional connection is added after each two convolutional layers, that is, the input of the first layer of the two convolutional layers is fused with the output of the two convolutional layers and then used as the input of the next convolutional layer of the two convolutional layers, after multiple jump fusion, the full-connected layers are input after average pooling, and finally the face feature vector of the face three-dimensional model is obtained, the jump structure is adopted to avoid the problem of gradient disappearance or gradient explosion along with the increase of the depth in the training process;

the residual network model is trained by adopting an improved cosine measure-based loss function (LMCL), and the loss function of the residual network model is as follows:

wherein m is an interval, training samples in a training set of the residual error network model can be further divided, and if the weight of a full connection layer for identifying the current face to be identified in the residual error network model is W_jJ represents the current face label to be recognized in the residual network model, the input vector is x <, and if the offset bias of the full connection layer is 0, the output after the full connection layer is output:

f_j＝||W_j||*||x′||*cos(θ_j)

wherein theta is_jIs W_jForming an angle with x', and forming an angle with W_jSetting | as 1, | x' | as s, s as a constant, N represents the training of the residual error network modelThe total number of the concentrated training samples, i represents any input face corresponding label in the training set of the residual error network model, yi represents the input face number to be recognized, and theta_yiFull connection layer weight W for representing input face to be recognized and current face to be recognized_jThe included angle between the two is added to the interval m, and then the Loss function Loss is obtained₂。