CN110991281A - Dynamic face recognition method - Google Patents

Dynamic face recognition method Download PDF

Info

Publication number
CN110991281A
CN110991281A CN201911145377.5A CN201911145377A CN110991281A CN 110991281 A CN110991281 A CN 110991281A CN 201911145377 A CN201911145377 A CN 201911145377A CN 110991281 A CN110991281 A CN 110991281A
Authority
CN
China
Prior art keywords
face
dimensional
model
layers
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911145377.5A
Other languages
Chinese (zh)
Other versions
CN110991281B (en
Inventor
高建彬
蒋文韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201911145377.5A priority Critical patent/CN110991281B/en
Publication of CN110991281A publication Critical patent/CN110991281A/en
Application granted granted Critical
Publication of CN110991281B publication Critical patent/CN110991281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a dynamic face recognition method, which comprises the following steps: firstly, shooting a video through a camera so as to obtain a marked face image; performing targeted illumination compensation on the marked face image; constructing a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) for three-dimensional face reconstruction to obtain a face three-dimensional model; extracting the features of the human face three-dimensional model to obtain a human face feature vector; and carrying out face matching based on the face feature vector, and further realizing face recognition. According to the dynamic face recognition method provided by the invention, the problems of gesture shielding and the like can be well solved through three-dimensional face reconstruction, and the superior dynamic face recognition effect compared with the prior art can be obtained by performing face recognition on the three-dimensional face reconstruction model based on the CNN.

Description

Dynamic face recognition method
Technical Field
The invention relates to a face recognition technology, in particular to a dynamic face recognition method.
Background
The research of face recognition dates back to the sixties and seventies of the last century, and the face recognition system has the following advantages: 1. optional characteristics: the user can almost acquire the face image in an unconscious state without specially matching with face acquisition equipment, and the sampling mode is not mandatory; "2, non-contact: the user can obtain the face image without directly contacting with the equipment; 3. concurrency: the method can be used for sorting, judging and identifying a plurality of faces in an actual application scene; 4. also in line with the visual characteristics: the characteristic of 'people can be identified by the appearance', the operation is simple, the result is visual, and the concealment is good. The traditional face recognition technology is mainly based on face recognition of visible light images, which is a familiar recognition mode, and the technology has been developed for over 30 years. However, this method has a defect that it is difficult to overcome, and especially when the ambient light changes, the recognition effect will be rapidly reduced, which cannot meet the needs of the actual system. Technologies for solving the illumination problem include three-dimensional image face recognition and thermal imaging face recognition. However, the two technologies are still far from mature and the recognition effect is not satisfactory. In recent years, with the development of deep learning, the success rate of face recognition in a control environment is high, but a satisfactory effect is difficult to achieve in a non-control environment, some factors restrict large-scale commercialization of a face recognition system, factors such as illumination change, different postures, local face shielding, complex expression and age change greatly increase the difficulty of face recognition, and meanwhile, deep learning is driven by data, and at present, it is not practical for the same person to obtain a large number of images of various types of the face recognition system, so that the detection effect under an uncontrollable condition is also influenced to a certain extent. Therefore, the method for improving the accuracy and the robustness of the face recognition method has great research value, and has great application value in the fields of security monitoring, identity authentication, crime investigation, intelligent control and the like. Early recognition methods based on human face geometric features were generally used, and the geometric features commonly used were the local shape features of five sense organs of a human face, such as eyes, nose, mouth, and the like. After that, a face recognition method based on correlation matching, subspace and statistics appears, and then based on a neural network, Gutta and the like propose a mixed neural network, Lawrence and the like to realize sample clustering through a multi-stage SOM, a convolutional neural network CNN is used for face recognition, Lin and the like adopt a probability decision-based neural network method, Demers and the like propose to extract face image features by adopting a principal component neural network method, an autocorrelation neural network is used to further compress the features, and finally a multilayer perceptron (MLP) is adopted to realize face recognition. Er and the like adopt Principal Component Analysis (PCA) to carry out dimension compression, then Linear Discriminant Analysis (LDA) is used for extracting features, and then face recognition is carried out based on a Radial Basis Function (RBF). The advantage of the neural network is that the implicit expression of the rules and regulations is obtained through the learning process, and the neural network is highly adaptive. Finally, an identification method based on a three-dimensional model can overcome the defects of the traditional two-dimensional image, Atick and the like expand the thought of a characteristic face, mainly represent the front face by a three-dimensional function, Nastar and the like provide a method based on a gray plane, a face image is modeled into a variable 3D grid surface, the face matching problem is converted into an elastic matching problem, Anh and the like provide a 3D face reconstruction algorithm, the 3D face is reconstructed by using a deep convolution neural network method, the 3D face can be accurately estimated by a plurality of pictures of the same face, and a good effect is achieved. Generally, three-dimensional face images are based on multiple pictures, and since 2017, inspired by the success of Deep Neural Network (DNN), Dou et al propose a DNN-based method for end-to-end 3D face reconstruction from a single 2D image; jackson et al propose that large-pose three-dimensional face reconstruction is realized from a single image, a three-dimensional face is directly regressed from a CNN instead of estimating parameters of a face 3D deformation statistical model (3DMM) through the CNN; in 2018, Feng Y et al propose a reconstruction algorithm that can simultaneously accomplish three-dimensional reconstruction and end-to-end alignment. Lu et al propose a three-dimensional reconstruction method based on single-picture geometric details by aligning the projection of 3D face markers with 2D markers detected from the input image, generating a smooth and coarse 3D face by an exemplary-based bilinear face model; then, using local correction, refining the 3D face by luminosity consistency constraint; finally, a shadow shape method is applied on the media side to recover fine geometric details. The method keeps higher reduction degree, and can also improve the identification effect to a great extent during identification. Feng et al evaluated dense three-dimensional reconstruction of 2D images in the field, provided a dedicated data set, while comparing the effects of the three more advanced 3D reconstruction systems at present.
However, the existing dynamic face recognition technology has a series of problems: (1) the problem of illumination: when faced with the examination of various environmental light sources, the phenomena of side light, backlight, top light, high light and the like may occur, and the illumination at each time interval may be different, even the illumination at each position in the monitoring area may be different. (2) Character pose and accouterment problems: because the dynamic monitoring is of a non-fitting type, monitored personnel pass through a monitoring area in a natural posture, and therefore various non-frontal face postures such as a side face, a head-down posture and a head-up posture and ornaments phenomena such as a hat, a mask and glasses can occur. (3) Image problems of the camera: many technical parameters of the camera affect the video image quality, such as the size of the sensor, the processing speed of the DSP, the built-in image processing chip and the lens, etc., and some setting parameters built in the camera also affect the video quality, such as the exposure time, the aperture, the dynamic white balance, etc. (4) Frame loss, face missing detection and the like: the situations of video frame loss and face missing detection can be caused by required network identification and system calculation identification; in the area with large monitoring people flow, people lose frames and people face missing detection is often caused due to the bandwidth problem and the computing power problem of network transmission.
Disclosure of Invention
Aiming at the problems, in order to overcome the defects, the invention provides an efficient, reasonable and effective dynamic face recognition method to solve a series of problems such as illumination postures, and the method is suitable for performing effective face recognition in a non-controllable environment. In order to achieve the above object, the dynamic face recognition method of the present invention comprises the following steps:
(1) a video is captured by a high-profile camera (e.g., sony SSC-N21), a common face detection algorithm is invoked, and frames from the video that may contain faces are intercepted by the input detector.
(2) Preprocessing the extracted frames of the human face, estimating an illumination mode by utilizing an illumination mode parameter space, and then performing targeted illumination compensation so as to eliminate the influences of shadow, highlight and the like caused by non-uniform front illumination;
(3) in order to eliminate the influence of the posture, a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) is constructed to realize three-dimensional face reconstruction, a residual block is combined with common convolution operation, meanwhile, a feature point is added in the reconstruction process as a guide, and as the alignment of the feature points of the face is carried out according to the feature points in the reconstruction process, the alignment is carried out without redundant steps after the reconstruction.
(4) Constructing a residual error network model for extracting feature vectors, training the residual error network model by adopting large-margin cosine measurement, finely adjusting network parameters of the residual error network model, simultaneously increasing a skip level structure (by taking the characteristics of the residual error network into account, the output of each layer can spread across multiple layers), and optimizing the feature vector extraction result by adjusting the number of skip layers in each training and the number of convolutional layers of the residual error network model.
(5) And matching the extracted feature vectors to realize face recognition.
The invention discloses a dynamic face recognition method, which improves a loss function in the processes of three-dimensional face reconstruction and subsequent face recognition aiming at a three-dimensional face model and realizes the dynamic face recognition efficiency through the three-dimensional face reconstruction. The invention carries out three-dimensional face reconstruction by constructing a three-dimensional face reconstruction model based on the Convolutional Neural Network (CNN), then extracts features by constructing a residual network and finally carries out face matching, illumination compensation is carried out on the face before feature extraction, the influence of illumination is eliminated, meanwhile, the problems of gesture shielding and the like can be well solved by three-dimensional face reconstruction, and the face recognition by the three-dimensional face reconstruction model based on the Convolutional Neural Network (CNN) can also obtain better effect compared with the prior art, thus obtaining good effect when carrying out dynamic face recognition.
Drawings
FIG. 1 is a flow chart of the dynamic face recognition implementation provided by the present invention
FIG. 2 is a model of face detection algorithm of the present invention
FIG. 3 is a schematic diagram of a three-dimensional human face reconstruction model according to the present invention
FIG. 4 is a schematic diagram of the residual error network model of the present invention
FIG. 5 is a diagram illustrating a residual block structure
FIG. 6 shows the comparison of three-dimensional face reconstruction results
FIG. 7 shows a comparison of human face recognition effects
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The invention provides a dynamic face recognition method, as shown in fig. 1, the method comprises the following steps:
step 1: firstly, a video is shot through a camera, a face detector (using the existing face detector) is designed for the purpose, a common face detection algorithm is called, image frames possibly containing faces are intercepted from the video, each frame of face image frames intercepted from the video is input into the face detector, and a marked face image is obtained. The specific implementation process is shown in fig. 2. Wherein the camera is a high-profile camera, such as a Sony SSC-N21 model.
Step 2: then aiming at the influence of illumination, estimating an illumination mode by using an illumination mode parameter space, and performing targeted illumination compensation on the marked face image obtained in the step (1) to obtain a marked face image subjected to illumination compensation so as to eliminate the influence of shadow, highlight and the like caused by non-uniform front illumination;
and step 3: and 3, performing three-dimensional face reconstruction on the marked face image subjected to illumination compensation obtained in the step 2. As for the two-dimensional face, the three-dimensional face has great advantages, the problems of posture and the like can be effectively solved, the influence of the posture is eliminated, and the recognition effect is improved. The three-dimensional human face reconstruction model adopts a mode of combining a residual block and a common convolution operation, and meanwhile, a characteristic point is added in the reconstruction process as a guide. The specific implementation process is shown in fig. 3. The alignment of the three-dimensional face reconstruction model is specifically performed by: the feature points are added in the three-dimensional face reconstruction process as a guide, and the process enables a three-dimensional face model after three-dimensional face reconstruction to directly determine coordinate values of the feature points, so that the three-dimensional face model after three-dimensional face reconstruction is an aligned three-dimensional face model.
The three-dimensional face reconstruction model is realized by adopting a structure combining an encoder (encoder) and a decoder (decoder), the three-dimensional face is represented by a three-dimensional face (Volumeric), the face is regarded as 200 cross slices from a behind-the-ear plane to a nose tip plane, each cross slice is an equal-altitude point, and feature point guidance is added in the working process of the encoder and the decoder to directly obtain the three-dimensional face model.
And 4, step 4: the feature extraction is performed on the face three-dimensional model obtained in the step 3, a residual network model is designed for feature extraction, training optimization is performed on the residual network model by adopting large-margin cosine measurement, network parameters of the residual network model are finely adjusted, a skip level structure is added (by taking the characteristics of the residual network as a reference, the output of each layer can skip multi-layer propagation), the recognition result is optimized by adjusting the number of skip layers during each training (the number of skip layers after the number of skip layers during each training is adjusted is the same) and the number of convolutional layers of the residual network model, the face feature vector of the face three-dimensional model generated in the step 3 is extracted by training the optimized residual network model, and the specific implementation process is shown in fig. 4. In this embodiment, the residual network model includes 18 convolutional layers connected in sequence, each layer is convolved with convolution kernels having the same size of 3 × 3, the number of convolution kernels in the 1 st to 6 th convolutional layers is 64, the number of convolution kernels in the 7 th to 12 th convolutional layers is 128, the number of convolution kernels in the 13 th to 18 th convolutional layers is 256, and an additional connection is added after each two layers of convolutional layers, that is, an input of a first layer of the two layers of convolutional layers and an output after passing through the two layers of convolutional layers are fused and then used as an input of a next layer of the two layers of convolutional layers, and after multiple jumping and fusion, the input is input into a full-connection layer after average pooling, and finally, a face feature vector of the face three-dimensional model is obtained. The purpose of this is that for a general convolutional neural network, as the number of convolutional layers increases, the problem of gradient disappearance or gradient explosion is easily generated, and then the adoption of this skip level structure can make it possible to avoid the above problem even in a deep network model.
Aiming at the problem of softmax loss, a normalized cosine loss function with large margin is adopted during training, and a jump-level structure (namely, the number of layers of each jump in the residual error network model is adjusted) is adopted, so that the problems of gradient disappearance and the like along with the increase of depth in the training process are solved.
And 5: and finally, comparing the face feature vectors extracted in the step 4 in a preset face library to realize face matching and further carry out face recognition, wherein the face recognition is realized by adopting a recognition network, and a recognition result is output through the recognition network, namely which person is recognized is output.
The core of the invention is mainly that end-to-end three-dimensional face reconstruction is carried out by utilizing a three-dimensional face reconstruction model, and a certain improvement is carried out on the loss function of a subsequent residual error network model, the three-dimensional face reconstruction model in the invention is different from the prior face reconstruction model in that a corresponding face three-dimensional model is obtained directly through CNN network regression without alignment, and compared with the traditional method, the invention can generate a face three-dimensional model in a one-to-one mode, and the reconstruction effect is superior to that of the traditional method (the traditional method at present mainly reconstructs a face three-dimensional model based on fitting 3DMM parameters, and generates the face three-dimensional model in a multi-to-one mode).
Outputting a corresponding UV position mapping chart aiming at each three-dimensional face reconstruction, then outputting a face three-dimensional model aiming at each UV position mapping chart, carrying out face recognition aiming at the obtained face three-dimensional model each time, and specifically outputting which person is recognized.
The most important part of the method is the three-dimensional face reconstruction part, and the performance of the three-dimensional face reconstruction model influences the face recognition efficiency of the whole method.
The three-dimensional human face reconstruction model generally adopts an encoder-decoder structure, a 256 x 3 human face image is input, an 8 x 512 feature map is formed after encoding, a 256 x 3 UV position mapping map is output after decoding, wherein the UV position mapping map is a two-dimensional image for recording three-dimensional coordinates of facial point clouds of all human face images, semantic information is reserved in each UV position mapping map, RGB information of each point in the three-dimensional coordinates of the facial point clouds of the human face images is contained in the UV position mapping map, the encoding structure is formed by cascading 10 identical residual blocks, the structure of the residual blocks is shown in figure 5, each residual block structure comprises 3 convolutional layers, the first layer adopts identical convolutional kernels with the size of 1 x 1, the number of the convolutional kernels is 64, the second layer adopts identical convolutional kernels with the size of 3 x 3, the number of convolution kernels is 64, the same convolution kernels with the size of 1 multiplied by 1 are adopted in the third layer, the number of convolution kernels is 256, each convolution layer is activated through a Relu function and input into the next convolution layer after being convolved, finally, the output after passing through the three-layer residual block structure and the input before inputting the first convolution layer of the residual block structure are summed, then, after Relu function activation, obtaining the output of each residual block structure, wherein the decoding structure is composed of 17 layers of deconvolution layers, the first layer sets the size of a deconvolution kernel to be 4 multiplied by 4, the step length to be 2, padding to be 1, the size of a deconvolution kernel to be 2 multiplied by 2, the step length to be 1, and the padding to be 0, the parameters of the odd layers in the 17-layer reverse convolution layer are the same as those of the first layer, the parameters of the even layers are the same as those of the second layer, and by the decoding structure, the size of the finally obtained image is the same as that of the input image, and the final output layer adopts a Sigmoid function. The loss function adopted by the three-dimensional face reconstruction model is shown in formula (1):
Figure BDA0002282017700000061
wherein (x, y) represents coordinate values in the UV position map, W (x, y) represents weights of each point in the UV position map, and P (x, y) tableShowing a prediction of the UV position map, and
Figure BDA0002282017700000062
and representing a real UV position map information of the current face.
And training the three-dimensional face reconstruction model, wherein a 300W-based synthetic data set (300W-LP) is adopted as a training set when the three-dimensional face reconstruction model is trained, and the synthetic data set comprises annotations of different face image angles and estimated 3DMM coefficients. The size of the facial image of the human face in the training sample of the training set is firstly scaled to 256 × 256, then an adaptive moment estimation (Adam) optimizer is used, the learning rate is started from 0.0001, the attenuation is half every 5 cycles, and the size of the training batch in the training process is set to be 16.
The test set of the three-dimensional face reconstruction model described above was selected as a data set AFLW2000-3D, which contains 2000 unconstrained face images for evaluation of the three-dimensional face reconstruction and alignment. The experimental effect is shown in fig. 6 (where the first image in each line is an input face image, the second image in each line is a result reconstructed by using the three-dimensional face reconstruction model, and the third image in each line is a result obtained by using a VRN-Guide (an algorithm with the best reconstruction effect at present)), and it can be seen from fig. 6 that the reconstruction effect of the present invention is superior to the VRN-Guide.
Inputting the three-dimensional face reconstruction into the residual error network model for feature extraction, wherein the specific structure is shown in fig. 4, and different from the previous feature extraction, the residual error network model is trained by adopting an improved cosine measure-based loss function (LMCL), and the loss function of the residual error network model is shown in formula (2):
Figure BDA0002282017700000063
formula (2) shows that if the input face number to be recognized is the current face label to be recognized, the interval m is added when calculating the loss, if not, the interval m is not added, wherein m is an interval, and the training samples in the training set can be further separated, for exampleIf the weight of the full connection layer for identifying the current face to be identified in the residual error network model is WjJ represents the current face label to be recognized in the residual network model, the input vector is x', here, assuming that the offset bias of the full connection layer is 0, the output after passing through the full connection layer (for example, the current face label to be recognized is 1, then WjThat is, for a full connection layer weight of the current face label 1 to be recognized, j is 1):
fj=||Wj||*||x′||*cos(θj) (3)
wherein theta isjIs WjForming an angle with x', and forming an angle with WjSetting | l as 1, | x' | as s (s is a constant), N represents the total number of training samples in the training set, i represents any input face corresponding label in the training set, yi represents the input face number to be recognized, and thetayiFull connection layer weight W for representing input face to be recognized and current face to be recognizedjThe angle between them, after adding the interval m, yields the loss function used in equation (2).
The final effect of the Loss function adopted by the residual network model relative to other Loss functions is shown in fig. 7, wherein the effect of the Loss function (LMCL) adopted by the invention is generally better than that of the Loss function (Softmax Loss, L-Softmax Loss, a-Softmax Loss and the like) commonly used at present under various data sets (LFW, YTF, MF1 RANK1 and the like).
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.

Claims (3)

1. A dynamic face recognition method is characterized by comprising the following steps:
step 1: firstly, shooting a video through a camera, using a face detector, calling a face detection algorithm in the face detector, intercepting image frames containing faces from the video, inputting each frame of face image frames intercepted from the video into the face detector, and obtaining a marked face image;
step 2: then aiming at the influence of illumination, estimating an illumination mode by utilizing an illumination mode parameter space, and carrying out targeted illumination compensation on the marked face image obtained in the step 1 to obtain a marked face image subjected to illumination compensation so as to eliminate the influence of shadow and highlight caused by non-uniform front illumination;
and step 3: performing three-dimensional face reconstruction on the marked face image subjected to illumination compensation obtained in the step 2 to eliminate the influence of the posture and improve the recognition effect, and constructing a three-dimensional face reconstruction model based on a Convolutional Neural Network (CNN) to perform end-to-end three-dimensional face reconstruction to obtain a corresponding face three-dimensional model;
and 4, step 4: extracting the features of the human face three-dimensional model obtained in the step 3, constructing a residual error network model for extracting the features, meanwhile, the residual error network model is trained and optimized by adopting large-margin cosine measurement, network parameters of the residual error network model are finely adjusted, and a skip level structure is added, i.e. the output of each convolutional layer of the residual network model can be propagated across multiple convolutional layers, optimizing the recognition result by adjusting the number of hopping layers in each training and the number of convolutional layers of the residual error network model, after adjusting the number of hopping layers in each training and the number of convolutional layers of the residual error network model, the number of layers of each jump is the same during the training, and the human face feature vector of the human face three-dimensional model generated in the step 3 is extracted through a training optimized residual error network model;
and 5: and finally, comparing the face feature vectors extracted in the step 4 in a preset face library to realize face matching and further carry out face recognition, wherein the face recognition is realized by adopting a recognition network, and a recognition result is output through the recognition network, namely which person is recognized.
2. The dynamic face recognition method according to claim 1, wherein the three-dimensional face reconstruction model in step 3 is implemented by using a combination of an encoder (encoder) and a decoder (decoder) in a manner of combining a residual block with a general convolution operation, the three-dimensional face is represented by a three-dimensional face (Volumetric), the face is considered as 200 cross-slices from a behind-the-ear plane to a nose tip plane, each cross-slice is an equal altitude point, and feature point guidance is added in the working process of the encoder and the decoder, and the specific operation of aligning the three-dimensional face reconstruction model is as follows: the characteristic points are added in the three-dimensional face reconstruction process as guidance, and the process enables a three-dimensional face model after three-dimensional face reconstruction to directly determine coordinate values of the characteristic points, so that the three-dimensional face model after three-dimensional face reconstruction is an aligned three-dimensional face model;
outputting a corresponding UV position mapping chart for each three-dimensional face reconstruction, then outputting a face three-dimensional model for each UV position mapping chart, and performing face recognition on the obtained face three-dimensional model each time to specifically output which person is recognized;
specifically, the three-dimensional human face reconstruction model generally adopts a coding-decoding (encoder-decoder) structure, a 256 × 256 × 3 human face image is input, then an 8 × 8 × 512 feature map is formed after coding, and finally a 256 × 256 × 3 UV position mapping map is output after decoding, wherein the UV position mapping map is a two-dimensional image for recording three-dimensional coordinates of all human face image face point clouds, semantic information is reserved in each UV position mapping map, and RGB information of each point in the three-dimensional coordinates of the human face image face point clouds is contained; the coding structure is formed by cascading 10 identical residual block structures, each residual block structure comprises 3 convolutional layers, the first layer adopts convolutional kernels with the same size of 1 x 1, the number of the convolutional kernels is 64, the second layer adopts convolutional kernels with the same size of 3 x 3, the number of the convolutional kernels is 64, the third layer adopts convolutional kernels with the same size of 1 x 1, the number of the convolutional kernels is 256, each convolutional layer is activated through a Relu function after being convolved and input into the next convolutional layer, finally, the output after passing through the three layers of residual block structures and the input before inputting into the first layer of convolutional layer of the residual block structure are subjected to summation, and then the output of each residual block structure is obtained after the Relu function activation; the decoding structure is composed of 17 layers of deconvolution layers, the first layer is provided with a deconvolution kernel with the size of 4 multiplied by 4, the step length of 2 and the padding of 1, the second layer is provided with a deconvolution kernel with the size of 2 multiplied by 2, the step length of 1 and the padding of 0, parameters of odd-numbered layers of deconvolution layers in the 17 layers of deconvolution layers are the same as parameters of the first layer of deconvolution layers, parameters of even-numbered layers of deconvolution layers are the same as parameters of the second layer of deconvolution layers, the size of the finally obtained image is the same as the input through the decoding structure, and finally the output layer adopts a Sigmoid function; the loss function adopted by the three-dimensional face reconstruction model is as follows:
Figure FDA0002282017690000021
wherein (x, y) represents coordinate values in the UV position map, W (x, y) represents weights of points in the UV position map, P (x, y) represents a prediction of the UV position map, and
Figure FDA0002282017690000022
real UV position mapping map information representing a current face;
training the three-dimensional face reconstruction model, wherein a 300W-based synthetic data set 300W-LP is used as a training set when the three-dimensional face reconstruction model is trained, and the training set comprises annotations of different face image angles and estimated 3DMM coefficients; firstly, scaling the size of a face image in a training sample of the training set to 256 multiplied by 256, then using an adaptive moment estimation (Adam) optimizer, starting from 0.0001, attenuating the learning rate by half every 5 periods, and setting the size of a training batch to be 16 in the training process;
the test set of the three-dimensional face reconstruction model selects a data set AFLW2000-3D, wherein the data set AFLW2000-3D comprises 2000 unconstrained face images for evaluating three-dimensional face reconstruction and alignment.
3. The dynamic face recognition method according to claim 2, wherein the residual network model used in step 4 comprises 18 convolutional layers connected in sequence, each layer is convolved with a convolution kernel with the same size of 3 x 3, the number of convolution kernels in the 1 st to 6 th convolutional layers is 64, the number of convolution kernels in the 7 th to 12 th convolutional layers is 128, the number of convolution kernels in the 13 th to 18 th convolutional layers is 256, and an additional connection is added after each two convolutional layers, that is, the input of the first layer of the two convolutional layers is fused with the output of the two convolutional layers and then used as the input of the next convolutional layer of the two convolutional layers, after multiple jump fusion, the full-connected layers are input after average pooling, and finally the face feature vector of the face three-dimensional model is obtained, the jump structure is adopted to avoid the problem of gradient disappearance or gradient explosion along with the increase of the depth in the training process;
the residual network model is trained by adopting an improved cosine measure-based loss function (LMCL), and the loss function of the residual network model is as follows:
Figure FDA0002282017690000031
wherein m is an interval, training samples in a training set of the residual error network model can be further divided, and if the weight of a full connection layer for identifying the current face to be identified in the residual error network model is WjJ represents the current face label to be recognized in the residual network model, the input vector is x <, and if the offset bias of the full connection layer is 0, the output after the full connection layer is output:
fj=||Wj||*||x′||*cos(θj)
wherein theta isjIs WjForming an angle with x', and forming an angle with WjSetting | as 1, | x' | as s, s as a constant, N represents the training of the residual error network modelThe total number of the concentrated training samples, i represents any input face corresponding label in the training set of the residual error network model, yi represents the input face number to be recognized, and thetayiFull connection layer weight W for representing input face to be recognized and current face to be recognizedjThe included angle between the two is added to the interval m, and then the Loss function Loss is obtained2
CN201911145377.5A 2019-11-21 2019-11-21 Dynamic face recognition method Active CN110991281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911145377.5A CN110991281B (en) 2019-11-21 2019-11-21 Dynamic face recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911145377.5A CN110991281B (en) 2019-11-21 2019-11-21 Dynamic face recognition method

Publications (2)

Publication Number Publication Date
CN110991281A true CN110991281A (en) 2020-04-10
CN110991281B CN110991281B (en) 2022-11-04

Family

ID=70085444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911145377.5A Active CN110991281B (en) 2019-11-21 2019-11-21 Dynamic face recognition method

Country Status (1)

Country Link
CN (1) CN110991281B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488857A (en) * 2020-04-29 2020-08-04 北京华捷艾米科技有限公司 Three-dimensional face recognition model training method and device
CN111950496A (en) * 2020-08-20 2020-11-17 广东工业大学 Identity recognition method for masked person
CN112001268A (en) * 2020-07-31 2020-11-27 中科智云科技有限公司 Face calibration method and device
CN112200006A (en) * 2020-09-15 2021-01-08 青岛邃智信息科技有限公司 Human body attribute detection and identification method under community monitoring scene
CN112489205A (en) * 2020-12-16 2021-03-12 北京航星机器制造有限公司 Method for manufacturing simulated human face
CN112507916A (en) * 2020-12-16 2021-03-16 苏州金瑞阳信息科技有限责任公司 Face detection method and system based on facial expression
CN112597867A (en) * 2020-12-17 2021-04-02 佛山科学技术学院 Face recognition method and system for mask, computer equipment and storage medium
CN112686202A (en) * 2021-01-12 2021-04-20 武汉大学 Human head identification method and system based on 3D reconstruction
CN112819928A (en) * 2021-01-27 2021-05-18 成都数字天空科技有限公司 Model reconstruction method and device, electronic equipment and storage medium
CN112882666A (en) * 2021-03-15 2021-06-01 上海电力大学 Three-dimensional modeling and model filling-based 3D printing system and method
CN113255466A (en) * 2021-04-30 2021-08-13 广州有酱料网络科技有限公司 Sauce supply chain logistics monitoring system
CN113468984A (en) * 2021-06-16 2021-10-01 哈尔滨理工大学 Crop pest and disease leaf identification system, identification method and pest and disease prevention method
CN113469269A (en) * 2021-07-16 2021-10-01 上海电力大学 Residual convolution self-coding wind-solar-charged scene generation method based on multi-channel fusion
CN113705393A (en) * 2021-08-16 2021-11-26 武汉大学 3D face model-based depression angle face recognition method and system
CN116188612A (en) * 2023-02-20 2023-05-30 信扬科技(佛山)有限公司 Image reconstruction method, electronic device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379041A1 (en) * 2015-06-24 2016-12-29 Samsung Electronics Co., Ltd. Face recognition method and apparatus
CN108090403A (en) * 2016-11-22 2018-05-29 上海银晨智能识别科技有限公司 A kind of face dynamic identifying method and system based on 3D convolutional neural networks
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN109299643A (en) * 2018-07-17 2019-02-01 深圳职业技术学院 A kind of face identification method and system based on big attitude tracking
CN110020620A (en) * 2019-03-29 2019-07-16 中国科学院深圳先进技术研究院 Face identification method, device and equipment under a kind of big posture
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
US20190286884A1 (en) * 2015-06-24 2019-09-19 Samsung Electronics Co., Ltd. Face recognition method and apparatus
US20190318153A1 (en) * 2017-11-30 2019-10-17 Beijing Sensetime Technology Development Co., Ltd Methods and apparatus for video-based facial recognition, electronic devices, and storage media

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379041A1 (en) * 2015-06-24 2016-12-29 Samsung Electronics Co., Ltd. Face recognition method and apparatus
US20190286884A1 (en) * 2015-06-24 2019-09-19 Samsung Electronics Co., Ltd. Face recognition method and apparatus
CN108090403A (en) * 2016-11-22 2018-05-29 上海银晨智能识别科技有限公司 A kind of face dynamic identifying method and system based on 3D convolutional neural networks
US20190318153A1 (en) * 2017-11-30 2019-10-17 Beijing Sensetime Technology Development Co., Ltd Methods and apparatus for video-based facial recognition, electronic devices, and storage media
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN109299643A (en) * 2018-07-17 2019-02-01 深圳职业技术学院 A kind of face identification method and system based on big attitude tracking
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
CN110020620A (en) * 2019-03-29 2019-07-16 中国科学院深圳先进技术研究院 Face identification method, device and equipment under a kind of big posture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOSEF KITTLER等: "Conformal Mapping of a 3D Face Representation onto a 2D Image for CNN Based Face Recognition", 《2018 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB)》, 16 July 2018 (2018-07-16), pages 124 - 131 *
夏洋洋: "基于深度学习的非限定条件下人脸识别研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 15 July 2017 (2017-07-15) *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488857A (en) * 2020-04-29 2020-08-04 北京华捷艾米科技有限公司 Three-dimensional face recognition model training method and device
CN112001268A (en) * 2020-07-31 2020-11-27 中科智云科技有限公司 Face calibration method and device
CN112001268B (en) * 2020-07-31 2024-01-12 中科智云科技有限公司 Face calibration method and equipment
CN111950496A (en) * 2020-08-20 2020-11-17 广东工业大学 Identity recognition method for masked person
CN111950496B (en) * 2020-08-20 2023-09-15 广东工业大学 Mask person identity recognition method
CN112200006A (en) * 2020-09-15 2021-01-08 青岛邃智信息科技有限公司 Human body attribute detection and identification method under community monitoring scene
CN112507916B (en) * 2020-12-16 2021-07-27 苏州金瑞阳信息科技有限责任公司 Face detection method and system based on facial expression
CN112507916A (en) * 2020-12-16 2021-03-16 苏州金瑞阳信息科技有限责任公司 Face detection method and system based on facial expression
CN112489205A (en) * 2020-12-16 2021-03-12 北京航星机器制造有限公司 Method for manufacturing simulated human face
CN112597867A (en) * 2020-12-17 2021-04-02 佛山科学技术学院 Face recognition method and system for mask, computer equipment and storage medium
CN112597867B (en) * 2020-12-17 2024-04-26 佛山科学技术学院 Face recognition method and system for wearing mask, computer equipment and storage medium
CN112686202A (en) * 2021-01-12 2021-04-20 武汉大学 Human head identification method and system based on 3D reconstruction
CN112819928A (en) * 2021-01-27 2021-05-18 成都数字天空科技有限公司 Model reconstruction method and device, electronic equipment and storage medium
CN112819928B (en) * 2021-01-27 2022-10-28 成都数字天空科技有限公司 Model reconstruction method and device, electronic equipment and storage medium
CN112882666A (en) * 2021-03-15 2021-06-01 上海电力大学 Three-dimensional modeling and model filling-based 3D printing system and method
CN113255466A (en) * 2021-04-30 2021-08-13 广州有酱料网络科技有限公司 Sauce supply chain logistics monitoring system
CN113468984A (en) * 2021-06-16 2021-10-01 哈尔滨理工大学 Crop pest and disease leaf identification system, identification method and pest and disease prevention method
CN113469269A (en) * 2021-07-16 2021-10-01 上海电力大学 Residual convolution self-coding wind-solar-charged scene generation method based on multi-channel fusion
CN113705393A (en) * 2021-08-16 2021-11-26 武汉大学 3D face model-based depression angle face recognition method and system
CN116188612A (en) * 2023-02-20 2023-05-30 信扬科技(佛山)有限公司 Image reconstruction method, electronic device and storage medium

Also Published As

Publication number Publication date
CN110991281B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN110991281B (en) Dynamic face recognition method
Lee et al. From big to small: Multi-scale local planar guidance for monocular depth estimation
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN107545302B (en) Eye direction calculation method for combination of left eye image and right eye image of human eye
CN112418095B (en) Facial expression recognition method and system combined with attention mechanism
CN106650653B (en) Construction method of human face recognition and age synthesis combined model based on deep learning
Xie et al. Normalization of face illumination based on large-and small-scale features
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN112766158A (en) Multi-task cascading type face shielding expression recognition method
EP3084682A1 (en) System and method for identifying faces in unconstrained media
CN109960975B (en) Human face generation and human face recognition method based on human eyes
CN110765839B (en) Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image
CN109448035A (en) Infrared image and visible light image registration method based on deep learning
CN112418041A (en) Multi-pose face recognition method based on face orthogonalization
CN116645917A (en) LED display screen brightness adjusting system and method thereof
CN111666845A (en) Small sample deep learning multi-mode sign language recognition method based on key frame sampling
CN115497139A (en) Method for detecting and identifying face covered by mask and integrating attention mechanism
Peng et al. Presentation attack detection based on two-stream vision transformers with self-attention fusion
CN110569760A (en) Living body detection method based on near-infrared and remote photoplethysmography
CN109064444B (en) Track slab disease detection method based on significance analysis
CN117095471A (en) Face counterfeiting tracing method based on multi-scale characteristics
Chen et al. Face recognition with masks based on spatial fine-grained frequency domain broadening
CN114898447B (en) Personalized fixation point detection method and device based on self-attention mechanism
Dastbaravardeh et al. Channel Attention-Based Approach with Autoencoder Network for Human Action Recognition in Low-Resolution Frames
CN116091793A (en) Light field significance detection method based on optical flow fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant