CN111832525A

CN111832525A - Living body detection method for face alignment

Info

Publication number: CN111832525A
Application number: CN202010716812.1A
Authority: CN
Inventors: 徐秋林
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-10-27

Abstract

The invention discloses a living body detection method for face alignment, which is characterized in that when the method is used, a face image is input into a designed LiveNet convolutional neural network so as to extract the optimal living body characteristic; the invention introduces a CNN-6 network structure to align the face image, and aligns the front of the inclined face image, thereby correcting the face image and solving the problem of low living body recognition rate caused by overlarge inclination angle of the face image; based on the constructed LiveNet convolutional neural network, redundant Central Differential Convolutional (CDC) network structures are deleted, and the convolutional neural network with stronger robust modeling capability is formed; in the image preprocessing step, data sets of non-face and fuzzy images in the face image are eliminated, so that the accuracy of in-vivo detection is improved, the calculation speed of a network model is also improved, and the performance of the whole algorithm is improved; the LiveNet convolutional neural network solves the problem that a CDC network consumes a large amount of time, saves a large amount of development time, and improves the accuracy of a living body detection algorithm.

Description

Living body detection method for face alignment

Technical Field

The invention relates to the technical field of face recognition, in particular to a living body detection method for face alignment.

Background

With the development of scientific technology and the improvement of living standard of people, the face recognition technology plays a crucial role because it is more reliable and safer than other identity authentication methods (traditional methods, iris and fingerprint). However, the existing face recognition system is very vulnerable, a living attack is that an illegal user forges information (prints photos, copies with a mobile phone, and other abnormal means) of a legal person to cheat the face recognition system for verification, login, payment, and other functions, and once the face recognition system cannot resist a living attack sample, the property and personal safety of people are seriously threatened. Therefore, the in vivo detection technique has very important significance.

The existing living body detection method combines human-computer interaction to detect living bodies, the method needs to appoint users to finish certain blinking or voice operations, the process is complex, the acquisition time is too long, and the user experience is poor.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides the living body detection method for the face alignment, which can improve the success rate of detecting the living body in an algorithm mode.

A living body detection method for face alignment comprises the following steps:

an image preprocessing step: shooting a face image, removing a non-face image and a fuzzy face image, aligning the face image by using a CNN-6 network, marking a real person face image as 1, marking an attack face image as 0, constructing the marked face image data into a data set, converting the data set into a corresponding tfrechrd format file, and parallelly reading the data of the face image in a multithreading way;

a characteristic extraction step: converting the data set into a corresponding tfrecrd format file, inputting the tfrecrd format file into a constructed LiveNet convolutional neural network, and extracting high-dimensional face features through a convolutional layer of the LiveNet convolutional neural network;

a human face living body training step: inputting the face image with the mark into the constructed LiveNet convolutional neural network for training, and extracting the optimal living body characteristic as an optimal living body detection model;

a human face living body detection step: and detecting a living body through the trained optimal living body detection model, comparing a detection output result with a threshold value, judging as a living body face if the detection output result is greater than the threshold value, and judging as a non-living body face if the detection output result is less than or equal to the threshold value.

Further, the framework process in the LiveNet convolutional neural network includes:

the backbone network Depthnet and MAFM module of NAS search includes two convolution layers after Head1, and two deconvolution layers at the last layer to obtain spatial information of the image.

Further, when the non-face image and the fuzzy face image are eliminated, the loss of the shot face image is calculated by adopting a loss function, the coordinates of key points in the face image are positioned, the face is aligned through the coordinates of the key points of the face, and the size of the cut face image is a preset value.

Further, a non-linear mapping is obtained to align the faces: phi: i → s, wherein I is the shot face image, I belongs to R^H×W×3And outputs a shape vector s ∈ R^2LCutting the frame of the face image I, wherein H is the height of the face image; w is the width of the face image;

wherein the vector s is of the form ═ x₁，…，x_L，y₁，…，y_L]^TL is the number of key points of the face, L is 5, x_LAnd y_LIs the coordinate of the lth key.

Further, when the nonlinear mapping is obtained, the nonlinear mapping phi is (phi)₁…φ_M) (I) is a combination of M functions, wherein each function represents a particular layer in a multi-layer neural network structure;

training sample with preset set of marks

Using a CNN-6 network and using a loss function to find a minimum value of the non-linear mapping Φ:

wherein loss is a predefined loss function for measuring the difference between the predicted value and its true value; the CNN-6 living network is trained with a loss function and an optimization algorithm is used to optimize the losses in the network.

Further, the CNN-6 network includes 5 3 × 3 convolutional layers, 1 fully-connected layer, and 1 output layer, and after each convolutional layer and fully-connected layer, feature mapping is performed using a Relu nonlinear activation function.

Furthermore, when coordinates of key points in the face image are located, the wing (x) loss function is used to accurately locate the key points of the face, and the formula is as follows:

wherein x represents a feature value on the face image; the value range of the nonlinear part is-w; limiting the curvature of the nonlinear region; c-w-wxln (1+ w /) is a fixed constant;

the nonlinear function value of the loss function is between ln (x) and [/w, 1+/w ].

Further, the LiveNet convolutional neural network structure comprises a backbone network and an MAFM module of NAS search, two convolutional layers are added behind the Head1, and two anti-convolutional layers are added in the last layer to acquire the spatial information of the image.

Further, in the step of training the living human face, the method further comprises the steps of:

step 1: marking the corresponding label of the aligned face data set;

step 2: in the training and testing stage, initializing the weight of the network model;

and step 3: extracting different high-dimensional features on the face image through convolution operation, and enlarging the face image through deconvolution operation;

and 4, step 4: calculating an output result of a forward propagation algorithm according to the weight and the bias value of the LiveNet convolutional neural network structure, and calculating a loss value in the network;

and 5: calculating a gradient of network model back propagation through the loss value;

step 6: updating parameters of network nodes by adopting a gradient descent algorithm to obtain an optimal in vivo detection model;

and 7: repeating the steps 3-7 until the network converges.

The invention has the beneficial effects that: when the invention is used, the face image is input into the designed LiveNet convolutional neural network so as to extract the optimal living body characteristic; the invention introduces a CNN-6 network structure to align the face images and align the front faces of the inclined face images, thereby correcting the face images and solving the problem of low living body recognition rate caused by overlarge inclination angle of the face images; the invention deletes redundant CDC network structure based on the constructed LiveNet convolution neural network, solves the problems that the CDC network structure consumes a large amount of time during training and testing, and the living body detection model has low precision and is not suitable for human face detection task. Therefore, the new LiveNet convolutional neural network forms a convolutional neural network with stronger robust modeling capability; in the image preprocessing step, data sets of non-face and fuzzy images in the face image are eliminated, so that the accuracy of in-vivo detection is improved, the calculation speed of a network model is also improved, and the performance of the whole algorithm is improved; the LiveNet convolutional neural network is arranged, so that the problem that a CDC network consumes a large amount of time is solved, a large amount of development time is saved, and the accuracy of a living body detection algorithm is improved.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart of the principle method of the present invention;

FIG. 2 is a diagram of a CNN-6 face alignment network architecture of the present invention;

FIG. 3 is a diagram of the LiveNet biopsy network of the present invention;

fig. 4 is a diagram of the face alignment effect of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

The invention provides a living body detection method for face alignment, which comprises the following steps of:

an image preprocessing step: shooting face images including real and non-living face images (printing photos, copying a mobile phone, copying a high-definition picture and the like), clearing the non-face images and fuzzy face images, aligning the face images by using a CNN-6 network, marking the real face images as 1, marking attack face images as 0, constructing the marked face image data into a data set, converting the data set (including a training set and a test set) into corresponding tfrecrd format files, and parallelly reading the data of the face images in a multithreading manner;

a human face living body detection step: and detecting a living body through the trained optimal living body detection model, comparing a detection output result with a threshold value, judging as a living body face if the detection output result is greater than the threshold value, and judging as a non-living body face if the detection output result is less than or equal to the threshold value. Wherein the threshold value can be preset according to the detection requirement.

Preferably, the face image is not divided into a training set and a test set according to a certain proportion, the training set is used in the face living body training step, and the test set is used in the face living body detection step; the facial images contained in the training set are different from the facial images contained in the test set, so that the accuracy of the living body model is measured on one hand, and the accuracy of the model is improved on the other hand;

when in use, the invention has the following advantages:

1. because the frame and light information of different backgrounds are beneficial to improving the accuracy of human face living body detection, the human face image with the background is input into the designed LiveNet convolution neural network so as to extract the optimal living body characteristic;

2. the invention introduces a CNN-6 network structure to align the face image, and aligns the front of the inclined face image, thereby correcting the face image and solving the problem of low living body recognition rate caused by overlarge inclination angle of the face image;

3. in the image preprocessing process, the face image is not divided into a training set and a test set according to a certain proportion; the facial images contained in the training set are different from the facial images contained in the test set, so that the accuracy of the living body model is measured on one hand, and the accuracy of the model is improved on the other hand;

4. based on the constructed LiveNet convolutional neural network, the redundant CDC (center differential convolution) network structure is deleted, the problems that a great deal of time is consumed in training and testing of the CDC network structure are solved, and the living body detection model is low in precision and is not suitable for a face detection task; therefore, the new LiveNet convolutional neural network forms a convolutional neural network with stronger robust modeling capability;

5. according to the invention, the data sets of non-face and fuzzy images in the face image are removed in the image preprocessing step, so that the accuracy of in-vivo detection is improved, and the calculation speed of a network model is also improved, thereby improving the performance of the whole algorithm;

6. the LiveNet convolutional neural network is arranged, so that the problem that a CDC network consumes a large amount of time is solved, a large amount of development time is saved, and the accuracy of a live body detection algorithm is improved;

7. according to the invention, before the training step, the human face image is aligned by using the CNN-6 network, so that the accuracy of the living body model is improved.

Preferably, the framework flow in the LiveNet convolutional neural network includes:

the backbone network (i.e. Depthnet) and MAFM module of NAS search add two convolution layers after Head1, and add two deconvolution layers at the last layer to obtain the spatial information of the image. Specifically, the architecture flow may be as shown in fig. 3: image input → stem0 → stem1 → Low-level cell → Mid-level cell → High-level cell → MAFM → Head0 → Head1 → Conv30 32 → Conv16 → 16 → img8 8 → Up-Conv → Up-Conv → image output.

Preferably, when the non-face image and the blurred face image are eliminated, the loss of the shot face image is calculated by adopting a loss function, the coordinates of key points in the face image are positioned, the face is aligned through the coordinates of the key points of the face, and the size of the cut face image is a preset value.

Specifically, a non-linear mapping is obtained to align the face: phi: i → s, wherein I is a photographed face image,

I∈R^H×W×3and outputs a shape vector s ∈ R^2LCutting the frame of the face image I, wherein H is the height of the face image; w is the width of the face image;

When obtaining the nonlinear mapping, the living body detection model needs to define a multi-layer neural network structure with random initialization parameters, and the nonlinear mapping phi (phi) is equal to₁…φ_M) (I) is a combination of M functions, wherein each function represents a particular layer in a multi-layer neural network structure;

training with presetting of a set of markersSample(s)

wherein loss is a predefined loss function for measuring the difference between the predicted value and its true value; CNN-6 living networks are trained with a loss function (in supervised mode) and optimized for losses in the network using (random gradient descent (SGD), etc.) optimization algorithms.

In order to effectively analyze the influence of different loss functions, the invention introduces a CNN-6 network architecture to align the human face so as to test the accuracy and the performance of the model. The input of the network is a 256 × 256 × 3 color image, and the output is coordinates of key points of the face image.

The alignment effect of the face image may be as shown in fig. 4.

As shown in fig. 2, the CNN-6 network includes 5 3 × 3 convolutional layers, 1 fully-connected layer, and 1 output layer, and after each convolutional layer and fully-connected layer, feature mapping is performed using a Relu nonlinear activation function. Preferably, the feature map is scaled down to half the original using the max-pooling layer after each convolutional layer.

Preferably, the face key point positioning based on the convolutional neural network, it is crucial to use the correct loss function. However, in the existing deep neural network-based face localization system, L is mainly used₂A loss function calculates the loss in the network. L is₂Can handle smaller losses, but at larger initial positioning errors, L₂The loss function cannot recover quickly from these large errors. Therefore, when coordinates of key points in the face image are located, the wing (x) loss function is used to accurately locate the key points of the face, and the formula is as follows:

the nonlinear function value of the loss function is between ln (X) and [/w, 1+/w ], and is scaled by w in the X-axis and Y-axis.

In the conventional convolutional neural network, as the network deepens, part of spatial information of an image is lost. The LiveNet convolutional neural network designed by the invention can better utilize the spatial characteristics. As shown in fig. 3, the LiveNet convolutional neural network structure includes a backbone network and an MAFM module of NAS search, two convolutional layers are added after Head1, the structure accurately extracts semantic information of a face image, and two anti-convolutional layers are added in the last layer to obtain spatial information of the image. The invention provides a LiveNet convolutional neural network structure, two convolutional layers and a deconvolution layer are added, the technical problem that partial spatial information of an image is lost due to continuous deepening of the network is solved, and meanwhile, semantic information of a face image can be accurately extracted.

The invention inputs the aligned face image into the designed LiveNet network, the added convolution layer and deconvolution layer can well combine semantic and spatial information, and more preferably, in the process of the face living body training step, the invention also comprises the following steps:

step 1: marking corresponding labels of the aligned face data sets (training set and test set);

step 6: updating parameters (including weight w, bias term b and the like) of the network nodes by adopting a gradient descent algorithm to obtain an optimal in vivo detection model;

and 7: repeating the steps 3-7 until the network converges.

Experiments are carried out aiming at the technical scheme of the invention, and 3 public face data sets CASIA-MFSD, RELAY-ATTACK and MSU-MFSD are used for verifying the validity and the real-time performance of the living network. The accepted error rate is used to quantitatively and qualitatively measure the effectiveness and accuracy of the liveness detection algorithm. Quasi-drug

Wherein TP represents the number of correctly detected living bodies; TN represents the number of correctly detected attack samples; FP represents the number of false positives actually being live as attack samples; FN is actually the number of live false positives of attack samples. In unbalanced positive and negative samples, the accuracy is used as an index for evaluating a living body detection model, the obtained result may be inaccurate, and in order to better analyze the accuracy of the model, a Half Error Rate (Half Total Error Rate, HTER) is used as the index for evaluation. The formula is as follows:

wherein FAR is the probability of false acceptance; FRR is the probability of false rejection; d is a test set; τ is a threshold value; HTER is half error rate, and the smaller the HTER value, the higher the accuracy.

The following table is a comparison of the performance of the network

Backbone network	Training time (min)	Test time consumption (min)
			DepthNet+CDC	140	49
DepthNet	105	20

Experiment 1, the invention tests the influence of adding a DepthNet network structure into a CDC network structure, and the time consumed in the training and testing processes of two algorithms in a CASIA-MFSD video. Results as shown in the table above, DepthNet + CDC training was 35 minutes more than the DepthNet algorithm, with 29 minutes more test time. Therefore, the DepthNet network greatly reduces the training and testing time of the living body model.

Experiment 2, the result of the alignment of the face image using the CNN-6 network is shown in fig. 4, where a of fig. 4 shows the original image and b of fig. 4 shows the aligned face image.

The following table represents a comparison of the half-positive error rates for different data sets:

experiment 3, CASIA-MFSD, RELAY-ATTACK and MSU-MFSD data sets are respectively used for testing the generalization capability of the model, and the results are shown in the table above, the half-positive error rate of the LiveNet proposed by the invention under 3 data sets is lower, which shows that the method is good in generalization of video ATTACK, print photo ATTACK and high-definition photo ATTACK. In addition, the half error rate of the CNN-6 combined LiveNet network model test is the lowest, which shows that the aligned face image can greatly improve the accuracy of the living body model.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; the modifications and the substitutions do not cause the essence of the corresponding technical solutions to depart from the scope of the technical solutions of the embodiments of the present invention, and the corresponding technical solutions are all covered in the claims and the specification of the present invention.

Claims

1. A living body detection method for face alignment is characterized in that: the method comprises the following steps:

an image preprocessing step: shooting a face image, removing a non-face image and a fuzzy face image, aligning the face image by using a CNN-6 network, marking a real person face image as 1, marking an attack face image as 0, constructing the marked face image data into a data set, converting the data set into a corresponding tfrechrd format file, and parallelly reading the data of the face image in a multithread manner;

2. The in-vivo detection method for face alignment according to claim 1, characterized in that:

the framework process in the LiveNet convolutional neural network comprises the following steps:

the backbone network Depthnet and MAFM module of NAS search includes two convolution layers after Head1, and two deconvolution layers at the last layer to obtain spatial information of an image.

3. The in-vivo detection method for face alignment according to claim 2, characterized in that:

when the non-human face image and the fuzzy human face image are cleaned, the loss of the shot human face image is calculated by adopting a loss function, the coordinates of key points in the human face image are positioned, the human face is aligned through the coordinates of the key points of the human face, and the size of the human face image is cut to be a preset value.

4. The in-vivo detection method for face alignment according to claim 3, wherein:

obtaining a non-linear mapping to align faces: phi: i → s, wherein I is a photographed face image,

5. The in-vivo detection method for face alignment according to claim 4, wherein:

when the non-linear mapping is obtained, the non-linear mapping phi is equal to (phi)₁…φ_M) (I) is a combination of M functions, wherein each function represents a particular layer in a multi-layer neural network structure;

training sample with preset set of marks

Using a CNN-6 network and using a loss function to find a minimum value of the non-linear mapping phi:

6. The in-vivo detection method for face alignment according to claim 5, wherein:

the CNN-6 network includes 5 3 x 3 convolutional layers, 1 fully-connected layer, and 1 output layer, after each convolutional and fully-connected layer, feature mapping using the Relu nonlinear activation function.

7. The in-vivo detection method for face alignment according to claim 6, wherein:

when coordinates of key points in the face image are positioned, a wing (x) loss function is used for accurately positioning the key points of the face, and the formula is as follows:

8. The in-vivo detection method for face alignment according to claim 7, wherein:

the LiveNet convolutional neural network structure comprises a backbone network (DepthNet) and an MAFM module of NAS search, two convolutional layers are added behind a Head1, and two anti-convolutional layers are added in the last layer to acquire spatial information of an image.

9. The in-vivo detection method for face alignment according to claim 8, wherein:

in the step of training the living human face, the method further comprises the following steps:

step 1: marking the corresponding label of the aligned face data set;

and 7: repeating the steps 3-7 until the network converges.