CN114519897A

CN114519897A - Human face in-vivo detection method based on color space fusion and recurrent neural network

Info

Publication number: CN114519897A
Application number: CN202111663546.1A
Authority: CN
Inventors: 钱鹰; 张蓝; 刘歆; 陈仕杰
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-05-20

Abstract

The invention requests to protect a human face in-vivo detection method based on color space fusion and a recurrent neural network, and relates to the technical field of biological in-vivo detection. The invention includes fusing new color spaces; constructing a human face living body detection LSTM network; inputting color characteristics of a fake face attack video of the public data set into the constructed LSTM for training; and the newly fused color space and the trained network model are used for human face living body detection. The human face in-vivo detection algorithm provided by the invention can directly carry out human face in-vivo detection on the content captured by the camera, can realize accurate detection under two-dimensional fake human face attack and fine-crafted three-dimensional fake human face attack, and solves the problem of low human face in-vivo detection stability under multi-dimensional cross-dataset fake human face attack.

Description

Human face living body detection method based on color space fusion and cyclic neural network

Technical Field

The invention belongs to the field of biological authentication and anti-counterfeiting, and particularly relates to a human face in-vivo detection method based on color space fusion and a recurrent neural network.

Background

With the advent of the artificial intelligence era, people begin to use their own biological characteristics as identification labels, so that identification becomes more convenient and safer. In the biological recognition, the face recognition has a large proportion in the biological recognition due to the advantages of low cost and no need of managing equipment, and the face recognition is widely applied to the daily life fields of face payment, an access control system, an attendance system and the like along with the development of face detection and recognition technology.

The fake face attack is an attack aiming at a face recognition system, and attempts to make the face recognition system authenticate an illegal user as a legal user by presenting a fake version of the face of the legal user to a camera, so that the illegal user obtains the trust of the face recognition system.

Generally, the face forgery attack can be classified into 2 types: the method comprises the following steps of two-dimensional face forging attack and three-dimensional face forging attack, wherein the two-dimensional face forging attack comprises photo attack and video attack, the photo attack means that an illegal user obtains the trust of a face recognition system in a mode of printing a photo or a picture of a legal user out or displaying the photo or the picture on electronic equipment, and the video attack means that the illegal user obtains the trust of the face recognition system by utilizing a video containing face information of the legal user. The three-dimensional face forgery attack mainly refers to the fact that an illegal user manufactures a 3D face mask of a legal user by using various materials (such as silica gel and latex), and the trust of a face recognition system is obtained by wearing the 3D face mask.

In recent years, although people have conducted a lot of research on face live detection, existing face live detection algorithms still have a space for improvement in cross-dataset and multi-dimensional face forgery attacks. The existing human face living body detection algorithm comprises methods such as LBP, LBP-TOP, Markov and the like based on manual characteristics, and the method mainly utilizes designed texture characteristics to distinguish real human faces and counterfeit human face attacks; there are detection methods based on deep learning, for example: CNN, CNN + LSTM, Deep Tree Net, etc. to use neural network to learn the difference between real face and fake face, so as to judge fake face attack; there are methods based on life information (e.g. rPPG method) that use the unique life information of real face to distinguish the attack of real face from the attack of forged face. The methods have higher detection accuracy when the two-dimensional or three-dimensional fake face attack is detected independently, but have lower detection accuracy in multi-dimensional and cross-dataset tests.

In the human face living body detection, detection of two-dimensional and three-dimensional fake human face attacks is included, and a detection method with mixed characteristics (firstly detecting the two-dimensional fake human face attacks and then detecting the three-dimensional fake human face attacks) can obtain higher detection accuracy on the multi-dimensional fake human face attacks, but the detection time is longer, and the detection is not beneficial to practical use. The deep learning method needs various data for training, and the generalization capability of the model tends to be reduced in the learning process. Therefore, the problem of cross-data set and multi-dimensional face forgery attack is a hot spot in the field of face living body detection.

The application publication number CN105354554A discloses a face in-vivo detection method based on color and singular value characteristics, which mainly solves the problems of complex calculation and low recognition rate of the existing face authenticity identification technology. The method comprises the following implementation steps: 1) marking positive and negative samples of a face database, and dividing the positive and negative samples into a training set and a test set; 2) partitioning the facial images of the training set, and extracting color features and singular value features of small partitions of the training set in batches; 3) normalizing the extracted feature vectors, and sending the normalized feature vectors to a support vector machine classifier for training to obtain a training model; 4) and (4) extracting the characteristics of the test set data, and predicting the characteristics of the test set data by using the training model to obtain a classification result. The method improves the classification efficiency, obtains a higher classification effect, and can be used for face authenticity detection in social networks or real life. The method trains and predicts the extracted color features through a support vector machine, and is difficult to successfully detect the attack of the mask type forged face with fine workmanship, because the mask type forged face has the feature information extremely similar to the real face. The invention combines the specific pulse characteristics (rPPG signals) of the real human face to carry out human face living body detection, and has higher detection accuracy rate on three-dimensional fake human face attack.

CN111881815A, a human face living body detection method of multi-model feature migration, which is to adopt a multi-model feature migration method under various color spaces to train a living body by constructing and fusing heterogeneous data sets, so as to improve the precision and generalization capability of a living body detection model. In the training stage, visible light images on an open source or private data set are fused, and an RGB model and a YUV model are respectively trained simultaneously until the models are converged after face detection, alignment and cutting; in the prediction stage, the collected visible light images are respectively input into the trained RGB model and YUV model to respectively obtain the results of the two models, the final score is obtained through a model score fusion strategy, and finally the in-vivo detection result is judged according to the score. The method has good generalization performance and high precision, and is suitable for deployment and use in industry. The method carries out human face living body detection through a multi-model feature migration method under various color spaces, can well avoid the influence of illumination on a detection result, and has the problem of difficult detection on mask type fake human face attacks with fine workmanship and fake human face attacks across data sets. The method can also avoid the influence of external illumination on the detection result through color space fusion, and the method combines the specific pulse characteristics (rPPG signals) of the real face to carry out the living body detection of the face, thereby having higher detection accuracy rate on multi-dimensional and cross-data set false face attack.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The method has the advantages that the original video is directly used for face living body detection, images in the video do not need to be preprocessed, and the method accords with practical application scenes. Meanwhile, the human face living body detection method based on color space fusion and the recurrent neural network has stable and higher detection precision under the cross-data set and multi-dimensional fake human face attack. The technical scheme of the invention is as follows:

a human face in-vivo detection method based on color space fusion and a recurrent neural network comprises the following steps:

carrying out face detection on each frame of image in the original video, and segmenting a face area and a background area in the image;

constructing a new color space by utilizing the correlation of the rPPG signals of the face region and the background region;

carrying out face detection on each frame of input image, and segmenting a face region and a background region in the image; converting the divided image from RGB color space into HSV and YCbCr color space, dividing 9 color channels, and carrying out Fourier conversion on each color channel to obtain rPPG signals of a face area and a background area of each color channel;

and selecting three color channels to construct a new color space by utilizing the principle that the relevance of the rPPG signal of the real face region and the background region is small and the relevance of the forged face region is large.

Constructing an LSTM network for human face living body detection;

inputting color characteristics of videos in the public data set into the constructed LSTM network for training;

and carrying out human face living body detection by using the new color space and the human face living body detection model trained by the LSTM network.

Further, the segmenting the face region image and the background region image comprises the following steps:

detecting each frame image in the original video by using a front _ face _ detector in a dlib library as a face detector;

using a shape _ predictor _68_ face _ landworks.dat human face feature extraction extractor to position and label the human face part in the image;

and dividing the face area and the background area according to the marked position information, and segmenting the face area and the background area.

Further, the constructing of the new color space includes the following steps:

capturing each frame of image in an original video, carrying out face detection on the image, and segmenting the image into a human face part and a background part;

performing color space conversion on the human face and background region images to convert the human face and background region images into HSV and YCbCr color spaces;

dividing three color spaces of RGB, HSV and YCbCr to obtain 9 color channels;

acquiring color features of color channels of a segmentation area of each frame of a current video to form a 9-person face color feature list and 9 background color feature lists;

carrying out Fourier transformation on the human face color feature list and the background area color feature list to obtain an rPPG signal;

calculating a correlation coefficient of an rPPG signal of the face region and the background region, and recording;

for the average correlation coefficient values of all channels of the videos, for the real face video, the color channels are arranged in an ascending order according to the correlation coefficient, the color channels are arranged in a descending order according to the fake face attack video, and the three color channels with the highest contact ratio from front to back are selected as a new color space.

Further, the calculating a correlation coefficient between the human face region and the rPPG signal of the background region specifically includes:

{R₁，R₂，R₃}＝f_min({m1，m2，m3.......m9})

{F₁，F₂，F₃}＝f_max({m1，m2，m3.......m9})

wherein m is_jCorrelation coefficient of rPPG signal generated for same channel face and background same video, C_iAverage correlation coefficient f generated for n videos_maxTo take the maximum 3 m values, f, of 9 color channels_minTo take the minimum 3 m values of 9 color channels.

Further, the constructing of the LSTM network for human face living body detection includes the following steps: extracting the face area of each frame of the original video and the color Feature of the original area by using a new color space to form a Feature Map; feature Map is imported into LSTM network, using LSTM layer with 100 hidden neurons, fully connected layer and FFT layer, LSTM is used to estimate the bit with N_fFrame input sequence

rPPG signal f, I_jRepresenting the color characteristics of each frame, and converting the response of the full connection layer into a Fourier domain by the FFT layer to obtain an rPPG signal;

a Fourier transform layer is accessed behind a full connection layer of the LSTM network, and then the Fourier transform layer is used for carrying out Fourier transform on a difference sequence output by the network, so that frequency domain information is obtained;

and (4) combining the prediction result of the LSTM and the correlation of the frequency information of each region of the human face and the background region, and outputting the result.

Further, the inputting of the color features of the video in the public data set into the constructed LSTM network for training includes the following steps:

aiming at a real face video in a public data set, an rPPG signal of the real face video is obtained by using a traditional rPPG method and is used as a ground channel in a network training process, and the rPPG signal of a forged face attack video is set to be 0;

extracting color features of the video in the public data set by using the constructed new color space, and inputting a color feature sequence and the set ground channel into an LSTM network for training;

and a Fourier transform layer is inserted behind a full connection layer of the LSTM network, the color characteristic change sequence is converted into an rPPG signal, and classification and prediction are carried out.

The invention has the following advantages and beneficial effects:

the method based on color space fusion and the recurrent neural network can ensure that stable and higher detection precision is realized under the attack of multi-dimensional fake human faces on the task of human face living body detection; according to the color space fusion method, based on the principle that the relevance of rPPG signals extracted from each region of a human face and pulse signals extracted from a background region is low, and the relevance of the rPPG signals extracted from each region of a forged human face and the relevance of the rPPG signals extracted from the background region is high, each frame image is divided into the human face region and the background region, the relevance between color channels of the human face region and the background region is calculated, three color channels which can show the change of the color characteristics of the human face are obtained through statistics, and the three color channels are combined into a new color space, so that the influence of external environmental noise on the human face living body detection process is reduced, and the human face living body detection precision is improved. Compared with the existing method, the method can combine the rPPG method and the recurrent neural network, capture the color characteristic change of each frame of the video to the maximum extent by constructing a new color space, enhance the illumination robustness, and improve the accuracy and the practicability better compared with other methods.

Drawings

FIG. 1 is a flow chart of a face in-vivo detection method based on color space fusion and a recurrent neural network according to a preferred embodiment of the present invention.

Fig. 2 is a flowchart illustrating segmentation of a face region and a background region according to an embodiment of the present invention.

FIG. 3 is a flowchart of color space fusion according to an embodiment of the present invention.

FIG. 4 is a diagram of a portion of a recurrent neural network in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

in order to realize the living human face detection, the living human face detection method based on color space fusion and a recurrent neural network provided by the embodiment of the invention comprises three stages of human face region and background region segmentation, color space fusion and introduction of an LSTM network to predict rPPG signals, and comprises the following steps:

segmentation stage of human face region and background region

S1: performing face detection on each frame of image in the original video, and segmenting a face region and a background region in the image

Color space fusion phase

S2: constructing a new color space by utilizing the correlation of the rPPG signals of the face region and the background region;

introduction of a recurrent neural network to predict rPPG signal phase

S3: introducing an LSTM network for human face living body detection;

s4: inputting color features of videos in the public data set into the constructed recurrent neural network for training;

s5: and the human face living body detection model trained by using the new color space and the LSTM network is used for human face living body detection.

In S1, the step of segmenting the face region image and the background region image includes the following steps (as shown in fig. 2):

s11: detecting each frame image in an original video by using a front _ face _ detector in a dlib library as a face detector;

s12: using a shape _ predictor _68_ face _ landworks.dat human face feature extraction extractor to position and label the human face part in the image;

s13: and dividing the face area and the background area according to the marked position information, and segmenting the face area and the background area. Mainly comprises the following steps:

a1: finding out the middle point of the position of the face in the vertical direction;

further, according to the identification information of 68 feature points, finding out the position information 28 representing the midpoint of the eyes, and taking the x1 variable name as a representation, then finding out the position information 34 representing the midpoint of the nose, and taking the x2 variable name as a representation, and then finding out the position information 9 representing the midpoint of the chin, and taking the x3 variable name as a representation;

further, with the position information of x1, x2, x3, the value of (x1+ x3)/2 is held using the x _ mid _ up variable representing the upper vertical direction position information in the human face, and the value of (x2+ x3)/2 is held using the x _ mid _ down variable representing the lower vertical direction position information in the human face.

A2: finding the middle point of the position of the face in the horizontal direction;

further, according to the identification information of 68 feature points, finding out position information 3 representing a middle contour point on the left side of the human face, and taking a variable name of y1 for representation, then finding out position information 30 representing an upper point in the vertical direction of the human face, and taking a variable name of y2 for representation, and then finding out position information 13 representing a contour point at a middle lower position on the right side of the human face, and taking a variable name of y3 for representation;

further, with the position information of y1, y2, y3, the value of (y1+ y2)/2 is held using the y _ mid _ left variable representing the nose left face part position information, and the value of (y2+ y3)/2 is held using the x _ mid _ right variable representing the nose right face position information.

A3, dividing the human face area according to the following values of x1, x2, x3, y1, y2 and y 3;

a4, finding the position information of the background area around the human face;

further, according to the position information of y _ mid _ left and y1, a distance variable is created to store the value of y _ mid _ left-y1, in order to avoid the appearance of a human face, y0 is y1-distance × 3, y4 is y3+ distance × 3, and the values of y1 and y3 are reset to reach the distance, y1 is y1-distance × 2, and y3 is y3+ distance × 2;

a5, segmenting a background area;

further, judging the size of y0, when y0<0, the background area on the left side of the human face is cut into img [ x1: x3,0: y1], when y0>0, the background area on the left side of the human face is cut into img [ x1: x3, y0: y1 ];

further, judging the size of y4, when y4>640, the background area on the right side of the face is cut into img [ x1: x3, y3:640], and when y4<640, the background area on the right side of the face is cut into img [ x1: x3, y3: y4 ];

in S2, the color space fusion includes the following steps (as shown in fig. 3):

s21: according to the principle that the relevance of rPPG signals extracted from each region of the human face and rPPG signals extracted from a background region is low, and the relevance of rPPG signals extracted from each region of the forged human face and rPPG signals extracted from the background region is high, color space conversion is carried out on the divided human face region and the background region, and the RGB color space is converted into HSV (hue saturation value) and YCbCr (hue saturation value) color spaces;

s22: segmenting RGB, HSV and YCbCr color spaces of the images of the face region and the background region to obtain 9 color channels of R, G, B, H, S, V, Y, Cb and Cr of the face region and the background region;

s23: acquiring color features of color channels of a segmentation area of each frame of a current video to form a 9-person face color feature list and 9 background color feature lists;

s24: carrying out Fourier transformation on the human face color feature list and the background area color feature list to obtain an rPPG signal;

s25: calculating the correlation coefficient of rPPG signals of the face region and the background region, recording, and using a formula

Obtaining the average correlation coefficient value of each channel of all videos, wherein M_jCorrelation coefficient of rPPG signal generated for same channel face and background same video, C_iAverage correlation coefficients generated for the n videos;

s26: by the formula { R₁，R₂，R₃}＝f_min({x1，x2，x3.......x9}){F₁，F₂，F₃}＝ f_max({ x1, x2, x3... x9}) results in 3 color channels to be recorded, where f_maxWhen the original video is a forged face attack video, the color channels are arranged in descending order according to the number of the relations, the 3 color channels with the maximum correlation coefficient between the face area and the background area are taken for recording, f_minWhen the original video is positioned in a real face video, arranging color channels according to ascending order of correlation numbers, taking 3 color channels with the minimum correlation coefficient between a face area and a background area for recording, and selecting three color channels with the highest coincidence degree from front to back as a new color space;

in S3, constructing a recurrent neural network for human face live detection includes the following steps (as shown in fig. 3):

s31: extracting the color characteristics of the face region and the original region in each frame image of the original video by using the new color space combined in the S2 to form a Feature Map;

s32: feature maps were imported into LSTM networks using LSTM layers with 100 hidden neurons. The purpose of the LSTM is to estimate the number of bits with N_fFrame input sequence

rPPG signal f;

s33: a full connection layer is accessed behind the LSTM layer, a Fourier transformation layer is accessed behind the full connection layer, and then the difference sequence output by the network is subjected to Fourier transformation by utilizing the layer, so that frequency domain information is obtained;

s34: and (4) combining the prediction result of the LSTM and the correlation of the frequency information of each region of the human face and the background region, and outputting the result.

In S4, inputting the color features of the video in the public data set into the constructed recurrent neural network for training, including the following steps:

s41: aiming at the real face video in the public data set, an rPPG signal of the real face video is obtained by using a traditional rPPG method and is used as a ground channel in the network training process. For a fake face attack video, setting an rPPG signal of the fake face attack video to be 0;

s42: extracting color features of the video in the public data set by using the color space constructed by S2, inputting the color feature sequence and the set ground channel into a recurrent neural network for training, and setting the objective function as

Wherein theta is_RIs the RNN parameter, F_jIs a face region feature map, N_sIs the number of frame sequences, f_iA group channel representing the ith frame;

s43: and a Fourier transform layer is inserted behind a full connection layer of the LSTM network, the color characteristic change sequence is converted into an rPPG signal, and classification and prediction are carried out.

Compared with the existing method, the detection result obtained by the detection method based on color space fusion and the recurrent neural network has stability and better accuracy.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A human face living body detection method based on color space fusion and a recurrent neural network is characterized by comprising the following steps:

constructing a new color space by utilizing the correlation of the rPPG signals of the face region and the background region; carrying out face detection on each frame of input image, and segmenting a face area and a background area in the image;

converting the divided image from RGB color space into HSV and YCbCr color space, dividing 9 color channels, and carrying out Fourier conversion on each color channel to obtain rPPG signals of a face area and a background area of each color channel;

selecting three color channels to construct a new color space by utilizing the principle that the correlation between the rPPG signals of a real face area and a background area is small and the correlation between a forged face area is large;

constructing an LSTM network for human face living body detection;

2. The method for detecting the living human face based on the color space fusion and the recurrent neural network as claimed in claim 1, wherein the segmenting the human face region image and the background region image comprises the following steps:

3. The method for detecting the living human face based on the color space fusion and the recurrent neural network as claimed in claim 1, wherein the constructing a new color space comprises the following steps:

dividing three color spaces of RGB, HSV and YCbCr to obtain 9 color channels;

carrying out Fourier transformation on the face color feature list and the background area color feature list to obtain an rPPG signal;

4. The method according to claim 3, wherein the calculating of the correlation coefficient of the rPPG signals of the face region and the background region specifically comprises:

{R₁，R₂，R₃}＝f_min({m1，m2，m3.......m9})

{F₁，F₂，F₃}＝f_max({m1，m2，m3.......m9})

5. The method for detecting the living human face based on the color space fusion and the recurrent neural network as claimed in claim 3 or 4, wherein the constructing the LSTM network for detecting the living human face comprises the following steps: extracting the face area of each frame of the original video and the color Feature of the original area by using a new color space to form a Feature Map; feature Map is imported into LSTM network, using LSTM layer with 100 hidden neurons, fully connected layer and FFT layer, LSTM is used to estimate the bit with N_fInput sequence of frames

The FFT layer converts the response of the full connection layer into a Fourier domain to obtain an rPPG signal f;

6. The method for detecting the living human face based on the color space fusion and the recurrent neural network as claimed in claim 5, wherein the inputting of the color features of the video in the public data set into the constructed LSTM network for training comprises the following steps: