CN114067444A

CN114067444A - Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Info

Publication number: CN114067444A
Application number: CN202111185654.2A
Authority: CN
Inventors: 冯浩宇; 王宇飞; 胡永健; 蔡楚鑫; 葛治中
Original assignee: South China University of Technology SCUT; Sino Singapore International Joint Research Institute
Current assignee: South China University of Technology SCUT; Sino Singapore International Joint Research Institute
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2022-02-18

Abstract

The invention discloses a face deception detection method and a system based on a meta-pseudo label and an illumination invariant feature, wherein the method comprises the following steps: preprocessing data to obtain an RGB color channel image and a PLGF image; the RGB color channel diagram is divided into a labeled sample, an unlabeled sample and an enhanced unlabeled sample, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of the unlabeled sample and enhanced unlabeled loss, and parameters of a student model and a teacher model are updated; sending the PLGF image into an illumination invariant feature extraction network to obtain feature vectors and classification vectors, and storing a network model and parameters after supervised training by utilizing triple loss and cross entropy loss; determining a threshold value by using the verification set; and loading test data to a student model and an illumination invariant feature extraction network to obtain corresponding RGB classification scores and PLGF classification scores, weighting and summing to obtain classification scores, and judging a classification result according to a threshold value. The method improves the robustness of the face spoofing detection model under the condition of insufficient training samples.

Description

Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Technical Field

The invention relates to the technical field of face identification anti-spoofing detection, in particular to a face spoofing detection method and a face spoofing detection system based on a meta-pseudo label and illumination invariant features.

Background

Today, there is a dramatic increase in the business and industry in the context of using facial biometric technology, for example, facial unlocking technology may be used to protect personal privacy in electronic devices, and facial biometric technology may be used to authenticate payments. However, using the face as a biometric feature for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally classified into four categories: 1) the photo attack is that an attacker deceives the authentication system by using a photo printed or displayed on a screen; 2) video replay attack, wherein an attacker utilizes a video deception authentication system of an attacker shot in advance; 3) the face mask attacks, an attacker wears a face mask deception system elaborately manufactured according to the attacked person, and the 3D printing technology is more mature at present, so that the manufactured 3D mask is more real; 4) and against sample attack, an attacker generates specific sample noise through a GAN network to interfere the face authentication system to generate wrong directional identity verification. These face spoofing attacks are not only cost effective but also fool the system, severely impacting and threatening the application of face recognition systems.

In related research, texture Features manually extracted by a traditional method, such as Local Binary Pattern (LBP), Histogram of Oriented Gradient (HoG), Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and the like, are not only relatively rough in texture details but also easily affected by illumination and scenes; compared with the traditional manual characteristics, the deep neural network method has obvious advantages in the aspect of learning texture characteristics, but the method of the pure data driving method has larger limitations in generalization performance and calculation complexity; the supervised learning approach is limited by the labeled training samples, because the gradient descent approach is intended to reduce the loss on the training set when optimizing the neural network model using the gradient descent approach, and therefore, it is likely that overfitting phenomena are caused due to insufficient training data quantity and diversity, and the generalization capability is limited.

The existing face anti-cheating detection algorithm can achieve a good detection effect in a library, but the accuracy rate of cross-library detection is sharply reduced. On one hand, the pure data driving method depends on training data too much, the phenomenon of overfitting can be caused by insufficient quantity and diversity of the training data, and the generalization capability can be seriously reduced when the method meets the characteristics of equipment with different shooting specifications, even gender, age and skin color; on the other hand, the illumination change influences the extraction of the intrinsic spoofed texture by the classification model. Meanwhile, a large amount of labeled training data is often needed for the model of the face recognition system, but it is difficult to collect samples of all collection scenes such as different illumination and shooting equipment, so that high requirements are placed on the diversity of data, and the generalization of the model is also restricted.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a face spoofing detection method and a face spoofing detection system based on a meta-pseudo label and an illumination invariant feature.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a face spoofing detection method based on a meta-pseudo label and an illumination invariant feature, which comprises the following steps:

picking up a face region image from an input image to obtain an RGB color channel image;

randomly cutting image blocks of an RGB color channel diagram to be trained, dividing the image blocks into labeled samples, unlabeled samples and enhanced unlabeled samples, and taking the samples as RGB branch training samples;

performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and performing data enhancement to be used as a sample for PLGF branch training;

building a student model and a teacher model;

constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss;

constructing a student meta-learning module, and sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters to obtain the meta-learning loss of students;

establishing a teacher updating module, and updating teacher model parameters by utilizing the semi-supervised loss of the teacher, the enhanced non-label loss and the student meta-learning loss;

iteratively updating network parameters of the student model and the teacher model by using an optimizer according to the loss function, and storing parameters of the teacher model and the student model after training is completed;

constructing an attention module based on the NeXtVLAD;

constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network;

inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, sending the feature vector and a real label into a triple loss function to obtain triple loss, obtaining cross entropy loss by the classification vector and the real label through a cross entropy function, updating an illumination invariant feature extraction network parameter by an optimizer according to the triple loss and the cross entropy loss, and storing the parameter of the illumination invariant feature extraction network after training;

sending the RGB color channel map of the face of the verification set into a student model to obtain RGB classification scores, meanwhile obtaining a PLGF map through illumination separation pretreatment, sending the PLGF map into an illumination invariant feature extraction network to obtain PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and false omission rate, and taking the thresholds with the same value as a test judgment threshold T;

sending the RGB color channel map of the face of the test set into a trained student model to obtain RGB classification scores, meanwhile obtaining a PLGF map through illumination separation pretreatment, sending the RGB classification scores and the PLGF classification scores into a trained illumination invariant feature extraction network to obtain PLGF classification scores, carrying out weighted summation on the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining a final predicted label value according to a test decision threshold T, and calculating a reference index.

As a preferred technical scheme, the method comprises the following steps of performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, performing data enhancement to serve as a sample for PLGF branch training, and specifically:

performing PLGF convolution on the facial features of the three color channels and the PLGF operator in the horizontal direction and the vertical direction respectively to obtain a horizontal gradient G_horAnd a vertical gradient G_ver；

According to the Lambert model, the horizontal and vertical gradients are subjected to illumination separation to obtain horizontal illumination separation gradient ISG_horAnd vertical illumination separation gradient ISG_ver；

And carrying out linear activation operation on the illumination separation gradients in the horizontal direction and the vertical direction to obtain a synthetic gradient ISG (isogenic/genetic Algorithm) to form a PLGF (planar/geometric gradient) image.

As a preferred technical scheme, the building of the student model and the teacher model comprises the following specific steps: a teacher model and a student model with the same network structure are built by utilizing a residual block ResBlock based on ResNet, a convolution layer with 3 ResBlock, a batch normalization layer, a global average pooling layer and a full connection layer are arranged, and the full connection layer outputs classification vectors.

As a preferred technical scheme, the obtaining of teacher semi-supervised loss, pseudo labels of unlabeled data and enhanced unlabeled loss specifically comprises the following steps:

sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher model to obtain a labeled classification vector P_T，lUnlabeled classification vector P_T，uAnd enhancing the unlabeled classification vector P_T，ua；

Tag classification vector P_T，lFirstly, obtaining the label loss through a softmax function and then through a smooth cross entropy function with the real label y

The smooth cross-entropy function is expressed as:

S＝Y(1-α)+α/2

wherein, alpha represents a smoothing coefficient, Y represents a real label vector of a real label Y subjected to One-Hot coding, and S represents the smoothed label vector;

unlabeled classification vector P_T，uDividing by a control coefficient T to obtain a soft label vector Z through a softmax function, and taking out a maximum value Z from the soft label vector Z_maxThe category in which the pseudo label y is constructed^pMaximum value z_maxAfter the threshold t, the confidence score M is obtained through judgment, and the label-free classification vector P is enhanced_T，uaAnd the soft label vector Z is multiplied by the confidence score M to obtain the label-free loss through a cross entropy function

Represented by the formula:

z_max＝max(Z)

y^p＝argmax(Z)

loss of label

And no loss of label

Teacher semi-supervised loss by weighted summation

Represented by the following formula:

where s is the current step number, s_tlLambda is the weight of no tag loss for the total number of steps;

enhanced label-free classification vector P_T，uaAfter passing through the softmax function, the maximum value p is taken_maxThe category of the teacher class serves as an enhanced hard tag h, and the enhanced non-tag classification loss of the teacher is obtained through a cross entropy function

Represented by the following formula:

p_max＝max(P_T，ua)

h＝argmax(P_T，ua)

wherein H is the enhanced hard tag vector of the enhanced hard tag H subjected to One-Hot coding.

As a preferred technical scheme, the obtaining of the meta-learning loss of the student comprises the following specific steps:

sending the labeled sample and the enhanced unlabeled sample into a student model to obtain a labeled classification score P_S，lAnd enhancing the unlabeled classification score P_S，ua；

Labeled classification score P_S，lFirstly, the old tag loss is obtained through a softmax function and then through a cross entropy function with the real tag y

Represented by the following formula:

wherein Y is a real label vector of a real label Y subjected to One-Hot coding;

enhancing label-free classification score P_S，uaBy softmax function, then with the pseudo tag y^pEnhanced label-free loss by smoothing cross-entropy function

Represented by the following formula:

S^p＝Y^p(1-α)+α/2

wherein α represents a smoothing coefficient, Y^pIndicating a false label y^pOne-Hot encoded true tag vector, S^pRepresenting the smoothed label vector;

followed by enhancement of label-free loss

Updating the student model network parameters by using a student optimizer;

sending the labeled sample into the student model with updated parameters to obtain a new labeled classification score P_S，l', first pass through soThe ftmax function and the real label are crossed with the entropy function to obtain new label loss

Loss of old label

Make a difference to obtain the student's primary learning loss

As a preferred technical scheme, the teacher model parameters are updated by utilizing teacher semi-supervised loss, enhanced unlabelled loss and student meta-learning loss, and the method specifically comprises the following steps:

teacher reinforcing label-free classification loss obtained by teacher learning module

The student's meta-learning loss obtained by the student's meta-learning module

Multiplying, and obtaining teacher semi-supervised loss by the teacher learning module

Adding to obtain teacher loss

According to teacher's loss

Updating teacher model network parameters and loss of teacher by optimizer

Expressed as:

as a preferred technical solution, the constructing of the attention module based on the NeXtVLAD specifically includes the steps of: taking a feature matrix I with the channel number of C and the feature dimension of N as input, and obtaining an extended feature X with the dimension of C multiplied by lambda N by using a multiple lambda dimension raising through a full connection layer_o；

Extended feature X_oObtaining a grouping matrix G with the dimension of CG multiplied by 1 through full connection layers and dimension transformation; extended feature X_oMultiplying the group attention matrix B by a group weight matrix W with the dimension of lambda N multiplied by GK, obtaining a group attention matrix B with the dimension of CG multiplied by K through a batch normalization layer, a softmax function and dimension transformation, and then performing point multiplication on the group attention matrix B and a grouping matrix G to obtain a group coefficient matrix A with the dimension of CG multiplied by K;

extended feature X_oObtaining the packet expansion characteristic X with the dimension of lambda N/G multiplied by CG through dimension transformation_eMultiplying the group coefficient matrix A to obtain a grouping feature matrix X; summing column vectors of the group coefficient matrix A to obtain A ', multiplying the A' by a cluster matrix C with the dimension of lambda N/G multiplied by K to obtain C ', and subtracting the C' from the grouping feature matrix X to obtain a local clustering matrix V; and finally, the local clustering matrix V is subjected to batch normalization layer and flattening, and local clustering vectors are output.

As a preferred technical scheme, the method comprises the following steps of constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination-invariant feature extraction network, wherein the method specifically comprises the following steps:

constructing a feature extraction backbone network by using a 4-layer convolution layer containing a jumper connection layer, a discarding layer and a pooling layer, wherein the input size is H multiplied by W multiplied by 3, and the output size is H multiplied by W multiplied by 3

Is adjusted to the flattened output

Then the feature X is shaped into

Is sent to the attention module, wherein the expansion coefficient is lambda and the number of packets isG, the number of clusters is K, and the output size is

And finally, outputting the classification vector through a full connection layer.

As a preferred technical scheme, the PLGF graph after data enhancement is input into an illumination invariant feature extraction network to obtain a feature vector and a classification score, the feature vector and a real label are sent into a triple loss function to obtain triple loss, the classification score and the real label obtain cross entropy loss through the cross entropy function, the triple loss and the cross entropy loss are used for supervision, an optimizer is used for updating network parameters, and a PLGF branch network model and parameters are stored after training is completed, and the method specifically comprises the following steps:

setting the batch size of the PLGF image x after input data enhancement as n, sending the PLGF image x into the illumination invariant feature extraction network f, outputting a feature vector group f (x) consisting of n feature vectors with dimension d, and obtaining triple loss through a triple loss function

The triplet loss function is represented as:

wherein,

is the anchor point(s) of the anchor,

is and

the samples of the same class are then compared to each other,

is and

the samples of the different classes of samples are,

represents the L2 distance, f (-) represents the feature vector output over the network f, γ is the edge coefficient;

inputting a PLGF image to the illumination invariant feature extraction network to obtain a classification vector p, obtaining cross entropy loss through a softmax function and then a cross entropy function with a real label y

The cross entropy function is expressed as:

y is a real label vector of a real label Y through One-Hot coding;

weighting and summing the cross entropy loss and the triple loss to obtain the PLGF loss

As a loss function of illumination invariant feature extraction network training, a specific calculation formula is as follows:

wherein α and β represent weights;

using SGD optimizer with momentum of mu to minimize PLGF loss L_PLGFExtracting network parameters for the target updating illumination invariant features, wherein the parameter updating formula is as follows:

θ_t+1＝θ_t+v_t+1

where v is the momentum velocity, t is the number of iterations of the current training,

as a model parameter theta_tIs the learning rate of the current generation.

Meanwhile, the learning rate is attenuated along with the training iteration times, and the learning rate updating formula is as follows:

where t is the iteration number of the current training, lr (t) is the corresponding learning rate at iteration t, s₁Is a first threshold number of steps, s₂Is a second threshold number of steps, epsilon₀As an initial learning rate, s₁、s₂、ε₀Preferred values are 200, 2000, 0.01, respectively.

The invention provides a face deception detection system based on a meta-pseudo label and an illumination invariant feature, which comprises the following steps: the system comprises a data preprocessing module, a student model and teacher model building module, a teacher learning module, a student meta-learning module, a teacher updating module, an attention module, an illumination invariant feature extraction network building module, an illumination invariant feature learning module, a verification module and a testing module;

the data preprocessing module is used for picking the face region image to obtain an RGB color channel image, randomly cutting an image block of the RGB color channel image to be trained into a labeled sample, an unlabeled sample and an enhanced unlabeled sample which are used as RGB branch training samples; carrying out illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and carrying out data enhancement to be used as a sample for PLGF branch training;

the student model and teacher model building module is used for building a student model and a teacher model, inputting RGB images and outputting classification scores;

the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss;

the student meta-learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters so as to obtain the meta-learning loss of students;

the teacher updating module is used for updating parameters of the teacher model by utilizing teacher semi-supervised loss, enhanced label-free loss and student meta-learning loss, iteratively updating network parameters of the student model and the teacher model by using an optimizer according to a loss function, and storing the parameters of the teacher model and the student model after training is finished;

the attention module is used for constructing an attention module based on the NeXtVLAD;

the illumination invariant feature extraction network construction module is used for constructing a feature extraction backbone network, embedding an attention module and constructing an illumination invariant feature extraction network;

the illumination invariant feature learning module is used for inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, the feature vector and a real label are sent into a triple loss function to obtain triple loss, the classification vector and the real label obtain cross entropy loss through the cross entropy function, an optimizer is used for updating illumination invariant feature extraction network parameters according to the triple loss and the cross entropy loss, and the parameters of the illumination invariant feature extraction network are stored after training is completed;

the verification module is used for respectively sending the RGB color channel map of the face of the verification set and the PLGF map obtained by illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network, respectively obtaining RGB classification scores and PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with labels, calculating a false alarm rate and a false omission rate, and taking the threshold value when the RGB classification scores and the PLGF classification scores are equal as a test judgment threshold value T;

the testing module is used for respectively sending the RGB color channel map of the face of the testing set and the PLGF map obtained through illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network to respectively obtain RGB classification scores and PLGF classification scores, then weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining a final predicted label value according to a testing judgment threshold T, and calculating a benchmark index.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) in the training stage, aiming at RGB color channel data, a semi-supervised learning network framework of 'teacher generation of pseudo labels and student feedback' is adopted and combined with meta-learning, so that on one hand, the diversity of training data is enriched by unmarked data with pseudo labels, and the problem of model training under the condition of limited label training data is solved; on the other hand, the model is trained progressively through meta-learning, the essential relation of the image block characteristics is excavated, and the learning capability of the network is improved, so that the unseen sample acquisition environment can be better generalized.

(2) In the training stage, illumination separation processing is adopted, a PLGF image is obtained from RGB color channel data, the PLGF image comprises material characteristics only related to a reflection coefficient and distinguishing clues such as deception noise and the like introduced by secondary imaging and color loss in illumination components, the influence of different illumination environments on the model performance is reduced, and the robustness of the face deception detection model can be improved through the extraction of essential characteristics by an illumination invariant characteristic extraction network with an attention module.

(3) In the detection stage, model test loading data are loaded to a student model and an illumination invariant feature extraction network to obtain corresponding RGB classification scores and PLGF classification scores, the classification scores are obtained through weighted summation, and classification results are judged according to a threshold value.

Drawings

FIG. 1 is a schematic flow chart of a face spoofing detection method based on meta-pseudo labels and illumination invariant features according to the present invention;

FIG. 2 is a schematic drawing of the PLGF of the present invention;

FIG. 3 is a schematic diagram of a network architecture of a student model and a teacher model according to the present invention;

FIG. 4 is a schematic diagram of an illumination invariant feature extraction network architecture according to the present invention;

FIG. 5 is a schematic diagram of a test flow according to the present invention;

FIG. 6 is an overall block diagram of the face spoofing detection system based on meta-pseudo labels and illumination invariant features of the present invention;

FIG. 7 is a schematic diagram of a teacher learning module of the present invention;

FIG. 8 is a diagram of a student meta-learning module of the present invention;

fig. 9 is a schematic diagram of an illumination invariant feature extraction network according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Examples

The embodiment uses the Replay-attach, CASIA-MFSD and MSU _ MFSD biopsy data sets for training and testing as examples, and the implementation process of the embodiment is described in detail. The Replay-attach data set comprises 1200 videos, real human faces from 50 testers and generated deceptive human faces are collected by using a MacBook camera with the resolution of 320 x 240 pixels, and the real human faces and the deceptive human faces are divided into a training set, a verification set and a test set according to the ratio of 3:3: 4; the CASIA-MFSD data set comprises 600 videos, real faces from 50 testers and deceptive faces generated according to the real faces are collected by three cameras with the resolutions of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels respectively, and the videos are divided into a training set and a testing set according to the ratio of 2: 3; the MSU _ MFSD data set includes 280 videos, with real faces from 35 testers and spoofed faces generated therefrom, 15 for the training set and 20 for the testing set. Since the CASIA-MFSD and MSU _ MFSD live test datasets do not contain a validation set, the present embodiment performs threshold determination using the corresponding test set as the validation set for both datasets. And then, framing the videos of the data set to obtain pictures, and randomly extracting 4000 samples from the pictures of the training set as labeled samples and 16000 samples as unlabeled samples to form the training set. The embodiment is carried out on a Linux system and is mainly implemented on the basis of a deep learning framework Pytrich1.6.1, wherein the used display cards are GTX1080Ti, CUDA version is 10.1.105 and cudnn version 7.6.4.

As shown in fig. 1, the present embodiment provides a face spoofing detection method based on meta-pseudo labels and illumination invariant features, including the following steps:

s1: picking up a face region image from an input image to obtain an RGB color channel image;

in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue.

S2: randomly cutting an image block of an RGB color channel diagram to be trained, dividing the image block into a labeled sample, an unlabeled sample and an enhanced unlabeled sample, and taking the samples as RGB branch training samples;

in this embodiment, the RGB color channel map input size is H × W, and the randomly cropped image block size is H × W

And randomly selecting N labeled samples and unlabeled samples from all training set samples, wherein the number ratio of the labeled samples to the unlabeled samples is mu. And then random data enhancement is carried out on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, histogram equalization, inverting, randomly rotating pixel values with the lowest bit positions of 0-4, horizontally and vertically shearing, horizontally translating, vertically translating, inverting all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for enhancement of each sample. H, W, N, mu preferred value in this example256, 4000, 4;

s3: as shown in fig. 2, an RGB color channel image to be trained is subjected to illumination separation preprocessing to obtain a PLGF image, data enhancement is performed to be used as a sample of PLGF branch training, a first line in the image is an original image after data enhancement, a second line is an image subjected to illumination separation convolution, the first and second columns are true samples, and the third and fourth columns are false samples;

in this embodiment, the specific steps include: firstly, the PLGF convolution is respectively carried out on the human face features of three color channels and a PLGF operator in the horizontal direction and the vertical direction to obtain a horizontal gradient G_horAnd a vertical gradient G_ver. The PLGF convolution is specifically expressed by the following formula:

wherein f is_horAnd f_verRespectively, Pattern of Local Graphical Force (PLGF): A3X 3 convolution kernel in horizontal and vertical directions for the Local gravity mode (PLGF) in the novel Local Image Descriptor. I [ x, y]Is the pixel value of coordinate (x, y), G_d[x，y]Is the directional gradient of the coordinates (x, y).

Then, according to a Lambert model, the horizontal and vertical gradients are subjected to illumination separation to obtain a horizontal illumination separation gradient ISG_horAnd vertical illumination separation gradient ISG_ver. The illumination separation is specifically to divide the gradient by the self pixel value of the minimum value for preventing zero division, and because the illumination intensity changes slowly to be the constant value L in a small area, the illumination component L can be eliminated to obtain the human face texture characteristic only related to the reflection coefficient, and the human face texture characteristic has abundant texture information and can be used as the effective characteristic of deception detection. The light fractionation is specifically expressed by the following formula:

d∈{hor，ver}

wherein, I [ x, y ] is pixel value with (x, y) coordinate, R [ x, y ] is reflection coefficient of the coordinate pixel, and L [ x, y ] is illumination intensity imaged by the coordinate pixel.

And then carrying out linear activation operation on the illumination separation gradient in the horizontal direction and the vertical direction to obtain a synthetic gradient ISG (isogenic/anagenic) which forms a PLGF (planar gradient) diagram as shown in the following formula:

and finally, performing data enhancement on the PLGF image, wherein the data enhancement comprises randomly horizontally overturning with the probability of 0.5, randomly selecting a region with the proportion of 1 or 2 blocks in the image region of 0.005-0.1 with the probability of 0.5, setting the pixel values of three channels as 0.4914, 0.4822 and 0.4465, and randomly selecting a region with the pixel size of 0-28 as 0.

S4: building a student model and a teacher model;

as shown in fig. 3, a teacher model and a student model having the same network structure are constructed using a residual block ResBlock based on ResNet, the input resolution is set to H × W × 3, and a size of H × W × 3 is obtained by providing 3 resblocks of convolution layers

The initial feature map is processed by a batch normalization layer and a global average pooling layer, discarded and flattened to obtain a feature vector with the size of 128, and finally the feature vector is sent to a full connection layer with the neuron output number of 2 to obtain a classification vector. H, W of the present embodiment has preferred values of 256, 256;

s5: constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of the unlabeled data and enhanced unlabeled loss;

in this embodiment, the specific steps include:

The smooth cross-entropy function is expressed as:

S＝Y(1-α)+α/2

wherein, alpha represents a smoothing coefficient, the preferred value is 0.8, Y represents a real label vector of a real label Y subjected to One-Hot coding, and S represents the smoothed label vector;

unlabeled classification vector P_T，uDividing by a control coefficient T, wherein the optimal value of T is 0.7, obtaining a soft label vector Z through a softmax function, and taking out a maximum value Z of the soft label vector Z_maxThe category in which the pseudo label y is constructed^pMaximum value z_maxAfter a threshold t, wherein the optimal value of t is 0.6, a confidence score M is obtained through judgment, and a label-free classification vector P is enhanced_T，uaAnd the soft label vector Z is multiplied by the confidence score M to obtain the label-free loss through a cross entropy function

Represented by the formula:

z_max＝max(Z)

y^p＝argmax(Z)

loss of label

And no loss of label

Teacher semi-supervised loss by weighted summation

Represented by the following formula:

p_max＝max(P_T，ua)

h＝argmax(P_T，ua)

S6: constructing a student meta-learning module, and sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters to obtain the meta-learning loss of students;

in this embodiment, the specific steps include:

Represented by the following formula:

wherein Y is a real label vector of a real label Y subjected to One-Hot coding;

Represented by the following formula:

S^p＝Y^p(1-α)+α/2

followed by enhancement of label-free loss

Updating the student model network parameters by using a student optimizer;

sending the labeled sample into the student model with updated parameters to obtain a new labeled classification score P_s，l', first of allObtaining new label loss through softmax function and then cross entropy function with real label

Loss of old label

Make a difference to obtain the student's primary learning loss

S7: establishing a teacher updating module, and updating teacher model parameters by utilizing the semi-supervised loss of the teacher, the enhanced non-label loss and the student meta-learning loss;

in this embodiment, the specific steps include: teacher reinforcing label-free classification loss obtained by teacher learning module

The student's meta-learning loss obtained by the student's meta-learning module

Adding to obtain teacher loss

Then according to

The field optimizer updates teacher model network parameters, and the teacher loss is expressed as:

s8: iteratively updating network parameters of the student model and the teacher model by using an optimizer according to the loss function, and storing parameters of the teacher model and the student model after training is completed;

in this embodiment, the specific steps include: the teacher model and the student model both adopt SGD optimizers with nerasterov momentum, wherein the momentum is mu, the optimal value is 0.9, and the initial value of the learning rate is epsilon₀Preferably, the value is 0.05, and the parameter updating formula is as follows:

as a model parameter theta_tIs the current learning rate.

wherein t is the iteration number of the current training, and lr (t) is the corresponding learning rate at the t-th iteration. Preheating step number of teacher model optimizer

Number of waiting steps

Total number of steps

The optimal values are 1000, 0 and 64000 respectively, and the preheating step number of the student model optimizer

Number of waiting steps

Total number of steps

Preferred values are 1000, 3000, 64000 respectively.

Loss of label-free according to enhancements in student meta-learning module

Optimizing with a student model optimizer with a goal of minimizing losses; based on teacher loss in teacher update module

Optimization with a teacher model optimizer with the goal of minimizing losses.

S9: constructing an attention module based on the NeXtVLAD;

in this embodiment, the specific steps include:

and setting an expansion coefficient lambda, a grouping number G and a cluster number K, wherein the preferred values are 2, 7 and 7 respectively.

Taking a feature matrix I with the channel number of C and the feature dimension of N as input, and obtaining an extended feature X with the dimension of C multiplied by lambda N by using a multiple lambda dimension raising through a full connection layer_oWherein preferred values for C and N are 32, 196, respectively;

extended feature X_oObtaining the packet expansion characteristic X with the dimension of lambda N/G multiplied by CG through dimension transformation_eMultiplying the group coefficient matrix A to obtain a grouping feature matrix X; column vectors of the set coefficient matrix aSumming to obtain A ', multiplying the A' by a cluster matrix C with the dimension of lambda N/G multiplied by K to obtain C ', and subtracting the grouping characteristic matrix X from the C' to obtain a local clustering matrix V; and finally, the local clustering matrix V is subjected to batch normalization layer and flattening, and local clustering vectors are output.

S10: constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network;

as shown in fig. 4, the specific steps include: constructing a feature extraction network by using 4 layers of convolution layers including jumping, discarding and pooling layers, setting the input size to be H multiplied by W multiplied by 3 and the output size to be H multiplied by W multiplied by 3

Is adjusted to the flattened output

Then the feature X adjusts the shape to

The feature matrix X' is sent to the attention module, wherein the expansion coefficient is lambda, the grouping number is G, the cluster number is K, and the output size is

Finally, the local clustering vectors are sent to the full connection layer with the neuron output number of 2 to output classification vectors p. In the present embodiment, preferred values of H, W, λ, G, K are 224, 2, 7, respectively.

S11: inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain feature vectors and classification vectors, and sending the feature vectors and the real labels into a triple loss function to obtain triple losses

Obtaining cross entropy loss by the true and false classification vectors and the true labels through a cross entropy function

Will cross-lose, ternaryWeighting and summing the group losses to obtain total loss, updating network parameters by using an optimizer, supervising training, and storing illumination invariant features after training to extract a network model and parameters;

in this embodiment, the specific steps include:

The triplet loss function is represented as:

wherein,

is the anchor point(s) of the anchor,

is and

the samples of the same class are then compared to each other,

is and

the samples of the different classes of samples are,

representing the L2 distance, f (-) represents the feature vector output over the network f. γ is an edge coefficient, and is set to 0.2.

The cross entropy function is expressed as:

and Y is a real label vector of the real label Y subjected to One-Hot coding.

wherein the weights α and β are preferably 1 and 1, respectively.

Adopting an SGD optimizer with momentum, wherein the momentum is mu, the optimal value is 0.9, and the parameter updating formula is as follows:

θ_t+1＝θ_t+v_t+1

as a model parameter theta_tIs the learning rate of the current generation.

where t is the iteration number of the current training, lr (t) is the corresponding learning rate at iteration t, s₁Is a first threshold number of steps, s₂Is a second threshold number of steps, epsilon₀As an initial learning rate, s₁、s₂、ε₀The values were set to 200, 2000, and 0.01.

S12: determining a threshold value by using the verification set;

in this embodiment, the specific steps include: sending the face RGB color channel map of the verification set into a student model to obtain an RGB classification vector p^RGBTaking out the data from the softmax function

As RGB classification scores; meanwhile, the RGB color channel map is subjected to illumination separation pretreatment to obtain a PLGF map, the PLGF map is sent to an illumination invariant feature extraction network to obtain a PLGF classification vector p^PLGFTaking out the data from the softmax function

As PLGF classification scores; then weighting and summing the RGB classification scores and the PLGF classification scores to obtain a total classification score, wherein the classification score is as follows:

wherein alpha is 0.8 and beta is 0.2; then, sampling at equal intervals in a value range (0, 1) to obtain different judgment thresholds, obtaining a predicted label value according to the thresholds, comparing the predicted label value with a real label, calculating a false alarm rate and a missing detection rate, and taking the threshold with the same value as a test judgment threshold T;

s13: testing the model;

as shown in fig. 5, the specific steps include: sending the RGB color channel map of the face of the test set into a student model to obtain an RGB classification vector p^RGBTaking out the data from the softmax function

wherein alpha is 0.8 and beta is 0.2; and then, obtaining a final predicted label value according to the test judgment threshold value T, and calculating a reference index.

The performance evaluation indexes of the face spoofing detection algorithm in this embodiment adopt a False Acceptance Rate (FAR), a False Rejection Rate (FRR), a True Acceptance Rate (TAR), an Equal Error Rate (EER), a Half Error Rate (Half Total Error Rate, hter, which are described in detail in the confusion matrix of table 1:

table 1 confusion matrix table

Tagging/prediction	The prediction is true	Prediction of false
			The label is true	TA	FR
The label is false	FA	TR

The False Acceptance Rate (FAR) refers to the ratio of the number of live faces determined by the non-live faces to the number of non-live faces labeled:

the False Rejection Rate (FRR) is the ratio of the number of non-live faces determined by live faces to the number of live faces labeled:

the correct acceptance rate (TAR) is the ratio of the number of live faces determined by the live faces to the number of live faces labeled:

equal Error Rate (EER) is the error rate when FRR and FAR are equal;

half error rate (HTER) is the mean of FRR and FAR:

in order to prove the effectiveness of the invention and test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on CASIA-MFSD, Replay-Attack and MSU-MFSD databases. The in-library and cross-library experimental results are shown in tables 2 and 3, respectively:

table 2 library of experimental results

TABLE 3 Cross-Bank Experimental results

As can be seen from Table 2, the half error rate and the equal error rate of the method in the library are both low, and the library has excellent performance of deception detection; as can be seen from table 3, the half-error rate of cross-library detection is also lower compared to the current method; compared with texture analysis methods such as LBP, HoG, SIFT, SURF and the like, the method has the advantages that the influence of scene and illumination change is reduced, the extracted illumination invariant features are kept, rich texture information reflecting reflection coefficients is reserved, and deception traces can be effectively detected; the training set is composed of labeled samples and unlabeled samples of a small number of frames extracted from each section of training set video, the diversity of training data is enriched through data augmentation and unlabeled data with pseudo labels, and the learning capability and model enlightening capability of limited sample characteristics are improved through progressive training of the model through meta-learning. The experimental results prove that under the condition that the labeled training samples are insufficient, the high accuracy in the library is ensured, the cross-library error rate is greatly reduced, and the generalization performance is obviously improved.

Example 2

As shown in fig. 6, this embodiment provides a face spoofing detection system based on meta-pseudo labels and illumination invariant features, including: the system comprises a data preprocessing module, a student model and teacher model building module, a teacher learning module, a student meta-learning module, a teacher updating module, an attention module, an illumination invariant feature extraction network building module, an illumination invariant feature learning module, a verification module and a testing module;

in this embodiment, the data preprocessing module is configured to extract a face region image to obtain an RGB color channel map, randomly cut out image blocks from the RGB color channel map to be trained, divide the image blocks into labeled samples and unlabeled samples, and perform random data enhancement on the unlabeled samples to obtain enhanced unlabeled samples, which are used as RGB branch training samples; performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and performing data enhancement to be used as a sample for PLGF branch training;

in this embodiment, the student model and teacher model building module is configured to build a student model and a teacher model, input an RGB image, and output a classification score;

as shown in fig. 7, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo-label of the unlabeled data, and enhanced unlabeled loss;

as shown in fig. 8, the student meta-learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the student meta-learning module to update the student model parameters, so as to obtain a meta-learning loss of the student;

in this embodiment, the teacher updating module is configured to update parameters of the teacher model by using teacher semi-supervised loss, enhanced unlabelled loss, and student meta-learning loss, iteratively update network parameters of the student model and the teacher model by using an optimizer according to a loss function, and store parameters of the teacher model and the student model after training is completed;

in this embodiment, the attention module is used to construct a NeXtVLAD-based attention module;

as shown in fig. 9, the illumination invariant feature extraction network construction module is used to build a feature extraction backbone network, embed an attention module, and build an illumination invariant feature extraction network;

in this embodiment, the illumination invariant feature learning module is configured to input the data-enhanced PLGF map to an illumination invariant feature extraction network to obtain a feature vector and a classification vector, send the feature vector and a real label to a triplet loss function to obtain a triplet loss, obtain a cross entropy loss by the classification vector and the real label through a cross entropy function, update an illumination invariant feature extraction network parameter with an optimizer according to the triplet loss and the cross entropy loss, and store the parameter of the illumination invariant feature extraction network after training;

in this embodiment, the verification module is configured to send a verification set face RGB color channel map and a PLGF map obtained through illumination separation preprocessing to a trained student model and an illumination invariant feature extraction network, respectively obtain RGB classification scores and PLGF classification scores, then perform weighted summation on the RGB classification scores and the PLGF classification scores to obtain a total classification score, obtain predicted label values according to different decision thresholds, compare the predicted label values with the labels, calculate a false alarm rate and a false drop rate, and take the threshold value when the two are equal as a test decision threshold T;

in this embodiment, the test module is configured to send the RGB color channel maps of the test set face and the PLGF maps obtained through illumination separation preprocessing to the trained student model and the illumination invariant feature extraction network, respectively obtain RGB classification scores and PLGF classification scores, then perform weighted summation on the RGB classification scores and the PLGF classification scores to obtain a total classification score, obtain a final predicted label value according to the test decision threshold T, and calculate a benchmark index.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A face spoofing detection method based on a meta-pseudo label and an illumination invariant feature is characterized by comprising the following steps:

randomly cutting image blocks of an RGB color channel diagram to be trained, dividing the image blocks into labeled samples, unlabeled samples and enhanced unlabeled samples, and taking the labeled samples, the unlabeled samples and the enhanced unlabeled samples as RGB branch training samples;

building a student model and a teacher model;

constructing an attention module based on the NeXtVLAD;

2. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features according to claim 1, wherein the method comprises the following specific steps of performing illumination separation preprocessing on an RGB color channel image to be trained to obtain a PLGF image, performing data enhancement to serve as a sample for PLGF branch training:

3. The method for detecting face spoofing based on the meta-pseudo label and the illumination invariant feature of claim 1, wherein the specific steps of constructing the student model and the teacher model comprise: a teacher model and a student model with the same network structure are built by utilizing a residual block ResBlock based on ResNet, a convolution layer with 3 ResBlock, a batch normalization layer, a global average pooling layer and a full connection layer are arranged, and the full connection layer outputs classification vectors.

4. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the steps of obtaining teacher semi-supervised loss, pseudo labels of unlabeled data and enhanced unlabeled loss comprise:

feeding the labeled swatches, unlabeled swatches, and enhanced unlabeled swatches to a teacherTeacher model to obtain labeled classification vector P_T，lUnlabeled classification vector P_T，uAnd enhancing the unlabeled classification vector P_T，ua；

The smooth cross-entropy function is expressed as:

S＝Y(1-α)+α/2

Represented by the formula:

z_max＝max(Z)

y^p＝argmax(Z)

loss of label

And no loss of label

Teacher semi-supervised loss by weighted summation

Represented by the following formula:

Represented by the following formula:

p_max＝max(P_T，ua)

h＝argmax(P_T，ua)

5. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the obtaining of meta-learning loss of students comprises the following specific steps:

Represented by the following formula:

wherein Y is a real label vector of a real label Y subjected to One-Hot coding;

Represented by the following formula:

S^p＝Y^p(1-α)+α/2

followed by enhancement of label-free loss

Updating the student model network parameters by using a student optimizer;

sending the labeled sample into the student model with updated parameters to obtain a new labeled classification score P_S，l', first pass through sThe soft max function and then the real label are crossed with the entropy function to obtain the new labeled loss

Loss of old label

Make a difference to obtain the student's primary learning loss

6. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the teacher model parameters are updated by teacher semi-supervised loss, enhanced unlabeled loss and student meta-learning loss, and the specific steps comprise:

The student's meta-learning loss obtained by the student's meta-learning module

Adding to obtain teacher loss

Then according to teacher loss

Updating teacher model network parameters and loss of teacher by optimizer

Expressed as:

7. the method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the step of constructing the attention module based on the NeXtVLAD comprises the following specific steps:

taking a feature matrix I with the channel number of C and the feature dimension of N as input, and obtaining an extended feature X with the dimension of C multiplied by lambda N by using a multiple lambda dimension raising through a full connection layer_o；

8. The method for detecting face spoofing based on the meta-pseudo label and the illumination invariant feature according to claim 1, wherein the method comprises the following specific steps of constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network:

constructing a feature extraction backbone network by using a 4-layer convolution layer containing a jumper connection layer, a discarding layer and a pooling layer, setting the input size to be H multiplied by W multiplied by 3, and outputtingHas a size of

Is adjusted to the flattened output

The feature vector of (2); then the characteristic X adjusts the shape to

Finally, the local clustering vectors are sent to the full connection layer with the neuron output number of 2 to output classification vectors.

9. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features according to claim 1, wherein the PLGF map after data enhancement is input to an illumination invariant feature extraction network to obtain feature vectors and classification vectors, the feature vectors and the real labels are sent to a triplet loss function to obtain triplet losses, the classification vectors and the real labels obtain cross entropy losses through a cross entropy function, the illumination invariant feature extraction network parameters are updated by an optimizer according to the triplet losses and the cross entropy losses, and the parameters of the illumination invariant feature extraction network are saved after training is completed, the method comprising the following specific steps:

The triplet loss function is represented as:

wherein,

is the anchor point(s) of the anchor,

is and

the samples of the same class are then compared to each other,

is and

the samples of the different classes of samples are,

The cross entropy function is expressed as:

y is a real label vector of a real label Y through One-Hot coding;

wherein α and β represent weights;

using SGD optimizers with momentums of mu to minimize PLGF losses

And extracting network parameters for the target updating illumination invariant features.

10. A face spoofing detection system based on meta-pseudo labels and illumination invariant features, comprising: the system comprises a data preprocessing module, a student model and teacher model building module, a teacher learning module, a student meta-learning module, a teacher updating module, an attention module, an illumination invariant feature extraction network building module, an illumination invariant feature learning module, a verification module and a testing module;

the data preprocessing module is used for picking up a face region image to obtain an RGB color channel map, randomly cutting an image block of the RGB color channel map to be trained into a labeled sample, an unlabeled sample and an enhanced unlabeled sample which are used as RGB branch training samples; carrying out illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and carrying out data enhancement to be used as a sample for PLGF branch training;

the teacher updating module is used for updating parameters of a teacher model by utilizing teacher semi-supervised loss, enhanced label-free loss and student meta-learning loss, iteratively updating network parameters of the student model and the teacher model by using an optimizer according to a loss function, and storing the parameters of the teacher model and the student model after training is finished;

the illumination invariant feature learning module is used for inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, the feature vector and a real label are sent into a triple loss function to obtain triple loss, the classification vector and the real label obtain cross entropy loss through the cross entropy function, the triple loss and the cross entropy loss are used for supervision, an optimizer is used for updating network parameters, and the parameters of the illumination invariant feature extraction network are stored after training is completed;

the verification module is used for respectively sending the RGB color channel map of the face of the verification set and the PLGF map obtained by illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network, respectively obtaining RGB classification scores and PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and false omission rate, and taking the threshold value when the two are equal as a test judgment threshold value T;