CN114067444A - Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature - Google Patents

Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature Download PDF

Info

Publication number
CN114067444A
CN114067444A CN202111185654.2A CN202111185654A CN114067444A CN 114067444 A CN114067444 A CN 114067444A CN 202111185654 A CN202111185654 A CN 202111185654A CN 114067444 A CN114067444 A CN 114067444A
Authority
CN
China
Prior art keywords
loss
teacher
label
plgf
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111185654.2A
Other languages
Chinese (zh)
Inventor
冯浩宇
王宇飞
胡永健
蔡楚鑫
葛治中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Sino Singapore International Joint Research Institute
Original Assignee
South China University of Technology SCUT
Sino Singapore International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Sino Singapore International Joint Research Institute filed Critical South China University of Technology SCUT
Priority to CN202111185654.2A priority Critical patent/CN114067444A/en
Publication of CN114067444A publication Critical patent/CN114067444A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face deception detection method and a system based on a meta-pseudo label and an illumination invariant feature, wherein the method comprises the following steps: preprocessing data to obtain an RGB color channel image and a PLGF image; the RGB color channel diagram is divided into a labeled sample, an unlabeled sample and an enhanced unlabeled sample, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of the unlabeled sample and enhanced unlabeled loss, and parameters of a student model and a teacher model are updated; sending the PLGF image into an illumination invariant feature extraction network to obtain feature vectors and classification vectors, and storing a network model and parameters after supervised training by utilizing triple loss and cross entropy loss; determining a threshold value by using the verification set; and loading test data to a student model and an illumination invariant feature extraction network to obtain corresponding RGB classification scores and PLGF classification scores, weighting and summing to obtain classification scores, and judging a classification result according to a threshold value. The method improves the robustness of the face spoofing detection model under the condition of insufficient training samples.

Description

Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
Technical Field
The invention relates to the technical field of face identification anti-spoofing detection, in particular to a face spoofing detection method and a face spoofing detection system based on a meta-pseudo label and illumination invariant features.
Background
Today, there is a dramatic increase in the business and industry in the context of using facial biometric technology, for example, facial unlocking technology may be used to protect personal privacy in electronic devices, and facial biometric technology may be used to authenticate payments. However, using the face as a biometric feature for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally classified into four categories: 1) the photo attack is that an attacker deceives the authentication system by using a photo printed or displayed on a screen; 2) video replay attack, wherein an attacker utilizes a video deception authentication system of an attacker shot in advance; 3) the face mask attacks, an attacker wears a face mask deception system elaborately manufactured according to the attacked person, and the 3D printing technology is more mature at present, so that the manufactured 3D mask is more real; 4) and against sample attack, an attacker generates specific sample noise through a GAN network to interfere the face authentication system to generate wrong directional identity verification. These face spoofing attacks are not only cost effective but also fool the system, severely impacting and threatening the application of face recognition systems.
In related research, texture Features manually extracted by a traditional method, such as Local Binary Pattern (LBP), Histogram of Oriented Gradient (HoG), Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and the like, are not only relatively rough in texture details but also easily affected by illumination and scenes; compared with the traditional manual characteristics, the deep neural network method has obvious advantages in the aspect of learning texture characteristics, but the method of the pure data driving method has larger limitations in generalization performance and calculation complexity; the supervised learning approach is limited by the labeled training samples, because the gradient descent approach is intended to reduce the loss on the training set when optimizing the neural network model using the gradient descent approach, and therefore, it is likely that overfitting phenomena are caused due to insufficient training data quantity and diversity, and the generalization capability is limited.
The existing face anti-cheating detection algorithm can achieve a good detection effect in a library, but the accuracy rate of cross-library detection is sharply reduced. On one hand, the pure data driving method depends on training data too much, the phenomenon of overfitting can be caused by insufficient quantity and diversity of the training data, and the generalization capability can be seriously reduced when the method meets the characteristics of equipment with different shooting specifications, even gender, age and skin color; on the other hand, the illumination change influences the extraction of the intrinsic spoofed texture by the classification model. Meanwhile, a large amount of labeled training data is often needed for the model of the face recognition system, but it is difficult to collect samples of all collection scenes such as different illumination and shooting equipment, so that high requirements are placed on the diversity of data, and the generalization of the model is also restricted.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a face spoofing detection method and a face spoofing detection system based on a meta-pseudo label and an illumination invariant feature.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a face spoofing detection method based on a meta-pseudo label and an illumination invariant feature, which comprises the following steps:
picking up a face region image from an input image to obtain an RGB color channel image;
randomly cutting image blocks of an RGB color channel diagram to be trained, dividing the image blocks into labeled samples, unlabeled samples and enhanced unlabeled samples, and taking the samples as RGB branch training samples;
performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and performing data enhancement to be used as a sample for PLGF branch training;
building a student model and a teacher model;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss;
constructing a student meta-learning module, and sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters to obtain the meta-learning loss of students;
establishing a teacher updating module, and updating teacher model parameters by utilizing the semi-supervised loss of the teacher, the enhanced non-label loss and the student meta-learning loss;
iteratively updating network parameters of the student model and the teacher model by using an optimizer according to the loss function, and storing parameters of the teacher model and the student model after training is completed;
constructing an attention module based on the NeXtVLAD;
constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network;
inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, sending the feature vector and a real label into a triple loss function to obtain triple loss, obtaining cross entropy loss by the classification vector and the real label through a cross entropy function, updating an illumination invariant feature extraction network parameter by an optimizer according to the triple loss and the cross entropy loss, and storing the parameter of the illumination invariant feature extraction network after training;
sending the RGB color channel map of the face of the verification set into a student model to obtain RGB classification scores, meanwhile obtaining a PLGF map through illumination separation pretreatment, sending the PLGF map into an illumination invariant feature extraction network to obtain PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and false omission rate, and taking the thresholds with the same value as a test judgment threshold T;
sending the RGB color channel map of the face of the test set into a trained student model to obtain RGB classification scores, meanwhile obtaining a PLGF map through illumination separation pretreatment, sending the RGB classification scores and the PLGF classification scores into a trained illumination invariant feature extraction network to obtain PLGF classification scores, carrying out weighted summation on the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining a final predicted label value according to a test decision threshold T, and calculating a reference index.
As a preferred technical scheme, the method comprises the following steps of performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, performing data enhancement to serve as a sample for PLGF branch training, and specifically:
performing PLGF convolution on the facial features of the three color channels and the PLGF operator in the horizontal direction and the vertical direction respectively to obtain a horizontal gradient GhorAnd a vertical gradient Gver
According to the Lambert model, the horizontal and vertical gradients are subjected to illumination separation to obtain horizontal illumination separation gradient ISGhorAnd vertical illumination separation gradient ISGver
And carrying out linear activation operation on the illumination separation gradients in the horizontal direction and the vertical direction to obtain a synthetic gradient ISG (isogenic/genetic Algorithm) to form a PLGF (planar/geometric gradient) image.
As a preferred technical scheme, the building of the student model and the teacher model comprises the following specific steps: a teacher model and a student model with the same network structure are built by utilizing a residual block ResBlock based on ResNet, a convolution layer with 3 ResBlock, a batch normalization layer, a global average pooling layer and a full connection layer are arranged, and the full connection layer outputs classification vectors.
As a preferred technical scheme, the obtaining of teacher semi-supervised loss, pseudo labels of unlabeled data and enhanced unlabeled loss specifically comprises the following steps:
sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher model to obtain a labeled classification vector PT,lUnlabeled classification vector PT,uAnd enhancing the unlabeled classification vector PT,ua
Tag classification vector PT,lFirstly, obtaining the label loss through a softmax function and then through a smooth cross entropy function with the real label y
Figure BDA0003299096500000041
The smooth cross-entropy function is expressed as:
S=Y(1-α)+α/2
Figure BDA0003299096500000042
wherein, alpha represents a smoothing coefficient, Y represents a real label vector of a real label Y subjected to One-Hot coding, and S represents the smoothed label vector;
unlabeled classification vector PT,uDividing by a control coefficient T to obtain a soft label vector Z through a softmax function, and taking out a maximum value Z from the soft label vector ZmaxThe category in which the pseudo label y is constructedpMaximum value zmaxAfter the threshold t, the confidence score M is obtained through judgment, and the label-free classification vector P is enhancedT,uaAnd the soft label vector Z is multiplied by the confidence score M to obtain the label-free loss through a cross entropy function
Figure BDA0003299096500000043
Represented by the formula:
zmax=max(Z)
yp=argmax(Z)
Figure BDA0003299096500000051
Figure BDA0003299096500000052
loss of label
Figure BDA0003299096500000053
And no loss of label
Figure BDA0003299096500000054
Teacher semi-supervised loss by weighted summation
Figure BDA0003299096500000055
Represented by the following formula:
Figure BDA0003299096500000056
where s is the current step number, stlLambda is the weight of no tag loss for the total number of steps;
enhanced label-free classification vector PT,uaAfter passing through the softmax function, the maximum value p is takenmaxThe category of the teacher class serves as an enhanced hard tag h, and the enhanced non-tag classification loss of the teacher is obtained through a cross entropy function
Figure BDA0003299096500000057
Represented by the following formula:
pmax=max(PT,ua)
h=argmax(PT,ua)
Figure BDA0003299096500000058
wherein H is the enhanced hard tag vector of the enhanced hard tag H subjected to One-Hot coding.
As a preferred technical scheme, the obtaining of the meta-learning loss of the student comprises the following specific steps:
sending the labeled sample and the enhanced unlabeled sample into a student model to obtain a labeled classification score PS,lAnd enhancing the unlabeled classification score PS,ua
Labeled classification score PS,lFirstly, the old tag loss is obtained through a softmax function and then through a cross entropy function with the real tag y
Figure BDA0003299096500000059
Represented by the following formula:
Figure BDA00032990965000000510
wherein Y is a real label vector of a real label Y subjected to One-Hot coding;
enhancing label-free classification score PS,uaBy softmax function, then with the pseudo tag ypEnhanced label-free loss by smoothing cross-entropy function
Figure BDA00032990965000000511
Represented by the following formula:
Sp=Yp(1-α)+α/2
Figure BDA0003299096500000061
wherein α represents a smoothing coefficient, YpIndicating a false label ypOne-Hot encoded true tag vector, SpRepresenting the smoothed label vector;
followed by enhancement of label-free loss
Figure BDA0003299096500000062
Updating the student model network parameters by using a student optimizer;
sending the labeled sample into the student model with updated parameters to obtain a new labeled classification score PS,l', first pass through soThe ftmax function and the real label are crossed with the entropy function to obtain new label loss
Figure BDA0003299096500000063
Loss of old label
Figure BDA0003299096500000064
Make a difference to obtain the student's primary learning loss
Figure BDA0003299096500000065
As a preferred technical scheme, the teacher model parameters are updated by utilizing teacher semi-supervised loss, enhanced unlabelled loss and student meta-learning loss, and the method specifically comprises the following steps:
teacher reinforcing label-free classification loss obtained by teacher learning module
Figure BDA0003299096500000066
The student's meta-learning loss obtained by the student's meta-learning module
Figure BDA0003299096500000067
Multiplying, and obtaining teacher semi-supervised loss by the teacher learning module
Figure BDA0003299096500000068
Adding to obtain teacher loss
Figure BDA0003299096500000069
According to teacher's loss
Figure BDA00032990965000000610
Updating teacher model network parameters and loss of teacher by optimizer
Figure BDA00032990965000000611
Expressed as:
Figure BDA00032990965000000612
as a preferred technical solution, the constructing of the attention module based on the NeXtVLAD specifically includes the steps of: taking a feature matrix I with the channel number of C and the feature dimension of N as input, and obtaining an extended feature X with the dimension of C multiplied by lambda N by using a multiple lambda dimension raising through a full connection layero
Extended feature XoObtaining a grouping matrix G with the dimension of CG multiplied by 1 through full connection layers and dimension transformation; extended feature XoMultiplying the group attention matrix B by a group weight matrix W with the dimension of lambda N multiplied by GK, obtaining a group attention matrix B with the dimension of CG multiplied by K through a batch normalization layer, a softmax function and dimension transformation, and then performing point multiplication on the group attention matrix B and a grouping matrix G to obtain a group coefficient matrix A with the dimension of CG multiplied by K;
extended feature XoObtaining the packet expansion characteristic X with the dimension of lambda N/G multiplied by CG through dimension transformationeMultiplying the group coefficient matrix A to obtain a grouping feature matrix X; summing column vectors of the group coefficient matrix A to obtain A ', multiplying the A' by a cluster matrix C with the dimension of lambda N/G multiplied by K to obtain C ', and subtracting the C' from the grouping feature matrix X to obtain a local clustering matrix V; and finally, the local clustering matrix V is subjected to batch normalization layer and flattening, and local clustering vectors are output.
As a preferred technical scheme, the method comprises the following steps of constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination-invariant feature extraction network, wherein the method specifically comprises the following steps:
constructing a feature extraction backbone network by using a 4-layer convolution layer containing a jumper connection layer, a discarding layer and a pooling layer, wherein the input size is H multiplied by W multiplied by 3, and the output size is H multiplied by W multiplied by 3
Figure BDA0003299096500000071
Is adjusted to the flattened output
Figure BDA0003299096500000072
Then the feature X is shaped into
Figure BDA0003299096500000073
Is sent to the attention module, wherein the expansion coefficient is lambda and the number of packets isG, the number of clusters is K, and the output size is
Figure BDA0003299096500000074
And finally, outputting the classification vector through a full connection layer.
As a preferred technical scheme, the PLGF graph after data enhancement is input into an illumination invariant feature extraction network to obtain a feature vector and a classification score, the feature vector and a real label are sent into a triple loss function to obtain triple loss, the classification score and the real label obtain cross entropy loss through the cross entropy function, the triple loss and the cross entropy loss are used for supervision, an optimizer is used for updating network parameters, and a PLGF branch network model and parameters are stored after training is completed, and the method specifically comprises the following steps:
setting the batch size of the PLGF image x after input data enhancement as n, sending the PLGF image x into the illumination invariant feature extraction network f, outputting a feature vector group f (x) consisting of n feature vectors with dimension d, and obtaining triple loss through a triple loss function
Figure BDA0003299096500000075
The triplet loss function is represented as:
Figure BDA0003299096500000076
Figure BDA0003299096500000077
wherein,
Figure BDA0003299096500000078
is the anchor point(s) of the anchor,
Figure BDA0003299096500000079
is and
Figure BDA00032990965000000710
the samples of the same class are then compared to each other,
Figure BDA00032990965000000711
is and
Figure BDA00032990965000000712
the samples of the different classes of samples are,
Figure BDA00032990965000000713
represents the L2 distance, f (-) represents the feature vector output over the network f, γ is the edge coefficient;
inputting a PLGF image to the illumination invariant feature extraction network to obtain a classification vector p, obtaining cross entropy loss through a softmax function and then a cross entropy function with a real label y
Figure BDA00032990965000000714
The cross entropy function is expressed as:
Figure BDA00032990965000000715
y is a real label vector of a real label Y through One-Hot coding;
weighting and summing the cross entropy loss and the triple loss to obtain the PLGF loss
Figure BDA00032990965000000716
As a loss function of illumination invariant feature extraction network training, a specific calculation formula is as follows:
Figure BDA0003299096500000081
wherein α and β represent weights;
using SGD optimizer with momentum of mu to minimize PLGF loss LPLGFExtracting network parameters for the target updating illumination invariant features, wherein the parameter updating formula is as follows:
Figure BDA0003299096500000082
θt+1=θt+vt+1
where v is the momentum velocity, t is the number of iterations of the current training,
Figure BDA0003299096500000083
as a model parameter thetatIs the learning rate of the current generation.
Meanwhile, the learning rate is attenuated along with the training iteration times, and the learning rate updating formula is as follows:
Figure BDA0003299096500000084
where t is the iteration number of the current training, lr (t) is the corresponding learning rate at iteration t, s1Is a first threshold number of steps, s2Is a second threshold number of steps, epsilon0As an initial learning rate, s1、s2、ε0Preferred values are 200, 2000, 0.01, respectively.
The invention provides a face deception detection system based on a meta-pseudo label and an illumination invariant feature, which comprises the following steps: the system comprises a data preprocessing module, a student model and teacher model building module, a teacher learning module, a student meta-learning module, a teacher updating module, an attention module, an illumination invariant feature extraction network building module, an illumination invariant feature learning module, a verification module and a testing module;
the data preprocessing module is used for picking the face region image to obtain an RGB color channel image, randomly cutting an image block of the RGB color channel image to be trained into a labeled sample, an unlabeled sample and an enhanced unlabeled sample which are used as RGB branch training samples; carrying out illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and carrying out data enhancement to be used as a sample for PLGF branch training;
the student model and teacher model building module is used for building a student model and a teacher model, inputting RGB images and outputting classification scores;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss;
the student meta-learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters so as to obtain the meta-learning loss of students;
the teacher updating module is used for updating parameters of the teacher model by utilizing teacher semi-supervised loss, enhanced label-free loss and student meta-learning loss, iteratively updating network parameters of the student model and the teacher model by using an optimizer according to a loss function, and storing the parameters of the teacher model and the student model after training is finished;
the attention module is used for constructing an attention module based on the NeXtVLAD;
the illumination invariant feature extraction network construction module is used for constructing a feature extraction backbone network, embedding an attention module and constructing an illumination invariant feature extraction network;
the illumination invariant feature learning module is used for inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, the feature vector and a real label are sent into a triple loss function to obtain triple loss, the classification vector and the real label obtain cross entropy loss through the cross entropy function, an optimizer is used for updating illumination invariant feature extraction network parameters according to the triple loss and the cross entropy loss, and the parameters of the illumination invariant feature extraction network are stored after training is completed;
the verification module is used for respectively sending the RGB color channel map of the face of the verification set and the PLGF map obtained by illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network, respectively obtaining RGB classification scores and PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with labels, calculating a false alarm rate and a false omission rate, and taking the threshold value when the RGB classification scores and the PLGF classification scores are equal as a test judgment threshold value T;
the testing module is used for respectively sending the RGB color channel map of the face of the testing set and the PLGF map obtained through illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network to respectively obtain RGB classification scores and PLGF classification scores, then weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining a final predicted label value according to a testing judgment threshold T, and calculating a benchmark index.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) in the training stage, aiming at RGB color channel data, a semi-supervised learning network framework of 'teacher generation of pseudo labels and student feedback' is adopted and combined with meta-learning, so that on one hand, the diversity of training data is enriched by unmarked data with pseudo labels, and the problem of model training under the condition of limited label training data is solved; on the other hand, the model is trained progressively through meta-learning, the essential relation of the image block characteristics is excavated, and the learning capability of the network is improved, so that the unseen sample acquisition environment can be better generalized.
(2) In the training stage, illumination separation processing is adopted, a PLGF image is obtained from RGB color channel data, the PLGF image comprises material characteristics only related to a reflection coefficient and distinguishing clues such as deception noise and the like introduced by secondary imaging and color loss in illumination components, the influence of different illumination environments on the model performance is reduced, and the robustness of the face deception detection model can be improved through the extraction of essential characteristics by an illumination invariant characteristic extraction network with an attention module.
(3) In the detection stage, model test loading data are loaded to a student model and an illumination invariant feature extraction network to obtain corresponding RGB classification scores and PLGF classification scores, the classification scores are obtained through weighted summation, and classification results are judged according to a threshold value.
Drawings
FIG. 1 is a schematic flow chart of a face spoofing detection method based on meta-pseudo labels and illumination invariant features according to the present invention;
FIG. 2 is a schematic drawing of the PLGF of the present invention;
FIG. 3 is a schematic diagram of a network architecture of a student model and a teacher model according to the present invention;
FIG. 4 is a schematic diagram of an illumination invariant feature extraction network architecture according to the present invention;
FIG. 5 is a schematic diagram of a test flow according to the present invention;
FIG. 6 is an overall block diagram of the face spoofing detection system based on meta-pseudo labels and illumination invariant features of the present invention;
FIG. 7 is a schematic diagram of a teacher learning module of the present invention;
FIG. 8 is a diagram of a student meta-learning module of the present invention;
fig. 9 is a schematic diagram of an illumination invariant feature extraction network according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Examples
The embodiment uses the Replay-attach, CASIA-MFSD and MSU _ MFSD biopsy data sets for training and testing as examples, and the implementation process of the embodiment is described in detail. The Replay-attach data set comprises 1200 videos, real human faces from 50 testers and generated deceptive human faces are collected by using a MacBook camera with the resolution of 320 x 240 pixels, and the real human faces and the deceptive human faces are divided into a training set, a verification set and a test set according to the ratio of 3:3: 4; the CASIA-MFSD data set comprises 600 videos, real faces from 50 testers and deceptive faces generated according to the real faces are collected by three cameras with the resolutions of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels respectively, and the videos are divided into a training set and a testing set according to the ratio of 2: 3; the MSU _ MFSD data set includes 280 videos, with real faces from 35 testers and spoofed faces generated therefrom, 15 for the training set and 20 for the testing set. Since the CASIA-MFSD and MSU _ MFSD live test datasets do not contain a validation set, the present embodiment performs threshold determination using the corresponding test set as the validation set for both datasets. And then, framing the videos of the data set to obtain pictures, and randomly extracting 4000 samples from the pictures of the training set as labeled samples and 16000 samples as unlabeled samples to form the training set. The embodiment is carried out on a Linux system and is mainly implemented on the basis of a deep learning framework Pytrich1.6.1, wherein the used display cards are GTX1080Ti, CUDA version is 10.1.105 and cudnn version 7.6.4.
As shown in fig. 1, the present embodiment provides a face spoofing detection method based on meta-pseudo labels and illumination invariant features, including the following steps:
s1: picking up a face region image from an input image to obtain an RGB color channel image;
in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue.
S2: randomly cutting an image block of an RGB color channel diagram to be trained, dividing the image block into a labeled sample, an unlabeled sample and an enhanced unlabeled sample, and taking the samples as RGB branch training samples;
in this embodiment, the RGB color channel map input size is H × W, and the randomly cropped image block size is H × W
Figure BDA0003299096500000122
And randomly selecting N labeled samples and unlabeled samples from all training set samples, wherein the number ratio of the labeled samples to the unlabeled samples is mu. And then random data enhancement is carried out on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, histogram equalization, inverting, randomly rotating pixel values with the lowest bit positions of 0-4, horizontally and vertically shearing, horizontally translating, vertically translating, inverting all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for enhancement of each sample. H, W, N, mu preferred value in this example256, 4000, 4;
s3: as shown in fig. 2, an RGB color channel image to be trained is subjected to illumination separation preprocessing to obtain a PLGF image, data enhancement is performed to be used as a sample of PLGF branch training, a first line in the image is an original image after data enhancement, a second line is an image subjected to illumination separation convolution, the first and second columns are true samples, and the third and fourth columns are false samples;
in this embodiment, the specific steps include: firstly, the PLGF convolution is respectively carried out on the human face features of three color channels and a PLGF operator in the horizontal direction and the vertical direction to obtain a horizontal gradient GhorAnd a vertical gradient Gver. The PLGF convolution is specifically expressed by the following formula:
Figure BDA0003299096500000121
Figure BDA0003299096500000131
wherein f ishorAnd fverRespectively, Pattern of Local Graphical Force (PLGF): A3X 3 convolution kernel in horizontal and vertical directions for the Local gravity mode (PLGF) in the novel Local Image Descriptor. I [ x, y]Is the pixel value of coordinate (x, y), Gd[x,y]Is the directional gradient of the coordinates (x, y).
Then, according to a Lambert model, the horizontal and vertical gradients are subjected to illumination separation to obtain a horizontal illumination separation gradient ISGhorAnd vertical illumination separation gradient ISGver. The illumination separation is specifically to divide the gradient by the self pixel value of the minimum value for preventing zero division, and because the illumination intensity changes slowly to be the constant value L in a small area, the illumination component L can be eliminated to obtain the human face texture characteristic only related to the reflection coefficient, and the human face texture characteristic has abundant texture information and can be used as the effective characteristic of deception detection. The light fractionation is specifically expressed by the following formula:
Figure BDA0003299096500000132
d∈{hor,ver}
wherein, I [ x, y ] is pixel value with (x, y) coordinate, R [ x, y ] is reflection coefficient of the coordinate pixel, and L [ x, y ] is illumination intensity imaged by the coordinate pixel.
And then carrying out linear activation operation on the illumination separation gradient in the horizontal direction and the vertical direction to obtain a synthetic gradient ISG (isogenic/anagenic) which forms a PLGF (planar gradient) diagram as shown in the following formula:
Figure BDA0003299096500000133
and finally, performing data enhancement on the PLGF image, wherein the data enhancement comprises randomly horizontally overturning with the probability of 0.5, randomly selecting a region with the proportion of 1 or 2 blocks in the image region of 0.005-0.1 with the probability of 0.5, setting the pixel values of three channels as 0.4914, 0.4822 and 0.4465, and randomly selecting a region with the pixel size of 0-28 as 0.
S4: building a student model and a teacher model;
as shown in fig. 3, a teacher model and a student model having the same network structure are constructed using a residual block ResBlock based on ResNet, the input resolution is set to H × W × 3, and a size of H × W × 3 is obtained by providing 3 resblocks of convolution layers
Figure BDA0003299096500000141
The initial feature map is processed by a batch normalization layer and a global average pooling layer, discarded and flattened to obtain a feature vector with the size of 128, and finally the feature vector is sent to a full connection layer with the neuron output number of 2 to obtain a classification vector. H, W of the present embodiment has preferred values of 256, 256;
s5: constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of the unlabeled data and enhanced unlabeled loss;
in this embodiment, the specific steps include:
sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher model to obtain a labeled classification vector PT,lUnlabeled classification vector PT,uAnd enhancing the unlabeled classification vector PT,ua
Tag classification vector PT,lFirstly, obtaining the label loss through a softmax function and then through a smooth cross entropy function with the real label y
Figure BDA0003299096500000142
The smooth cross-entropy function is expressed as:
S=Y(1-α)+α/2
Figure BDA0003299096500000143
wherein, alpha represents a smoothing coefficient, the preferred value is 0.8, Y represents a real label vector of a real label Y subjected to One-Hot coding, and S represents the smoothed label vector;
unlabeled classification vector PT,uDividing by a control coefficient T, wherein the optimal value of T is 0.7, obtaining a soft label vector Z through a softmax function, and taking out a maximum value Z of the soft label vector ZmaxThe category in which the pseudo label y is constructedpMaximum value zmaxAfter a threshold t, wherein the optimal value of t is 0.6, a confidence score M is obtained through judgment, and a label-free classification vector P is enhancedT,uaAnd the soft label vector Z is multiplied by the confidence score M to obtain the label-free loss through a cross entropy function
Figure BDA0003299096500000144
Represented by the formula:
zmax=max(Z)
yp=argmax(Z)
Figure BDA0003299096500000145
Figure BDA0003299096500000146
loss of label
Figure BDA0003299096500000151
And no loss of label
Figure BDA0003299096500000152
Teacher semi-supervised loss by weighted summation
Figure BDA0003299096500000153
Figure BDA0003299096500000154
Where s is the current step number, stlLambda is the weight of no tag loss for the total number of steps;
enhanced label-free classification vector PT,uaAfter passing through the softmax function, the maximum value p is takenmaxThe category of the teacher class serves as an enhanced hard tag h, and the enhanced non-tag classification loss of the teacher is obtained through a cross entropy function
Figure BDA0003299096500000155
Represented by the following formula:
pmax=max(PT,ua)
h=argmax(PT,ua)
Figure BDA0003299096500000156
wherein H is the enhanced hard tag vector of the enhanced hard tag H subjected to One-Hot coding.
S6: constructing a student meta-learning module, and sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters to obtain the meta-learning loss of students;
in this embodiment, the specific steps include:
sending the labeled sample and the enhanced unlabeled sample into a student model to obtain a labeled classification score PS,lAnd enhancing the unlabeled classification score PS,ua
Labeled classification score PS,lFirstly, the old tag loss is obtained through a softmax function and then through a cross entropy function with the real tag y
Figure BDA0003299096500000157
Represented by the following formula:
Figure BDA0003299096500000158
wherein Y is a real label vector of a real label Y subjected to One-Hot coding;
enhancing label-free classification score PS,uaBy softmax function, then with the pseudo tag ypEnhanced label-free loss by smoothing cross-entropy function
Figure BDA0003299096500000159
Represented by the following formula:
Sp=Yp(1-α)+α/2
Figure BDA00032990965000001510
wherein α represents a smoothing coefficient, YpIndicating a false label ypOne-Hot encoded true tag vector, SpRepresenting the smoothed label vector;
followed by enhancement of label-free loss
Figure BDA0003299096500000161
Updating the student model network parameters by using a student optimizer;
sending the labeled sample into the student model with updated parameters to obtain a new labeled classification score Ps,l', first of allObtaining new label loss through softmax function and then cross entropy function with real label
Figure BDA0003299096500000162
Loss of old label
Figure BDA0003299096500000163
Make a difference to obtain the student's primary learning loss
Figure BDA0003299096500000164
S7: establishing a teacher updating module, and updating teacher model parameters by utilizing the semi-supervised loss of the teacher, the enhanced non-label loss and the student meta-learning loss;
in this embodiment, the specific steps include: teacher reinforcing label-free classification loss obtained by teacher learning module
Figure BDA0003299096500000165
The student's meta-learning loss obtained by the student's meta-learning module
Figure BDA0003299096500000166
Multiplying, and obtaining teacher semi-supervised loss by the teacher learning module
Figure BDA0003299096500000167
Adding to obtain teacher loss
Figure BDA0003299096500000168
Then according to
Figure BDA0003299096500000169
The field optimizer updates teacher model network parameters, and the teacher loss is expressed as:
Figure BDA00032990965000001610
s8: iteratively updating network parameters of the student model and the teacher model by using an optimizer according to the loss function, and storing parameters of the teacher model and the student model after training is completed;
in this embodiment, the specific steps include: the teacher model and the student model both adopt SGD optimizers with nerasterov momentum, wherein the momentum is mu, the optimal value is 0.9, and the initial value of the learning rate is epsilon0Preferably, the value is 0.05, and the parameter updating formula is as follows:
Figure BDA00032990965000001611
Figure BDA00032990965000001612
where v is the momentum velocity, t is the number of iterations of the current training,
Figure BDA00032990965000001613
as a model parameter thetatIs the current learning rate.
Meanwhile, the learning rate is attenuated along with the training iteration times, and the learning rate updating formula is as follows:
Figure BDA00032990965000001614
wherein t is the iteration number of the current training, and lr (t) is the corresponding learning rate at the t-th iteration. Preheating step number of teacher model optimizer
Figure BDA0003299096500000171
Number of waiting steps
Figure BDA0003299096500000172
Total number of steps
Figure BDA0003299096500000173
The optimal values are 1000, 0 and 64000 respectively, and the preheating step number of the student model optimizer
Figure BDA0003299096500000174
Number of waiting steps
Figure BDA0003299096500000175
Total number of steps
Figure BDA0003299096500000176
Preferred values are 1000, 3000, 64000 respectively.
Loss of label-free according to enhancements in student meta-learning module
Figure BDA0003299096500000177
Optimizing with a student model optimizer with a goal of minimizing losses; based on teacher loss in teacher update module
Figure BDA0003299096500000178
Optimization with a teacher model optimizer with the goal of minimizing losses.
S9: constructing an attention module based on the NeXtVLAD;
in this embodiment, the specific steps include:
and setting an expansion coefficient lambda, a grouping number G and a cluster number K, wherein the preferred values are 2, 7 and 7 respectively.
Taking a feature matrix I with the channel number of C and the feature dimension of N as input, and obtaining an extended feature X with the dimension of C multiplied by lambda N by using a multiple lambda dimension raising through a full connection layeroWherein preferred values for C and N are 32, 196, respectively;
extended feature XoObtaining a grouping matrix G with the dimension of CG multiplied by 1 through full connection layers and dimension transformation; extended feature XoMultiplying the group attention matrix B by a group weight matrix W with the dimension of lambda N multiplied by GK, obtaining a group attention matrix B with the dimension of CG multiplied by K through a batch normalization layer, a softmax function and dimension transformation, and then performing point multiplication on the group attention matrix B and a grouping matrix G to obtain a group coefficient matrix A with the dimension of CG multiplied by K;
extended feature XoObtaining the packet expansion characteristic X with the dimension of lambda N/G multiplied by CG through dimension transformationeMultiplying the group coefficient matrix A to obtain a grouping feature matrix X; column vectors of the set coefficient matrix aSumming to obtain A ', multiplying the A' by a cluster matrix C with the dimension of lambda N/G multiplied by K to obtain C ', and subtracting the grouping characteristic matrix X from the C' to obtain a local clustering matrix V; and finally, the local clustering matrix V is subjected to batch normalization layer and flattening, and local clustering vectors are output.
S10: constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network;
as shown in fig. 4, the specific steps include: constructing a feature extraction network by using 4 layers of convolution layers including jumping, discarding and pooling layers, setting the input size to be H multiplied by W multiplied by 3 and the output size to be H multiplied by W multiplied by 3
Figure BDA0003299096500000179
Is adjusted to the flattened output
Figure BDA00032990965000001710
Then the feature X adjusts the shape to
Figure BDA00032990965000001711
The feature matrix X' is sent to the attention module, wherein the expansion coefficient is lambda, the grouping number is G, the cluster number is K, and the output size is
Figure BDA00032990965000001712
Finally, the local clustering vectors are sent to the full connection layer with the neuron output number of 2 to output classification vectors p. In the present embodiment, preferred values of H, W, λ, G, K are 224, 2, 7, respectively.
S11: inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain feature vectors and classification vectors, and sending the feature vectors and the real labels into a triple loss function to obtain triple losses
Figure BDA00032990965000001815
Obtaining cross entropy loss by the true and false classification vectors and the true labels through a cross entropy function
Figure BDA00032990965000001816
Will cross-lose, ternaryWeighting and summing the group losses to obtain total loss, updating network parameters by using an optimizer, supervising training, and storing illumination invariant features after training to extract a network model and parameters;
in this embodiment, the specific steps include:
setting the batch size of the PLGF image x after input data enhancement as n, sending the PLGF image x into the illumination invariant feature extraction network f, outputting a feature vector group f (x) consisting of n feature vectors with dimension d, and obtaining triple loss through a triple loss function
Figure BDA0003299096500000181
The triplet loss function is represented as:
Figure BDA0003299096500000182
Figure BDA0003299096500000183
wherein,
Figure BDA0003299096500000184
is the anchor point(s) of the anchor,
Figure BDA0003299096500000185
is and
Figure BDA0003299096500000186
the samples of the same class are then compared to each other,
Figure BDA0003299096500000187
is and
Figure BDA0003299096500000188
the samples of the different classes of samples are,
Figure BDA0003299096500000189
representing the L2 distance, f (-) represents the feature vector output over the network f. γ is an edge coefficient, and is set to 0.2.
Inputting a PLGF image to the illumination invariant feature extraction network to obtain a classification vector p, obtaining cross entropy loss through a softmax function and then a cross entropy function with a real label y
Figure BDA00032990965000001810
The cross entropy function is expressed as:
Figure BDA00032990965000001811
and Y is a real label vector of the real label Y subjected to One-Hot coding.
Weighting and summing the cross entropy loss and the triple loss to obtain the PLGF loss
Figure BDA00032990965000001812
As a loss function of illumination invariant feature extraction network training, a specific calculation formula is as follows:
Figure BDA00032990965000001813
wherein the weights α and β are preferably 1 and 1, respectively.
Adopting an SGD optimizer with momentum, wherein the momentum is mu, the optimal value is 0.9, and the parameter updating formula is as follows:
Figure BDA00032990965000001814
θt+1=θt+vt+1
where v is the momentum velocity, t is the number of iterations of the current training,
Figure BDA0003299096500000191
as a model parameter thetatIs the learning rate of the current generation.
Meanwhile, the learning rate is attenuated along with the training iteration times, and the learning rate updating formula is as follows:
Figure BDA0003299096500000192
where t is the iteration number of the current training, lr (t) is the corresponding learning rate at iteration t, s1Is a first threshold number of steps, s2Is a second threshold number of steps, epsilon0As an initial learning rate, s1、s2、ε0The values were set to 200, 2000, and 0.01.
S12: determining a threshold value by using the verification set;
in this embodiment, the specific steps include: sending the face RGB color channel map of the verification set into a student model to obtain an RGB classification vector pRGBTaking out the data from the softmax function
Figure BDA0003299096500000193
As RGB classification scores; meanwhile, the RGB color channel map is subjected to illumination separation pretreatment to obtain a PLGF map, the PLGF map is sent to an illumination invariant feature extraction network to obtain a PLGF classification vector pPLGFTaking out the data from the softmax function
Figure BDA0003299096500000194
As PLGF classification scores; then weighting and summing the RGB classification scores and the PLGF classification scores to obtain a total classification score, wherein the classification score is as follows:
Figure BDA0003299096500000195
wherein alpha is 0.8 and beta is 0.2; then, sampling at equal intervals in a value range (0, 1) to obtain different judgment thresholds, obtaining a predicted label value according to the thresholds, comparing the predicted label value with a real label, calculating a false alarm rate and a missing detection rate, and taking the threshold with the same value as a test judgment threshold T;
s13: testing the model;
as shown in fig. 5, the specific steps include: sending the RGB color channel map of the face of the test set into a student model to obtain an RGB classification vector pRGBTaking out the data from the softmax function
Figure BDA0003299096500000196
As RGB classification scores; meanwhile, the RGB color channel map is subjected to illumination separation pretreatment to obtain a PLGF map, the PLGF map is sent to an illumination invariant feature extraction network to obtain a PLGF classification vector pPLGFTaking out the data from the softmax function
Figure BDA0003299096500000197
As PLGF classification scores; then weighting and summing the RGB classification scores and the PLGF classification scores to obtain a total classification score, wherein the classification score is as follows:
Figure BDA0003299096500000198
wherein alpha is 0.8 and beta is 0.2; and then, obtaining a final predicted label value according to the test judgment threshold value T, and calculating a reference index.
The performance evaluation indexes of the face spoofing detection algorithm in this embodiment adopt a False Acceptance Rate (FAR), a False Rejection Rate (FRR), a True Acceptance Rate (TAR), an Equal Error Rate (EER), a Half Error Rate (Half Total Error Rate, hter, which are described in detail in the confusion matrix of table 1:
table 1 confusion matrix table
Tagging/prediction The prediction is true Prediction of false
The label is true TA FR
The label is false FA TR
The False Acceptance Rate (FAR) refers to the ratio of the number of live faces determined by the non-live faces to the number of non-live faces labeled:
Figure BDA0003299096500000201
the False Rejection Rate (FRR) is the ratio of the number of non-live faces determined by live faces to the number of live faces labeled:
Figure BDA0003299096500000202
the correct acceptance rate (TAR) is the ratio of the number of live faces determined by the live faces to the number of live faces labeled:
Figure BDA0003299096500000203
equal Error Rate (EER) is the error rate when FRR and FAR are equal;
half error rate (HTER) is the mean of FRR and FAR:
Figure BDA0003299096500000204
in order to prove the effectiveness of the invention and test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on CASIA-MFSD, Replay-Attack and MSU-MFSD databases. The in-library and cross-library experimental results are shown in tables 2 and 3, respectively:
table 2 library of experimental results
Figure BDA0003299096500000211
TABLE 3 Cross-Bank Experimental results
Figure BDA0003299096500000212
As can be seen from Table 2, the half error rate and the equal error rate of the method in the library are both low, and the library has excellent performance of deception detection; as can be seen from table 3, the half-error rate of cross-library detection is also lower compared to the current method; compared with texture analysis methods such as LBP, HoG, SIFT, SURF and the like, the method has the advantages that the influence of scene and illumination change is reduced, the extracted illumination invariant features are kept, rich texture information reflecting reflection coefficients is reserved, and deception traces can be effectively detected; the training set is composed of labeled samples and unlabeled samples of a small number of frames extracted from each section of training set video, the diversity of training data is enriched through data augmentation and unlabeled data with pseudo labels, and the learning capability and model enlightening capability of limited sample characteristics are improved through progressive training of the model through meta-learning. The experimental results prove that under the condition that the labeled training samples are insufficient, the high accuracy in the library is ensured, the cross-library error rate is greatly reduced, and the generalization performance is obviously improved.
Example 2
As shown in fig. 6, this embodiment provides a face spoofing detection system based on meta-pseudo labels and illumination invariant features, including: the system comprises a data preprocessing module, a student model and teacher model building module, a teacher learning module, a student meta-learning module, a teacher updating module, an attention module, an illumination invariant feature extraction network building module, an illumination invariant feature learning module, a verification module and a testing module;
in this embodiment, the data preprocessing module is configured to extract a face region image to obtain an RGB color channel map, randomly cut out image blocks from the RGB color channel map to be trained, divide the image blocks into labeled samples and unlabeled samples, and perform random data enhancement on the unlabeled samples to obtain enhanced unlabeled samples, which are used as RGB branch training samples; performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and performing data enhancement to be used as a sample for PLGF branch training;
in this embodiment, the student model and teacher model building module is configured to build a student model and a teacher model, input an RGB image, and output a classification score;
as shown in fig. 7, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo-label of the unlabeled data, and enhanced unlabeled loss;
as shown in fig. 8, the student meta-learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the student meta-learning module to update the student model parameters, so as to obtain a meta-learning loss of the student;
in this embodiment, the teacher updating module is configured to update parameters of the teacher model by using teacher semi-supervised loss, enhanced unlabelled loss, and student meta-learning loss, iteratively update network parameters of the student model and the teacher model by using an optimizer according to a loss function, and store parameters of the teacher model and the student model after training is completed;
in this embodiment, the attention module is used to construct a NeXtVLAD-based attention module;
as shown in fig. 9, the illumination invariant feature extraction network construction module is used to build a feature extraction backbone network, embed an attention module, and build an illumination invariant feature extraction network;
in this embodiment, the illumination invariant feature learning module is configured to input the data-enhanced PLGF map to an illumination invariant feature extraction network to obtain a feature vector and a classification vector, send the feature vector and a real label to a triplet loss function to obtain a triplet loss, obtain a cross entropy loss by the classification vector and the real label through a cross entropy function, update an illumination invariant feature extraction network parameter with an optimizer according to the triplet loss and the cross entropy loss, and store the parameter of the illumination invariant feature extraction network after training;
in this embodiment, the verification module is configured to send a verification set face RGB color channel map and a PLGF map obtained through illumination separation preprocessing to a trained student model and an illumination invariant feature extraction network, respectively obtain RGB classification scores and PLGF classification scores, then perform weighted summation on the RGB classification scores and the PLGF classification scores to obtain a total classification score, obtain predicted label values according to different decision thresholds, compare the predicted label values with the labels, calculate a false alarm rate and a false drop rate, and take the threshold value when the two are equal as a test decision threshold T;
in this embodiment, the test module is configured to send the RGB color channel maps of the test set face and the PLGF maps obtained through illumination separation preprocessing to the trained student model and the illumination invariant feature extraction network, respectively obtain RGB classification scores and PLGF classification scores, then perform weighted summation on the RGB classification scores and the PLGF classification scores to obtain a total classification score, obtain a final predicted label value according to the test decision threshold T, and calculate a benchmark index.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A face spoofing detection method based on a meta-pseudo label and an illumination invariant feature is characterized by comprising the following steps:
picking up a face region image from an input image to obtain an RGB color channel image;
randomly cutting image blocks of an RGB color channel diagram to be trained, dividing the image blocks into labeled samples, unlabeled samples and enhanced unlabeled samples, and taking the labeled samples, the unlabeled samples and the enhanced unlabeled samples as RGB branch training samples;
performing illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and performing data enhancement to be used as a sample for PLGF branch training;
building a student model and a teacher model;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss;
constructing a student meta-learning module, and sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters to obtain the meta-learning loss of students;
establishing a teacher updating module, and updating teacher model parameters by utilizing the semi-supervised loss of the teacher, the enhanced non-label loss and the student meta-learning loss;
iteratively updating network parameters of the student model and the teacher model by using an optimizer according to the loss function, and storing parameters of the teacher model and the student model after training is completed;
constructing an attention module based on the NeXtVLAD;
constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network;
inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, sending the feature vector and a real label into a triple loss function to obtain triple loss, obtaining cross entropy loss by the classification vector and the real label through a cross entropy function, updating an illumination invariant feature extraction network parameter by an optimizer according to the triple loss and the cross entropy loss, and storing the parameter of the illumination invariant feature extraction network after training;
sending the RGB color channel map of the face of the verification set into a student model to obtain RGB classification scores, meanwhile obtaining a PLGF map through illumination separation pretreatment, sending the PLGF map into an illumination invariant feature extraction network to obtain PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and false omission rate, and taking the thresholds with the same value as a test judgment threshold T;
sending the RGB color channel map of the face of the test set into a trained student model to obtain RGB classification scores, meanwhile obtaining a PLGF map through illumination separation pretreatment, sending the RGB classification scores and the PLGF classification scores into a trained illumination invariant feature extraction network to obtain PLGF classification scores, carrying out weighted summation on the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining a final predicted label value according to a test decision threshold T, and calculating a reference index.
2. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features according to claim 1, wherein the method comprises the following specific steps of performing illumination separation preprocessing on an RGB color channel image to be trained to obtain a PLGF image, performing data enhancement to serve as a sample for PLGF branch training:
performing PLGF convolution on the facial features of the three color channels and the PLGF operator in the horizontal direction and the vertical direction respectively to obtain a horizontal gradient GhorAnd a vertical gradient Gver
According to the Lambert model, the horizontal and vertical gradients are subjected to illumination separation to obtain horizontal illumination separation gradient ISGhorAnd vertical illumination separation gradient ISGver
And carrying out linear activation operation on the illumination separation gradients in the horizontal direction and the vertical direction to obtain a synthetic gradient ISG (isogenic/genetic Algorithm) to form a PLGF (planar/geometric gradient) image.
3. The method for detecting face spoofing based on the meta-pseudo label and the illumination invariant feature of claim 1, wherein the specific steps of constructing the student model and the teacher model comprise: a teacher model and a student model with the same network structure are built by utilizing a residual block ResBlock based on ResNet, a convolution layer with 3 ResBlock, a batch normalization layer, a global average pooling layer and a full connection layer are arranged, and the full connection layer outputs classification vectors.
4. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the steps of obtaining teacher semi-supervised loss, pseudo labels of unlabeled data and enhanced unlabeled loss comprise:
feeding the labeled swatches, unlabeled swatches, and enhanced unlabeled swatches to a teacherTeacher model to obtain labeled classification vector PT,lUnlabeled classification vector PT,uAnd enhancing the unlabeled classification vector PT,ua
Tag classification vector PT,lFirstly, obtaining the label loss through a softmax function and then through a smooth cross entropy function with the real label y
Figure FDA0003299096490000031
The smooth cross-entropy function is expressed as:
S=Y(1-α)+α/2
Figure FDA0003299096490000032
wherein, alpha represents a smoothing coefficient, Y represents a real label vector of a real label Y subjected to One-Hot coding, and S represents the smoothed label vector;
unlabeled classification vector PT,uDividing by a control coefficient T to obtain a soft label vector Z through a softmax function, and taking out a maximum value Z from the soft label vector ZmaxThe category in which the pseudo label y is constructedpMaximum value zmaxAfter the threshold t, the confidence score M is obtained through judgment, and the label-free classification vector P is enhancedT,uaAnd the soft label vector Z is multiplied by the confidence score M to obtain the label-free loss through a cross entropy function
Figure FDA0003299096490000033
Represented by the formula:
zmax=max(Z)
yp=argmax(Z)
Figure FDA0003299096490000034
Figure FDA0003299096490000035
loss of label
Figure FDA0003299096490000036
And no loss of label
Figure FDA0003299096490000037
Teacher semi-supervised loss by weighted summation
Figure FDA0003299096490000038
Represented by the following formula:
Figure FDA0003299096490000039
where s is the current step number, stlLambda is the weight of no tag loss for the total number of steps;
enhanced label-free classification vector PT,uaAfter passing through the softmax function, the maximum value p is takenmaxThe category of the teacher class serves as an enhanced hard tag h, and the enhanced non-tag classification loss of the teacher is obtained through a cross entropy function
Figure FDA00032990964900000310
Represented by the following formula:
pmax=max(PT,ua)
h=argmax(PT,ua)
Figure FDA0003299096490000041
wherein H is the enhanced hard tag vector of the enhanced hard tag H subjected to One-Hot coding.
5. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the obtaining of meta-learning loss of students comprises the following specific steps:
sending the labeled sample and the enhanced unlabeled sample into a student model to obtain a labeled classification score PS,lAnd enhancing the unlabeled classification score PS,ua
Labeled classification score PS,lFirstly, the old tag loss is obtained through a softmax function and then through a cross entropy function with the real tag y
Figure FDA0003299096490000042
Represented by the following formula:
Figure FDA0003299096490000043
wherein Y is a real label vector of a real label Y subjected to One-Hot coding;
enhancing label-free classification score PS,uaBy softmax function, then with the pseudo tag ypEnhanced label-free loss by smoothing cross-entropy function
Figure FDA0003299096490000044
Represented by the following formula:
Sp=Yp(1-α)+α/2
Figure FDA0003299096490000045
wherein α represents a smoothing coefficient, YpIndicating a false label ypOne-Hot encoded true tag vector, SpRepresenting the smoothed label vector;
followed by enhancement of label-free loss
Figure FDA0003299096490000046
Updating the student model network parameters by using a student optimizer;
sending the labeled sample into the student model with updated parameters to obtain a new labeled classification score PS,l', first pass through sThe soft max function and then the real label are crossed with the entropy function to obtain the new labeled loss
Figure FDA0003299096490000047
Loss of old label
Figure FDA0003299096490000048
Make a difference to obtain the student's primary learning loss
Figure FDA0003299096490000049
6. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the teacher model parameters are updated by teacher semi-supervised loss, enhanced unlabeled loss and student meta-learning loss, and the specific steps comprise:
teacher reinforcing label-free classification loss obtained by teacher learning module
Figure FDA0003299096490000051
The student's meta-learning loss obtained by the student's meta-learning module
Figure FDA0003299096490000052
Multiplying, and obtaining teacher semi-supervised loss by the teacher learning module
Figure FDA0003299096490000053
Adding to obtain teacher loss
Figure FDA0003299096490000054
Then according to teacher loss
Figure FDA0003299096490000055
Updating teacher model network parameters and loss of teacher by optimizer
Figure FDA0003299096490000056
Expressed as:
Figure FDA0003299096490000057
7. the method for detecting face spoofing based on meta-pseudo labels and illumination invariant features as claimed in claim 1, wherein the step of constructing the attention module based on the NeXtVLAD comprises the following specific steps:
taking a feature matrix I with the channel number of C and the feature dimension of N as input, and obtaining an extended feature X with the dimension of C multiplied by lambda N by using a multiple lambda dimension raising through a full connection layero
Extended feature XoObtaining a grouping matrix G with the dimension of CG multiplied by 1 through full connection layers and dimension transformation; extended feature XoMultiplying the group attention matrix B by a group weight matrix W with the dimension of lambda N multiplied by GK, obtaining a group attention matrix B with the dimension of CG multiplied by K through a batch normalization layer, a softmax function and dimension transformation, and then performing point multiplication on the group attention matrix B and a grouping matrix G to obtain a group coefficient matrix A with the dimension of CG multiplied by K;
extended feature XoObtaining the packet expansion characteristic X with the dimension of lambda N/G multiplied by CG through dimension transformationeMultiplying the group coefficient matrix A to obtain a grouping feature matrix X; summing column vectors of the group coefficient matrix A to obtain A ', multiplying the A' by a cluster matrix C with the dimension of lambda N/G multiplied by K to obtain C ', and subtracting the C' from the grouping feature matrix X to obtain a local clustering matrix V; and finally, the local clustering matrix V is subjected to batch normalization layer and flattening, and local clustering vectors are output.
8. The method for detecting face spoofing based on the meta-pseudo label and the illumination invariant feature according to claim 1, wherein the method comprises the following specific steps of constructing a feature extraction backbone network, embedding an attention module, and constructing an illumination invariant feature extraction network:
constructing a feature extraction backbone network by using a 4-layer convolution layer containing a jumper connection layer, a discarding layer and a pooling layer, setting the input size to be H multiplied by W multiplied by 3, and outputtingHas a size of
Figure FDA0003299096490000058
Is adjusted to the flattened output
Figure FDA0003299096490000059
The feature vector of (2); then the characteristic X adjusts the shape to
Figure FDA00032990964900000510
The feature matrix X' is sent to the attention module, wherein the expansion coefficient is lambda, the grouping number is G, the cluster number is K, and the output size is
Figure FDA00032990964900000511
Finally, the local clustering vectors are sent to the full connection layer with the neuron output number of 2 to output classification vectors.
9. The method for detecting face spoofing based on meta-pseudo labels and illumination invariant features according to claim 1, wherein the PLGF map after data enhancement is input to an illumination invariant feature extraction network to obtain feature vectors and classification vectors, the feature vectors and the real labels are sent to a triplet loss function to obtain triplet losses, the classification vectors and the real labels obtain cross entropy losses through a cross entropy function, the illumination invariant feature extraction network parameters are updated by an optimizer according to the triplet losses and the cross entropy losses, and the parameters of the illumination invariant feature extraction network are saved after training is completed, the method comprising the following specific steps:
setting the batch size of the PLGF image x after input data enhancement as n, sending the PLGF image x into the illumination invariant feature extraction network f, outputting a feature vector group f (x) consisting of n feature vectors with dimension d, and obtaining triple loss through a triple loss function
Figure FDA0003299096490000061
The triplet loss function is represented as:
Figure FDA0003299096490000062
wherein,
Figure FDA0003299096490000063
is the anchor point(s) of the anchor,
Figure FDA0003299096490000064
is and
Figure FDA0003299096490000065
the samples of the same class are then compared to each other,
Figure FDA0003299096490000066
is and
Figure FDA0003299096490000067
the samples of the different classes of samples are,
Figure FDA0003299096490000068
represents the L2 distance, f (-) represents the feature vector output over the network f, γ is the edge coefficient;
inputting a PLGF image to the illumination invariant feature extraction network to obtain a classification vector P, obtaining cross entropy loss through a softmax function and then a cross entropy function with a real label y
Figure FDA0003299096490000069
The cross entropy function is expressed as:
Figure FDA00032990964900000610
y is a real label vector of a real label Y through One-Hot coding;
weighting and summing the cross entropy loss and the triple loss to obtain the PLGF loss
Figure FDA00032990964900000611
As a loss function of illumination invariant feature extraction network training, a specific calculation formula is as follows:
Figure FDA00032990964900000612
wherein α and β represent weights;
using SGD optimizers with momentums of mu to minimize PLGF losses
Figure FDA00032990964900000613
And extracting network parameters for the target updating illumination invariant features.
10. A face spoofing detection system based on meta-pseudo labels and illumination invariant features, comprising: the system comprises a data preprocessing module, a student model and teacher model building module, a teacher learning module, a student meta-learning module, a teacher updating module, an attention module, an illumination invariant feature extraction network building module, an illumination invariant feature learning module, a verification module and a testing module;
the data preprocessing module is used for picking up a face region image to obtain an RGB color channel map, randomly cutting an image block of the RGB color channel map to be trained into a labeled sample, an unlabeled sample and an enhanced unlabeled sample which are used as RGB branch training samples; carrying out illumination separation pretreatment on an RGB color channel image to be trained to obtain a PLGF image, and carrying out data enhancement to be used as a sample for PLGF branch training;
the student model and teacher model building module is used for building a student model and a teacher model, inputting RGB images and outputting classification scores;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss;
the student meta-learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples into the student meta-learning module to update student model parameters so as to obtain the meta-learning loss of students;
the teacher updating module is used for updating parameters of a teacher model by utilizing teacher semi-supervised loss, enhanced label-free loss and student meta-learning loss, iteratively updating network parameters of the student model and the teacher model by using an optimizer according to a loss function, and storing the parameters of the teacher model and the student model after training is finished;
the attention module is used for constructing an attention module based on the NeXtVLAD;
the illumination invariant feature extraction network construction module is used for constructing a feature extraction backbone network, embedding an attention module and constructing an illumination invariant feature extraction network;
the illumination invariant feature learning module is used for inputting the data-enhanced PLGF image into an illumination invariant feature extraction network to obtain a feature vector and a classification vector, the feature vector and a real label are sent into a triple loss function to obtain triple loss, the classification vector and the real label obtain cross entropy loss through the cross entropy function, the triple loss and the cross entropy loss are used for supervision, an optimizer is used for updating network parameters, and the parameters of the illumination invariant feature extraction network are stored after training is completed;
the verification module is used for respectively sending the RGB color channel map of the face of the verification set and the PLGF map obtained by illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network, respectively obtaining RGB classification scores and PLGF classification scores, weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and false omission rate, and taking the threshold value when the two are equal as a test judgment threshold value T;
the testing module is used for respectively sending the RGB color channel map of the face of the testing set and the PLGF map obtained through illumination separation pretreatment into a trained student model and an illumination invariant feature extraction network to respectively obtain RGB classification scores and PLGF classification scores, then weighting and summing the RGB classification scores and the PLGF classification scores to obtain total classification scores, obtaining a final predicted label value according to a testing judgment threshold T, and calculating a benchmark index.
CN202111185654.2A 2021-10-12 2021-10-12 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature Pending CN114067444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111185654.2A CN114067444A (en) 2021-10-12 2021-10-12 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111185654.2A CN114067444A (en) 2021-10-12 2021-10-12 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Publications (1)

Publication Number Publication Date
CN114067444A true CN114067444A (en) 2022-02-18

Family

ID=80234484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111185654.2A Pending CN114067444A (en) 2021-10-12 2021-10-12 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Country Status (1)

Country Link
CN (1) CN114067444A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663986A (en) * 2022-03-31 2022-06-24 华南理工大学 In-vivo detection method and system based on double-decoupling generation and semi-supervised learning
CN114676931A (en) * 2022-04-12 2022-06-28 国网江苏省电力有限公司泰州供电分公司 Electric quantity prediction system based on data relay technology
CN114724220A (en) * 2022-04-12 2022-07-08 广州广电卓识智能科技有限公司 Living body detection method, living body detection device, and readable medium
CN115273186A (en) * 2022-07-18 2022-11-01 中国人民警察大学 Depth-forged face video detection method and system based on image feature fusion
CN116563642A (en) * 2023-05-30 2023-08-08 智慧眼科技股份有限公司 Image classification model credible training and image classification method, device and equipment
WO2023202596A1 (en) * 2022-04-19 2023-10-26 华为技术有限公司 Semi-supervised model training method and system, and related device
WO2024016949A1 (en) * 2022-07-20 2024-01-25 马上消费金融股份有限公司 Label generation method and apparatus, image classification model method and apparatus, and image classification method and apparatus

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663986A (en) * 2022-03-31 2022-06-24 华南理工大学 In-vivo detection method and system based on double-decoupling generation and semi-supervised learning
CN114676931A (en) * 2022-04-12 2022-06-28 国网江苏省电力有限公司泰州供电分公司 Electric quantity prediction system based on data relay technology
CN114724220A (en) * 2022-04-12 2022-07-08 广州广电卓识智能科技有限公司 Living body detection method, living body detection device, and readable medium
CN114676931B (en) * 2022-04-12 2024-02-20 国网江苏省电力有限公司泰州供电分公司 Electric quantity prediction system based on data center technology
WO2023202596A1 (en) * 2022-04-19 2023-10-26 华为技术有限公司 Semi-supervised model training method and system, and related device
CN115273186A (en) * 2022-07-18 2022-11-01 中国人民警察大学 Depth-forged face video detection method and system based on image feature fusion
WO2024016949A1 (en) * 2022-07-20 2024-01-25 马上消费金融股份有限公司 Label generation method and apparatus, image classification model method and apparatus, and image classification method and apparatus
CN116563642A (en) * 2023-05-30 2023-08-08 智慧眼科技股份有限公司 Image classification model credible training and image classification method, device and equipment
CN116563642B (en) * 2023-05-30 2024-02-27 智慧眼科技股份有限公司 Image classification model credible training and image classification method, device and equipment

Similar Documents

Publication Publication Date Title
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
CN111259850B (en) Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN107153817B (en) Pedestrian re-identification data labeling method and device
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN111931758B (en) Face recognition method and device combining facial veins
CN114783003B (en) Pedestrian re-identification method and device based on local feature attention
CN112580576B (en) Face spoofing detection method and system based on multi-scale illumination invariance texture characteristics
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN109800629A (en) A kind of Remote Sensing Target detection method based on convolutional neural networks
CN110853074B (en) Video target detection network system for enhancing targets by utilizing optical flow
CN113221655B (en) Face spoofing detection method based on feature space constraint
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN114663986B (en) Living body detection method and system based on double decoupling generation and semi-supervised learning
CN112308883A (en) Multi-ship fusion tracking method based on visible light and infrared images
CN112668557A (en) Method for defending image noise attack in pedestrian re-identification system
CN112364791A (en) Pedestrian re-identification method and system based on generation of confrontation network
CN111507416B (en) Smoking behavior real-time detection method based on deep learning
CN117152625A (en) Remote sensing small target identification method, system, equipment and medium based on CoordConv and Yolov5
CN114550268A (en) Depth-forged video detection method utilizing space-time characteristics
CN110728214B (en) Weak and small figure target detection method based on scale matching
CN111553337A (en) Hyperspectral multi-target detection method based on improved anchor frame
CN113468954B (en) Face counterfeiting detection method based on local area features under multiple channels
CN107679467B (en) Pedestrian re-identification algorithm implementation method based on HSV and SDALF

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination