CN114663986A - In-vivo detection method and system based on double-decoupling generation and semi-supervised learning - Google Patents
In-vivo detection method and system based on double-decoupling generation and semi-supervised learning Download PDFInfo
- Publication number
- CN114663986A CN114663986A CN202210329816.3A CN202210329816A CN114663986A CN 114663986 A CN114663986 A CN 114663986A CN 202210329816 A CN202210329816 A CN 202210329816A CN 114663986 A CN114663986 A CN 114663986A
- Authority
- CN
- China
- Prior art keywords
- prosthesis
- teacher
- loss
- detector
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 37
- 238000001727 in vivo Methods 0.000 title claims abstract description 16
- 239000013598 vector Substances 0.000 claims abstract description 167
- 238000012360 testing method Methods 0.000 claims abstract description 34
- 230000006870 function Effects 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000012795 verification Methods 0.000 claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 39
- 238000005070 sampling Methods 0.000 claims description 21
- 238000009826 distribution Methods 0.000 claims description 17
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 8
- 238000002372 labelling Methods 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 5
- 230000002708 enhancing effect Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000009977 dual effect Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 4
- 230000001815 facial effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a living body detection method and a living body detection system based on double decoupling generation and semi-supervised learning, wherein the method comprises the following steps: preprocessing data to obtain an original sample of the RGB color channel image, and pairing the same identity to obtain a true and false image pair; the real person encoder outputs a real person identity vector, the prosthesis encoder outputs a prosthesis identity vector and a prosthesis mode vector, the three are combined and sent to the decoder to obtain a reconstructed true and false image pair, a double decoupling generation loss function is constructed, and noise is sent to the trained decoder to obtain a generated sample; constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample for the original sample and the generated sample, sending the labeled sample, the non-labeled sample and the enhanced non-labeled sample to a teacher learning module to obtain semi-supervised loss of a teacher, a pseudo label of the non-labeled sample and enhanced non-labeled loss, and updating network parameters of a detector and the teacher; determining a threshold value by using the verification set; and loading the test data to a detector to obtain a classification score, and judging a classification result according to a threshold value. The invention can improve the robustness of the in-vivo detection model.
Description
Technical Field
The invention relates to the technical field of face recognition anti-cheating detection, in particular to a living body detection method and system based on double decoupling generation and semi-supervised learning.
Background
Today, there is a dramatic increase in the business and industry in the context of using facial biometric technology, for example, facial unlocking technology may be used to protect personal privacy in electronic devices, and facial biometric technology may be used to authenticate payments. However, using the face as a biometric feature for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally classified into four categories: 1) the photo attack is that an attacker deceives the authentication system by using a photo printed or displayed on a screen; 2) video replay attack, wherein an attacker utilizes a video deception authentication system of an attacker shot in advance; 3) the face mask attacks, and an attacker wears a face mask deception system elaborately manufactured according to the attacked; 4) and against sample attack, an attacker generates specific sample noise through a GAN network to interfere the face authentication system to generate wrong directional identity verification. These face spoofing attacks are not only low cost but also can spoof the system, severely impacting and threatening the application of the face recognition system.
The in-vivo detection plays a crucial role in preventing a face recognition system from being attacked by a prosthesis, and benefits from the strong feature extraction capability of a deep learning network, the in-vivo detection algorithm based on deep learning obtains better performance than the in-vivo detection algorithm based on the traditional manual feature algorithm, although most in-vivo detection algorithms based on deep learning can achieve good detection effect in a library, the cross-library detection performance is poor, and the main reason is that data inside and outside the library are collected under different conditions, for example, shooting equipment, ambient light and attack presentation equipment are different, so that the data inside and outside the library are distributed differently, and domain displacement exists between the two. When the diversity of the training data is insufficient, overfitting is easy to occur in the in-library learning process, and the cross-library generalization performance is poor. Although the cause can be judged, the problem is not easy to solve in the real world application process, and the living body detection model is difficult to collect labeled training samples in all scenes, so that most of the existing anti-spoofing data sets at present have insufficient diversity. For example, the common CASIA, Replay-attach and msu shooting devices are respectively 3, 2 and 2, and the shooting backgrounds are respectively 3, 2 and 1.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which is characterized in that living body/prosthesis characteristics are modeled through decoupling learning, a generated sample is synthesized in a potential space to expand a data set, the diversity of training data is improved, meanwhile, the high discrimination of the generated sample is ensured, and the influence of generated noise on model learning is reduced.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which comprises the following steps:
picking up a face region image from an input image to obtain an RGB color channel image;
matching each true image of an original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
constructing a real person encoder, a prosthesis encoder and a decoder;
sending the true image into a true encoder to obtain a true identity vector, sending the matched false image into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the true identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined true identity vector, the prosthesis identity vector and the prosthesis mode vector into a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function to optimize;
inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;
cutting a training set consisting of an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;
updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss;
iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing detection rate, and taking the thresholds with the same value as a test judgment threshold;
and sending the RGB color channel map of the face of the test set into a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.
As a preferred technical solution, the construction of the real-person encoder, the prosthetic encoder, and the decoder specifically includes the steps of:
the backbone networks of the real person encoder and the prosthesis encoder adopt the same network structure, and are provided with a convolutional layer, an example normalization layer, a LeakyReLU and five groups of units consisting of residual blocks which are stacked by the pooling layer and the convolutional layer and are connected in a skipping level mode;
the real person encoder outputs a hidden layer real person vector through the full connecting layer, and the prosthesis encoder outputs a hidden layer prosthesis vector through the full connecting layer;
the decoder backbone network is built by a convolutional layer, an upper sampling layer, a Sigmoid active layer and a residual block, hidden layer real man vectors and hidden layer prosthesis vectors output by a real man encoder and a prosthesis encoder are input into a full connection layer firstly, pass through six groups of units formed by residual blocks and upper sampling layers which are stacked by the convolutional layer and connected in a skipping stage, and finally pass through the convolutional layer to output an image pair.
As a preferred technical scheme, the real image is sent into a real person encoder to obtain a real person identity vector, and the matched false image is sent into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, and the method specifically comprises the following steps:
the human decoder outputs human hidden vectors including human identity vector meanAnd true person identity vector variance
The prosthesis encoder outputs a prosthesis hidden vector comprising a prosthesis identity vector meanVariance of identity vector of prosthesisMean of prosthesis mode vectorVariance of prosthesis mode vector
Sampling real person identity variation component from standard normal distributionIdentity variation of prosthesisAnd a prosthetic mode variation component
Performing repeated parameter operation to respectively obtain real person identity vectorsProsthesis identity vectorAnd a prosthesis mode vectorThe concrete expression is as follows:
vector true person identityProsthesis identity vectorAnd a prosthesis mode vectorMerging into a hidden vector, inputting to a decoder, outputtingGenerating a reconstructed true-false image pair, including a reconstructed true-human imageAnd reconstructing the prosthesis image
As a preferred technical scheme, the method comprises the steps of merging a human identity vector, a prosthesis identity vector and a prosthesis mode vector, sending the merged human identity vector, prosthesis identity vector and prosthesis mode vector to a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function for optimization, wherein the method comprises the following specific steps:
vector of prosthesis modeSending into the full connection layer fc (·), and outputting the prosthesis category vectorAnd with the actual prosthesis class label ysCalculating cross entropy to obtain classification lossExpressed as:
wherein k is the number of prosthesis classes, YsLabeling vectors for the one-hot coded prosthesis class;
identity vector in prosthesisAnd a prosthesis mode vectorAdding angular quadrature constraints to obtain quadrature lossesExpressed as:
wherein <, > represents the inner product;
for reconstructing real person imageReconstructing a prosthetic imageCalculating reconstruction losses with corresponding original imagesExpressed as:
to real person identity vectorProsthesis identity vectorCalculating the maximum average differential lossExpressed as:
for reconstructing real person imageAnd reconstructing the prosthesis imageCalculating pairing lossExpressed as:
constrained by calculating the KL divergence, as shown in the following equation:
wherein ,λ1、λ2、λ3、λ4Represents a corresponding weight value;
employing Adam optimizer to minimize double decoupling generator loss functionA real-man encoder, a prosthetic encoder and a decoder are optimized for the target.
As a preferred technical solution, the method for obtaining the generated samples from the standard normal distribution sampling noise input to the trained decoder comprises the following specific steps:
let the hidden vector dimension be n, from a standard normal distributionSampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectorsAnd reconstructing the prosthesis pattern hidden vector
Reconstructing a real identity hidden vectorConcealing vectors directly from reconstructed prosthesis identityThe true and false image pairs are generated through a decoder as the generation samples.
The detector and the teacher network have the same network structure, and are provided with a convolutional layer, a batch normalization layer, a ReLU, three units consisting of residual blocks formed by convolutional layer stacking and skip level connection, a global average pooling layer and a full connection layer, and the full connection layer outputs a classification vector.
As a preferred technical scheme, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss, and the method specifically comprises the following steps:
labeled exemplars { x }l,ylThe input parameter is thetaTThe teacher network T outputs a teacher labeled prediction result T (x)l;θT) And with the genuine label ylCalculating cross entropy to obtain teacher labeled lossThe concrete expression is as follows:
where CE represents cross entropy loss;
unlabeled sample xuAnd enhanced unlabeled samplesRespectively inputting the teacher network T to obtain a teacher unlabelled prediction result T (x)u;θT) And teacher enhanced unlabeled prediction resultsCalculating the cross entropy loss of the two to obtain the label-free loss of the teacherThe concrete expression is as follows:
predicting result T (x) from teacher without labelu;θT) The category to which the maximum value belongs is taken out as a false labelThe concrete expression is as follows:
lose teachers with labelsNo label loss with teacherWeighted summation to obtain semi-supervised loss of teacherExpressed as:
wherein s is the current step number stlLambda is the weight of no tag loss for the total number of steps;
enhancing unlabeled prediction results for teachersObtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongsExpressed as:
as a preferred technical solution, the method for updating the detector parameter by sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample to the detector learning module to obtain the detector update loss includes the following specific steps:
the reinforced label-free sampleThe feeding parameter is thetaDDetector D to obtain detector enhanced label-free prediction resultsAnd a pseudo tagCalculating cross entropy to obtain detector enhanced label-free lossThe concrete expression is as follows:
loss of label for detectorOptimizing the detector by using a gradient descent method to obtain an optimized parameter theta'DIs specifically represented as:
labeled sample (x)l,yl) Respectively fed into the optimization parameters of thetaDIs theta 'and the optimized parameter is'DObtaining an old detector with a loss of labelAnd loss of label to the new detectorThen the two are differenced to obtain the update loss of the detectorThe concrete expression is as follows:
as a preferred technical scheme, the method for updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector update loss comprises the following specific steps:
loss of detector updatesEnhancing label-free loss with teachersMultiply and then partially monitor the loss with the teacherAdd up to form teacher's lossThe teacher network is optimized by a gradient descent method, and the method is represented as follows:
wherein ,θTRepresenting a parameter, theta ', before optimization of the teacher network'TRepresenting the teacher's network optimized parameter, ηTThe network learning rate of the teacher is shown,representing the gradient calculation.
The invention also provides a living body detection system based on double decoupling generation and semi-supervised learning, which comprises: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;
the data preprocessing module is used for matting human face region images from input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
the double decoupling generator building module is used for building a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the double decoupling generator training module is used for sending the real image into the real encoder to obtain a real person identity vector, sending the matched false image into the prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the real person identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined real person identity vector, the prosthesis identity vector and the prosthesis mode vector into the decoder to output a reconstructed real and false image pair, and constructing a double decoupling generation loss function for optimization;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the unsupervised data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
the detector learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples to the detector learning module to update the detector parameters to obtain the update loss of the detector;
the network parameter updating module is used for updating teacher network parameters by utilizing teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss, iteratively updating parameters of the detector and the teacher network by using an optimizer according to a loss function, and storing the parameters of the teacher network and the detector after training is finished;
the verification module is used for sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing rate, and taking the threshold with the same value as a test judgment threshold;
the test module is used for sending the RGB color channel map of the face of the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) in the data generation stage, the invention adopts a real-person encoder, a prosthesis encoder and a decoder to construct a double-decoupling generator, inputs a true-false image pair of an original sample to train the double-decoupling generator, then inputs standard normal distribution sampling noise to the trained decoder in the double-decoupling generator to obtain a generated sample, and the generated sample is used as a part of unlabeled data to enrich the diversity of training data and solve the problem of insufficient diversity of the training data.
(2) In the training stage, a semi-supervised learning framework that a teacher generates pseudo labels and a detector feeds back is adopted, specifically, a teacher network provides the detector with the pseudo labels without label data to supervise the detector learning, the detector parameters are updated and then the performance is evaluated on the labeled data, and the generated pseudo labels are optimized by feeding back loss to the teacher network, so that the problems of model training under the condition of limited labeled training data and label uncertainty caused by fuzzy generated samples due to the defects of a variational self-encoder are solved; and high-discrimination characteristics of the image blocks are mined, and the learning capability of the network is improved, so that unseen sample acquisition environments can be better generalized.
(3) In the detection stage, the model tests and loads data to the detector to obtain corresponding classification scores, and the classification result is judged according to the threshold value.
Drawings
FIG. 1 is a schematic flow chart of the in-vivo detection method based on double-decoupling generation and semi-supervised learning according to the present invention;
FIG. 2 is a schematic diagram of a network structure of a real human encoder and a prosthesis encoder according to the present invention;
FIG. 3 is a schematic diagram of a network structure of a decoder according to the present invention;
FIG. 4 is a schematic diagram of the training phase of the dual de-coupling generator of the present invention;
FIG. 5 is a schematic flow chart of a generation phase of the dual decoupling generator of the present invention;
FIG. 6(a) is a schematic diagram of a live image generated by a dual decoupling generator of the present invention;
FIG. 6(b) is a schematic diagram of an image of a prosthesis generated by a dual decoupling generator of the present invention;
FIG. 7 is a schematic diagram of a network architecture of a detector and teacher network according to the present invention;
FIG. 8 is an overall framework diagram of semi-supervised learning of the present invention;
FIG. 9 is a block diagram of the living body detection system based on the dual decoupling generation and the semi-supervised learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment uses the Replay-attach, CASIA-MFSD and MSU _ MFSD biopsy data sets for training and testing as examples, and the implementation process of the embodiment is described in detail. The Replay-attach data set comprises 1200 videos, real human faces from 50 testers and generated deceptive human faces are collected by using a MacBook camera with the resolution of 320 x 240 pixels, and the real human faces and the deceptive human faces are divided into a training set, a verification set and a test set according to the ratio of 3:3: 4; the CASIA-MFSD data set comprises 600 videos, real faces from 50 testers and deceptive faces generated according to the real faces are collected by three cameras with the resolutions of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels respectively, and the videos are divided into a training set and a testing set according to the ratio of 2: 3; the MSU _ MFSD data set includes 280 videos, with real faces from 35 testers and spoofed faces generated therefrom, 15 for the training set and 20 for the testing set. Since the CASIA-MFSD and MSU _ MFSD live test datasets do not contain a validation set, the present embodiment performs threshold determination using the corresponding test set as the validation set for both datasets. And then framing the video of the data set to obtain a picture. The embodiment is carried out on a Linux system and is mainly implemented on the basis of a deep learning framework Pytrich1.6.1, and the used display cards are GTX1080Ti, CUDA version 10.1.105 and cudnn version 7.6.4.
As shown in fig. 1, the present embodiment provides a living body detection method based on double-decoupling generation and semi-supervised learning, including the following steps:
s1: picking up a face region image from an input image to obtain an RGB color channel image;
in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue. Then, matching each true image of the original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
s2: constructing a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
in this embodiment, as shown in fig. 2, the respective backbone networks of the real encoder and the prosthetic encoder have the same network structure, and the input size is H × W × 3, and the input size is H × W × 32 after passing through the convolutional layer, the example normalization layer, the leakage relu, and five groups of pooling layers,The unit composed of residual blocks with the convolution layer stacking and jump level connection outputs 1/2, 1/4, 1/8, 1/16 and 1/32 with the original sizes respectively, the channel numbers are 64, 128, 256, 512 and 512 respectively, and the obtained unit has the size ofThe feature vector of (2). After the output characteristics of the backbone network are obtained, the real man encoder outputs hidden layer real man vectors with the size of 2 x hdim through the full connection layer; the prosthesis encoder outputs a hidden layer prosthesis vector with a size of 4 × hdim through the full connected layer.
As shown in fig. 3, the decoder backbone network is built with convolutional layers, upsampling layers, Sigmoid active layers, and residual blocks. The input size is 3 × hdim, and first the input full link layer size is changed toThen, through six groups of units consisting of residual blocks and up-sampling layers which are stacked and connected in a jump stage by convolutional layers, 1/32, 1/16, 1/8, 1/4, 1/2 and 1 with the original sizes are output, the channel numbers are 512, 256, 128 and 64 respectively, and finally, through convolutional layers, an image pair with the size of H multiplied by W multiplied by 6 is output.
S3: constructing a double decoupling generation module, wherein the double decoupling generation module consists of a double decoupling generator and a double decoupling generator loss function, the double decoupling generator consists of a real person encoder, a prosthesis encoder and a decoder, a real image is sent into the real person encoder to obtain a real person identity vector, a matched false image is sent into the prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, the three are combined and sent into the decoder to output a reconstructed real and false image pair, and the double decoupling generation loss function is constructed to optimize;
as shown in fig. 4, a true image and a false image, each having dimensions H × W × 3, are input to the true human decoder and the false human encoder, respectively. Setting hidden vector dimension as n, real person decoder obtains real person hidden vector with dimension 2 Xn including real person identity vector mean value with dimension nAnd variance of identity vector of real personThe prosthesis encoder obtains a prosthesis hidden vector with dimension 4 x n, which comprises a prosthesis identity vector mean with dimension nVariance of identity vector of prosthesisMean of prosthesis mode vectorVariance of prosthesis mode vectorThen sampling the real person identity variation component from the standard normal distributionIdentity variation of prosthesisAnd a prosthesis mode variation componentCarrying out the heavy parameter operation shown as the following formula to respectively obtain the identity vectors of real personsProsthesis identity vectorAnd a prosthesis mode vector
Real person identity vectorProsthesis identity vectorAnd a prosthesis mode vectorCombining into hidden vectors with dimension of 3 Xn, inputting into decoder, outputting reconstructed true and false image pairs with dimension of H × W × 6, wherein the reconstructed true and false image pairs include reconstructed true human image with dimension of H × W × 3And reconstructing the prosthesis image
To better learn the prosthesis patterns of different attack patterns, a prosthesis pattern vector is usedSending into the full connection layer fc (·), and outputting the prosthesis category vectorAnd with the actual prosthesis class label ysCalculating cross entropy to obtain classification lossAs shown in the following formula:
wherein k is the number of prosthesis classes, YsThe vectors are labeled for the one-hot coded prosthesis class.
To make the prosthesis identity vectorAnd a prosthesis mode vectorEffective separation is carried out, and the orthogonal loss is obtained by adding angle orthogonal constraint between the twoAs shown in the following formula:
wherein <, > represents the inner product.
For reconstructing real person imageAnd reconstructing the prosthesis imageCalculating reconstruction losses with corresponding original imagesAs shown in the following formula:
for keeping identity consistency of hidden vectors, a true person identity vector with dimension n is subjectedProsthesis identity vectorCalculating the maximum average differential lossAs shown in the following formula:
in order to maintain identity consistency of reconstructed images, reconstructed real person imagesAnd reconstructing the prosthesis imageCalculating pairing lossAs shown in the following formula:
to fit the distribution of the hidden vectors to a standard normal distribution, the constraint is performed by calculating the Kullback-Leibler divergence (KL divergence), as shown in the following equation:
Finally, weighting and summing the losses to obtain a loss function of the double decoupling generatorAs shown in the following formula:
wherein λ1、λ2、λ3、λ4The weights are weighted, and the optimal values are 10, 0.5, 0.1 and 1, respectively.
Employing Adam optimizer to minimize double decoupling generator loss functionAnd (3) optimizing the real-person encoder, the prosthesis encoder and the decoder for the target, and iteratively training T generations, wherein the optimal value of T is 200. The parameter update formula is as follows:
mt+1=β1mt+(1-β1)g
vt+1=β2vt+(1-β2)g2
wherein β1,β2Optimally set to 0.9, 0.999, the parameter epsilon preventing divide by zero is optimally set to 1e-8, the learning rate eta is optimally set to 0.0002, thetatRepresenting the parameters of the t-th iteration.
S4: inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;
as shown in FIG. 5, FIG. 6(a) and FIG. 6(b), the hidden vector dimension is n and is normally distributed from the normSampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectorsAnd reconstructing the prosthesis pattern hidden vectorIn order to ensure identity consistency of true and false images, a human identity hidden vector is reconstructedConcealing vectors directly from reconstructed prosthesis identityAfter copying, the true and false image pairs can be generated by the decoder as the generation samples, as shown in fig. 6(a) and 6(b), the true and false image pairs generated by the dual decoupling generator are obtained, the number n of the dimension in this embodiment is optimally 128, and the number of the generation samples is optimally 6400.
S5: cutting a training set consisting of an original sample and a generated sample, dividing the training set into a labeled sample and a non-labeled sample, and performing random data enhancement on the non-labeled sample to obtain an enhanced non-labeled sample;
in this embodiment, first, a piece of RGB color channel map of size H × W to be trained is randomly cut out to have a size of H × WAnd in the image block, N original samples to be trained are randomly selected from the original samples to be trained to serve as labeled data, the rest samples and the generated samples jointly form unlabeled data, and the number ratio of the labeled samples to the unlabeled samples is mu. And then random data enhancement is carried out on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, histogram equalization, inverting, randomly rotating pixel values with the lowest bit positions of 0-4, horizontally and vertically shearing, horizontally translating, vertically translating, inverting all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for enhancement of each sample. The preferred values of H, W, N and mu in the embodiment are 256, 6000 and 4;
s6: constructing a detector and teacher network;
as shown in fig. 7, the teacher network and the detector have the same network structure, and the input size is H × W × 3, and the output size is 1, 1/2, 1/4 of the original size through the convolutional layer, the batch normalization layer, the ReLU to H × W × 16, and the unit composed of three sets of residual blocks stacked and connected by skip level, respectivelyThe number of tracks is 32, 64, 128, respectively, giving a size ofThe feature vector is changed into 1 multiplied by 128 through the size of the global average pooling layer, and finally a classification vector with dimension of 2 is output through the full connection layer.
S7: a teacher learning module is constructed, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to the teacher learning module, and the semi-supervised loss of the teacher, the pseudo label of the unlabeled data and the enhanced unlabeled loss are obtained:
as shown in FIG. 8, first agree that CE (q, p) represents the cross entropy loss of two distributions q and p; if p is a label value, the one-hot encoding is firstly carried out to change the pseudo label vector. Let k be the number of label classes, the cross entropy loss is expressed as:
secondly, appointing argmax (v) to represent the index where the maximum value in the vector v is taken out;
for labeled samples xl,ylSending a parameter thetaTThe teacher network T outputs a teacher labeled prediction result T (x)l;θT) And with the genuine label ylCalculating cross entropy to obtain teacher labeled lossRepresented by the following formula.
For unlabeled sample xuAnd the enhanced unlabeled sample obtained by one-time data enhancementRespectively sent to a teacher network T to obtain a teacher label-free prediction knotFruit T (x)u;θT) And teacher enhanced unlabeled prediction resultsCalculating the cross entropy loss of the two results to obtain the teacher label-free lossTeacher unlabeled prediction result T (x)u;θT) The category to which the maximum value belongs is taken out as a false labelRepresented by the following formula:
then the teacher is marked for lossNo label loss with teacherWeighted summation to obtain semi-supervised loss of teacherRepresented by the following formula.
Where s is the current step number, stlλ is the weight of the label-free penalty for the total number of steps.
Enhancing unlabeled prediction results for teachersObtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongsRepresented by the following formula.
S8: constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;
as shown in fig. 8, the enhanced unlabeled specimenThe feeding parameter is thetaDDetector D to obtain detector enhanced label-free prediction resultsAnd a pseudo tagCalculating cross entropy to obtain detector enhanced label-free lossRepresented by the following formula:
let the learning rate of the detector be etaDAccording to detector loss without labelOptimizing the detector by using a gradient descent method to obtain an optimized parameter theta'DThe detector of (a), is represented by the following formula:
will have a sample (x) labeledl,yl) Respectively fed into the optimization parameters of thetaDIs θ 'and the optimized parameter'DObtaining an old detector with a loss of labelAnd loss of label to the new detectorThen the difference is made between the two to obtain the update loss of the detectorRepresented by the following formula.
S9: updating teacher network parameters by utilizing teacher semi-supervision loss, enhanced label-free loss and detector updating loss:
as shown in FIG. 8, loss of detector updateStrengthen no label with teacherLoss of powerMultiply and then with teacher's semi-supervised lossAdd up to form teacher's lossLet the teacher's network learning rate be etaTAnd optimizing the teacher network by using a gradient descent method, wherein the formula is as follows:
s10: iteratively updating the network parameters of the detector and the teacher network by using an optimizer according to the loss function, storing the parameters of the teacher network and the detector after training is completed,represents the gradient calculation:
in this embodiment, both the teacher network and the detector use SGD optimizers with nersterov momentum, where the momentum is μ, the preferred value is 0.9, and the initial value of the learning rate is ε0The optimal value is 0.05, and the learning rate is attenuated along with the training iteration times;
enhancing label-free loss from detectors in a detector learning moduleOptimizing with a detector optimizer with the goal of minimizing losses; based on teacher loss in teacher update moduleOptimizer using teacher networkOptimization is performed with the goal of minimizing losses.
S11: determining a threshold value by using the verification set;
in this embodiment, the specific steps include: sending the face RGB color channel map of the verification set into a detector to obtain a classification score p, then carrying out equal-interval sampling in a value range (0,1) to obtain different judgment thresholds, obtaining a predicted label value according to the threshold, comparing the predicted label value with a real label, calculating a false alarm rate and a false omission rate, and taking the threshold with the same value as a test judgment threshold T;
s12: testing the model;
in this embodiment, the specific steps include: and sending the RGB color channel map of the face in the test set into a detector to obtain a classification score p, then obtaining a final predicted label value according to a test judgment threshold T, and calculating a reference index.
The performance evaluation indexes of the living body detection algorithm in this embodiment adopt a False Acceptance Rate (FAR), a False Rejection Rate (FRR), a True Acceptance Rate (TAR), an Equal Error Rate (EER), a Half Error Rate (Half Total Error Rate, hter) and the confusion matrix of table 1 to describe the indexes in detail:
table 1 confusion matrix table
Tagging/prediction | The prediction is true | Prediction of false |
The label is true | TA | FR |
The label is false | FA | TR |
The False Acceptance Rate (FAR) refers to the ratio of the number of live faces determined by the non-live faces to the number of non-live faces labeled:
the False Rejection Rate (FRR) is the ratio of the number of non-live faces determined by live faces to the number of live faces labeled:
the correct acceptance rate (TAR) is the ratio of the number of live faces determined by the live faces to the number of live faces labeled:
equal Error Rate (EER) is the error rate when FRR and FAR are equal;
half error rate (HTER) is the mean of FRR and FAR:
in order to prove the effectiveness of the invention and test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on CASIA-MFSD, Replay-Attack and MSU-MFSD databases. The in-library and cross-library experimental results are shown in tables 2 and 3, respectively:
table 2 library of experimental results
TABLE 3 Cross-Bank Experimental results
As can be seen from Table 2, the half error rate and the equal error rate of the method in the library are both low, and the library has excellent performance of deception detection; as can be seen from Table 3, the half-error rate of cross-bank detection is also low; the training set is composed of labeled samples and unlabeled samples of a small number of frames extracted from each section of training set video, but the diversity of training data is enriched by generating data obtained through a double-decoupling generator, and the learning capability and model heuristic capability of limited sample characteristics are improved by training the model progressively through meta-learning. The experimental results prove that under the condition that the labeled training samples are insufficient, the high accuracy in the library is ensured, the cross-library error rate is greatly reduced, and the generalization performance is obviously improved.
As shown in fig. 9, the present embodiment further provides a living body detection system based on double-decoupling generation and semi-supervised learning, including: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;
in this embodiment, the data preprocessing module is configured to extract a face region image from an input image to obtain an RGB color channel image, pair each true image of an original sample of the RGB color channel image to be trained with a false image of the same identity to form a true-false image pair, and label an attack category label for the false image;
in this embodiment, the dual decoupling generator building module is used for building a real encoder, a prosthetic encoder and a decoder to form a dual decoupling generator;
in this embodiment, the dual decoupling generator training module is configured to send a true image to a true encoder to obtain a true identity vector, send a matched false image to a false encoder to obtain a false identity vector and a false pattern vector, combine the true identity vector, the false identity vector and the false pattern vector, send the combined true identity vector, false identity vector and false pattern vector to a decoder to output a reconstructed true and false image pair, and construct a dual decoupling generation loss function for optimization;
in this embodiment, the generated sample construction module is configured to input standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
in this embodiment, the unsupervised data enhancement module is configured to crop a training set formed by an original sample and a generated sample, and construct a labeled sample, a unlabeled sample, and an enhanced unlabeled sample;
in this embodiment, the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;
in this embodiment, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of the unlabeled data, and enhanced unlabeled loss of the teacher;
in this embodiment, the detector learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the detector learning module to update the detector parameters, so as to obtain an update loss of the detector;
in this embodiment, the network parameter updating module is configured to update the teacher network parameters by using the teacher semi-supervised loss, the teacher enhanced label-free loss, and the detector updating loss, iteratively update the parameters of the detector and the teacher network by using the optimizer according to the loss function, and store the parameters of the teacher network and the detector after training is completed;
in this embodiment, the verification module is configured to send a verification set face RGB color channel map to a trained detector to obtain a classification score, obtain a predicted label value according to different decision thresholds, compare the predicted label value with a real label, calculate a false alarm rate and a false omission factor, and take a threshold value when the predicted label value and the real label value are equal to each other as a test decision threshold;
in this embodiment, the test module is configured to send the test set face RGB color channel map to a trained detector, obtain a classification score, obtain a final predicted label value according to a test decision threshold, and calculate a reference index.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (10)
1. A living body detection method based on double decoupling generation and semi-supervised learning is characterized by comprising the following steps:
picking up a face region image from an input image to obtain an RGB color channel image;
matching each true image of an original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
constructing a real person encoder, a prosthesis encoder and a decoder;
sending the true image into a true encoder to obtain a true identity vector, sending the matched false image into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the true identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined true identity vector, the prosthesis identity vector and the prosthesis mode vector into a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function to optimize;
inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;
cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;
updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss;
iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing detection rate, and taking the thresholds with the same value as a test judgment threshold;
and sending the RGB color channel map of the face in the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.
2. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the construction of the real-person encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the respective backbone networks of the real person encoder and the prosthesis encoder adopt the same network structure, and are provided with a convolution layer, an example normalization layer, a LeakyReLU and five groups of units consisting of residual blocks which are formed by stacking and grade-skipping connection of the pooling layer and the convolution layer;
the real person encoder outputs a hidden layer real person vector through the full connecting layer, and the prosthesis encoder outputs a hidden layer prosthesis vector through the full connecting layer;
the decoder backbone network is built by a convolutional layer, an upper sampling layer, a Sigmoid active layer and a residual block, hidden layer real man vectors and hidden layer prosthesis vectors output by a real man encoder and a prosthesis encoder are input into a full connection layer firstly, pass through six groups of units formed by residual blocks and upper sampling layers which are stacked by the convolutional layer and connected in a skipping stage, and finally pass through the convolutional layer to output an image pair.
3. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the true image is sent to a true human encoder to obtain a true human identity vector, and the matched false image is sent to a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, and the specific steps include:
the human decoder outputs human hidden vectors including human identity vector meanAnd true person identity vector variance
The prosthesis encoder outputs a prosthesis hidden vector comprising a prosthesis identity vector meanVariance of identity vector of prosthesisMean of prosthesis mode vectorVariance of the prosthesis mode vector
Sampling real person identity variation component from standard normal distributionIdentity variation of prosthesisAnd a prosthetic mode variation component
Performing repeated parameter operation to respectively obtain real person identity vectorsProsthesis identity vectorAnd a prosthesis mode vectorThe concrete expression is as follows:
4. The in-vivo detection method based on double-decoupling generation and semi-supervised learning of claim 3, wherein the true human identity vector, the prosthesis identity vector and the prosthesis mode vector are merged and sent to a decoder to output a reconstructed true and false image pair, a double-decoupling generation loss function is constructed for optimization, and the specific steps include:
vector of prosthesis modeFeeding in the fully-connected layer fc (, output prosthesis class vectorAnd with the actual prosthesis class label ysCalculating cross entropy to obtain classification lossExpressed as:
wherein k is the number of prosthesis classes, YsLabeling vectors for the one-hot coded prosthesis class;
identity vector in prosthesisAnd a prosthesis mode vectorAdding angular quadrature constraints to obtain quadrature lossesExpressed as:
wherein <, > represents the inner product;
for reconstructing real person imageReconstructing a prosthetic imageCalculating reconstruction losses with corresponding original imagesExpressed as:
to real person identity vectorProsthesis identity vectorCalculating the maximum average differential lossExpressed as:
for reconstructing real person imageAnd reconstructing the prosthesis imageCalculating pairing lossExpressed as:
constrained by calculating the KL divergence, as shown in the following equation:
wherein ,λ1、λ2、λ3、λ4Represents a corresponding weight value;
5. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the generating samples are obtained from a standard normal distribution sampling noise input to a trained decoder, and the specific steps comprise:
let the hidden vector dimension be n, from a standard normal distributionSampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectorsAnd reconstructing the prosthesis pattern hidden vector
6. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the detector and teacher network have the same network structure, and are provided with a convolutional layer, a batch normalization layer, a ReLU, three units of residual blocks formed by convolutional layer stacking and skip level connection, a global average pooling layer and a full connection layer, and the full connection layer outputs classification vectors.
7. The in-vivo detection method based on double-decoupling generation and semi-supervised learning of claim 1, wherein the labeled samples, the unlabeled samples and the enhanced unlabeled samples are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss, and the specific steps include:
labeled exemplars { x }l,ylThe input parameter is thetaTThe teacher network T outputs a teacher labeled prediction result T (x)l;θT) And the genuine label ylCalculating cross entropy to obtain teacher marked markLoss of labelThe concrete expression is as follows:
where CE represents cross entropy loss;
unlabeled sample xuAnd enhanced unlabeled samplesRespectively inputting the teacher network T to obtain a teacher unlabelled prediction result T (x)u;θT) And teacher enhanced unlabeled prediction resultsCalculating the cross entropy loss of the two to obtain the label-free loss of the teacherThe concrete expression is as follows:
predicting result T (x) from teacher without labelu;θT) Taking the category to which the maximum value belongs as a pseudo labelThe concrete expression is as follows:
lose teachers with labelsNo label loss with teacherWeighted summation to obtain semi-supervised loss of teacherExpressed as:
wherein s is the current step number stlLambda is the weight of no tag loss for the total number of steps;
enhancing unlabeled prediction results for teachersObtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongsExpressed as:
8. the in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the pseudo labels of the labeled samples, the enhanced unlabeled samples and the unlabeled samples are sent to a detector learning module to update detector parameters, so as to obtain a detector update loss, and the specific steps include:
the reinforced label-free sampleThe feeding parameter is thetaDDetector D to obtain detector enhanced label-free prediction resultsAnd a pseudo tagCalculating cross entropy to obtain detector enhanced label-free lossThe concrete expression is as follows:
loss of label for detectorOptimizing the detector by using a gradient descent method to obtain an optimized parameter theta'DIs specifically represented as:
labeled sample (x)l,yl) Respectively feeding the parameters to be optimized to be thetaDIs theta 'and the optimized parameter is'DObtaining an old detector with a loss of labelAnd loss of label to the new detectorThen the two are differenced to obtain the update loss of the detectorThe concrete expression is as follows:
9. the in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the teacher network parameters are updated by teacher semi-supervised loss, teacher enhanced non-label loss and detector update loss, and the specific steps include:
loss of detector updatesEnhancing label-free loss with teachersMultiply and then partially monitor the loss with the teacherAdd up to form teacher's lossThe teacher network is optimized by a gradient descent method, and the method is represented as follows:
10. A living body detection system based on double decoupling generation and semi-supervised learning, characterized by comprising: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;
the data preprocessing module is used for matting human face region images from input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
the double decoupling generator building module is used for building a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the double decoupling generator training module is used for sending the real image into the real person encoder to obtain a real person identity vector, sending the matched false image into the false body encoder to obtain a false body identity vector and a false body mode vector, combining the real person identity vector, the false body identity vector and the false body mode vector, sending the combined combination into the decoder to output a reconstructed real and false image pair, and constructing a double decoupling generation loss function to optimize;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the unsupervised data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
the detector learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples to the detector learning module to update the detector parameters, so that the update loss of the detector is obtained;
the network parameter updating module is used for updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss, iteratively updating parameters of the detector and the teacher network by using an optimizer according to a loss function, and storing the parameters of the teacher network and the detector after training is completed;
the verification module is used for sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating the false alarm rate and the missing rate, and taking the threshold with the same value as the test judgment threshold;
the test module is used for sending the RGB color channel map of the face of the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329816.3A CN114663986B (en) | 2022-03-31 | 2022-03-31 | Living body detection method and system based on double decoupling generation and semi-supervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329816.3A CN114663986B (en) | 2022-03-31 | 2022-03-31 | Living body detection method and system based on double decoupling generation and semi-supervised learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114663986A true CN114663986A (en) | 2022-06-24 |
CN114663986B CN114663986B (en) | 2023-06-20 |
Family
ID=82033819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210329816.3A Active CN114663986B (en) | 2022-03-31 | 2022-03-31 | Living body detection method and system based on double decoupling generation and semi-supervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114663986B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311605A (en) * | 2022-09-29 | 2022-11-08 | 山东大学 | Semi-supervised video classification method and system based on neighbor consistency and contrast learning |
CN116152885A (en) * | 2022-12-02 | 2023-05-23 | 南昌大学 | Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460931A (en) * | 2020-03-17 | 2020-07-28 | 华南理工大学 | Face spoofing detection method and system based on color channel difference image characteristics |
CN111753595A (en) * | 2019-03-29 | 2020-10-09 | 北京市商汤科技开发有限公司 | Living body detection method and apparatus, device, and storage medium |
WO2021134871A1 (en) * | 2019-12-30 | 2021-07-08 | 深圳市爱协生科技有限公司 | Forensics method for synthesized face image based on local binary pattern and deep learning |
CN114067444A (en) * | 2021-10-12 | 2022-02-18 | 中新国际联合研究院 | Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature |
-
2022
- 2022-03-31 CN CN202210329816.3A patent/CN114663986B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753595A (en) * | 2019-03-29 | 2020-10-09 | 北京市商汤科技开发有限公司 | Living body detection method and apparatus, device, and storage medium |
US20200364478A1 (en) * | 2019-03-29 | 2020-11-19 | Beijing Sensetime Technology Development Co., Ltd. | Method and apparatus for liveness detection, device, and storage medium |
WO2021134871A1 (en) * | 2019-12-30 | 2021-07-08 | 深圳市爱协生科技有限公司 | Forensics method for synthesized face image based on local binary pattern and deep learning |
CN111460931A (en) * | 2020-03-17 | 2020-07-28 | 华南理工大学 | Face spoofing detection method and system based on color channel difference image characteristics |
CN114067444A (en) * | 2021-10-12 | 2022-02-18 | 中新国际联合研究院 | Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature |
Non-Patent Citations (2)
Title |
---|
AMIR MOHAMMADI等: "Improving cross-dataset perfomance of face presentation attack detection systems using face recognition datasets" * |
唐燕: "基于深度学习的人脸活体检测" * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311605A (en) * | 2022-09-29 | 2022-11-08 | 山东大学 | Semi-supervised video classification method and system based on neighbor consistency and contrast learning |
CN116152885A (en) * | 2022-12-02 | 2023-05-23 | 南昌大学 | Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling |
CN116152885B (en) * | 2022-12-02 | 2023-08-01 | 南昌大学 | Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling |
Also Published As
Publication number | Publication date |
---|---|
CN114663986B (en) | 2023-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
Cheng et al. | Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification | |
CN109993072B (en) | Low-resolution pedestrian re-identification system and method based on super-resolution image generation | |
Asnani et al. | Reverse engineering of generative models: Inferring model hyperparameters from generated images | |
CN114663986A (en) | In-vivo detection method and system based on double-decoupling generation and semi-supervised learning | |
CN110516616A (en) | A kind of double authentication face method for anti-counterfeit based on extensive RGB and near-infrared data set | |
CN112734696B (en) | Face changing video tampering detection method and system based on multi-domain feature fusion | |
CN114067444A (en) | Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature | |
CN112418041B (en) | Multi-pose face recognition method based on face orthogonalization | |
CN109255289B (en) | Cross-aging face recognition method based on unified generation model | |
CN112668519A (en) | Abnormal face recognition living body detection method and system based on MCCAE network and Deep SVDD network | |
CN113537027B (en) | Face depth counterfeiting detection method and system based on face division | |
CN114387641A (en) | False video detection method and system based on multi-scale convolutional network and ViT | |
CN114677722A (en) | Multi-supervision human face in-vivo detection method integrating multi-scale features | |
CN114693607A (en) | Method and system for detecting tampered video based on multi-domain block feature marker point registration | |
Xie et al. | Writer-independent online signature verification based on 2D representation of time series data using triplet supervised network | |
CN114241564A (en) | Facial expression recognition method based on inter-class difference strengthening network | |
CN113887573A (en) | Human face forgery detection method based on visual converter | |
CN116824695A (en) | Pedestrian re-identification non-local defense method based on feature denoising | |
CN115331135A (en) | Method for detecting Deepfake video based on multi-domain characteristic region standard score difference | |
CN114429646A (en) | Gait recognition method based on deep self-attention transformation network | |
CN117496601B (en) | Face living body detection system and method based on fine classification and antibody domain generalization | |
Gao et al. | AONet: attentional occlusion-aware network for occluded person re-identification | |
Guo et al. | Discriminative Prototype Learning for Few-Shot Object Detection in Remote Sensing Images | |
CN117612201B (en) | Single-sample pedestrian re-identification method based on feature compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |