CN114663986B - Living body detection method and system based on double decoupling generation and semi-supervised learning - Google Patents
Living body detection method and system based on double decoupling generation and semi-supervised learning Download PDFInfo
- Publication number
- CN114663986B CN114663986B CN202210329816.3A CN202210329816A CN114663986B CN 114663986 B CN114663986 B CN 114663986B CN 202210329816 A CN202210329816 A CN 202210329816A CN 114663986 B CN114663986 B CN 114663986B
- Authority
- CN
- China
- Prior art keywords
- vector
- false
- true
- loss
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 186
- 238000012360 testing method Methods 0.000 claims abstract description 32
- 238000010586 diagram Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 37
- 238000005070 sampling Methods 0.000 claims description 25
- 238000009826 distribution Methods 0.000 claims description 17
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 15
- 238000012795 verification Methods 0.000 claims description 14
- 238000010276 construction Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000002708 enhancing effect Effects 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims 1
- 238000010200 validation analysis Methods 0.000 abstract description 4
- 230000001815 facial effect Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a living body detection method and a living body detection system based on double decoupling generation and semi-supervised learning, wherein the method comprises the following steps: preprocessing data to obtain an RGB color channel diagram original sample, and pairing the same identities to obtain a true and false image pair; the true person encoder outputs a true person identity vector, the false person encoder outputs a false person identity vector and a false person mode vector, the false person identity vector, the false person mode vector and the false person identity vector are combined and sent to the decoder to obtain a reconstructed true and false image pair, a double decoupling generation loss function is constructed, and noise is sent to the trained decoder to obtain a generated sample; constructing a label sample, a label-free sample and an enhanced label-free sample for the original sample and the generated sample, and sending the label sample, the label-free sample and the enhanced label-free sample into a teacher learning module to obtain semi-supervision loss of a teacher, pseudo labels of the label-free sample and enhanced label-free loss of the label-free sample, and updating network parameters of a detector and the teacher; determining a threshold using the validation set; and loading the test data to the detector to obtain a classification score, and judging a classification result according to the threshold value. The invention can improve the robustness of the living body detection model.
Description
Technical Field
The invention relates to the technical field of face recognition anti-deception detection, in particular to a living body detection method and system based on double decoupling generation and semi-supervised learning.
Background
Today, the use of facial biometric technology in businesses and industries has increased dramatically in situations where, for example, facial unlocking technology can be used to protect personal privacy in electronic devices and facial biometric technology can be used to authenticate payments. However, using a face as a biometric for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally divided into four categories: 1) Photo attack-an attacker spoofs the authentication system using a print or a photo of a display screen; 2) Video replay attack, namely an attacker uses a video spoofing authentication system of an attacked person shot in advance; 3) The facial mask attacks, and an attacker wears a facial mask deception system which is carefully manufactured according to the attacked person; 4) Against sample attacks, an attacker generates specific sample noise through the GAN network to interfere with the face authentication system to generate false directional identity verification. These face spoofing attacks are not only low cost but can fool the system, severely affecting and threatening the application of the face recognition system.
The living body detection plays a vital role in preventing the human face recognition system from being attacked by the prosthesis, and the living body detection algorithm based on the deep learning achieves better performance than the living body detection algorithm based on the traditional manual characteristic algorithm due to the strong characteristic extraction capability of the deep learning network, and although most living body detection algorithms based on the deep learning can achieve good detection effect in a library, the detection performance is poor due to the fact that data in the library and data out of the library are often collected under different conditions, such as different shooting equipment, environmental illumination, attack presentation equipment and the like, so that the distribution of the data in the library and the data out of the library are different, and domain displacement exists between the two. When the diversity of training data is insufficient, the fitting is easy to be performed in the in-library learning process, and the cross-library generalization performance is poor. Although the cause can be judged, it is not easy to solve the above problem in the real world application process, and it is difficult for the living detection model to collect labeled training samples in all scenes, resulting in insufficient diversity of most of the existing anti-spoofing data sets at present. For example, the common CASIA, replay-attach and msu shooting devices have 3, 2 and 2 types respectively, and the shooting backgrounds have 3, 2 and 1 types respectively.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which is characterized in that living body/prosthesis characteristics are modeled through decoupling learning, a sample is generated in potential space synthesis to expand a data set, so that the diversity of training data is improved, meanwhile, the high distinguishing degree of the generated sample is ensured, the influence of generated noise on model learning is reduced, the technical scheme based on double decoupling generation and semi-supervised learning is specifically adopted, the technical problems of insufficient diversity and poor generalization of living body detection model data are solved, and the technical effects of ensuring the accuracy in a library and effectively improving the generalization performance are achieved.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which comprises the following steps:
the face region image is extracted from the input image to obtain an RGB color channel image;
pairing each true image of an RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;
constructing a real encoder, a prosthesis encoder and a decoder;
The true image is sent to a true person encoder to obtain a true person identity vector, the paired false image is sent to a false body encoder to obtain a false body identity vector and a false body mode vector, the true person identity vector, the false body identity vector and the false body mode vector are combined and sent to a decoder to output a reconstructed true and false image pair, and a double decoupling generation loss function is constructed to optimize;
inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a label-free sample and an enhanced label-free sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;
updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss;
Iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel diagram of the face of the verification set to a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
sending the RGB color channel diagram of the face of the test set to a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.
As a preferable technical scheme, the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;
the true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;
the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and a residual block, a hidden layer true human vector and a hidden layer false vector output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual block and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer.
As an preferable technical scheme, the steps of sending the true image to the true encoder to obtain a true identity vector, sending the paired false image to the false encoder to obtain a false identity vector and a false mode vector include:
the true person decoder outputs a true person hidden vector comprising a true person identity vector average valueAnd true person identity vector variance
The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector meanProsthetic identity vector varianceProsthesis mode vector mean->Prosthetic mode vector variance->
Sampling real person identity variation components from standard normal distributionProsthesis identity variable component->And a prosthetic mode variable component
Performing heavy parameter operation to obtain true identity vectorsProsthesis identity vector->And a prosthetic mode vectorThe concrete steps are as follows:
to the true person identity vectorProsthesis identity vector->And a prosthesis mode vector->Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>And reconstructing a prosthetic image->
As a preferred technical solution, the merging of the true identity vector, the false identity vector and the false mode vector into the true and false image pair reconstructed by the decoder output, and the construction of the double decoupling generation loss function for optimization, the specific steps include:
Pattern vector of prosthesisInto full connection layer fc (·) and out of prosthesis class vector +.>And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss->Expressed as:
wherein k is the number of prosthesis categories, Y s A prosthesis class label vector subjected to one-time thermal coding;
in the prosthetic identity vectorAnd a prosthesis mode vector->Adding an angular orthogonality constraint to obtain an orthogonality loss->Expressed as:
wherein < ·, · > represents the inner product;
for reconstructing real person imageReconstructing a prosthetic image->Calculate reconstruction loss with corresponding artwork>Expressed as:
for true person identity vectorProsthesis identity vector->Calculate the maximum average difference loss->Expressed as:
for reconstructing real person imageAnd reconstructing a prosthetic image->Calculating pairing loss->Expressed as:
constraint is performed by calculating the KL divergence as shown in the following formula:
wherein ,λ1 、λ 2 、λ 3 、λ 4 Representing the corresponding weight value;
employing Adam optimizers to minimize double decoupling generator loss functionsThe real encoder, prosthetic encoder and decoder are optimized for the target.
As a preferred technical solution, the step of inputting the standard normal distributed sampling noise to the trained decoder to obtain the generated samples includes the following specific steps:
Let the dimension of the hidden vector be n, from the normal distribution of the standardSampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +.>And reconstructing a prosthetic mode hidden vector +.>
Reconstructing true person identity hidden vectorsHiding the vector directly from the reconstructed prosthesis identity>The copy is then passed through a decoder to generate a true-false image pair as a generated sample.
As an optimal technical scheme, the detector and the teacher network have the same network structure, and are provided with a convolution layer, a batch normalization layer, a ReLU, three groups of units formed by residual blocks which are stacked by the convolution layer and connected in a skip level, a global average pooling layer and a full connection layer, and the full connection layer outputs classification vectors.
As a preferred technical solution, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain a teacher semi-supervision loss, a pseudo label of unlabeled data and a teacher enhanced unlabeled loss, and the specific steps include:
labeled sample { x l ,y l The input parameter is theta T Teacher network T outputs teacher tagged prediction result T (x l ;θ T ) With the real label y l Calculating cross entropy to obtain teacher label lossThe concrete steps are as follows:
Wherein CE represents cross entropy loss;
label-free sample x u Enhanced unlabeled exemplarRespectively inputting the teacher network T to obtain teacher label-free prediction results T (x u ;θ T ) And teacher enhanced label-free prediction result->Calculating cross entropy loss of the two to obtain no-label loss of teachers->The concrete steps are as follows:
label-free prediction knot from teacherFruit T (x) u ;θ T ) Extracting the category of the maximum value as a pseudo tagThe concrete steps are as follows:
the teacher has label lossAnd teacher no tag loss->Weighted summation to obtain teacher semi-supervision loss>Expressed as:
wherein s is the current step number, s tl Lambda is the weight without label loss for the total steps;
enhancing unlabeled prediction results for teachersThe class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->Expressed as:
as a preferred technical solution, the step of sending the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to a detector learning module to update the detector parameters to obtain a detector update loss includes:
the enhanced unlabeled exemplar is to be processedThe feeding parameter is theta D The detector D of (1) gets the detector enhanced label-free prediction result +. >And pseudo tag->Calculating cross entropy to obtain detector enhancement no tag loss +.>The concrete steps are as follows:
no tag loss of detectorOptimizing the detector by gradient descent method to obtain optimized parameter of theta' D Specifically expressed as:
labeled sample (x l ,y l ) Respectively send the parameters to be theta before optimization D The detector and optimized parameters of (2) are theta' D Obtaining a label loss from the old detectorAnd a new detector with tag loss->Then the two are differenced to obtain the detector update loss +.>The concrete steps are as follows:
as an optimized technical scheme, the method for updating the teacher network parameters by utilizing the teacher semi-supervised loss, the teacher enhanced label-free loss and the detector updating loss comprises the following specific steps:
loss of detector updateNo tag loss with teacher enhancement>Multiplying and then semi-supervising the loss of teachers>Adding to form teacher loss->The teacher network is optimized by gradient descent method, expressed as:
wherein ,θT Representing parameters, theta ', before optimizing teacher network' T Representing parameters, eta after teacher network optimization T Represents the network learning rate of the teacher,representing the gradient calculations.
The invention also provides a living body detection system based on double decoupling generation and semi-supervised learning, which comprises: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;
The data preprocessing module is used for extracting face area images from the input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling the false images with attack type labels;
the double decoupling generator construction module is used for constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the double-decoupling generator training module is used for sending the true image to the true man encoder to obtain a true man identity vector, sending the paired false image to the false man encoder to obtain a false man identity vector and a false man mode vector, merging the true man identity vector, the false man identity vector and the false man mode vector, sending the merged true man identity vector, the false man identity vector and the false man mode vector to the decoder to output a reconstructed true and false image pair, and constructing a double-decoupling generation loss function to optimize;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the non-supervision data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a non-label sample and an enhanced non-label sample;
The detector construction module and the teacher network construction module are respectively used for constructing a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
the detector learning module is used for sending the label sample, the enhanced label-free sample and the pseudo label of the label-free sample to the detector learning module to update the detector parameters so as to obtain the detector updating loss;
the network parameter updating module is used for updating the parameters of the teacher network by utilizing the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher and the updating loss of the detector, iteratively updating the parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
the verification module is used for sending the RGB color channel diagram of the verification set face to the trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
The testing module is used for sending the RGB color channel diagram of the face of the testing set to the trained detector to obtain classification scores, obtaining a final predicted label value according to the testing judgment threshold value, and calculating a reference index.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) In the data generation stage, the invention adopts a real encoder, a prosthesis encoder and a decoder to construct a double-decoupling generator, the double-decoupling generator is trained by inputting real and false image pairs of original samples, and then the real and false image pairs are input into the trained decoder in the double-decoupling generator from standard normal distributed sampling noise to obtain generated samples, and the generated samples are used as part of label-free data to enrich the diversity of training data and solve the problem of insufficient diversity of the training data.
(2) In the training stage, a semi-supervised learning framework of pseudo-label generation by a teacher and detector feedback is adopted, specifically, a teacher network provides pseudo-labels without label data for a detector to supervise the detector learning, the performance of the detector is evaluated on labeled data after the detector parameters are updated, the generated pseudo-labels are optimized by losing feedback to the teacher network, and the problem of model training under the condition of limited label training data and the problem of label uncertainty caused by fuzzy generated samples due to defects of a variable self-encoder are solved; the high-discrimination characteristic of the image block is mined, and the learning capacity of the network is improved, so that the sample collection environment which is not seen can be better generalized.
(3) In the detection stage, the model tests loading data to the detector to obtain corresponding classification scores, and the classification results are judged according to the threshold value.
Drawings
FIG. 1 is a flow diagram of a living body detection method based on double decoupling generation and semi-supervised learning according to the present invention;
FIG. 2 is a schematic diagram of a network architecture of a real encoder and a prosthetic encoder according to the present invention;
FIG. 3 is a diagram illustrating a network architecture of a decoder according to the present invention;
FIG. 4 is a schematic diagram of a training phase flow of the dual decoupling generator of the present invention;
FIG. 5 is a schematic diagram of a generating phase flow of the dual decoupling generator of the present invention;
fig. 6 (a) is a schematic diagram of a real person image generated by the dual decoupling generator of the present invention;
FIG. 6 (b) is a schematic diagram of a prosthesis image generated by the dual decoupling generator of the present invention;
FIG. 7 is a schematic diagram of the network architecture of the detector and teacher network of the present invention;
FIG. 8 is an overall framework diagram of semi-supervised learning of the present invention;
fig. 9 is an overall block diagram of a living body detection system based on double decoupling generation and semi-supervised learning of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
This embodiment uses Replay-attach, CASIA-MFSD, and MSU_MFSD biopsy data sets for training and testing as an example, and details the implementation of this embodiment. The Replay-attach data set comprises 1200 videos, real faces from 50 testers and spoofed faces generated according to the real faces are collected by using a MacBook camera with the resolution ratio of 320 multiplied by 240 pixels, and the real faces are divided into a training set, a verification set and a test set according to a ratio of 3:3:4; the CASIA-MFSD data set comprises 600 videos, and three cameras with the resolution of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels are utilized to collect real faces from 50 testers and deceptive faces generated according to the real faces, and the real faces are divided into a training set and a testing set according to the resolution of 2:3; the msu_mfsd dataset included 280 videos, collecting real faces from 35 testers, 15 for the training set and 20 for the test set, and spoof faces generated therefrom. Since the CASIA-MFSD and MSU_MFSD live detection data sets do not contain a validation set, the present embodiment uses the corresponding test set as the validation set for threshold determination for both data sets. And then framing the video of the data set to obtain a picture. The embodiment is carried out on a Linux system and is mainly realized based on a deep learning framework Pytorch1.6.1, wherein the display card is GTX1080Ti, the CUDA version is 10.1.105, and the cudnn version 7.6.4.
As shown in fig. 1, the present embodiment provides a living body detection method based on double decoupling generation and semi-supervised learning, which includes the following steps:
s1: the face region image is extracted from the input image to obtain an RGB color channel image;
in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue. Then, matching each true image of the RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;
s2: constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
in this embodiment, as shown in fig. 2, the main networks of the real encoder and the prosthetic encoder respectively adopt the same network structure, the input size is set to be h×w×3, the input size is changed into h×w×32 through a convolution layer, an instance normalization layer and a LeakyReLU, and then the input size is respectively 1/2, 1/4, 1/8, 1/16, 1/32 of the original size, the channel number is respectively 64, 128, 256, 512 and 512, and the output size is obtained through five groups of units formed by residual blocks connected by a pooling layer, a convolution layer stack and a skip level Is described. After the output characteristics of the backbone network are obtained, the real person encoder outputs hidden layer real person vectors with the size of 2 Xhdim through the full connection layer; the prosthetic encoder outputs hidden layer prosthetic vectors of size 4 xhdim through the full connection layer.
As shown in fig. 3, the decoder backbone is built with convolutional layers, upsampling layers, sigmoid activation layers, and residual blocks. The input size is set to 3 Xhdim, and the full link layer size is first input to becomeThen through six groups of residual blocks and upsampling layer structures which are connected by convolution layer stack and skip levelThe resultant units have output sizes of 1/32, 1/16, 1/8, 1/4, 1/2, and 1 of original sizes, and the channel numbers are 512, 256, 128, and 64, respectively, and finally the image pairs with the sizes of H×W×6 are output through the convolution layers.
S3: the method comprises the steps of constructing a double decoupling generation module, wherein the double decoupling generation module consists of a double decoupling generator and a double decoupling generator loss function, the double decoupling generator consists of a real person encoder, a false body encoder and a decoder, transmitting a real image to the real person encoder to obtain a real person identity vector, transmitting a matched false image to the false body encoder to obtain a false body identity vector and a false body mode vector, merging the three to transmit the false image pair reconstructed by the decoder output, and constructing a double decoupling generation loss function to optimize;
As shown in fig. 4, the true image and the false image each having dimensions of h×w×3 of the input pair are input to the true decoder and the false encoder, respectively. Let the hidden vector dimension be n, the true human decoder obtains the true human hidden vector with dimension of 2×n, which contains the mean value of the true human identity vector with dimension of nAnd true person identity vector variance->The prosthetic encoder obtains a prosthetic hidden vector of dimension 4 x n comprising a prosthetic identity vector mean of dimension n>Prosthesis identity vector variance->Prosthesis mode vector mean->Prosthetic mode vector variance->Subsequently sampling true from a standard normal distributionPerson identity variable component->Prosthesis identity variable component->And prosthesis pattern variable component->Carrying out heavy parameter operation shown in the following formula to obtain true identity vector +.>Prosthesis identity vector->And a prosthesis mode vector->
True person identity vectorProsthesis identity vector->And a prosthesis mode vector->Combining into hidden vector with dimension of 3 Xn, inputting to decoder, outputting reconstructed true and false image pair with dimension of H XW X6, wherein reconstructed true image pair with dimension of H XW X3 is contained->And reconstructing a prosthetic image->
To better learn the prosthesis modes of different attack modes, the prosthesis mode vector is used for Into full connection layer fc (·) and out of prosthesis class vector +.>And with a true prosthesis class label y s Calculating cross entropy to obtain classification lossThe following formula is shown:
wherein k is the number of prosthesis classes, Y s Is a prosthesis class label vector subjected to one-time thermal coding.
For the purpose of prosthetic identity vectorAnd a prosthesis mode vector->Effective separation, adding angular orthogonality constraint between the two to obtain orthogonality loss->The following formula is shown:
wherein </and >. Indicate inner products.
For reconstructing real person imageAnd reconstructing a prosthetic image->Calculate reconstruction loss with corresponding artwork>The following formula is shown:
to maintain identity consistency of hidden vectors, true person identity vectors with n-dimension are comparedProsthesis identity vector->Calculate the maximum average difference loss->The following formula is shown:
to maintain identity consistency of reconstructed images, reconstructed images of true persons are reconstructedAnd reconstructing a prosthetic image->Calculating pairing loss->The following formula is shown:
to fit the distribution of hidden vectors to a standard normal distribution, constraints are placed by calculating the Kullback-Leibler divergence (KL divergence), as shown in the following equation:
Finally, the loss weighting summation is carried out to obtain a double decoupling generator loss function The following formula is shown:
wherein λ1 、λ 2 、λ 3 、λ 4 Is weighted and has optimal values of 10, 0.5, 0.1 and 1 respectively.
Employing Adam optimizers to minimize double decoupling generator loss functionsThe real encoder, the prosthetic encoder and the decoder are optimized for the target, T algebra are trained iteratively, and the optimal value of T is 200. The parameter updating formula is as follows:
m t+1 =β 1 m t +(1-β 1 )g
v t+1 =β 2 v t +(1-β 2 )g 2
wherein β1 ,β 2 Optimally set to 0.9,0.999, parameter e for preventing zero division optimally set to 1e-8, learning rate eta optimally set to 0.0002, theta t Parameters representing the t-th iteration.
S4: inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
as shown in fig. 5, 6 (a) and 6 (b), the hidden vector dimension is set to n, and the hidden vector is distributed from the standard normal stateSampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +.>And reconstructing a prosthetic mode hidden vectorIn order to ensure the identity consistency of the true and false images, reconstructing the true identity hidden vector +.>Hiding the vector directly from the reconstructed prosthesis identity>The true and false image pairs are copied and then generated by the decoder, and as the generated samples, as shown in fig. 6 (a) and 6 (b), the true and false image pairs generated by the double decoupling generator are obtained, the dimension n optimum value of this embodiment is 128, and the generated sample number optimum value is 6400.
S5: cutting a training set formed by an original sample and a generated sample, dividing the training set into a label sample and a label-free sample, and carrying out random data enhancement on the label-free sample to obtain an enhanced label-free sample;
in this embodiment, firstly, a RGB color channel chart of H×W size to be trained is randomly cut out to form a block of H×W sizeAnd the image block is characterized in that N original samples to be trained are randomly selected from the original samples to be trained to serve as labeled data, the remaining samples and the generated samples form unlabeled data together, and the number ratio of the labeled samples to the unlabeled samples is mu. And then carrying out random data enhancement on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, equalizing a histogram, reversing colors, enabling pixel values to be 0-4 bit positions with the lowest pixel value, randomly rotating, horizontally miscut, vertically miscut, horizontally translating, vertically translating, reversing all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for each sample to enhance. The preferred values for H, W, N, μ of this embodiment are 256, 6000, 4;
S6: constructing a detector and teacher network;
as shown in FIG. 7, the teacher network and the detector each have the same network structure, the input size is H×W×3, the input size is H×W×16 by first passing through the convolution layer, the batch normalization layer, and the ReLU, and then passing through three sets of units composed of residual blocks connected by the convolution layer stack and the skip stage, the output sizes are 1, 1/2, and 1/4 of the original sizes, the channel numbers are 32, 64, and 128, and the obtained sizes areThe feature vector of (2) is changed into 1×1×128 through global average pooling layer size, and finally the classification vector with dimension of 2 is output through full connection layer.
S7: constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and enhanced unlabeled loss:
as shown in fig. 8, CE (q, p) is first agreed to represent the cross entropy loss of two distributions q and p; if p is a label value, the pseudo label vector is firstly changed into a pseudo label vector through one-hot coding. Let k be the number of label categories, the cross entropy loss be expressed as:
secondly, the contract argmax (v) represents the index of the maximum value in the fetched vector v;
for labeled samples { x l ,y l } with a feeding parameter of θ T Teacher network T outputs teacher tagged prediction result T (x l ;θ T ) With the real label y l Calculating cross entropy to obtain teacher label lossThe following formula.
For unlabeled exemplar x u And the enhanced unlabeled exemplar as described above obtained by one data enhancementRespectively feeding the teacher data into a teacher network T to obtain a teacher label-free prediction result T (x u ;θ T ) And teacher enhanced label-free prediction resultsCalculating cross entropy loss of the two results to obtain no tag loss of teachers +.>Teacher label-free prediction result T (x) u ;θ T ) The category of the maximum value is taken out as pseudo tag +.>The following formula is shown:
then the teacher has label lossAnd teacher no tag loss->Weighted summation to obtain teacher semi-supervision loss>The following formula.
Wherein s is the current step number, s tl Lambda is the weight without tag loss for the total number of steps.
Enhancing unlabeled prediction results for teachersThe class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->The following formula.
S8: a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;
As shown in fig. 8, the enhanced unlabeled exemplar is to be readThe feeding parameter is theta D The detector D of (1) gets the detector enhanced label-free prediction result +.>And pseudo tag->Calculating cross entropy to obtain the enhanced label-free loss of the detectorThe following formula is shown:
let the detector learning rate be eta D No tag loss according to detectorOptimizing the detector by gradient descent method to obtain optimized parameter of theta' D Is represented by the following formula:
will have a sample of labels (x l ,y l ) Respectively send the parameters to be theta before optimization D The detector and optimized parameters of (2) are theta' D Obtaining a label loss from the old detectorAnd a new detector with tag loss->Then the two are differenced to obtain the detector update loss +.>The following formula. />
S9: updating teacher network parameters by using teacher semi-supervised loss, enhanced non-label loss and detector updating loss:
as shown in fig. 8, the detector is updated and lostNo tag loss with teacher enhancement>Multiplying and then semi-supervising the loss of teachers>Adding to form teacher loss->Let teacher's network learning rate be eta T The teacher network was optimized using gradient descent, expressed as:
s10: iteratively updating network parameters of the detector and the teacher network by using an optimizer according to the loss function, storing the parameters of the teacher network and the detector after training is completed, Representing gradient calculations:
in this embodiment, the teacher network and the detector both use SGD optimizers with nester ov momentum, where the momentum is μ, the preferred value is 0.9, and the initial value of learning rate is ε 0 The optimal value is 0.05, and the learning rate is attenuated along with the training iteration times;
enhancing label-free loss from a detector in a detector learning moduleOptimizing with a detector optimizer with the goal of minimizing losses; according to teacher loss in teacher update module>Optimizing with a teacher network optimizer with the goal of minimizing losses.
S11: determining a threshold using the validation set;
in this embodiment, the specific steps include: sending the RGB color channel diagram of the face of the verification set to a detector to obtain a classification fraction p, then carrying out equidistant sampling in a value range (0, 1) to obtain different judgment thresholds, obtaining a predicted label value according to the thresholds, comparing the predicted label value with a real label, calculating a false alarm rate and a false omission rate, and taking the thresholds when the false alarm rate and the false omission rate are equal as a test judgment threshold T;
s12: testing a model;
in this embodiment, the specific steps include: and sending the RGB color channel diagram of the face of the test set to a detector to obtain a classification score p, obtaining a final predicted label value according to a test decision threshold T, and calculating a reference index.
The performance evaluation index of the living body measurement method of this embodiment adopts Error acceptance Rate (False Acceptance Rate, FAR), error rejection Rate (False Rejection Rate, FRR), correct acceptance Rate (True Acceptance Rate, TAR), equal Error Rate (EER), half Error Rate (Half Total Error Rate, hter. The above-mentioned index is described in detail by using the confusion matrix of table 1:
TABLE 1 confusion matrix table
Label/predict | Predicted to be true | Prediction as false |
The tag is true | TA | FR |
The label being false | FA | TR |
The False Acceptance Rate (FAR) refers to the ratio of the number of living faces to the number of non-living faces in the label judgment of the non-living faces:
the False Rejection Rate (FRR) refers to the ratio of the number of non-living faces to the number of living faces in the label determined by the living faces:
the correct acceptance rate (TAR) refers to the ratio of the number of living faces to the number of living faces in the label:
the Equal Error Rate (EER) is the error rate when the FRR is equal to the FAR;
half error rate (HTER) is the average of FRR and FAR:
in order to prove the effectiveness of the invention and to test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on the CASIA-MFSD, replay-attach and MSU-MFSD databases. The in-library experimental results and the cross-library experimental results are shown in tables 2 and 3, respectively:
TABLE 2 results of in-library experiments
TABLE 3 Cross-library experimental results
As can be seen from Table 2, the semi-error rate and the equal error rate of the method in the library are low, and the library has excellent performance of fraud detection; as can be seen from table 3, the half error rate of cross-library detection is also lower; the training set consists of labeled samples and unlabeled samples of less frames extracted from each segment of training set video, but the diversity of training data is enriched by generating data through a double decoupling generator, and the model is progressively trained through meta learning, so that the learning capacity of limited sample characteristics and the model inspiring capacity are improved. The experimental result proves that the method can ensure high accuracy in the library and greatly reduce the error rate across libraries and obviously improve the generalization performance under the condition of insufficient label training samples.
As shown in fig. 9, the present embodiment further provides a living body detection system based on double decoupling generation and semi-supervised learning, including: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;
In this embodiment, the data preprocessing module is configured to extract a face area image from an input image to obtain an RGB color channel image, pair each true image of an original sample of the RGB color channel image to be trained with a false image of the same identity to form a true-false image pair, and label the false image with an attack type label;
in this embodiment, the double-decoupling generator building module is configured to build a real encoder, a prosthetic encoder, and a decoder to form a double-decoupling generator;
in this embodiment, the double-decoupling generator training module is configured to send a true image to a true person encoder to obtain a true person identity vector, send a paired false image to a false person encoder to obtain a false person identity vector and a false person mode vector, and combine the true person identity vector, the false person identity vector and the false person mode vector to send the true person identity vector, the false person identity vector and the false person mode vector to a decoder to output a reconstructed true and false image pair, so as to construct a double-decoupling generation loss function to optimize the true and false image pair;
in this embodiment, the generated sample construction module is configured to obtain a generated sample from the standard normal distributed sampling noise input to the trained decoder;
in this embodiment, the unsupervised data enhancing module is configured to cut a training set formed by an original sample and a generated sample, and construct a labeled sample, an unlabeled sample, and an enhanced unlabeled sample;
In this embodiment, the detector building module and the teacher network building module are respectively configured to build a detector and a teacher network;
in this embodiment, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module, so as to obtain a teacher semi-supervised loss, a pseudo-label of unlabeled data, and a teacher enhanced unlabeled loss;
in this embodiment, the detector learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the detector learning module to update the detector parameters, so as to obtain a detector update loss;
in this embodiment, the network parameter updating module is configured to update the parameters of the teacher network by using the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher, and the updating loss of the detector, iteratively update the parameters of the detector and the teacher network by using the optimizer according to the loss function, and save the parameters of the teacher network and the detector after training is completed;
in this embodiment, the verification module is configured to send the verification set face RGB color channel diagram to the trained detector to obtain a classification score, obtain a predicted tag value according to different decision thresholds, compare the predicted tag value with a real tag, calculate a false alarm rate and a false omission rate, and take a threshold value when the two are equal as a test decision threshold value;
In this embodiment, the test module is configured to send the RGB color channel chart of the face to the trained detector to obtain the classification score, obtain the final predicted label value according to the test decision threshold, and calculate the reference index.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (6)
1. The living body detection method based on double decoupling generation and semi-supervised learning is characterized by comprising the following steps of:
the face region image is extracted from the input image to obtain an RGB color channel image;
pairing each true image of an RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;
constructing a real encoder, a prosthesis encoder and a decoder;
the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;
The true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;
the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and residual blocks, a hidden layer true human vector and a hidden layer false vector which are output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual blocks and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer;
the true image is sent to a true person encoder to obtain a true person identity vector, the paired false image is sent to a false body encoder to obtain a false body identity vector and a false body mode vector, the true person identity vector, the false body identity vector and the false body mode vector are combined and sent to a decoder to output a reconstructed true and false image pair, and a double decoupling generation loss function is constructed to optimize;
the method comprises the specific steps of sending a true image to a true encoder to obtain a true identity vector, and sending a matched false image to a false encoder to obtain a false identity vector and a false mode vector, wherein the specific steps comprise:
the true person decoder outputs a true person hidden vector comprising a true person identity vector average value He ZhenHuman identity vector variance->
The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector meanProsthesis identity vector variance->Prosthesis mode vector mean->Prosthetic mode vector variance->
Sampling real person identity variation components from standard normal distributionProsthesis identity variable component->And prosthesis pattern variable component->
Performing heavy parameter operation to obtain true identity vectorsProsthesis identity vector->And a prosthesis mode vector->The concrete steps are as follows:
to the true person identity vectorProsthesis identity vector->And a prosthesis mode vector->Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>And reconstructing a prosthetic image->
The method comprises the specific steps of combining the true identity vector, the false identity vector and the false mode vector, sending the combined true identity vector, the false identity vector and the false mode vector into a true and false image pair reconstructed by the decoder output, constructing a double decoupling generation loss function, and optimizing the double decoupling generation loss function, wherein the specific steps comprise:
pattern vector of prosthesisInto full connection layer fc (·) and out of prosthesis class vector +.>And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss->Expressed as:
wherein k is the number of prosthesis categories, Y s A prosthesis class label vector subjected to one-time thermal coding;
in the prosthetic identity vector And a prosthesis mode vector->Adding an angular orthogonality constraint to obtain an orthogonality loss->Expressed as:
for reconstructing real person imageReconstructing a prosthetic image->Calculate reconstruction loss with corresponding artwork>Expressed as:
for true person identity vectorProsthesis identity vector->Calculate the maximum average difference loss->Expressed as:
for reconstructing real person imageAnd reconstructing a prosthetic image->Calculating pairing loss->Expressed as:
constraint is performed by calculating the KL divergence as shown in the following formula:
wherein ,λ1 、λ 2 、λ 3 、λ 4 Representing the corresponding weight value;
employing Adam optimizers to minimize double decoupling generator loss functionsOptimizing a real encoder, a prosthetic encoder and a decoder for a target;
inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the method comprises the specific steps of inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample, wherein the specific steps comprise:
let the dimension of the hidden vector be n, from the normal distribution of the standardSampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +. >Reconstruction of a prosthetic phantomHidden vector +.>
Reconstructing true person identity hidden vectorsHiding the vector directly from the reconstructed prosthesis identity>Copying, and then generating true and false image pairs through a decoder to serve as generated samples;
cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a label-free sample and an enhanced label-free sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;
updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss;
iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel diagram of the face of the verification set to a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
Sending the RGB color channel diagram of the face of the test set to a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.
2. The living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the detector and the teacher network have the same network structure, and are provided with a convolution layer, a batch normalization layer, a ReLU, three groups of units formed by residual blocks of convolution layer stack and skip level connection, a global averaging pooling layer and a full connection layer, and the full connection layer outputs classification vectors.
3. The living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the steps of sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to a teacher learning module to obtain a teacher semi-supervised loss, a pseudo label of unlabeled data and a teacher enhanced unlabeled loss comprise:
labeled sample { x l ,y l The input parameter is theta T Teacher network T outputs teacher tagged prediction resultsWith the real label y l Calculating cross entropy to obtain the label loss of teachers >The concrete steps are as follows:
wherein CE represents cross entropy loss;
label-free sample x u Enhanced unlabeled exemplarRespectively input to teacher network T to obtainTeacher label-free prediction result T (x) u ;θ T ) And teacher enhanced label-free prediction result->Calculating cross entropy loss of the two to obtain no-label loss of teachers->The concrete steps are as follows:
label-free prediction of results T (x) from teacher u ;θ T ) Extracting the category of the maximum value as a pseudo tagThe concrete steps are as follows:
the teacher has label lossAnd teacher no tag loss->Weighted summation to obtain semi-supervision loss of teacherExpressed as:
wherein s is the current step number, s tl Lambda is the weight without label loss for the total steps;
enhancing unlabeled prediction results for teachersThe class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->Expressed as:
4. the living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the steps of sending the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to a detector learning module to update the detector parameters and obtain the detector update loss include:
the enhanced unlabeled exemplar is to be processed The feeding parameter is theta D Detector D of (1) to obtain detector-enhanced label-free prediction resultsAnd pseudo tag->Calculating cross entropy to obtain detector enhancement no tag loss +.>The concrete steps are as follows:
no tag loss of detectorOptimizing the detector by gradient descent method to obtain optimized parameter of theta' D Specifically expressed as:
labeled sample (x l ,y l ) Respectively send the parameters to be theta before optimization D The detector and optimized parameters of (2) are theta' D Obtaining a label loss from the old detectorAnd a new detector with tag loss->Then the two are differenced to obtain the detector update loss +.>The concrete steps are as follows:
5. the living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the updating of the teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss, and detector updating loss comprises the following specific steps:
loss of detector updateNo tag loss with teacher enhancement>Multiplied by the total weight and then is semi-supervised by the teacher to loseAdding to form teacher loss->The teacher network is optimized by gradient descent method, expressed as:
6. A living body detection system based on double decoupling generation and semi-supervised learning, comprising: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;
the data preprocessing module is used for extracting face area images from the input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling the false images with attack type labels;
the double decoupling generator construction module is used for constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;
The true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;
the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and residual blocks, a hidden layer true human vector and a hidden layer false vector which are output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual blocks and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer;
the double-decoupling generator training module is used for sending the true image to the true man encoder to obtain a true man identity vector, sending the paired false image to the false man encoder to obtain a false man identity vector and a false man mode vector, merging the true man identity vector, the false man identity vector and the false man mode vector, sending the merged true man identity vector, the false man identity vector and the false man mode vector to the decoder to output a reconstructed true and false image pair, and constructing a double-decoupling generation loss function to optimize;
the method comprises the specific steps of sending a true image to a true encoder to obtain a true identity vector, and sending a matched false image to a false encoder to obtain a false identity vector and a false mode vector, wherein the specific steps comprise:
the true person decoder outputs a true person hidden vector comprising a true person identity vector average value And true person identity vector variance->
The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector meanProsthesis identity vector variance->Prosthesis mode vector mean->Prosthetic mode vector variance->
Sampling real person identity variation components from standard normal distributionProsthesis identity variable component->And prosthesis pattern variable component->
Performing heavy parameter operation to obtain true identity vectorsProsthesis identity vector->And a prosthesis mode vector->The concrete steps are as follows:
to the true person identity vectorProsthesis identity vector->And a prosthesis mode vector->Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>And reconstructing a prosthetic image->The method comprises the specific steps of combining the true identity vector, the false identity vector and the false mode vector, sending the combined true identity vector, the false identity vector and the false mode vector into a true and false image pair reconstructed by the decoder output, constructing a double decoupling generation loss function, and optimizing the double decoupling generation loss function, wherein the specific steps comprise:
pattern vector of prosthesisInto full connection layer fc (·) and out of prosthesis class vector +.>And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss->Expressed as:
wherein k is the number of prosthesis categories, Y s A prosthesis class label vector subjected to one-time thermal coding;
in the prosthetic identity vector And a prosthesis mode vector->Adding an angular orthogonality constraint to obtain an orthogonality loss->Expressed as:
for reconstructing real person imageReconstructing a prosthetic image->Calculate reconstruction loss with corresponding artwork>Expressed as:
for true person identity vectorProsthesis identity vector->Calculate the maximum average difference loss->Expressed as:
for reconstructing real person imageAnd reconstructing a prosthetic image->Calculating pairing loss->Expressed as:
constraint is performed by calculating the KL divergence as shown in the following formula:
wherein ,λ1 、λ 2 、λ 3 、λ 4 Representing the corresponding weight value;
employing Adam optimizers to minimize double decoupling generator loss functionsOptimizing a real encoder, a prosthetic encoder and a decoder for a target;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the method comprises the specific steps of inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample, wherein the specific steps comprise:
let the dimension of the hidden vector be n, from the normal distribution of the standardSampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +. >And reconstructing a prosthetic mode hidden vector +.>
Reconstructing true person identity hidden vectorsHiding the vector directly from the reconstructed prosthesis identity>Copying, and then generating true and false image pairs through a decoder to serve as generated samples;
the non-supervision data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a non-label sample and an enhanced non-label sample;
the detector construction module and the teacher network construction module are respectively used for constructing a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
the detector learning module is used for sending the label sample, the enhanced label-free sample and the pseudo label of the label-free sample to the detector learning module to update the detector parameters so as to obtain the detector updating loss;
the network parameter updating module is used for updating the parameters of the teacher network by utilizing the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher and the updating loss of the detector, iteratively updating the parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
The verification module is used for sending the RGB color channel diagram of the verification set face to the trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
the testing module is used for sending the RGB color channel diagram of the face of the testing set to the trained detector to obtain classification scores, obtaining a final predicted label value according to the testing judgment threshold value, and calculating a reference index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329816.3A CN114663986B (en) | 2022-03-31 | 2022-03-31 | Living body detection method and system based on double decoupling generation and semi-supervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329816.3A CN114663986B (en) | 2022-03-31 | 2022-03-31 | Living body detection method and system based on double decoupling generation and semi-supervised learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114663986A CN114663986A (en) | 2022-06-24 |
CN114663986B true CN114663986B (en) | 2023-06-20 |
Family
ID=82033819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210329816.3A Active CN114663986B (en) | 2022-03-31 | 2022-03-31 | Living body detection method and system based on double decoupling generation and semi-supervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114663986B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311605B (en) * | 2022-09-29 | 2023-01-03 | 山东大学 | Semi-supervised video classification method and system based on neighbor consistency and contrast learning |
CN116152885B (en) * | 2022-12-02 | 2023-08-01 | 南昌大学 | Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460931A (en) * | 2020-03-17 | 2020-07-28 | 华南理工大学 | Face spoofing detection method and system based on color channel difference image characteristics |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753595A (en) * | 2019-03-29 | 2020-10-09 | 北京市商汤科技开发有限公司 | Living body detection method and apparatus, device, and storage medium |
CN111222434A (en) * | 2019-12-30 | 2020-06-02 | 深圳市爱协生科技有限公司 | Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning |
CN114067444A (en) * | 2021-10-12 | 2022-02-18 | 中新国际联合研究院 | Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature |
-
2022
- 2022-03-31 CN CN202210329816.3A patent/CN114663986B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460931A (en) * | 2020-03-17 | 2020-07-28 | 华南理工大学 | Face spoofing detection method and system based on color channel difference image characteristics |
Also Published As
Publication number | Publication date |
---|---|
CN114663986A (en) | 2022-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109583342B (en) | Human face living body detection method based on transfer learning | |
Cheng et al. | Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN111709408B (en) | Image authenticity detection method and device | |
CN111460931B (en) | Face spoofing detection method and system based on color channel difference image characteristics | |
CN114663986B (en) | Living body detection method and system based on double decoupling generation and semi-supervised learning | |
US20230021661A1 (en) | Forgery detection of face image | |
CN114783003B (en) | Pedestrian re-identification method and device based on local feature attention | |
CN110516616A (en) | A kind of double authentication face method for anti-counterfeit based on extensive RGB and near-infrared data set | |
CN114067444A (en) | Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature | |
Muhammad et al. | Self-supervised 2d face presentation attack detection via temporal sequence sampling | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
Chen et al. | SNIS: A signal noise separation-based network for post-processed image forgery detection | |
CN112418041A (en) | Multi-pose face recognition method based on face orthogonalization | |
Trinh et al. | An examination of fairness of ai models for deepfake detection | |
CN115171047A (en) | Fire image detection method based on lightweight long-short distance attention transformer network | |
CN111476727B (en) | Video motion enhancement method for face-changing video detection | |
CN114677722A (en) | Multi-supervision human face in-vivo detection method integrating multi-scale features | |
Saealal et al. | Three-Dimensional Convolutional Approaches for the Verification of Deepfake Videos: The Effect of Image Depth Size on Authentication Performance | |
CN113887573A (en) | Human face forgery detection method based on visual converter | |
CN116188439A (en) | False face-changing image detection method and device based on identity recognition probability distribution | |
Cai et al. | Face anti-spoofing via conditional adversarial domain generalization | |
Patel et al. | An optimized convolution neural network based inter-frame forgery detection model—a multi-feature extraction framework | |
CN114693607A (en) | Method and system for detecting tampered video based on multi-domain block feature marker point registration | |
Zhang et al. | A multi-scale noise-resistant feature adaptation approach for image tampering localization over Facebook |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |