CN114663986A - In-vivo detection method and system based on double-decoupling generation and semi-supervised learning - Google Patents

In-vivo detection method and system based on double-decoupling generation and semi-supervised learning Download PDF

Info

Publication number
CN114663986A
CN114663986A CN202210329816.3A CN202210329816A CN114663986A CN 114663986 A CN114663986 A CN 114663986A CN 202210329816 A CN202210329816 A CN 202210329816A CN 114663986 A CN114663986 A CN 114663986A
Authority
CN
China
Prior art keywords
prosthesis
teacher
loss
detector
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210329816.3A
Other languages
Chinese (zh)
Other versions
CN114663986B (en
Inventor
冯浩宇
胡永健
刘琲贝
余翔宇
葛治中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210329816.3A priority Critical patent/CN114663986B/en
Publication of CN114663986A publication Critical patent/CN114663986A/en
Application granted granted Critical
Publication of CN114663986B publication Critical patent/CN114663986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a living body detection method and a living body detection system based on double decoupling generation and semi-supervised learning, wherein the method comprises the following steps: preprocessing data to obtain an original sample of the RGB color channel image, and pairing the same identity to obtain a true and false image pair; the real person encoder outputs a real person identity vector, the prosthesis encoder outputs a prosthesis identity vector and a prosthesis mode vector, the three are combined and sent to the decoder to obtain a reconstructed true and false image pair, a double decoupling generation loss function is constructed, and noise is sent to the trained decoder to obtain a generated sample; constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample for the original sample and the generated sample, sending the labeled sample, the non-labeled sample and the enhanced non-labeled sample to a teacher learning module to obtain semi-supervised loss of a teacher, a pseudo label of the non-labeled sample and enhanced non-labeled loss, and updating network parameters of a detector and the teacher; determining a threshold value by using the verification set; and loading the test data to a detector to obtain a classification score, and judging a classification result according to a threshold value. The invention can improve the robustness of the in-vivo detection model.

Description

In-vivo detection method and system based on double-decoupling generation and semi-supervised learning
Technical Field
The invention relates to the technical field of face recognition anti-cheating detection, in particular to a living body detection method and system based on double decoupling generation and semi-supervised learning.
Background
Today, there is a dramatic increase in the business and industry in the context of using facial biometric technology, for example, facial unlocking technology may be used to protect personal privacy in electronic devices, and facial biometric technology may be used to authenticate payments. However, using the face as a biometric feature for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally classified into four categories: 1) the photo attack is that an attacker deceives the authentication system by using a photo printed or displayed on a screen; 2) video replay attack, wherein an attacker utilizes a video deception authentication system of an attacker shot in advance; 3) the face mask attacks, and an attacker wears a face mask deception system elaborately manufactured according to the attacked; 4) and against sample attack, an attacker generates specific sample noise through a GAN network to interfere the face authentication system to generate wrong directional identity verification. These face spoofing attacks are not only low cost but also can spoof the system, severely impacting and threatening the application of the face recognition system.
The in-vivo detection plays a crucial role in preventing a face recognition system from being attacked by a prosthesis, and benefits from the strong feature extraction capability of a deep learning network, the in-vivo detection algorithm based on deep learning obtains better performance than the in-vivo detection algorithm based on the traditional manual feature algorithm, although most in-vivo detection algorithms based on deep learning can achieve good detection effect in a library, the cross-library detection performance is poor, and the main reason is that data inside and outside the library are collected under different conditions, for example, shooting equipment, ambient light and attack presentation equipment are different, so that the data inside and outside the library are distributed differently, and domain displacement exists between the two. When the diversity of the training data is insufficient, overfitting is easy to occur in the in-library learning process, and the cross-library generalization performance is poor. Although the cause can be judged, the problem is not easy to solve in the real world application process, and the living body detection model is difficult to collect labeled training samples in all scenes, so that most of the existing anti-spoofing data sets at present have insufficient diversity. For example, the common CASIA, Replay-attach and msu shooting devices are respectively 3, 2 and 2, and the shooting backgrounds are respectively 3, 2 and 1.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which is characterized in that living body/prosthesis characteristics are modeled through decoupling learning, a generated sample is synthesized in a potential space to expand a data set, the diversity of training data is improved, meanwhile, the high discrimination of the generated sample is ensured, and the influence of generated noise on model learning is reduced.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which comprises the following steps:
picking up a face region image from an input image to obtain an RGB color channel image;
matching each true image of an original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
constructing a real person encoder, a prosthesis encoder and a decoder;
sending the true image into a true encoder to obtain a true identity vector, sending the matched false image into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the true identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined true identity vector, the prosthesis identity vector and the prosthesis mode vector into a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function to optimize;
inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;
cutting a training set consisting of an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;
updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss;
iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing detection rate, and taking the thresholds with the same value as a test judgment threshold;
and sending the RGB color channel map of the face of the test set into a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.
As a preferred technical solution, the construction of the real-person encoder, the prosthetic encoder, and the decoder specifically includes the steps of:
the backbone networks of the real person encoder and the prosthesis encoder adopt the same network structure, and are provided with a convolutional layer, an example normalization layer, a LeakyReLU and five groups of units consisting of residual blocks which are stacked by the pooling layer and the convolutional layer and are connected in a skipping level mode;
the real person encoder outputs a hidden layer real person vector through the full connecting layer, and the prosthesis encoder outputs a hidden layer prosthesis vector through the full connecting layer;
the decoder backbone network is built by a convolutional layer, an upper sampling layer, a Sigmoid active layer and a residual block, hidden layer real man vectors and hidden layer prosthesis vectors output by a real man encoder and a prosthesis encoder are input into a full connection layer firstly, pass through six groups of units formed by residual blocks and upper sampling layers which are stacked by the convolutional layer and connected in a skipping stage, and finally pass through the convolutional layer to output an image pair.
As a preferred technical scheme, the real image is sent into a real person encoder to obtain a real person identity vector, and the matched false image is sent into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, and the method specifically comprises the following steps:
the human decoder outputs human hidden vectors including human identity vector mean
Figure BDA0003574807620000041
And true person identity vector variance
Figure BDA0003574807620000042
The prosthesis encoder outputs a prosthesis hidden vector comprising a prosthesis identity vector mean
Figure BDA0003574807620000043
Variance of identity vector of prosthesis
Figure BDA0003574807620000044
Mean of prosthesis mode vector
Figure BDA0003574807620000045
Variance of prosthesis mode vector
Figure BDA0003574807620000046
Sampling real person identity variation component from standard normal distribution
Figure BDA0003574807620000047
Identity variation of prosthesis
Figure BDA0003574807620000048
And a prosthetic mode variation component
Figure BDA0003574807620000049
Performing repeated parameter operation to respectively obtain real person identity vectors
Figure BDA00035748076200000410
Prosthesis identity vector
Figure BDA00035748076200000411
And a prosthesis mode vector
Figure BDA00035748076200000412
The concrete expression is as follows:
Figure BDA00035748076200000413
vector true person identity
Figure BDA00035748076200000414
Prosthesis identity vector
Figure BDA00035748076200000415
And a prosthesis mode vector
Figure BDA00035748076200000416
Merging into a hidden vector, inputting to a decoder, outputtingGenerating a reconstructed true-false image pair, including a reconstructed true-human image
Figure BDA00035748076200000417
And reconstructing the prosthesis image
Figure BDA00035748076200000418
As a preferred technical scheme, the method comprises the steps of merging a human identity vector, a prosthesis identity vector and a prosthesis mode vector, sending the merged human identity vector, prosthesis identity vector and prosthesis mode vector to a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function for optimization, wherein the method comprises the following specific steps:
vector of prosthesis mode
Figure BDA00035748076200000419
Sending into the full connection layer fc (·), and outputting the prosthesis category vector
Figure BDA00035748076200000420
And with the actual prosthesis class label ysCalculating cross entropy to obtain classification loss
Figure BDA00035748076200000421
Expressed as:
Figure BDA00035748076200000422
wherein k is the number of prosthesis classes, YsLabeling vectors for the one-hot coded prosthesis class;
identity vector in prosthesis
Figure BDA0003574807620000051
And a prosthesis mode vector
Figure BDA0003574807620000052
Adding angular quadrature constraints to obtain quadrature losses
Figure BDA0003574807620000053
Expressed as:
Figure BDA0003574807620000054
wherein <, > represents the inner product;
for reconstructing real person image
Figure BDA0003574807620000055
Reconstructing a prosthetic image
Figure BDA0003574807620000056
Calculating reconstruction losses with corresponding original images
Figure BDA0003574807620000057
Expressed as:
Figure BDA0003574807620000058
to real person identity vector
Figure BDA0003574807620000059
Prosthesis identity vector
Figure BDA00035748076200000510
Calculating the maximum average differential loss
Figure BDA00035748076200000511
Expressed as:
Figure BDA00035748076200000512
for reconstructing real person image
Figure BDA00035748076200000513
And reconstructing the prosthesis image
Figure BDA00035748076200000514
Calculating pairing loss
Figure BDA00035748076200000515
Expressed as:
Figure BDA00035748076200000516
constrained by calculating the KL divergence, as shown in the following equation:
Figure BDA00035748076200000517
wherein ,
Figure BDA00035748076200000518
n is the dimension of the hidden vector;
weighting and summing losses to obtain loss function of double decoupling generator
Figure BDA00035748076200000519
Expressed as:
Figure BDA00035748076200000520
wherein ,λ1、λ2、λ3、λ4Represents a corresponding weight value;
employing Adam optimizer to minimize double decoupling generator loss function
Figure BDA00035748076200000521
A real-man encoder, a prosthetic encoder and a decoder are optimized for the target.
As a preferred technical solution, the method for obtaining the generated samples from the standard normal distribution sampling noise input to the trained decoder comprises the following specific steps:
let the hidden vector dimension be n, from a standard normal distribution
Figure BDA00035748076200000613
Sampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectors
Figure BDA0003574807620000061
And reconstructing the prosthesis pattern hidden vector
Figure BDA0003574807620000062
Reconstructing a real identity hidden vector
Figure BDA0003574807620000063
Concealing vectors directly from reconstructed prosthesis identity
Figure BDA0003574807620000064
The true and false image pairs are generated through a decoder as the generation samples.
The detector and the teacher network have the same network structure, and are provided with a convolutional layer, a batch normalization layer, a ReLU, three units consisting of residual blocks formed by convolutional layer stacking and skip level connection, a global average pooling layer and a full connection layer, and the full connection layer outputs a classification vector.
As a preferred technical scheme, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss, and the method specifically comprises the following steps:
labeled exemplars { x }l,ylThe input parameter is thetaTThe teacher network T outputs a teacher labeled prediction result T (x)l;θT) And with the genuine label ylCalculating cross entropy to obtain teacher labeled loss
Figure BDA0003574807620000065
The concrete expression is as follows:
Figure BDA0003574807620000066
where CE represents cross entropy loss;
unlabeled sample xuAnd enhanced unlabeled samples
Figure BDA0003574807620000067
Respectively inputting the teacher network T to obtain a teacher unlabelled prediction result T (x)u;θT) And teacher enhanced unlabeled prediction results
Figure BDA0003574807620000068
Calculating the cross entropy loss of the two to obtain the label-free loss of the teacher
Figure BDA0003574807620000069
The concrete expression is as follows:
Figure BDA00035748076200000610
predicting result T (x) from teacher without labelu;θT) The category to which the maximum value belongs is taken out as a false label
Figure BDA00035748076200000611
The concrete expression is as follows:
Figure BDA00035748076200000612
lose teachers with labels
Figure BDA0003574807620000071
No label loss with teacher
Figure BDA0003574807620000072
Weighted summation to obtain semi-supervised loss of teacher
Figure BDA0003574807620000073
Expressed as:
Figure BDA0003574807620000074
wherein s is the current step number stlLambda is the weight of no tag loss for the total number of steps;
enhancing unlabeled prediction results for teachers
Figure BDA0003574807620000075
Obtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongs
Figure BDA0003574807620000076
Expressed as:
Figure BDA0003574807620000077
Figure BDA0003574807620000078
as a preferred technical solution, the method for updating the detector parameter by sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample to the detector learning module to obtain the detector update loss includes the following specific steps:
the reinforced label-free sample
Figure BDA0003574807620000079
The feeding parameter is thetaDDetector D to obtain detector enhanced label-free prediction results
Figure BDA00035748076200000710
And a pseudo tag
Figure BDA00035748076200000711
Calculating cross entropy to obtain detector enhanced label-free loss
Figure BDA00035748076200000712
The concrete expression is as follows:
Figure BDA00035748076200000713
loss of label for detector
Figure BDA00035748076200000714
Optimizing the detector by using a gradient descent method to obtain an optimized parameter theta'DIs specifically represented as:
Figure BDA00035748076200000715
wherein ,ηDWhich represents the learning rate of the detector,
Figure BDA00035748076200000716
representing a gradient calculation;
labeled sample (x)l,yl) Respectively fed into the optimization parameters of thetaDIs theta 'and the optimized parameter is'DObtaining an old detector with a loss of label
Figure BDA00035748076200000717
And loss of label to the new detector
Figure BDA00035748076200000718
Then the two are differenced to obtain the update loss of the detector
Figure BDA00035748076200000719
The concrete expression is as follows:
Figure BDA00035748076200000720
Figure BDA00035748076200000721
Figure BDA0003574807620000081
as a preferred technical scheme, the method for updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector update loss comprises the following specific steps:
loss of detector updates
Figure BDA0003574807620000082
Enhancing label-free loss with teachers
Figure BDA0003574807620000083
Multiply and then partially monitor the loss with the teacher
Figure BDA0003574807620000084
Add up to form teacher's loss
Figure BDA0003574807620000085
The teacher network is optimized by a gradient descent method, and the method is represented as follows:
Figure BDA0003574807620000086
Figure BDA0003574807620000087
wherein ,θTRepresenting a parameter, theta ', before optimization of the teacher network'TRepresenting the teacher's network optimized parameter, ηTThe network learning rate of the teacher is shown,
Figure BDA0003574807620000088
representing the gradient calculation.
The invention also provides a living body detection system based on double decoupling generation and semi-supervised learning, which comprises: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;
the data preprocessing module is used for matting human face region images from input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
the double decoupling generator building module is used for building a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the double decoupling generator training module is used for sending the real image into the real encoder to obtain a real person identity vector, sending the matched false image into the prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the real person identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined real person identity vector, the prosthesis identity vector and the prosthesis mode vector into the decoder to output a reconstructed real and false image pair, and constructing a double decoupling generation loss function for optimization;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the unsupervised data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
the detector learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples to the detector learning module to update the detector parameters to obtain the update loss of the detector;
the network parameter updating module is used for updating teacher network parameters by utilizing teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss, iteratively updating parameters of the detector and the teacher network by using an optimizer according to a loss function, and storing the parameters of the teacher network and the detector after training is finished;
the verification module is used for sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing rate, and taking the threshold with the same value as a test judgment threshold;
the test module is used for sending the RGB color channel map of the face of the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) in the data generation stage, the invention adopts a real-person encoder, a prosthesis encoder and a decoder to construct a double-decoupling generator, inputs a true-false image pair of an original sample to train the double-decoupling generator, then inputs standard normal distribution sampling noise to the trained decoder in the double-decoupling generator to obtain a generated sample, and the generated sample is used as a part of unlabeled data to enrich the diversity of training data and solve the problem of insufficient diversity of the training data.
(2) In the training stage, a semi-supervised learning framework that a teacher generates pseudo labels and a detector feeds back is adopted, specifically, a teacher network provides the detector with the pseudo labels without label data to supervise the detector learning, the detector parameters are updated and then the performance is evaluated on the labeled data, and the generated pseudo labels are optimized by feeding back loss to the teacher network, so that the problems of model training under the condition of limited labeled training data and label uncertainty caused by fuzzy generated samples due to the defects of a variational self-encoder are solved; and high-discrimination characteristics of the image blocks are mined, and the learning capability of the network is improved, so that unseen sample acquisition environments can be better generalized.
(3) In the detection stage, the model tests and loads data to the detector to obtain corresponding classification scores, and the classification result is judged according to the threshold value.
Drawings
FIG. 1 is a schematic flow chart of the in-vivo detection method based on double-decoupling generation and semi-supervised learning according to the present invention;
FIG. 2 is a schematic diagram of a network structure of a real human encoder and a prosthesis encoder according to the present invention;
FIG. 3 is a schematic diagram of a network structure of a decoder according to the present invention;
FIG. 4 is a schematic diagram of the training phase of the dual de-coupling generator of the present invention;
FIG. 5 is a schematic flow chart of a generation phase of the dual decoupling generator of the present invention;
FIG. 6(a) is a schematic diagram of a live image generated by a dual decoupling generator of the present invention;
FIG. 6(b) is a schematic diagram of an image of a prosthesis generated by a dual decoupling generator of the present invention;
FIG. 7 is a schematic diagram of a network architecture of a detector and teacher network according to the present invention;
FIG. 8 is an overall framework diagram of semi-supervised learning of the present invention;
FIG. 9 is a block diagram of the living body detection system based on the dual decoupling generation and the semi-supervised learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment uses the Replay-attach, CASIA-MFSD and MSU _ MFSD biopsy data sets for training and testing as examples, and the implementation process of the embodiment is described in detail. The Replay-attach data set comprises 1200 videos, real human faces from 50 testers and generated deceptive human faces are collected by using a MacBook camera with the resolution of 320 x 240 pixels, and the real human faces and the deceptive human faces are divided into a training set, a verification set and a test set according to the ratio of 3:3: 4; the CASIA-MFSD data set comprises 600 videos, real faces from 50 testers and deceptive faces generated according to the real faces are collected by three cameras with the resolutions of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels respectively, and the videos are divided into a training set and a testing set according to the ratio of 2: 3; the MSU _ MFSD data set includes 280 videos, with real faces from 35 testers and spoofed faces generated therefrom, 15 for the training set and 20 for the testing set. Since the CASIA-MFSD and MSU _ MFSD live test datasets do not contain a validation set, the present embodiment performs threshold determination using the corresponding test set as the validation set for both datasets. And then framing the video of the data set to obtain a picture. The embodiment is carried out on a Linux system and is mainly implemented on the basis of a deep learning framework Pytrich1.6.1, and the used display cards are GTX1080Ti, CUDA version 10.1.105 and cudnn version 7.6.4.
As shown in fig. 1, the present embodiment provides a living body detection method based on double-decoupling generation and semi-supervised learning, including the following steps:
s1: picking up a face region image from an input image to obtain an RGB color channel image;
in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue. Then, matching each true image of the original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
s2: constructing a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
in this embodiment, as shown in fig. 2, the respective backbone networks of the real encoder and the prosthetic encoder have the same network structure, and the input size is H × W × 3, and the input size is H × W × 32 after passing through the convolutional layer, the example normalization layer, the leakage relu, and five groups of pooling layers,The unit composed of residual blocks with the convolution layer stacking and jump level connection outputs 1/2, 1/4, 1/8, 1/16 and 1/32 with the original sizes respectively, the channel numbers are 64, 128, 256, 512 and 512 respectively, and the obtained unit has the size of
Figure BDA0003574807620000121
The feature vector of (2). After the output characteristics of the backbone network are obtained, the real man encoder outputs hidden layer real man vectors with the size of 2 x hdim through the full connection layer; the prosthesis encoder outputs a hidden layer prosthesis vector with a size of 4 × hdim through the full connected layer.
As shown in fig. 3, the decoder backbone network is built with convolutional layers, upsampling layers, Sigmoid active layers, and residual blocks. The input size is 3 × hdim, and first the input full link layer size is changed to
Figure BDA0003574807620000122
Then, through six groups of units consisting of residual blocks and up-sampling layers which are stacked and connected in a jump stage by convolutional layers, 1/32, 1/16, 1/8, 1/4, 1/2 and 1 with the original sizes are output, the channel numbers are 512, 256, 128 and 64 respectively, and finally, through convolutional layers, an image pair with the size of H multiplied by W multiplied by 6 is output.
S3: constructing a double decoupling generation module, wherein the double decoupling generation module consists of a double decoupling generator and a double decoupling generator loss function, the double decoupling generator consists of a real person encoder, a prosthesis encoder and a decoder, a real image is sent into the real person encoder to obtain a real person identity vector, a matched false image is sent into the prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, the three are combined and sent into the decoder to output a reconstructed real and false image pair, and the double decoupling generation loss function is constructed to optimize;
as shown in fig. 4, a true image and a false image, each having dimensions H × W × 3, are input to the true human decoder and the false human encoder, respectively. Setting hidden vector dimension as n, real person decoder obtains real person hidden vector with dimension 2 Xn including real person identity vector mean value with dimension n
Figure BDA0003574807620000123
And variance of identity vector of real person
Figure BDA0003574807620000124
The prosthesis encoder obtains a prosthesis hidden vector with dimension 4 x n, which comprises a prosthesis identity vector mean with dimension n
Figure BDA0003574807620000125
Variance of identity vector of prosthesis
Figure BDA0003574807620000126
Mean of prosthesis mode vector
Figure BDA0003574807620000127
Variance of prosthesis mode vector
Figure BDA0003574807620000128
Then sampling the real person identity variation component from the standard normal distribution
Figure BDA0003574807620000129
Identity variation of prosthesis
Figure BDA00035748076200001210
And a prosthesis mode variation component
Figure BDA00035748076200001211
Carrying out the heavy parameter operation shown as the following formula to respectively obtain the identity vectors of real persons
Figure BDA00035748076200001212
Prosthesis identity vector
Figure BDA0003574807620000131
And a prosthesis mode vector
Figure BDA0003574807620000132
Figure BDA0003574807620000133
Real person identity vector
Figure BDA0003574807620000134
Prosthesis identity vector
Figure BDA0003574807620000135
And a prosthesis mode vector
Figure BDA0003574807620000136
Combining into hidden vectors with dimension of 3 Xn, inputting into decoder, outputting reconstructed true and false image pairs with dimension of H × W × 6, wherein the reconstructed true and false image pairs include reconstructed true human image with dimension of H × W × 3
Figure BDA0003574807620000137
And reconstructing the prosthesis image
Figure BDA0003574807620000138
To better learn the prosthesis patterns of different attack patterns, a prosthesis pattern vector is used
Figure BDA0003574807620000139
Sending into the full connection layer fc (·), and outputting the prosthesis category vector
Figure BDA00035748076200001310
And with the actual prosthesis class label ysCalculating cross entropy to obtain classification loss
Figure BDA00035748076200001311
As shown in the following formula:
Figure BDA00035748076200001312
wherein k is the number of prosthesis classes, YsThe vectors are labeled for the one-hot coded prosthesis class.
To make the prosthesis identity vector
Figure BDA00035748076200001313
And a prosthesis mode vector
Figure BDA00035748076200001314
Effective separation is carried out, and the orthogonal loss is obtained by adding angle orthogonal constraint between the two
Figure BDA00035748076200001315
As shown in the following formula:
Figure BDA00035748076200001316
wherein <, > represents the inner product.
For reconstructing real person image
Figure BDA00035748076200001317
And reconstructing the prosthesis image
Figure BDA00035748076200001318
Calculating reconstruction losses with corresponding original images
Figure BDA00035748076200001319
As shown in the following formula:
Figure BDA00035748076200001320
for keeping identity consistency of hidden vectors, a true person identity vector with dimension n is subjected
Figure BDA00035748076200001321
Prosthesis identity vector
Figure BDA00035748076200001322
Calculating the maximum average differential loss
Figure BDA00035748076200001323
As shown in the following formula:
Figure BDA00035748076200001324
in order to maintain identity consistency of reconstructed images, reconstructed real person images
Figure BDA0003574807620000141
And reconstructing the prosthesis image
Figure BDA0003574807620000142
Calculating pairing loss
Figure BDA0003574807620000143
As shown in the following formula:
Figure BDA0003574807620000144
to fit the distribution of the hidden vectors to a standard normal distribution, the constraint is performed by calculating the Kullback-Leibler divergence (KL divergence), as shown in the following equation:
Figure BDA0003574807620000145
wherein
Figure BDA0003574807620000146
Where n is the dimension of the concealment vector.
Finally, weighting and summing the losses to obtain a loss function of the double decoupling generator
Figure BDA0003574807620000147
As shown in the following formula:
Figure BDA0003574807620000148
wherein λ1、λ2、λ3、λ4The weights are weighted, and the optimal values are 10, 0.5, 0.1 and 1, respectively.
Employing Adam optimizer to minimize double decoupling generator loss function
Figure BDA0003574807620000149
And (3) optimizing the real-person encoder, the prosthesis encoder and the decoder for the target, and iteratively training T generations, wherein the optimal value of T is 200. The parameter update formula is as follows:
mt+1=β1mt+(1-β1)g
vt+1=β2vt+(1-β2)g2
Figure BDA00035748076200001410
wherein β1,β2Optimally set to 0.9, 0.999, the parameter epsilon preventing divide by zero is optimally set to 1e-8, the learning rate eta is optimally set to 0.0002, thetatRepresenting the parameters of the t-th iteration.
S4: inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;
as shown in FIG. 5, FIG. 6(a) and FIG. 6(b), the hidden vector dimension is n and is normally distributed from the norm
Figure BDA00035748076200001411
Sampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectors
Figure BDA00035748076200001412
And reconstructing the prosthesis pattern hidden vector
Figure BDA00035748076200001413
In order to ensure identity consistency of true and false images, a human identity hidden vector is reconstructed
Figure BDA00035748076200001414
Concealing vectors directly from reconstructed prosthesis identity
Figure BDA00035748076200001415
After copying, the true and false image pairs can be generated by the decoder as the generation samples, as shown in fig. 6(a) and 6(b), the true and false image pairs generated by the dual decoupling generator are obtained, the number n of the dimension in this embodiment is optimally 128, and the number of the generation samples is optimally 6400.
S5: cutting a training set consisting of an original sample and a generated sample, dividing the training set into a labeled sample and a non-labeled sample, and performing random data enhancement on the non-labeled sample to obtain an enhanced non-labeled sample;
in this embodiment, first, a piece of RGB color channel map of size H × W to be trained is randomly cut out to have a size of H × W
Figure BDA0003574807620000151
And in the image block, N original samples to be trained are randomly selected from the original samples to be trained to serve as labeled data, the rest samples and the generated samples jointly form unlabeled data, and the number ratio of the labeled samples to the unlabeled samples is mu. And then random data enhancement is carried out on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, histogram equalization, inverting, randomly rotating pixel values with the lowest bit positions of 0-4, horizontally and vertically shearing, horizontally translating, vertically translating, inverting all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for enhancement of each sample. The preferred values of H, W, N and mu in the embodiment are 256, 6000 and 4;
s6: constructing a detector and teacher network;
as shown in fig. 7, the teacher network and the detector have the same network structure, and the input size is H × W × 3, and the output size is 1, 1/2, 1/4 of the original size through the convolutional layer, the batch normalization layer, the ReLU to H × W × 16, and the unit composed of three sets of residual blocks stacked and connected by skip level, respectivelyThe number of tracks is 32, 64, 128, respectively, giving a size of
Figure BDA0003574807620000152
The feature vector is changed into 1 multiplied by 128 through the size of the global average pooling layer, and finally a classification vector with dimension of 2 is output through the full connection layer.
S7: a teacher learning module is constructed, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to the teacher learning module, and the semi-supervised loss of the teacher, the pseudo label of the unlabeled data and the enhanced unlabeled loss are obtained:
as shown in FIG. 8, first agree that CE (q, p) represents the cross entropy loss of two distributions q and p; if p is a label value, the one-hot encoding is firstly carried out to change the pseudo label vector. Let k be the number of label classes, the cross entropy loss is expressed as:
Figure BDA0003574807620000161
secondly, appointing argmax (v) to represent the index where the maximum value in the vector v is taken out;
for labeled samples xl,ylSending a parameter thetaTThe teacher network T outputs a teacher labeled prediction result T (x)l;θT) And with the genuine label ylCalculating cross entropy to obtain teacher labeled loss
Figure BDA0003574807620000162
Represented by the following formula.
Figure BDA0003574807620000163
For unlabeled sample xuAnd the enhanced unlabeled sample obtained by one-time data enhancement
Figure BDA0003574807620000164
Respectively sent to a teacher network T to obtain a teacher label-free prediction knotFruit T (x)u;θT) And teacher enhanced unlabeled prediction results
Figure BDA0003574807620000165
Calculating the cross entropy loss of the two results to obtain the teacher label-free loss
Figure BDA0003574807620000166
Teacher unlabeled prediction result T (x)u;θT) The category to which the maximum value belongs is taken out as a false label
Figure BDA0003574807620000167
Represented by the following formula:
Figure BDA0003574807620000168
Figure BDA0003574807620000169
then the teacher is marked for loss
Figure BDA00035748076200001610
No label loss with teacher
Figure BDA00035748076200001611
Weighted summation to obtain semi-supervised loss of teacher
Figure BDA00035748076200001612
Represented by the following formula.
Figure BDA00035748076200001613
Where s is the current step number, stlλ is the weight of the label-free penalty for the total number of steps.
Enhancing unlabeled prediction results for teachers
Figure BDA00035748076200001614
Obtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongs
Figure BDA00035748076200001615
Represented by the following formula.
Figure BDA00035748076200001616
Figure BDA00035748076200001617
S8: constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;
as shown in fig. 8, the enhanced unlabeled specimen
Figure BDA00035748076200001618
The feeding parameter is thetaDDetector D to obtain detector enhanced label-free prediction results
Figure BDA00035748076200001619
And a pseudo tag
Figure BDA00035748076200001620
Calculating cross entropy to obtain detector enhanced label-free loss
Figure BDA00035748076200001621
Represented by the following formula:
Figure BDA0003574807620000171
let the learning rate of the detector be etaDAccording to detector loss without label
Figure BDA0003574807620000172
Optimizing the detector by using a gradient descent method to obtain an optimized parameter theta'DThe detector of (a), is represented by the following formula:
Figure BDA0003574807620000173
will have a sample (x) labeledl,yl) Respectively fed into the optimization parameters of thetaDIs θ 'and the optimized parameter'DObtaining an old detector with a loss of label
Figure BDA0003574807620000174
And loss of label to the new detector
Figure BDA0003574807620000175
Then the difference is made between the two to obtain the update loss of the detector
Figure BDA0003574807620000176
Represented by the following formula.
Figure BDA0003574807620000177
Figure BDA0003574807620000178
Figure BDA0003574807620000179
S9: updating teacher network parameters by utilizing teacher semi-supervision loss, enhanced label-free loss and detector updating loss:
as shown in FIG. 8, loss of detector update
Figure BDA00035748076200001710
Strengthen no label with teacherLoss of power
Figure BDA00035748076200001711
Multiply and then with teacher's semi-supervised loss
Figure BDA00035748076200001712
Add up to form teacher's loss
Figure BDA00035748076200001713
Let the teacher's network learning rate be etaTAnd optimizing the teacher network by using a gradient descent method, wherein the formula is as follows:
Figure BDA00035748076200001714
Figure BDA00035748076200001715
s10: iteratively updating the network parameters of the detector and the teacher network by using an optimizer according to the loss function, storing the parameters of the teacher network and the detector after training is completed,
Figure BDA00035748076200001716
represents the gradient calculation:
in this embodiment, both the teacher network and the detector use SGD optimizers with nersterov momentum, where the momentum is μ, the preferred value is 0.9, and the initial value of the learning rate is ε0The optimal value is 0.05, and the learning rate is attenuated along with the training iteration times;
enhancing label-free loss from detectors in a detector learning module
Figure BDA00035748076200001717
Optimizing with a detector optimizer with the goal of minimizing losses; based on teacher loss in teacher update module
Figure BDA00035748076200001718
Optimizer using teacher networkOptimization is performed with the goal of minimizing losses.
S11: determining a threshold value by using the verification set;
in this embodiment, the specific steps include: sending the face RGB color channel map of the verification set into a detector to obtain a classification score p, then carrying out equal-interval sampling in a value range (0,1) to obtain different judgment thresholds, obtaining a predicted label value according to the threshold, comparing the predicted label value with a real label, calculating a false alarm rate and a false omission rate, and taking the threshold with the same value as a test judgment threshold T;
s12: testing the model;
in this embodiment, the specific steps include: and sending the RGB color channel map of the face in the test set into a detector to obtain a classification score p, then obtaining a final predicted label value according to a test judgment threshold T, and calculating a reference index.
The performance evaluation indexes of the living body detection algorithm in this embodiment adopt a False Acceptance Rate (FAR), a False Rejection Rate (FRR), a True Acceptance Rate (TAR), an Equal Error Rate (EER), a Half Error Rate (Half Total Error Rate, hter) and the confusion matrix of table 1 to describe the indexes in detail:
table 1 confusion matrix table
Tagging/prediction The prediction is true Prediction of false
The label is true TA FR
The label is false FA TR
The False Acceptance Rate (FAR) refers to the ratio of the number of live faces determined by the non-live faces to the number of non-live faces labeled:
Figure BDA0003574807620000181
the False Rejection Rate (FRR) is the ratio of the number of non-live faces determined by live faces to the number of live faces labeled:
Figure BDA0003574807620000182
the correct acceptance rate (TAR) is the ratio of the number of live faces determined by the live faces to the number of live faces labeled:
Figure BDA0003574807620000191
equal Error Rate (EER) is the error rate when FRR and FAR are equal;
half error rate (HTER) is the mean of FRR and FAR:
Figure BDA0003574807620000192
in order to prove the effectiveness of the invention and test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on CASIA-MFSD, Replay-Attack and MSU-MFSD databases. The in-library and cross-library experimental results are shown in tables 2 and 3, respectively:
table 2 library of experimental results
Figure BDA0003574807620000193
TABLE 3 Cross-Bank Experimental results
Figure BDA0003574807620000194
As can be seen from Table 2, the half error rate and the equal error rate of the method in the library are both low, and the library has excellent performance of deception detection; as can be seen from Table 3, the half-error rate of cross-bank detection is also low; the training set is composed of labeled samples and unlabeled samples of a small number of frames extracted from each section of training set video, but the diversity of training data is enriched by generating data obtained through a double-decoupling generator, and the learning capability and model heuristic capability of limited sample characteristics are improved by training the model progressively through meta-learning. The experimental results prove that under the condition that the labeled training samples are insufficient, the high accuracy in the library is ensured, the cross-library error rate is greatly reduced, and the generalization performance is obviously improved.
As shown in fig. 9, the present embodiment further provides a living body detection system based on double-decoupling generation and semi-supervised learning, including: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;
in this embodiment, the data preprocessing module is configured to extract a face region image from an input image to obtain an RGB color channel image, pair each true image of an original sample of the RGB color channel image to be trained with a false image of the same identity to form a true-false image pair, and label an attack category label for the false image;
in this embodiment, the dual decoupling generator building module is used for building a real encoder, a prosthetic encoder and a decoder to form a dual decoupling generator;
in this embodiment, the dual decoupling generator training module is configured to send a true image to a true encoder to obtain a true identity vector, send a matched false image to a false encoder to obtain a false identity vector and a false pattern vector, combine the true identity vector, the false identity vector and the false pattern vector, send the combined true identity vector, false identity vector and false pattern vector to a decoder to output a reconstructed true and false image pair, and construct a dual decoupling generation loss function for optimization;
in this embodiment, the generated sample construction module is configured to input standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
in this embodiment, the unsupervised data enhancement module is configured to crop a training set formed by an original sample and a generated sample, and construct a labeled sample, a unlabeled sample, and an enhanced unlabeled sample;
in this embodiment, the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;
in this embodiment, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of the unlabeled data, and enhanced unlabeled loss of the teacher;
in this embodiment, the detector learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the detector learning module to update the detector parameters, so as to obtain an update loss of the detector;
in this embodiment, the network parameter updating module is configured to update the teacher network parameters by using the teacher semi-supervised loss, the teacher enhanced label-free loss, and the detector updating loss, iteratively update the parameters of the detector and the teacher network by using the optimizer according to the loss function, and store the parameters of the teacher network and the detector after training is completed;
in this embodiment, the verification module is configured to send a verification set face RGB color channel map to a trained detector to obtain a classification score, obtain a predicted label value according to different decision thresholds, compare the predicted label value with a real label, calculate a false alarm rate and a false omission factor, and take a threshold value when the predicted label value and the real label value are equal to each other as a test decision threshold;
in this embodiment, the test module is configured to send the test set face RGB color channel map to a trained detector, obtain a classification score, obtain a final predicted label value according to a test decision threshold, and calculate a reference index.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A living body detection method based on double decoupling generation and semi-supervised learning is characterized by comprising the following steps:
picking up a face region image from an input image to obtain an RGB color channel image;
matching each true image of an original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
constructing a real person encoder, a prosthesis encoder and a decoder;
sending the true image into a true encoder to obtain a true identity vector, sending the matched false image into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the true identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined true identity vector, the prosthesis identity vector and the prosthesis mode vector into a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function to optimize;
inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;
cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;
updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss;
iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing detection rate, and taking the thresholds with the same value as a test judgment threshold;
and sending the RGB color channel map of the face in the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.
2. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the construction of the real-person encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the respective backbone networks of the real person encoder and the prosthesis encoder adopt the same network structure, and are provided with a convolution layer, an example normalization layer, a LeakyReLU and five groups of units consisting of residual blocks which are formed by stacking and grade-skipping connection of the pooling layer and the convolution layer;
the real person encoder outputs a hidden layer real person vector through the full connecting layer, and the prosthesis encoder outputs a hidden layer prosthesis vector through the full connecting layer;
the decoder backbone network is built by a convolutional layer, an upper sampling layer, a Sigmoid active layer and a residual block, hidden layer real man vectors and hidden layer prosthesis vectors output by a real man encoder and a prosthesis encoder are input into a full connection layer firstly, pass through six groups of units formed by residual blocks and upper sampling layers which are stacked by the convolutional layer and connected in a skipping stage, and finally pass through the convolutional layer to output an image pair.
3. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the true image is sent to a true human encoder to obtain a true human identity vector, and the matched false image is sent to a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, and the specific steps include:
the human decoder outputs human hidden vectors including human identity vector mean
Figure FDA0003574807610000021
And true person identity vector variance
Figure FDA0003574807610000022
The prosthesis encoder outputs a prosthesis hidden vector comprising a prosthesis identity vector mean
Figure FDA0003574807610000023
Variance of identity vector of prosthesis
Figure FDA0003574807610000024
Mean of prosthesis mode vector
Figure FDA0003574807610000025
Variance of the prosthesis mode vector
Figure FDA0003574807610000026
Sampling real person identity variation component from standard normal distribution
Figure FDA0003574807610000027
Identity variation of prosthesis
Figure FDA0003574807610000028
And a prosthetic mode variation component
Figure FDA0003574807610000029
Performing repeated parameter operation to respectively obtain real person identity vectors
Figure FDA0003574807610000031
Prosthesis identity vector
Figure FDA0003574807610000032
And a prosthesis mode vector
Figure FDA0003574807610000033
The concrete expression is as follows:
Figure FDA0003574807610000034
vector true person identity
Figure FDA0003574807610000035
Prosthesis identity vector
Figure FDA0003574807610000036
And a prosthesis mode vector
Figure FDA0003574807610000037
Merging into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true human image
Figure FDA0003574807610000038
And reconstructing the prosthesis image
Figure FDA0003574807610000039
4. The in-vivo detection method based on double-decoupling generation and semi-supervised learning of claim 3, wherein the true human identity vector, the prosthesis identity vector and the prosthesis mode vector are merged and sent to a decoder to output a reconstructed true and false image pair, a double-decoupling generation loss function is constructed for optimization, and the specific steps include:
vector of prosthesis mode
Figure FDA00035748076100000310
Feeding in the fully-connected layer fc (, output prosthesis class vector
Figure FDA00035748076100000311
And with the actual prosthesis class label ysCalculating cross entropy to obtain classification loss
Figure FDA00035748076100000312
Expressed as:
Figure FDA00035748076100000313
wherein k is the number of prosthesis classes, YsLabeling vectors for the one-hot coded prosthesis class;
identity vector in prosthesis
Figure FDA00035748076100000314
And a prosthesis mode vector
Figure FDA00035748076100000315
Adding angular quadrature constraints to obtain quadrature losses
Figure FDA00035748076100000316
Expressed as:
Figure FDA00035748076100000317
wherein <, > represents the inner product;
for reconstructing real person image
Figure FDA00035748076100000318
Reconstructing a prosthetic image
Figure FDA00035748076100000319
Calculating reconstruction losses with corresponding original images
Figure FDA00035748076100000320
Expressed as:
Figure FDA00035748076100000321
to real person identity vector
Figure FDA0003574807610000041
Prosthesis identity vector
Figure FDA0003574807610000042
Calculating the maximum average differential loss
Figure FDA0003574807610000043
Expressed as:
Figure FDA0003574807610000044
for reconstructing real person image
Figure FDA0003574807610000045
And reconstructing the prosthesis image
Figure FDA0003574807610000046
Calculating pairing loss
Figure FDA0003574807610000047
Expressed as:
Figure FDA0003574807610000048
constrained by calculating the KL divergence, as shown in the following equation:
Figure FDA0003574807610000049
wherein ,
Figure FDA00035748076100000410
n is the dimension of the hidden vector;
weighting and summing losses to obtain loss function of double decoupling generator
Figure FDA00035748076100000411
Expressed as:
Figure FDA00035748076100000412
wherein ,λ1、λ2、λ3、λ4Represents a corresponding weight value;
employing Adam optimizer to minimize double decoupling generator loss function
Figure FDA00035748076100000413
A real-man encoder, a prosthetic encoder and a decoder are optimized for the target.
5. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the generating samples are obtained from a standard normal distribution sampling noise input to a trained decoder, and the specific steps comprise:
let the hidden vector dimension be n, from a standard normal distribution
Figure FDA00035748076100000414
Sampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectors
Figure FDA00035748076100000415
And reconstructing the prosthesis pattern hidden vector
Figure FDA00035748076100000416
Reconstructing a real identity hidden vector
Figure FDA00035748076100000417
Concealing vectors directly from reconstructed prosthesis identity
Figure FDA00035748076100000418
The true and false image pairs are generated through a decoder as the generation samples.
6. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the detector and teacher network have the same network structure, and are provided with a convolutional layer, a batch normalization layer, a ReLU, three units of residual blocks formed by convolutional layer stacking and skip level connection, a global average pooling layer and a full connection layer, and the full connection layer outputs classification vectors.
7. The in-vivo detection method based on double-decoupling generation and semi-supervised learning of claim 1, wherein the labeled samples, the unlabeled samples and the enhanced unlabeled samples are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss, and the specific steps include:
labeled exemplars { x }l,ylThe input parameter is thetaTThe teacher network T outputs a teacher labeled prediction result T (x)l;θT) And the genuine label ylCalculating cross entropy to obtain teacher marked markLoss of label
Figure FDA0003574807610000051
The concrete expression is as follows:
Figure FDA0003574807610000052
where CE represents cross entropy loss;
unlabeled sample xuAnd enhanced unlabeled samples
Figure FDA0003574807610000053
Respectively inputting the teacher network T to obtain a teacher unlabelled prediction result T (x)u;θT) And teacher enhanced unlabeled prediction results
Figure FDA0003574807610000054
Calculating the cross entropy loss of the two to obtain the label-free loss of the teacher
Figure FDA0003574807610000055
The concrete expression is as follows:
Figure FDA0003574807610000056
predicting result T (x) from teacher without labelu;θT) Taking the category to which the maximum value belongs as a pseudo label
Figure FDA0003574807610000057
The concrete expression is as follows:
Figure FDA0003574807610000058
lose teachers with labels
Figure FDA0003574807610000059
No label loss with teacher
Figure FDA00035748076100000510
Weighted summation to obtain semi-supervised loss of teacher
Figure FDA00035748076100000511
Expressed as:
Figure FDA00035748076100000512
wherein s is the current step number stlLambda is the weight of no tag loss for the total number of steps;
enhancing unlabeled prediction results for teachers
Figure FDA00035748076100000513
Obtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongs
Figure FDA00035748076100000514
Expressed as:
Figure FDA00035748076100000515
Figure FDA0003574807610000061
8. the in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the pseudo labels of the labeled samples, the enhanced unlabeled samples and the unlabeled samples are sent to a detector learning module to update detector parameters, so as to obtain a detector update loss, and the specific steps include:
the reinforced label-free sample
Figure FDA0003574807610000062
The feeding parameter is thetaDDetector D to obtain detector enhanced label-free prediction results
Figure FDA0003574807610000063
And a pseudo tag
Figure FDA0003574807610000064
Calculating cross entropy to obtain detector enhanced label-free loss
Figure FDA0003574807610000065
The concrete expression is as follows:
Figure FDA0003574807610000066
loss of label for detector
Figure FDA0003574807610000067
Optimizing the detector by using a gradient descent method to obtain an optimized parameter theta'DIs specifically represented as:
Figure FDA0003574807610000068
wherein ,ηDWhich represents the learning rate of the detector,
Figure FDA0003574807610000069
representing a gradient calculation;
labeled sample (x)l,yl) Respectively feeding the parameters to be optimized to be thetaDIs theta 'and the optimized parameter is'DObtaining an old detector with a loss of label
Figure FDA00035748076100000610
And loss of label to the new detector
Figure FDA00035748076100000611
Then the two are differenced to obtain the update loss of the detector
Figure FDA00035748076100000612
The concrete expression is as follows:
Figure FDA00035748076100000613
Figure FDA00035748076100000614
Figure FDA00035748076100000615
9. the in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the teacher network parameters are updated by teacher semi-supervised loss, teacher enhanced non-label loss and detector update loss, and the specific steps include:
loss of detector updates
Figure FDA00035748076100000616
Enhancing label-free loss with teachers
Figure FDA00035748076100000617
Multiply and then partially monitor the loss with the teacher
Figure FDA00035748076100000618
Add up to form teacher's loss
Figure FDA00035748076100000619
The teacher network is optimized by a gradient descent method, and the method is represented as follows:
Figure FDA00035748076100000620
Figure FDA00035748076100000621
wherein ,θTRepresenting a parameter, theta ', before optimization of the teacher network'TRepresenting the teacher's network optimized parameter, ηTThe network learning rate of the teacher is shown,
Figure FDA0003574807610000071
representing the gradient calculation.
10. A living body detection system based on double decoupling generation and semi-supervised learning, characterized by comprising: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;
the data preprocessing module is used for matting human face region images from input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;
the double decoupling generator building module is used for building a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the double decoupling generator training module is used for sending the real image into the real person encoder to obtain a real person identity vector, sending the matched false image into the false body encoder to obtain a false body identity vector and a false body mode vector, combining the real person identity vector, the false body identity vector and the false body mode vector, sending the combined combination into the decoder to output a reconstructed real and false image pair, and constructing a double decoupling generation loss function to optimize;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the unsupervised data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;
the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;
the detector learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples to the detector learning module to update the detector parameters, so that the update loss of the detector is obtained;
the network parameter updating module is used for updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss, iteratively updating parameters of the detector and the teacher network by using an optimizer according to a loss function, and storing the parameters of the teacher network and the detector after training is completed;
the verification module is used for sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating the false alarm rate and the missing rate, and taking the threshold with the same value as the test judgment threshold;
the test module is used for sending the RGB color channel map of the face of the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.
CN202210329816.3A 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning Active CN114663986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210329816.3A CN114663986B (en) 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210329816.3A CN114663986B (en) 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning

Publications (2)

Publication Number Publication Date
CN114663986A true CN114663986A (en) 2022-06-24
CN114663986B CN114663986B (en) 2023-06-20

Family

ID=82033819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210329816.3A Active CN114663986B (en) 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning

Country Status (1)

Country Link
CN (1) CN114663986B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311605A (en) * 2022-09-29 2022-11-08 山东大学 Semi-supervised video classification method and system based on neighbor consistency and contrast learning
CN116152885A (en) * 2022-12-02 2023-05-23 南昌大学 Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460931A (en) * 2020-03-17 2020-07-28 华南理工大学 Face spoofing detection method and system based on color channel difference image characteristics
CN111753595A (en) * 2019-03-29 2020-10-09 北京市商汤科技开发有限公司 Living body detection method and apparatus, device, and storage medium
WO2021134871A1 (en) * 2019-12-30 2021-07-08 深圳市爱协生科技有限公司 Forensics method for synthesized face image based on local binary pattern and deep learning
CN114067444A (en) * 2021-10-12 2022-02-18 中新国际联合研究院 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753595A (en) * 2019-03-29 2020-10-09 北京市商汤科技开发有限公司 Living body detection method and apparatus, device, and storage medium
US20200364478A1 (en) * 2019-03-29 2020-11-19 Beijing Sensetime Technology Development Co., Ltd. Method and apparatus for liveness detection, device, and storage medium
WO2021134871A1 (en) * 2019-12-30 2021-07-08 深圳市爱协生科技有限公司 Forensics method for synthesized face image based on local binary pattern and deep learning
CN111460931A (en) * 2020-03-17 2020-07-28 华南理工大学 Face spoofing detection method and system based on color channel difference image characteristics
CN114067444A (en) * 2021-10-12 2022-02-18 中新国际联合研究院 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AMIR MOHAMMADI等: "Improving cross-dataset perfomance of face presentation attack detection systems using face recognition datasets" *
唐燕: "基于深度学习的人脸活体检测" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311605A (en) * 2022-09-29 2022-11-08 山东大学 Semi-supervised video classification method and system based on neighbor consistency and contrast learning
CN116152885A (en) * 2022-12-02 2023-05-23 南昌大学 Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling
CN116152885B (en) * 2022-12-02 2023-08-01 南昌大学 Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling

Also Published As

Publication number Publication date
CN114663986B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN108537743B (en) Face image enhancement method based on generation countermeasure network
Cheng et al. Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification
CN109993072B (en) Low-resolution pedestrian re-identification system and method based on super-resolution image generation
Asnani et al. Reverse engineering of generative models: Inferring model hyperparameters from generated images
CN114663986A (en) In-vivo detection method and system based on double-decoupling generation and semi-supervised learning
CN110516616A (en) A kind of double authentication face method for anti-counterfeit based on extensive RGB and near-infrared data set
CN112734696B (en) Face changing video tampering detection method and system based on multi-domain feature fusion
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
CN112418041B (en) Multi-pose face recognition method based on face orthogonalization
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN112668519A (en) Abnormal face recognition living body detection method and system based on MCCAE network and Deep SVDD network
CN113537027B (en) Face depth counterfeiting detection method and system based on face division
CN114387641A (en) False video detection method and system based on multi-scale convolutional network and ViT
CN114677722A (en) Multi-supervision human face in-vivo detection method integrating multi-scale features
CN114693607A (en) Method and system for detecting tampered video based on multi-domain block feature marker point registration
Xie et al. Writer-independent online signature verification based on 2D representation of time series data using triplet supervised network
CN114241564A (en) Facial expression recognition method based on inter-class difference strengthening network
CN113887573A (en) Human face forgery detection method based on visual converter
CN116824695A (en) Pedestrian re-identification non-local defense method based on feature denoising
CN115331135A (en) Method for detecting Deepfake video based on multi-domain characteristic region standard score difference
CN114429646A (en) Gait recognition method based on deep self-attention transformation network
CN117496601B (en) Face living body detection system and method based on fine classification and antibody domain generalization
Gao et al. AONet: attentional occlusion-aware network for occluded person re-identification
Guo et al. Discriminative Prototype Learning for Few-Shot Object Detection in Remote Sensing Images
CN117612201B (en) Single-sample pedestrian re-identification method based on feature compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant