CN114663986B - Living body detection method and system based on double decoupling generation and semi-supervised learning - Google Patents

Living body detection method and system based on double decoupling generation and semi-supervised learning Download PDF

Info

Publication number
CN114663986B
CN114663986B CN202210329816.3A CN202210329816A CN114663986B CN 114663986 B CN114663986 B CN 114663986B CN 202210329816 A CN202210329816 A CN 202210329816A CN 114663986 B CN114663986 B CN 114663986B
Authority
CN
China
Prior art keywords
vector
false
true
loss
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210329816.3A
Other languages
Chinese (zh)
Other versions
CN114663986A (en
Inventor
冯浩宇
胡永健
刘琲贝
余翔宇
葛治中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210329816.3A priority Critical patent/CN114663986B/en
Publication of CN114663986A publication Critical patent/CN114663986A/en
Application granted granted Critical
Publication of CN114663986B publication Critical patent/CN114663986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a living body detection method and a living body detection system based on double decoupling generation and semi-supervised learning, wherein the method comprises the following steps: preprocessing data to obtain an RGB color channel diagram original sample, and pairing the same identities to obtain a true and false image pair; the true person encoder outputs a true person identity vector, the false person encoder outputs a false person identity vector and a false person mode vector, the false person identity vector, the false person mode vector and the false person identity vector are combined and sent to the decoder to obtain a reconstructed true and false image pair, a double decoupling generation loss function is constructed, and noise is sent to the trained decoder to obtain a generated sample; constructing a label sample, a label-free sample and an enhanced label-free sample for the original sample and the generated sample, and sending the label sample, the label-free sample and the enhanced label-free sample into a teacher learning module to obtain semi-supervision loss of a teacher, pseudo labels of the label-free sample and enhanced label-free loss of the label-free sample, and updating network parameters of a detector and the teacher; determining a threshold using the validation set; and loading the test data to the detector to obtain a classification score, and judging a classification result according to the threshold value. The invention can improve the robustness of the living body detection model.

Description

Living body detection method and system based on double decoupling generation and semi-supervised learning
Technical Field
The invention relates to the technical field of face recognition anti-deception detection, in particular to a living body detection method and system based on double decoupling generation and semi-supervised learning.
Background
Today, the use of facial biometric technology in businesses and industries has increased dramatically in situations where, for example, facial unlocking technology can be used to protect personal privacy in electronic devices and facial biometric technology can be used to authenticate payments. However, using a face as a biometric for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally divided into four categories: 1) Photo attack-an attacker spoofs the authentication system using a print or a photo of a display screen; 2) Video replay attack, namely an attacker uses a video spoofing authentication system of an attacked person shot in advance; 3) The facial mask attacks, and an attacker wears a facial mask deception system which is carefully manufactured according to the attacked person; 4) Against sample attacks, an attacker generates specific sample noise through the GAN network to interfere with the face authentication system to generate false directional identity verification. These face spoofing attacks are not only low cost but can fool the system, severely affecting and threatening the application of the face recognition system.
The living body detection plays a vital role in preventing the human face recognition system from being attacked by the prosthesis, and the living body detection algorithm based on the deep learning achieves better performance than the living body detection algorithm based on the traditional manual characteristic algorithm due to the strong characteristic extraction capability of the deep learning network, and although most living body detection algorithms based on the deep learning can achieve good detection effect in a library, the detection performance is poor due to the fact that data in the library and data out of the library are often collected under different conditions, such as different shooting equipment, environmental illumination, attack presentation equipment and the like, so that the distribution of the data in the library and the data out of the library are different, and domain displacement exists between the two. When the diversity of training data is insufficient, the fitting is easy to be performed in the in-library learning process, and the cross-library generalization performance is poor. Although the cause can be judged, it is not easy to solve the above problem in the real world application process, and it is difficult for the living detection model to collect labeled training samples in all scenes, resulting in insufficient diversity of most of the existing anti-spoofing data sets at present. For example, the common CASIA, replay-attach and msu shooting devices have 3, 2 and 2 types respectively, and the shooting backgrounds have 3, 2 and 1 types respectively.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which is characterized in that living body/prosthesis characteristics are modeled through decoupling learning, a sample is generated in potential space synthesis to expand a data set, so that the diversity of training data is improved, meanwhile, the high distinguishing degree of the generated sample is ensured, the influence of generated noise on model learning is reduced, the technical scheme based on double decoupling generation and semi-supervised learning is specifically adopted, the technical problems of insufficient diversity and poor generalization of living body detection model data are solved, and the technical effects of ensuring the accuracy in a library and effectively improving the generalization performance are achieved.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which comprises the following steps:
the face region image is extracted from the input image to obtain an RGB color channel image;
pairing each true image of an RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;
constructing a real encoder, a prosthesis encoder and a decoder;
The true image is sent to a true person encoder to obtain a true person identity vector, the paired false image is sent to a false body encoder to obtain a false body identity vector and a false body mode vector, the true person identity vector, the false body identity vector and the false body mode vector are combined and sent to a decoder to output a reconstructed true and false image pair, and a double decoupling generation loss function is constructed to optimize;
inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a label-free sample and an enhanced label-free sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;
updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss;
Iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel diagram of the face of the verification set to a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
sending the RGB color channel diagram of the face of the test set to a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.
As a preferable technical scheme, the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;
the true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;
the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and a residual block, a hidden layer true human vector and a hidden layer false vector output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual block and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer.
As an preferable technical scheme, the steps of sending the true image to the true encoder to obtain a true identity vector, sending the paired false image to the false encoder to obtain a false identity vector and a false mode vector include:
the true person decoder outputs a true person hidden vector comprising a true person identity vector average value
Figure BDA0003574807620000041
And true person identity vector variance
Figure BDA0003574807620000042
The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector mean
Figure BDA0003574807620000043
Prosthetic identity vector variance
Figure BDA0003574807620000044
Prosthesis mode vector mean->
Figure BDA0003574807620000045
Prosthetic mode vector variance->
Figure BDA0003574807620000046
Sampling real person identity variation components from standard normal distribution
Figure BDA0003574807620000047
Prosthesis identity variable component->
Figure BDA0003574807620000048
And a prosthetic mode variable component
Figure BDA0003574807620000049
Performing heavy parameter operation to obtain true identity vectors
Figure BDA00035748076200000410
Prosthesis identity vector->
Figure BDA00035748076200000411
And a prosthetic mode vector
Figure BDA00035748076200000412
The concrete steps are as follows:
Figure BDA00035748076200000413
to the true person identity vector
Figure BDA00035748076200000414
Prosthesis identity vector->
Figure BDA00035748076200000415
And a prosthesis mode vector->
Figure BDA00035748076200000416
Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>
Figure BDA00035748076200000417
And reconstructing a prosthetic image->
Figure BDA00035748076200000418
As a preferred technical solution, the merging of the true identity vector, the false identity vector and the false mode vector into the true and false image pair reconstructed by the decoder output, and the construction of the double decoupling generation loss function for optimization, the specific steps include:
Pattern vector of prosthesis
Figure BDA00035748076200000419
Into full connection layer fc (·) and out of prosthesis class vector +.>
Figure BDA00035748076200000420
And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss->
Figure BDA00035748076200000421
Expressed as:
Figure BDA00035748076200000422
wherein k is the number of prosthesis categories, Y s A prosthesis class label vector subjected to one-time thermal coding;
in the prosthetic identity vector
Figure BDA0003574807620000051
And a prosthesis mode vector->
Figure BDA0003574807620000052
Adding an angular orthogonality constraint to obtain an orthogonality loss->
Figure BDA0003574807620000053
Expressed as:
Figure BDA0003574807620000054
wherein < ·, · > represents the inner product;
for reconstructing real person image
Figure BDA0003574807620000055
Reconstructing a prosthetic image->
Figure BDA0003574807620000056
Calculate reconstruction loss with corresponding artwork>
Figure BDA0003574807620000057
Expressed as:
Figure BDA0003574807620000058
for true person identity vector
Figure BDA0003574807620000059
Prosthesis identity vector->
Figure BDA00035748076200000510
Calculate the maximum average difference loss->
Figure BDA00035748076200000511
Expressed as:
Figure BDA00035748076200000512
for reconstructing real person image
Figure BDA00035748076200000513
And reconstructing a prosthetic image->
Figure BDA00035748076200000514
Calculating pairing loss->
Figure BDA00035748076200000515
Expressed as:
Figure BDA00035748076200000516
constraint is performed by calculating the KL divergence as shown in the following formula:
Figure BDA00035748076200000517
wherein ,
Figure BDA00035748076200000518
n is the dimension of the hidden vector;
weighting and summing the losses to obtain a double-decoupling generator loss function
Figure BDA00035748076200000519
Expressed as:
Figure BDA00035748076200000520
wherein ,λ1 、λ 2 、λ 3 、λ 4 Representing the corresponding weight value;
employing Adam optimizers to minimize double decoupling generator loss functions
Figure BDA00035748076200000521
The real encoder, prosthetic encoder and decoder are optimized for the target.
As a preferred technical solution, the step of inputting the standard normal distributed sampling noise to the trained decoder to obtain the generated samples includes the following specific steps:
Let the dimension of the hidden vector be n, from the normal distribution of the standard
Figure BDA00035748076200000613
Sampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +.>
Figure BDA0003574807620000061
And reconstructing a prosthetic mode hidden vector +.>
Figure BDA0003574807620000062
Reconstructing true person identity hidden vectors
Figure BDA0003574807620000063
Hiding the vector directly from the reconstructed prosthesis identity>
Figure BDA0003574807620000064
The copy is then passed through a decoder to generate a true-false image pair as a generated sample.
As an optimal technical scheme, the detector and the teacher network have the same network structure, and are provided with a convolution layer, a batch normalization layer, a ReLU, three groups of units formed by residual blocks which are stacked by the convolution layer and connected in a skip level, a global average pooling layer and a full connection layer, and the full connection layer outputs classification vectors.
As a preferred technical solution, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain a teacher semi-supervision loss, a pseudo label of unlabeled data and a teacher enhanced unlabeled loss, and the specific steps include:
labeled sample { x l ,y l The input parameter is theta T Teacher network T outputs teacher tagged prediction result T (x l ;θ T ) With the real label y l Calculating cross entropy to obtain teacher label loss
Figure BDA0003574807620000065
The concrete steps are as follows:
Figure BDA0003574807620000066
Wherein CE represents cross entropy loss;
label-free sample x u Enhanced unlabeled exemplar
Figure BDA0003574807620000067
Respectively inputting the teacher network T to obtain teacher label-free prediction results T (x u ;θ T ) And teacher enhanced label-free prediction result->
Figure BDA0003574807620000068
Calculating cross entropy loss of the two to obtain no-label loss of teachers->
Figure BDA0003574807620000069
The concrete steps are as follows:
Figure BDA00035748076200000610
label-free prediction knot from teacherFruit T (x) u ;θ T ) Extracting the category of the maximum value as a pseudo tag
Figure BDA00035748076200000611
The concrete steps are as follows:
Figure BDA00035748076200000612
the teacher has label loss
Figure BDA0003574807620000071
And teacher no tag loss->
Figure BDA0003574807620000072
Weighted summation to obtain teacher semi-supervision loss>
Figure BDA0003574807620000073
Expressed as:
Figure BDA0003574807620000074
wherein s is the current step number, s tl Lambda is the weight without label loss for the total steps;
enhancing unlabeled prediction results for teachers
Figure BDA0003574807620000075
The class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->
Figure BDA0003574807620000076
Expressed as:
Figure BDA0003574807620000077
Figure BDA0003574807620000078
as a preferred technical solution, the step of sending the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to a detector learning module to update the detector parameters to obtain a detector update loss includes:
the enhanced unlabeled exemplar is to be processed
Figure BDA0003574807620000079
The feeding parameter is theta D The detector D of (1) gets the detector enhanced label-free prediction result +. >
Figure BDA00035748076200000710
And pseudo tag->
Figure BDA00035748076200000711
Calculating cross entropy to obtain detector enhancement no tag loss +.>
Figure BDA00035748076200000712
The concrete steps are as follows:
Figure BDA00035748076200000713
no tag loss of detector
Figure BDA00035748076200000714
Optimizing the detector by gradient descent method to obtain optimized parameter of theta' D Specifically expressed as:
Figure BDA00035748076200000715
wherein ,ηD Indicating the rate of learning of the detector,
Figure BDA00035748076200000716
representing gradient calculations;
labeled sample (x l ,y l ) Respectively send the parameters to be theta before optimization D The detector and optimized parameters of (2) are theta' D Obtaining a label loss from the old detector
Figure BDA00035748076200000717
And a new detector with tag loss->
Figure BDA00035748076200000718
Then the two are differenced to obtain the detector update loss +.>
Figure BDA00035748076200000719
The concrete steps are as follows:
Figure BDA00035748076200000720
Figure BDA00035748076200000721
Figure BDA0003574807620000081
as an optimized technical scheme, the method for updating the teacher network parameters by utilizing the teacher semi-supervised loss, the teacher enhanced label-free loss and the detector updating loss comprises the following specific steps:
loss of detector update
Figure BDA0003574807620000082
No tag loss with teacher enhancement>
Figure BDA0003574807620000083
Multiplying and then semi-supervising the loss of teachers>
Figure BDA0003574807620000084
Adding to form teacher loss->
Figure BDA0003574807620000085
The teacher network is optimized by gradient descent method, expressed as:
Figure BDA0003574807620000086
Figure BDA0003574807620000087
wherein ,θT Representing parameters, theta ', before optimizing teacher network' T Representing parameters, eta after teacher network optimization T Represents the network learning rate of the teacher,
Figure BDA0003574807620000088
representing the gradient calculations.
The invention also provides a living body detection system based on double decoupling generation and semi-supervised learning, which comprises: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;
The data preprocessing module is used for extracting face area images from the input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling the false images with attack type labels;
the double decoupling generator construction module is used for constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the double-decoupling generator training module is used for sending the true image to the true man encoder to obtain a true man identity vector, sending the paired false image to the false man encoder to obtain a false man identity vector and a false man mode vector, merging the true man identity vector, the false man identity vector and the false man mode vector, sending the merged true man identity vector, the false man identity vector and the false man mode vector to the decoder to output a reconstructed true and false image pair, and constructing a double-decoupling generation loss function to optimize;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the non-supervision data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a non-label sample and an enhanced non-label sample;
The detector construction module and the teacher network construction module are respectively used for constructing a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
the detector learning module is used for sending the label sample, the enhanced label-free sample and the pseudo label of the label-free sample to the detector learning module to update the detector parameters so as to obtain the detector updating loss;
the network parameter updating module is used for updating the parameters of the teacher network by utilizing the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher and the updating loss of the detector, iteratively updating the parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
the verification module is used for sending the RGB color channel diagram of the verification set face to the trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
The testing module is used for sending the RGB color channel diagram of the face of the testing set to the trained detector to obtain classification scores, obtaining a final predicted label value according to the testing judgment threshold value, and calculating a reference index.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) In the data generation stage, the invention adopts a real encoder, a prosthesis encoder and a decoder to construct a double-decoupling generator, the double-decoupling generator is trained by inputting real and false image pairs of original samples, and then the real and false image pairs are input into the trained decoder in the double-decoupling generator from standard normal distributed sampling noise to obtain generated samples, and the generated samples are used as part of label-free data to enrich the diversity of training data and solve the problem of insufficient diversity of the training data.
(2) In the training stage, a semi-supervised learning framework of pseudo-label generation by a teacher and detector feedback is adopted, specifically, a teacher network provides pseudo-labels without label data for a detector to supervise the detector learning, the performance of the detector is evaluated on labeled data after the detector parameters are updated, the generated pseudo-labels are optimized by losing feedback to the teacher network, and the problem of model training under the condition of limited label training data and the problem of label uncertainty caused by fuzzy generated samples due to defects of a variable self-encoder are solved; the high-discrimination characteristic of the image block is mined, and the learning capacity of the network is improved, so that the sample collection environment which is not seen can be better generalized.
(3) In the detection stage, the model tests loading data to the detector to obtain corresponding classification scores, and the classification results are judged according to the threshold value.
Drawings
FIG. 1 is a flow diagram of a living body detection method based on double decoupling generation and semi-supervised learning according to the present invention;
FIG. 2 is a schematic diagram of a network architecture of a real encoder and a prosthetic encoder according to the present invention;
FIG. 3 is a diagram illustrating a network architecture of a decoder according to the present invention;
FIG. 4 is a schematic diagram of a training phase flow of the dual decoupling generator of the present invention;
FIG. 5 is a schematic diagram of a generating phase flow of the dual decoupling generator of the present invention;
fig. 6 (a) is a schematic diagram of a real person image generated by the dual decoupling generator of the present invention;
FIG. 6 (b) is a schematic diagram of a prosthesis image generated by the dual decoupling generator of the present invention;
FIG. 7 is a schematic diagram of the network architecture of the detector and teacher network of the present invention;
FIG. 8 is an overall framework diagram of semi-supervised learning of the present invention;
fig. 9 is an overall block diagram of a living body detection system based on double decoupling generation and semi-supervised learning of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
This embodiment uses Replay-attach, CASIA-MFSD, and MSU_MFSD biopsy data sets for training and testing as an example, and details the implementation of this embodiment. The Replay-attach data set comprises 1200 videos, real faces from 50 testers and spoofed faces generated according to the real faces are collected by using a MacBook camera with the resolution ratio of 320 multiplied by 240 pixels, and the real faces are divided into a training set, a verification set and a test set according to a ratio of 3:3:4; the CASIA-MFSD data set comprises 600 videos, and three cameras with the resolution of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels are utilized to collect real faces from 50 testers and deceptive faces generated according to the real faces, and the real faces are divided into a training set and a testing set according to the resolution of 2:3; the msu_mfsd dataset included 280 videos, collecting real faces from 35 testers, 15 for the training set and 20 for the test set, and spoof faces generated therefrom. Since the CASIA-MFSD and MSU_MFSD live detection data sets do not contain a validation set, the present embodiment uses the corresponding test set as the validation set for threshold determination for both data sets. And then framing the video of the data set to obtain a picture. The embodiment is carried out on a Linux system and is mainly realized based on a deep learning framework Pytorch1.6.1, wherein the display card is GTX1080Ti, the CUDA version is 10.1.105, and the cudnn version 7.6.4.
As shown in fig. 1, the present embodiment provides a living body detection method based on double decoupling generation and semi-supervised learning, which includes the following steps:
s1: the face region image is extracted from the input image to obtain an RGB color channel image;
in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue. Then, matching each true image of the RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;
s2: constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
in this embodiment, as shown in fig. 2, the main networks of the real encoder and the prosthetic encoder respectively adopt the same network structure, the input size is set to be h×w×3, the input size is changed into h×w×32 through a convolution layer, an instance normalization layer and a LeakyReLU, and then the input size is respectively 1/2, 1/4, 1/8, 1/16, 1/32 of the original size, the channel number is respectively 64, 128, 256, 512 and 512, and the output size is obtained through five groups of units formed by residual blocks connected by a pooling layer, a convolution layer stack and a skip level
Figure BDA0003574807620000121
Is described. After the output characteristics of the backbone network are obtained, the real person encoder outputs hidden layer real person vectors with the size of 2 Xhdim through the full connection layer; the prosthetic encoder outputs hidden layer prosthetic vectors of size 4 xhdim through the full connection layer.
As shown in fig. 3, the decoder backbone is built with convolutional layers, upsampling layers, sigmoid activation layers, and residual blocks. The input size is set to 3 Xhdim, and the full link layer size is first input to become
Figure BDA0003574807620000122
Then through six groups of residual blocks and upsampling layer structures which are connected by convolution layer stack and skip levelThe resultant units have output sizes of 1/32, 1/16, 1/8, 1/4, 1/2, and 1 of original sizes, and the channel numbers are 512, 256, 128, and 64, respectively, and finally the image pairs with the sizes of H×W×6 are output through the convolution layers.
S3: the method comprises the steps of constructing a double decoupling generation module, wherein the double decoupling generation module consists of a double decoupling generator and a double decoupling generator loss function, the double decoupling generator consists of a real person encoder, a false body encoder and a decoder, transmitting a real image to the real person encoder to obtain a real person identity vector, transmitting a matched false image to the false body encoder to obtain a false body identity vector and a false body mode vector, merging the three to transmit the false image pair reconstructed by the decoder output, and constructing a double decoupling generation loss function to optimize;
As shown in fig. 4, the true image and the false image each having dimensions of h×w×3 of the input pair are input to the true decoder and the false encoder, respectively. Let the hidden vector dimension be n, the true human decoder obtains the true human hidden vector with dimension of 2×n, which contains the mean value of the true human identity vector with dimension of n
Figure BDA0003574807620000123
And true person identity vector variance->
Figure BDA0003574807620000124
The prosthetic encoder obtains a prosthetic hidden vector of dimension 4 x n comprising a prosthetic identity vector mean of dimension n>
Figure BDA0003574807620000125
Prosthesis identity vector variance->
Figure BDA0003574807620000126
Prosthesis mode vector mean->
Figure BDA0003574807620000127
Prosthetic mode vector variance->
Figure BDA0003574807620000128
Subsequently sampling true from a standard normal distributionPerson identity variable component->
Figure BDA0003574807620000129
Prosthesis identity variable component->
Figure BDA00035748076200001210
And prosthesis pattern variable component->
Figure BDA00035748076200001211
Carrying out heavy parameter operation shown in the following formula to obtain true identity vector +.>
Figure BDA00035748076200001212
Prosthesis identity vector->
Figure BDA0003574807620000131
And a prosthesis mode vector->
Figure BDA0003574807620000132
Figure BDA0003574807620000133
True person identity vector
Figure BDA0003574807620000134
Prosthesis identity vector->
Figure BDA0003574807620000135
And a prosthesis mode vector->
Figure BDA0003574807620000136
Combining into hidden vector with dimension of 3 Xn, inputting to decoder, outputting reconstructed true and false image pair with dimension of H XW X6, wherein reconstructed true image pair with dimension of H XW X3 is contained->
Figure BDA0003574807620000137
And reconstructing a prosthetic image->
Figure BDA0003574807620000138
To better learn the prosthesis modes of different attack modes, the prosthesis mode vector is used for
Figure BDA0003574807620000139
Into full connection layer fc (·) and out of prosthesis class vector +.>
Figure BDA00035748076200001310
And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss
Figure BDA00035748076200001311
The following formula is shown:
Figure BDA00035748076200001312
wherein k is the number of prosthesis classes, Y s Is a prosthesis class label vector subjected to one-time thermal coding.
For the purpose of prosthetic identity vector
Figure BDA00035748076200001313
And a prosthesis mode vector->
Figure BDA00035748076200001314
Effective separation, adding angular orthogonality constraint between the two to obtain orthogonality loss->
Figure BDA00035748076200001315
The following formula is shown:
Figure BDA00035748076200001316
wherein </and >. Indicate inner products.
For reconstructing real person image
Figure BDA00035748076200001317
And reconstructing a prosthetic image->
Figure BDA00035748076200001318
Calculate reconstruction loss with corresponding artwork>
Figure BDA00035748076200001319
The following formula is shown:
Figure BDA00035748076200001320
to maintain identity consistency of hidden vectors, true person identity vectors with n-dimension are compared
Figure BDA00035748076200001321
Prosthesis identity vector->
Figure BDA00035748076200001322
Calculate the maximum average difference loss->
Figure BDA00035748076200001323
The following formula is shown:
Figure BDA00035748076200001324
to maintain identity consistency of reconstructed images, reconstructed images of true persons are reconstructed
Figure BDA0003574807620000141
And reconstructing a prosthetic image->
Figure BDA0003574807620000142
Calculating pairing loss->
Figure BDA0003574807620000143
The following formula is shown:
Figure BDA0003574807620000144
to fit the distribution of hidden vectors to a standard normal distribution, constraints are placed by calculating the Kullback-Leibler divergence (KL divergence), as shown in the following equation:
Figure BDA0003574807620000145
wherein
Figure BDA0003574807620000146
Where n is the dimension of the hidden vector.
Finally, the loss weighting summation is carried out to obtain a double decoupling generator loss function
Figure BDA0003574807620000147
The following formula is shown:
Figure BDA0003574807620000148
wherein λ1 、λ 2 、λ 3 、λ 4 Is weighted and has optimal values of 10, 0.5, 0.1 and 1 respectively.
Employing Adam optimizers to minimize double decoupling generator loss functions
Figure BDA0003574807620000149
The real encoder, the prosthetic encoder and the decoder are optimized for the target, T algebra are trained iteratively, and the optimal value of T is 200. The parameter updating formula is as follows:
m t+1 =β 1 m t +(1-β 1 )g
v t+1 =β 2 v t +(1-β 2 )g 2
Figure BDA00035748076200001410
wherein β1 ,β 2 Optimally set to 0.9,0.999, parameter e for preventing zero division optimally set to 1e-8, learning rate eta optimally set to 0.0002, theta t Parameters representing the t-th iteration.
S4: inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
as shown in fig. 5, 6 (a) and 6 (b), the hidden vector dimension is set to n, and the hidden vector is distributed from the standard normal state
Figure BDA00035748076200001411
Sampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +.>
Figure BDA00035748076200001412
And reconstructing a prosthetic mode hidden vector
Figure BDA00035748076200001413
In order to ensure the identity consistency of the true and false images, reconstructing the true identity hidden vector +.>
Figure BDA00035748076200001414
Hiding the vector directly from the reconstructed prosthesis identity>
Figure BDA00035748076200001415
The true and false image pairs are copied and then generated by the decoder, and as the generated samples, as shown in fig. 6 (a) and 6 (b), the true and false image pairs generated by the double decoupling generator are obtained, the dimension n optimum value of this embodiment is 128, and the generated sample number optimum value is 6400.
S5: cutting a training set formed by an original sample and a generated sample, dividing the training set into a label sample and a label-free sample, and carrying out random data enhancement on the label-free sample to obtain an enhanced label-free sample;
in this embodiment, firstly, a RGB color channel chart of H×W size to be trained is randomly cut out to form a block of H×W size
Figure BDA0003574807620000151
And the image block is characterized in that N original samples to be trained are randomly selected from the original samples to be trained to serve as labeled data, the remaining samples and the generated samples form unlabeled data together, and the number ratio of the labeled samples to the unlabeled samples is mu. And then carrying out random data enhancement on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, equalizing a histogram, reversing colors, enabling pixel values to be 0-4 bit positions with the lowest pixel value, randomly rotating, horizontally miscut, vertically miscut, horizontally translating, vertically translating, reversing all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for each sample to enhance. The preferred values for H, W, N, μ of this embodiment are 256, 6000, 4;
S6: constructing a detector and teacher network;
as shown in FIG. 7, the teacher network and the detector each have the same network structure, the input size is H×W×3, the input size is H×W×16 by first passing through the convolution layer, the batch normalization layer, and the ReLU, and then passing through three sets of units composed of residual blocks connected by the convolution layer stack and the skip stage, the output sizes are 1, 1/2, and 1/4 of the original sizes, the channel numbers are 32, 64, and 128, and the obtained sizes are
Figure BDA0003574807620000152
The feature vector of (2) is changed into 1×1×128 through global average pooling layer size, and finally the classification vector with dimension of 2 is output through full connection layer.
S7: constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and enhanced unlabeled loss:
as shown in fig. 8, CE (q, p) is first agreed to represent the cross entropy loss of two distributions q and p; if p is a label value, the pseudo label vector is firstly changed into a pseudo label vector through one-hot coding. Let k be the number of label categories, the cross entropy loss be expressed as:
Figure BDA0003574807620000161
/>
secondly, the contract argmax (v) represents the index of the maximum value in the fetched vector v;
for labeled samples { x l ,y l } with a feeding parameter of θ T Teacher network T outputs teacher tagged prediction result T (x l ;θ T ) With the real label y l Calculating cross entropy to obtain teacher label loss
Figure BDA0003574807620000162
The following formula.
Figure BDA0003574807620000163
For unlabeled exemplar x u And the enhanced unlabeled exemplar as described above obtained by one data enhancement
Figure BDA0003574807620000164
Respectively feeding the teacher data into a teacher network T to obtain a teacher label-free prediction result T (x u ;θ T ) And teacher enhanced label-free prediction results
Figure BDA0003574807620000165
Calculating cross entropy loss of the two results to obtain no tag loss of teachers +.>
Figure BDA0003574807620000166
Teacher label-free prediction result T (x) u ;θ T ) The category of the maximum value is taken out as pseudo tag +.>
Figure BDA0003574807620000167
The following formula is shown:
Figure BDA0003574807620000168
Figure BDA0003574807620000169
then the teacher has label loss
Figure BDA00035748076200001610
And teacher no tag loss->
Figure BDA00035748076200001611
Weighted summation to obtain teacher semi-supervision loss>
Figure BDA00035748076200001612
The following formula.
Figure BDA00035748076200001613
Wherein s is the current step number, s tl Lambda is the weight without tag loss for the total number of steps.
Enhancing unlabeled prediction results for teachers
Figure BDA00035748076200001614
The class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->
Figure BDA00035748076200001615
The following formula.
Figure BDA00035748076200001616
Figure BDA00035748076200001617
S8: a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;
As shown in fig. 8, the enhanced unlabeled exemplar is to be read
Figure BDA00035748076200001618
The feeding parameter is theta D The detector D of (1) gets the detector enhanced label-free prediction result +.>
Figure BDA00035748076200001619
And pseudo tag->
Figure BDA00035748076200001620
Calculating cross entropy to obtain the enhanced label-free loss of the detector
Figure BDA00035748076200001621
The following formula is shown:
Figure BDA0003574807620000171
let the detector learning rate be eta D No tag loss according to detector
Figure BDA0003574807620000172
Optimizing the detector by gradient descent method to obtain optimized parameter of theta' D Is represented by the following formula:
Figure BDA0003574807620000173
will have a sample of labels (x l ,y l ) Respectively send the parameters to be theta before optimization D The detector and optimized parameters of (2) are theta' D Obtaining a label loss from the old detector
Figure BDA0003574807620000174
And a new detector with tag loss->
Figure BDA0003574807620000175
Then the two are differenced to obtain the detector update loss +.>
Figure BDA0003574807620000176
The following formula. />
Figure BDA0003574807620000177
Figure BDA0003574807620000178
Figure BDA0003574807620000179
S9: updating teacher network parameters by using teacher semi-supervised loss, enhanced non-label loss and detector updating loss:
as shown in fig. 8, the detector is updated and lost
Figure BDA00035748076200001710
No tag loss with teacher enhancement>
Figure BDA00035748076200001711
Multiplying and then semi-supervising the loss of teachers>
Figure BDA00035748076200001712
Adding to form teacher loss->
Figure BDA00035748076200001713
Let teacher's network learning rate be eta T The teacher network was optimized using gradient descent, expressed as:
Figure BDA00035748076200001714
Figure BDA00035748076200001715
s10: iteratively updating network parameters of the detector and the teacher network by using an optimizer according to the loss function, storing the parameters of the teacher network and the detector after training is completed,
Figure BDA00035748076200001716
Representing gradient calculations:
in this embodiment, the teacher network and the detector both use SGD optimizers with nester ov momentum, where the momentum is μ, the preferred value is 0.9, and the initial value of learning rate is ε 0 The optimal value is 0.05, and the learning rate is attenuated along with the training iteration times;
enhancing label-free loss from a detector in a detector learning module
Figure BDA00035748076200001717
Optimizing with a detector optimizer with the goal of minimizing losses; according to teacher loss in teacher update module>
Figure BDA00035748076200001718
Optimizing with a teacher network optimizer with the goal of minimizing losses.
S11: determining a threshold using the validation set;
in this embodiment, the specific steps include: sending the RGB color channel diagram of the face of the verification set to a detector to obtain a classification fraction p, then carrying out equidistant sampling in a value range (0, 1) to obtain different judgment thresholds, obtaining a predicted label value according to the thresholds, comparing the predicted label value with a real label, calculating a false alarm rate and a false omission rate, and taking the thresholds when the false alarm rate and the false omission rate are equal as a test judgment threshold T;
s12: testing a model;
in this embodiment, the specific steps include: and sending the RGB color channel diagram of the face of the test set to a detector to obtain a classification score p, obtaining a final predicted label value according to a test decision threshold T, and calculating a reference index.
The performance evaluation index of the living body measurement method of this embodiment adopts Error acceptance Rate (False Acceptance Rate, FAR), error rejection Rate (False Rejection Rate, FRR), correct acceptance Rate (True Acceptance Rate, TAR), equal Error Rate (EER), half Error Rate (Half Total Error Rate, hter. The above-mentioned index is described in detail by using the confusion matrix of table 1:
TABLE 1 confusion matrix table
Label/predict Predicted to be true Prediction as false
The tag is true TA FR
The label being false FA TR
The False Acceptance Rate (FAR) refers to the ratio of the number of living faces to the number of non-living faces in the label judgment of the non-living faces:
Figure BDA0003574807620000181
the False Rejection Rate (FRR) refers to the ratio of the number of non-living faces to the number of living faces in the label determined by the living faces:
Figure BDA0003574807620000182
the correct acceptance rate (TAR) refers to the ratio of the number of living faces to the number of living faces in the label:
Figure BDA0003574807620000191
the Equal Error Rate (EER) is the error rate when the FRR is equal to the FAR;
half error rate (HTER) is the average of FRR and FAR:
Figure BDA0003574807620000192
in order to prove the effectiveness of the invention and to test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on the CASIA-MFSD, replay-attach and MSU-MFSD databases. The in-library experimental results and the cross-library experimental results are shown in tables 2 and 3, respectively:
TABLE 2 results of in-library experiments
Figure BDA0003574807620000193
TABLE 3 Cross-library experimental results
Figure BDA0003574807620000194
As can be seen from Table 2, the semi-error rate and the equal error rate of the method in the library are low, and the library has excellent performance of fraud detection; as can be seen from table 3, the half error rate of cross-library detection is also lower; the training set consists of labeled samples and unlabeled samples of less frames extracted from each segment of training set video, but the diversity of training data is enriched by generating data through a double decoupling generator, and the model is progressively trained through meta learning, so that the learning capacity of limited sample characteristics and the model inspiring capacity are improved. The experimental result proves that the method can ensure high accuracy in the library and greatly reduce the error rate across libraries and obviously improve the generalization performance under the condition of insufficient label training samples.
As shown in fig. 9, the present embodiment further provides a living body detection system based on double decoupling generation and semi-supervised learning, including: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;
In this embodiment, the data preprocessing module is configured to extract a face area image from an input image to obtain an RGB color channel image, pair each true image of an original sample of the RGB color channel image to be trained with a false image of the same identity to form a true-false image pair, and label the false image with an attack type label;
in this embodiment, the double-decoupling generator building module is configured to build a real encoder, a prosthetic encoder, and a decoder to form a double-decoupling generator;
in this embodiment, the double-decoupling generator training module is configured to send a true image to a true person encoder to obtain a true person identity vector, send a paired false image to a false person encoder to obtain a false person identity vector and a false person mode vector, and combine the true person identity vector, the false person identity vector and the false person mode vector to send the true person identity vector, the false person identity vector and the false person mode vector to a decoder to output a reconstructed true and false image pair, so as to construct a double-decoupling generation loss function to optimize the true and false image pair;
in this embodiment, the generated sample construction module is configured to obtain a generated sample from the standard normal distributed sampling noise input to the trained decoder;
in this embodiment, the unsupervised data enhancing module is configured to cut a training set formed by an original sample and a generated sample, and construct a labeled sample, an unlabeled sample, and an enhanced unlabeled sample;
In this embodiment, the detector building module and the teacher network building module are respectively configured to build a detector and a teacher network;
in this embodiment, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module, so as to obtain a teacher semi-supervised loss, a pseudo-label of unlabeled data, and a teacher enhanced unlabeled loss;
in this embodiment, the detector learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the detector learning module to update the detector parameters, so as to obtain a detector update loss;
in this embodiment, the network parameter updating module is configured to update the parameters of the teacher network by using the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher, and the updating loss of the detector, iteratively update the parameters of the detector and the teacher network by using the optimizer according to the loss function, and save the parameters of the teacher network and the detector after training is completed;
in this embodiment, the verification module is configured to send the verification set face RGB color channel diagram to the trained detector to obtain a classification score, obtain a predicted tag value according to different decision thresholds, compare the predicted tag value with a real tag, calculate a false alarm rate and a false omission rate, and take a threshold value when the two are equal as a test decision threshold value;
In this embodiment, the test module is configured to send the RGB color channel chart of the face to the trained detector to obtain the classification score, obtain the final predicted label value according to the test decision threshold, and calculate the reference index.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (6)

1. The living body detection method based on double decoupling generation and semi-supervised learning is characterized by comprising the following steps of:
the face region image is extracted from the input image to obtain an RGB color channel image;
pairing each true image of an RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;
constructing a real encoder, a prosthesis encoder and a decoder;
the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;
The true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;
the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and residual blocks, a hidden layer true human vector and a hidden layer false vector which are output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual blocks and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer;
the true image is sent to a true person encoder to obtain a true person identity vector, the paired false image is sent to a false body encoder to obtain a false body identity vector and a false body mode vector, the true person identity vector, the false body identity vector and the false body mode vector are combined and sent to a decoder to output a reconstructed true and false image pair, and a double decoupling generation loss function is constructed to optimize;
the method comprises the specific steps of sending a true image to a true encoder to obtain a true identity vector, and sending a matched false image to a false encoder to obtain a false identity vector and a false mode vector, wherein the specific steps comprise:
the true person decoder outputs a true person hidden vector comprising a true person identity vector average value
Figure FDA0004128218770000011
He ZhenHuman identity vector variance->
Figure FDA0004128218770000012
The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector mean
Figure FDA0004128218770000021
Prosthesis identity vector variance->
Figure FDA0004128218770000022
Prosthesis mode vector mean->
Figure FDA0004128218770000023
Prosthetic mode vector variance->
Figure FDA0004128218770000024
Sampling real person identity variation components from standard normal distribution
Figure FDA0004128218770000025
Prosthesis identity variable component->
Figure FDA0004128218770000026
And prosthesis pattern variable component->
Figure FDA0004128218770000027
Performing heavy parameter operation to obtain true identity vectors
Figure FDA0004128218770000028
Prosthesis identity vector->
Figure FDA0004128218770000029
And a prosthesis mode vector->
Figure FDA00041282187700000230
The concrete steps are as follows:
Figure FDA00041282187700000210
to the true person identity vector
Figure FDA00041282187700000211
Prosthesis identity vector->
Figure FDA00041282187700000212
And a prosthesis mode vector->
Figure FDA00041282187700000213
Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>
Figure FDA00041282187700000214
And reconstructing a prosthetic image->
Figure FDA00041282187700000215
The method comprises the specific steps of combining the true identity vector, the false identity vector and the false mode vector, sending the combined true identity vector, the false identity vector and the false mode vector into a true and false image pair reconstructed by the decoder output, constructing a double decoupling generation loss function, and optimizing the double decoupling generation loss function, wherein the specific steps comprise:
pattern vector of prosthesis
Figure FDA00041282187700000216
Into full connection layer fc (·) and out of prosthesis class vector +.>
Figure FDA00041282187700000217
And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss->
Figure FDA00041282187700000218
Expressed as:
Figure FDA00041282187700000219
wherein k is the number of prosthesis categories, Y s A prosthesis class label vector subjected to one-time thermal coding;
in the prosthetic identity vector
Figure FDA00041282187700000220
And a prosthesis mode vector->
Figure FDA00041282187700000221
Adding an angular orthogonality constraint to obtain an orthogonality loss->
Figure FDA00041282187700000222
Expressed as:
Figure FDA00041282187700000223
wherein ,
Figure FDA00041282187700000224
representation->
Figure FDA00041282187700000225
and />
Figure FDA00041282187700000226
Is an inner product of (2);
for reconstructing real person image
Figure FDA00041282187700000227
Reconstructing a prosthetic image->
Figure FDA00041282187700000228
Calculate reconstruction loss with corresponding artwork>
Figure FDA00041282187700000229
Expressed as:
Figure FDA0004128218770000031
for true person identity vector
Figure FDA0004128218770000032
Prosthesis identity vector->
Figure FDA0004128218770000033
Calculate the maximum average difference loss->
Figure FDA0004128218770000034
Expressed as:
Figure FDA0004128218770000035
for reconstructing real person image
Figure FDA0004128218770000036
And reconstructing a prosthetic image->
Figure FDA0004128218770000037
Calculating pairing loss->
Figure FDA0004128218770000038
Expressed as:
Figure FDA0004128218770000039
constraint is performed by calculating the KL divergence as shown in the following formula:
Figure FDA00041282187700000310
wherein ,
Figure FDA00041282187700000311
n is the dimension of the hidden vector;
weighting and summing the losses to obtain a double-decoupling generator loss function
Figure FDA00041282187700000312
Expressed as:
Figure FDA00041282187700000313
wherein ,λ1 、λ 2 、λ 3 、λ 4 Representing the corresponding weight value;
employing Adam optimizers to minimize double decoupling generator loss functions
Figure FDA00041282187700000314
Optimizing a real encoder, a prosthetic encoder and a decoder for a target;
inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the method comprises the specific steps of inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample, wherein the specific steps comprise:
let the dimension of the hidden vector be n, from the normal distribution of the standard
Figure FDA00041282187700000319
Sampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +. >
Figure FDA00041282187700000315
Reconstruction of a prosthetic phantomHidden vector +.>
Figure FDA00041282187700000316
Reconstructing true person identity hidden vectors
Figure FDA00041282187700000317
Hiding the vector directly from the reconstructed prosthesis identity>
Figure FDA00041282187700000318
Copying, and then generating true and false image pairs through a decoder to serve as generated samples;
cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a label-free sample and an enhanced label-free sample;
constructing a detector and teacher network;
constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;
updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss;
iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
sending the RGB color channel diagram of the face of the verification set to a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
Sending the RGB color channel diagram of the face of the test set to a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.
2. The living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the detector and the teacher network have the same network structure, and are provided with a convolution layer, a batch normalization layer, a ReLU, three groups of units formed by residual blocks of convolution layer stack and skip level connection, a global averaging pooling layer and a full connection layer, and the full connection layer outputs classification vectors.
3. The living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the steps of sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to a teacher learning module to obtain a teacher semi-supervised loss, a pseudo label of unlabeled data and a teacher enhanced unlabeled loss comprise:
labeled sample { x l ,y l The input parameter is theta T Teacher network T outputs teacher tagged prediction results
Figure FDA0004128218770000051
With the real label y l Calculating cross entropy to obtain the label loss of teachers >
Figure FDA0004128218770000052
The concrete steps are as follows:
Figure FDA0004128218770000053
wherein CE represents cross entropy loss;
label-free sample x u Enhanced unlabeled exemplar
Figure FDA0004128218770000054
Respectively input to teacher network T to obtainTeacher label-free prediction result T (x) u ;θ T ) And teacher enhanced label-free prediction result->
Figure FDA0004128218770000055
Calculating cross entropy loss of the two to obtain no-label loss of teachers->
Figure FDA0004128218770000056
The concrete steps are as follows:
Figure FDA0004128218770000057
label-free prediction of results T (x) from teacher u ;θ T ) Extracting the category of the maximum value as a pseudo tag
Figure FDA00041282187700000520
The concrete steps are as follows:
Figure FDA0004128218770000058
the teacher has label loss
Figure FDA0004128218770000059
And teacher no tag loss->
Figure FDA00041282187700000510
Weighted summation to obtain semi-supervision loss of teacher
Figure FDA00041282187700000511
Expressed as:
Figure FDA00041282187700000512
wherein s is the current step number, s tl Lambda is the weight without label loss for the total steps;
enhancing unlabeled prediction results for teachers
Figure FDA00041282187700000513
The class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->
Figure FDA00041282187700000514
Expressed as:
Figure FDA00041282187700000515
Figure FDA00041282187700000516
4. the living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the steps of sending the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to a detector learning module to update the detector parameters and obtain the detector update loss include:
the enhanced unlabeled exemplar is to be processed
Figure FDA00041282187700000517
The feeding parameter is theta D Detector D of (1) to obtain detector-enhanced label-free prediction results
Figure FDA00041282187700000518
And pseudo tag->
Figure FDA00041282187700000519
Calculating cross entropy to obtain detector enhancement no tag loss +.>
Figure FDA0004128218770000061
The concrete steps are as follows:
Figure FDA0004128218770000062
no tag loss of detector
Figure FDA0004128218770000063
Optimizing the detector by gradient descent method to obtain optimized parameter of theta' D Specifically expressed as:
Figure FDA0004128218770000064
wherein ,ηD Indicating the rate of learning of the detector,
Figure FDA0004128218770000065
representing gradient calculations;
labeled sample (x l ,y l ) Respectively send the parameters to be theta before optimization D The detector and optimized parameters of (2) are theta' D Obtaining a label loss from the old detector
Figure FDA0004128218770000066
And a new detector with tag loss->
Figure FDA0004128218770000067
Then the two are differenced to obtain the detector update loss +.>
Figure FDA0004128218770000068
The concrete steps are as follows:
Figure FDA0004128218770000069
Figure FDA00041282187700000610
Figure FDA00041282187700000611
5. the living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the updating of the teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss, and detector updating loss comprises the following specific steps:
loss of detector update
Figure FDA00041282187700000612
No tag loss with teacher enhancement>
Figure FDA00041282187700000613
Multiplied by the total weight and then is semi-supervised by the teacher to lose
Figure FDA00041282187700000614
Adding to form teacher loss->
Figure FDA00041282187700000615
The teacher network is optimized by gradient descent method, expressed as:
Figure FDA00041282187700000616
Figure FDA00041282187700000617
wherein ,θT Representing parameters, theta, before optimizing teacher network′ T Representing parameters, eta after teacher network optimization T Represents the network learning rate of the teacher,
Figure FDA00041282187700000618
representing the gradient calculations.
6. A living body detection system based on double decoupling generation and semi-supervised learning, comprising: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;
the data preprocessing module is used for extracting face area images from the input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling the false images with attack type labels;
the double decoupling generator construction module is used for constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;
the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:
the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;
The true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;
the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and residual blocks, a hidden layer true human vector and a hidden layer false vector which are output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual blocks and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer;
the double-decoupling generator training module is used for sending the true image to the true man encoder to obtain a true man identity vector, sending the paired false image to the false man encoder to obtain a false man identity vector and a false man mode vector, merging the true man identity vector, the false man identity vector and the false man mode vector, sending the merged true man identity vector, the false man identity vector and the false man mode vector to the decoder to output a reconstructed true and false image pair, and constructing a double-decoupling generation loss function to optimize;
the method comprises the specific steps of sending a true image to a true encoder to obtain a true identity vector, and sending a matched false image to a false encoder to obtain a false identity vector and a false mode vector, wherein the specific steps comprise:
the true person decoder outputs a true person hidden vector comprising a true person identity vector average value
Figure FDA0004128218770000071
And true person identity vector variance->
Figure FDA0004128218770000072
The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector mean
Figure FDA0004128218770000073
Prosthesis identity vector variance->
Figure FDA0004128218770000074
Prosthesis mode vector mean->
Figure FDA0004128218770000075
Prosthetic mode vector variance->
Figure FDA0004128218770000076
Sampling real person identity variation components from standard normal distribution
Figure FDA0004128218770000081
Prosthesis identity variable component->
Figure FDA0004128218770000082
And prosthesis pattern variable component->
Figure FDA0004128218770000083
Performing heavy parameter operation to obtain true identity vectors
Figure FDA0004128218770000084
Prosthesis identity vector->
Figure FDA0004128218770000085
And a prosthesis mode vector->
Figure FDA0004128218770000086
The concrete steps are as follows:
Figure FDA0004128218770000087
to the true person identity vector
Figure FDA0004128218770000088
Prosthesis identity vector->
Figure FDA0004128218770000089
And a prosthesis mode vector->
Figure FDA00041282187700000810
Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>
Figure FDA00041282187700000811
And reconstructing a prosthetic image->
Figure FDA00041282187700000812
The method comprises the specific steps of combining the true identity vector, the false identity vector and the false mode vector, sending the combined true identity vector, the false identity vector and the false mode vector into a true and false image pair reconstructed by the decoder output, constructing a double decoupling generation loss function, and optimizing the double decoupling generation loss function, wherein the specific steps comprise:
pattern vector of prosthesis
Figure FDA00041282187700000813
Into full connection layer fc (·) and out of prosthesis class vector +.>
Figure FDA00041282187700000814
And with a true prosthesis class label y s Calculating cross entropy to obtain classification loss->
Figure FDA00041282187700000815
Expressed as:
Figure FDA00041282187700000816
wherein k is the number of prosthesis categories, Y s A prosthesis class label vector subjected to one-time thermal coding;
in the prosthetic identity vector
Figure FDA00041282187700000817
And a prosthesis mode vector->
Figure FDA00041282187700000818
Adding an angular orthogonality constraint to obtain an orthogonality loss->
Figure FDA00041282187700000819
Expressed as:
Figure FDA00041282187700000820
wherein ,
Figure FDA00041282187700000821
representation->
Figure FDA00041282187700000822
and />
Figure FDA00041282187700000823
Is an inner product of (2);
for reconstructing real person image
Figure FDA00041282187700000824
Reconstructing a prosthetic image->
Figure FDA00041282187700000825
Calculate reconstruction loss with corresponding artwork>
Figure FDA00041282187700000826
Expressed as:
Figure FDA00041282187700000827
for true person identity vector
Figure FDA0004128218770000091
Prosthesis identity vector->
Figure FDA0004128218770000092
Calculate the maximum average difference loss->
Figure FDA0004128218770000093
Expressed as:
Figure FDA0004128218770000094
for reconstructing real person image
Figure FDA0004128218770000095
And reconstructing a prosthetic image->
Figure FDA0004128218770000096
Calculating pairing loss->
Figure FDA0004128218770000097
Expressed as:
Figure FDA0004128218770000098
constraint is performed by calculating the KL divergence as shown in the following formula:
Figure FDA0004128218770000099
wherein ,
Figure FDA00041282187700000910
n is the dimension of the hidden vector;
weighting and summing the losses to obtain a double-decoupling generator loss function
Figure FDA00041282187700000911
Expressed as:
Figure FDA00041282187700000912
wherein ,λ1 、λ 2 、λ 3 、λ 4 Representing the corresponding weight value;
employing Adam optimizers to minimize double decoupling generator loss functions
Figure FDA00041282187700000913
Optimizing a real encoder, a prosthetic encoder and a decoder for a target;
the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;
the method comprises the specific steps of inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample, wherein the specific steps comprise:
let the dimension of the hidden vector be n, from the normal distribution of the standard
Figure FDA00041282187700000918
Sampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +. >
Figure FDA00041282187700000914
And reconstructing a prosthetic mode hidden vector +.>
Figure FDA00041282187700000915
Reconstructing true person identity hidden vectors
Figure FDA00041282187700000916
Hiding the vector directly from the reconstructed prosthesis identity>
Figure FDA00041282187700000917
Copying, and then generating true and false image pairs through a decoder to serve as generated samples;
the non-supervision data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a non-label sample and an enhanced non-label sample;
the detector construction module and the teacher network construction module are respectively used for constructing a detector and a teacher network;
the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;
the detector learning module is used for sending the label sample, the enhanced label-free sample and the pseudo label of the label-free sample to the detector learning module to update the detector parameters so as to obtain the detector updating loss;
the network parameter updating module is used for updating the parameters of the teacher network by utilizing the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher and the updating loss of the detector, iteratively updating the parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;
The verification module is used for sending the RGB color channel diagram of the verification set face to the trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;
the testing module is used for sending the RGB color channel diagram of the face of the testing set to the trained detector to obtain classification scores, obtaining a final predicted label value according to the testing judgment threshold value, and calculating a reference index.
CN202210329816.3A 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning Active CN114663986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210329816.3A CN114663986B (en) 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210329816.3A CN114663986B (en) 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning

Publications (2)

Publication Number Publication Date
CN114663986A CN114663986A (en) 2022-06-24
CN114663986B true CN114663986B (en) 2023-06-20

Family

ID=82033819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210329816.3A Active CN114663986B (en) 2022-03-31 2022-03-31 Living body detection method and system based on double decoupling generation and semi-supervised learning

Country Status (1)

Country Link
CN (1) CN114663986B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311605B (en) * 2022-09-29 2023-01-03 山东大学 Semi-supervised video classification method and system based on neighbor consistency and contrast learning
CN116152885B (en) * 2022-12-02 2023-08-01 南昌大学 Cross-modal heterogeneous face recognition and prototype restoration method based on feature decoupling

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460931A (en) * 2020-03-17 2020-07-28 华南理工大学 Face spoofing detection method and system based on color channel difference image characteristics

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753595A (en) * 2019-03-29 2020-10-09 北京市商汤科技开发有限公司 Living body detection method and apparatus, device, and storage medium
CN111222434A (en) * 2019-12-30 2020-06-02 深圳市爱协生科技有限公司 Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning
CN114067444A (en) * 2021-10-12 2022-02-18 中新国际联合研究院 Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460931A (en) * 2020-03-17 2020-07-28 华南理工大学 Face spoofing detection method and system based on color channel difference image characteristics

Also Published As

Publication number Publication date
CN114663986A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN109583342B (en) Human face living body detection method based on transfer learning
Cheng et al. Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN111709408B (en) Image authenticity detection method and device
CN111460931B (en) Face spoofing detection method and system based on color channel difference image characteristics
CN114663986B (en) Living body detection method and system based on double decoupling generation and semi-supervised learning
US20230021661A1 (en) Forgery detection of face image
CN114783003B (en) Pedestrian re-identification method and device based on local feature attention
CN110516616A (en) A kind of double authentication face method for anti-counterfeit based on extensive RGB and near-infrared data set
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
Muhammad et al. Self-supervised 2d face presentation attack detection via temporal sequence sampling
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
Chen et al. SNIS: A signal noise separation-based network for post-processed image forgery detection
CN112418041A (en) Multi-pose face recognition method based on face orthogonalization
Trinh et al. An examination of fairness of ai models for deepfake detection
CN115171047A (en) Fire image detection method based on lightweight long-short distance attention transformer network
CN111476727B (en) Video motion enhancement method for face-changing video detection
CN114677722A (en) Multi-supervision human face in-vivo detection method integrating multi-scale features
Saealal et al. Three-Dimensional Convolutional Approaches for the Verification of Deepfake Videos: The Effect of Image Depth Size on Authentication Performance
CN113887573A (en) Human face forgery detection method based on visual converter
CN116188439A (en) False face-changing image detection method and device based on identity recognition probability distribution
Cai et al. Face anti-spoofing via conditional adversarial domain generalization
Patel et al. An optimized convolution neural network based inter-frame forgery detection model—a multi-feature extraction framework
CN114693607A (en) Method and system for detecting tampered video based on multi-domain block feature marker point registration
Zhang et al. A multi-scale noise-resistant feature adaptation approach for image tampering localization over Facebook

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant