CN114663986B

CN114663986B - Living body detection method and system based on double decoupling generation and semi-supervised learning

Info

Publication number: CN114663986B
Application number: CN202210329816.3A
Authority: CN
Inventors: 冯浩宇; 胡永健; 刘琲贝; 余翔宇; 葛治中
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2023-06-20
Anticipated expiration: 2042-03-31
Also published as: CN114663986A

Abstract

The invention discloses a living body detection method and a living body detection system based on double decoupling generation and semi-supervised learning, wherein the method comprises the following steps: preprocessing data to obtain an RGB color channel diagram original sample, and pairing the same identities to obtain a true and false image pair; the true person encoder outputs a true person identity vector, the false person encoder outputs a false person identity vector and a false person mode vector, the false person identity vector, the false person mode vector and the false person identity vector are combined and sent to the decoder to obtain a reconstructed true and false image pair, a double decoupling generation loss function is constructed, and noise is sent to the trained decoder to obtain a generated sample; constructing a label sample, a label-free sample and an enhanced label-free sample for the original sample and the generated sample, and sending the label sample, the label-free sample and the enhanced label-free sample into a teacher learning module to obtain semi-supervision loss of a teacher, pseudo labels of the label-free sample and enhanced label-free loss of the label-free sample, and updating network parameters of a detector and the teacher; determining a threshold using the validation set; and loading the test data to the detector to obtain a classification score, and judging a classification result according to the threshold value. The invention can improve the robustness of the living body detection model.

Description

Living body detection method and system based on double decoupling generation and semi-supervised learning

Technical Field

The invention relates to the technical field of face recognition anti-deception detection, in particular to a living body detection method and system based on double decoupling generation and semi-supervised learning.

Background

Today, the use of facial biometric technology in businesses and industries has increased dramatically in situations where, for example, facial unlocking technology can be used to protect personal privacy in electronic devices and facial biometric technology can be used to authenticate payments. However, using a face as a biometric for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally divided into four categories: 1) Photo attack-an attacker spoofs the authentication system using a print or a photo of a display screen; 2) Video replay attack, namely an attacker uses a video spoofing authentication system of an attacked person shot in advance; 3) The facial mask attacks, and an attacker wears a facial mask deception system which is carefully manufactured according to the attacked person; 4) Against sample attacks, an attacker generates specific sample noise through the GAN network to interfere with the face authentication system to generate false directional identity verification. These face spoofing attacks are not only low cost but can fool the system, severely affecting and threatening the application of the face recognition system.

The living body detection plays a vital role in preventing the human face recognition system from being attacked by the prosthesis, and the living body detection algorithm based on the deep learning achieves better performance than the living body detection algorithm based on the traditional manual characteristic algorithm due to the strong characteristic extraction capability of the deep learning network, and although most living body detection algorithms based on the deep learning can achieve good detection effect in a library, the detection performance is poor due to the fact that data in the library and data out of the library are often collected under different conditions, such as different shooting equipment, environmental illumination, attack presentation equipment and the like, so that the distribution of the data in the library and the data out of the library are different, and domain displacement exists between the two. When the diversity of training data is insufficient, the fitting is easy to be performed in the in-library learning process, and the cross-library generalization performance is poor. Although the cause can be judged, it is not easy to solve the above problem in the real world application process, and it is difficult for the living detection model to collect labeled training samples in all scenes, resulting in insufficient diversity of most of the existing anti-spoofing data sets at present. For example, the common CASIA, replay-attach and msu shooting devices have 3, 2 and 2 types respectively, and the shooting backgrounds have 3, 2 and 1 types respectively.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which is characterized in that living body/prosthesis characteristics are modeled through decoupling learning, a sample is generated in potential space synthesis to expand a data set, so that the diversity of training data is improved, meanwhile, the high distinguishing degree of the generated sample is ensured, the influence of generated noise on model learning is reduced, the technical scheme based on double decoupling generation and semi-supervised learning is specifically adopted, the technical problems of insufficient diversity and poor generalization of living body detection model data are solved, and the technical effects of ensuring the accuracy in a library and effectively improving the generalization performance are achieved.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which comprises the following steps:

the face region image is extracted from the input image to obtain an RGB color channel image;

pairing each true image of an RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;

constructing a real encoder, a prosthesis encoder and a decoder;

The true image is sent to a true person encoder to obtain a true person identity vector, the paired false image is sent to a false body encoder to obtain a false body identity vector and a false body mode vector, the true person identity vector, the false body identity vector and the false body mode vector are combined and sent to a decoder to output a reconstructed true and false image pair, and a double decoupling generation loss function is constructed to optimize;

inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;

cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a label-free sample and an enhanced label-free sample;

constructing a detector and teacher network;

constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;

a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;

updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss;

Iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;

sending the RGB color channel diagram of the face of the verification set to a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;

sending the RGB color channel diagram of the face of the test set to a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.

As a preferable technical scheme, the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:

the main network of each of the real encoder and the prosthetic encoder adopts the same network structure, and is provided with a convolution layer, an example normalization layer and a LeakyReLU, and five groups of units consisting of a pooling layer, a convolution layer stack and residual blocks connected in a skip level;

the true man encoder outputs a hidden layer true man vector through the full connection layer, and the false body encoder outputs a hidden layer false body vector through the full connection layer;

the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and a residual block, a hidden layer true human vector and a hidden layer false vector output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual block and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer.

As an preferable technical scheme, the steps of sending the true image to the true encoder to obtain a true identity vector, sending the paired false image to the false encoder to obtain a false identity vector and a false mode vector include:

the true person decoder outputs a true person hidden vector comprising a true person identity vector average value

And true person identity vector variance

The prosthetic encoder outputs a prosthetic hidden vector comprising a prosthetic identity vector mean

Prosthetic identity vector variance

Prosthesis mode vector mean->

Prosthetic mode vector variance->

Sampling real person identity variation components from standard normal distribution

Prosthesis identity variable component->

And a prosthetic mode variable component

Performing heavy parameter operation to obtain true identity vectors

Prosthesis identity vector->

And a prosthetic mode vector

The concrete steps are as follows:

to the true person identity vector

Prosthesis identity vector->

And a prosthesis mode vector->

Combining into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true image +.>

And reconstructing a prosthetic image->

As a preferred technical solution, the merging of the true identity vector, the false identity vector and the false mode vector into the true and false image pair reconstructed by the decoder output, and the construction of the double decoupling generation loss function for optimization, the specific steps include:

Pattern vector of prosthesis

Into full connection layer fc (·) and out of prosthesis class vector +.>

And with a true prosthesis class label y _s Calculating cross entropy to obtain classification loss->

Expressed as:

wherein k is the number of prosthesis categories, Y _s A prosthesis class label vector subjected to one-time thermal coding;

in the prosthetic identity vector

And a prosthesis mode vector->

Adding an angular orthogonality constraint to obtain an orthogonality loss->

Expressed as:

wherein < ·, · > represents the inner product;

for reconstructing real person image

Reconstructing a prosthetic image->

Calculate reconstruction loss with corresponding artwork>

Expressed as:

for true person identity vector

Prosthesis identity vector->

Calculate the maximum average difference loss->

Expressed as:

for reconstructing real person image

And reconstructing a prosthetic image->

Calculating pairing loss->

Expressed as:

constraint is performed by calculating the KL divergence as shown in the following formula:

wherein ,

n is the dimension of the hidden vector;

weighting and summing the losses to obtain a double-decoupling generator loss function

Expressed as:

wherein ,λ₁ 、λ ₂ 、λ ₃ 、λ ₄ Representing the corresponding weight value;

employing Adam optimizers to minimize double decoupling generator loss functions

The real encoder, prosthetic encoder and decoder are optimized for the target.

As a preferred technical solution, the step of inputting the standard normal distributed sampling noise to the trained decoder to obtain the generated samples includes the following specific steps:

Let the dimension of the hidden vector be n, from the normal distribution of the standard

Sampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +.>

And reconstructing a prosthetic mode hidden vector +.>

Reconstructing true person identity hidden vectors

Hiding the vector directly from the reconstructed prosthesis identity>

The copy is then passed through a decoder to generate a true-false image pair as a generated sample.

As an optimal technical scheme, the detector and the teacher network have the same network structure, and are provided with a convolution layer, a batch normalization layer, a ReLU, three groups of units formed by residual blocks which are stacked by the convolution layer and connected in a skip level, a global average pooling layer and a full connection layer, and the full connection layer outputs classification vectors.

As a preferred technical solution, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain a teacher semi-supervision loss, a pseudo label of unlabeled data and a teacher enhanced unlabeled loss, and the specific steps include:

labeled sample { x _l ,y _l The input parameter is theta _T Teacher network T outputs teacher tagged prediction result T (x _l ；θ _T ) With the real label y _l Calculating cross entropy to obtain teacher label loss

The concrete steps are as follows:

Wherein CE represents cross entropy loss;

label-free sample x _u Enhanced unlabeled exemplar

Respectively inputting the teacher network T to obtain teacher label-free prediction results T (x _u ；θ _T ) And teacher enhanced label-free prediction result->

Calculating cross entropy loss of the two to obtain no-label loss of teachers->

The concrete steps are as follows:

label-free prediction knot from teacherFruit T (x) _u ；θ _T ) Extracting the category of the maximum value as a pseudo tag

The concrete steps are as follows:

the teacher has label loss

And teacher no tag loss->

Weighted summation to obtain teacher semi-supervision loss>

Expressed as:

wherein s is the current step number, s _tl Lambda is the weight without label loss for the total steps;

enhancing unlabeled prediction results for teachers

The class h to which the maximum value extracted by the self belongs is subjected to a cross entropy function to obtain enhanced no-tag loss ∈ ->

Expressed as:

as a preferred technical solution, the step of sending the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to a detector learning module to update the detector parameters to obtain a detector update loss includes:

the enhanced unlabeled exemplar is to be processed

The feeding parameter is theta _D The detector D of (1) gets the detector enhanced label-free prediction result +. >

And pseudo tag->

Calculating cross entropy to obtain detector enhancement no tag loss +.>

The concrete steps are as follows:

no tag loss of detector

Optimizing the detector by gradient descent method to obtain optimized parameter of theta' _D Specifically expressed as:

wherein ,η_D Indicating the rate of learning of the detector,

representing gradient calculations;

labeled sample (x _l ,y _l ) Respectively send the parameters to be theta before optimization _D The detector and optimized parameters of (2) are theta' _D Obtaining a label loss from the old detector

And a new detector with tag loss->

Then the two are differenced to obtain the detector update loss +.>

The concrete steps are as follows:

as an optimized technical scheme, the method for updating the teacher network parameters by utilizing the teacher semi-supervised loss, the teacher enhanced label-free loss and the detector updating loss comprises the following specific steps:

loss of detector update

No tag loss with teacher enhancement>

Multiplying and then semi-supervising the loss of teachers>

Adding to form teacher loss->

The teacher network is optimized by gradient descent method, expressed as:

wherein ,θ_T Representing parameters, theta ', before optimizing teacher network' _T Representing parameters, eta after teacher network optimization _T Represents the network learning rate of the teacher,

representing the gradient calculations.

The invention also provides a living body detection system based on double decoupling generation and semi-supervised learning, which comprises: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;

The data preprocessing module is used for extracting face area images from the input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling the false images with attack type labels;

the double decoupling generator construction module is used for constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;

the double-decoupling generator training module is used for sending the true image to the true man encoder to obtain a true man identity vector, sending the paired false image to the false man encoder to obtain a false man identity vector and a false man mode vector, merging the true man identity vector, the false man identity vector and the false man mode vector, sending the merged true man identity vector, the false man identity vector and the false man mode vector to the decoder to output a reconstructed true and false image pair, and constructing a double-decoupling generation loss function to optimize;

the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;

the non-supervision data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a label sample, a non-label sample and an enhanced non-label sample;

The detector construction module and the teacher network construction module are respectively used for constructing a detector and a teacher network;

the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss;

the detector learning module is used for sending the label sample, the enhanced label-free sample and the pseudo label of the label-free sample to the detector learning module to update the detector parameters so as to obtain the detector updating loss;

the network parameter updating module is used for updating the parameters of the teacher network by utilizing the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher and the updating loss of the detector, iteratively updating the parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;

the verification module is used for sending the RGB color channel diagram of the verification set face to the trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating false alarm rate and omission rate, and taking the thresholds when the false alarm rate and the omission rate are equal as test judgment thresholds;

The testing module is used for sending the RGB color channel diagram of the face of the testing set to the trained detector to obtain classification scores, obtaining a final predicted label value according to the testing judgment threshold value, and calculating a reference index.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) In the data generation stage, the invention adopts a real encoder, a prosthesis encoder and a decoder to construct a double-decoupling generator, the double-decoupling generator is trained by inputting real and false image pairs of original samples, and then the real and false image pairs are input into the trained decoder in the double-decoupling generator from standard normal distributed sampling noise to obtain generated samples, and the generated samples are used as part of label-free data to enrich the diversity of training data and solve the problem of insufficient diversity of the training data.

(2) In the training stage, a semi-supervised learning framework of pseudo-label generation by a teacher and detector feedback is adopted, specifically, a teacher network provides pseudo-labels without label data for a detector to supervise the detector learning, the performance of the detector is evaluated on labeled data after the detector parameters are updated, the generated pseudo-labels are optimized by losing feedback to the teacher network, and the problem of model training under the condition of limited label training data and the problem of label uncertainty caused by fuzzy generated samples due to defects of a variable self-encoder are solved; the high-discrimination characteristic of the image block is mined, and the learning capacity of the network is improved, so that the sample collection environment which is not seen can be better generalized.

(3) In the detection stage, the model tests loading data to the detector to obtain corresponding classification scores, and the classification results are judged according to the threshold value.

Drawings

FIG. 1 is a flow diagram of a living body detection method based on double decoupling generation and semi-supervised learning according to the present invention;

FIG. 2 is a schematic diagram of a network architecture of a real encoder and a prosthetic encoder according to the present invention;

FIG. 3 is a diagram illustrating a network architecture of a decoder according to the present invention;

FIG. 4 is a schematic diagram of a training phase flow of the dual decoupling generator of the present invention;

FIG. 5 is a schematic diagram of a generating phase flow of the dual decoupling generator of the present invention;

fig. 6 (a) is a schematic diagram of a real person image generated by the dual decoupling generator of the present invention;

FIG. 6 (b) is a schematic diagram of a prosthesis image generated by the dual decoupling generator of the present invention;

FIG. 7 is a schematic diagram of the network architecture of the detector and teacher network of the present invention;

FIG. 8 is an overall framework diagram of semi-supervised learning of the present invention;

fig. 9 is an overall block diagram of a living body detection system based on double decoupling generation and semi-supervised learning of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

This embodiment uses Replay-attach, CASIA-MFSD, and MSU_MFSD biopsy data sets for training and testing as an example, and details the implementation of this embodiment. The Replay-attach data set comprises 1200 videos, real faces from 50 testers and spoofed faces generated according to the real faces are collected by using a MacBook camera with the resolution ratio of 320 multiplied by 240 pixels, and the real faces are divided into a training set, a verification set and a test set according to a ratio of 3:3:4; the CASIA-MFSD data set comprises 600 videos, and three cameras with the resolution of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels are utilized to collect real faces from 50 testers and deceptive faces generated according to the real faces, and the real faces are divided into a training set and a testing set according to the resolution of 2:3; the msu_mfsd dataset included 280 videos, collecting real faces from 35 testers, 15 for the training set and 20 for the test set, and spoof faces generated therefrom. Since the CASIA-MFSD and MSU_MFSD live detection data sets do not contain a validation set, the present embodiment uses the corresponding test set as the validation set for threshold determination for both data sets. And then framing the video of the data set to obtain a picture. The embodiment is carried out on a Linux system and is mainly realized based on a deep learning framework Pytorch1.6.1, wherein the display card is GTX1080Ti, the CUDA version is 10.1.105, and the cudnn version 7.6.4.

As shown in fig. 1, the present embodiment provides a living body detection method based on double decoupling generation and semi-supervised learning, which includes the following steps:

s1: the face region image is extracted from the input image to obtain an RGB color channel image;

in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue. Then, matching each true image of the RGB color channel diagram original sample to be trained with a false image of the same identity to form a true-false image pair, and labeling the false image with an attack type label;

s2: constructing a real encoder, a prosthesis encoder and a decoder to form a double decoupling generator;

in this embodiment, as shown in fig. 2, the main networks of the real encoder and the prosthetic encoder respectively adopt the same network structure, the input size is set to be h×w×3, the input size is changed into h×w×32 through a convolution layer, an instance normalization layer and a LeakyReLU, and then the input size is respectively 1/2, 1/4, 1/8, 1/16, 1/32 of the original size, the channel number is respectively 64, 128, 256, 512 and 512, and the output size is obtained through five groups of units formed by residual blocks connected by a pooling layer, a convolution layer stack and a skip level

Is described. After the output characteristics of the backbone network are obtained, the real person encoder outputs hidden layer real person vectors with the size of 2 Xhdim through the full connection layer; the prosthetic encoder outputs hidden layer prosthetic vectors of size 4 xhdim through the full connection layer.

As shown in fig. 3, the decoder backbone is built with convolutional layers, upsampling layers, sigmoid activation layers, and residual blocks. The input size is set to 3 Xhdim, and the full link layer size is first input to become

Then through six groups of residual blocks and upsampling layer structures which are connected by convolution layer stack and skip levelThe resultant units have output sizes of 1/32, 1/16, 1/8, 1/4, 1/2, and 1 of original sizes, and the channel numbers are 512, 256, 128, and 64, respectively, and finally the image pairs with the sizes of H×W×6 are output through the convolution layers.

S3: the method comprises the steps of constructing a double decoupling generation module, wherein the double decoupling generation module consists of a double decoupling generator and a double decoupling generator loss function, the double decoupling generator consists of a real person encoder, a false body encoder and a decoder, transmitting a real image to the real person encoder to obtain a real person identity vector, transmitting a matched false image to the false body encoder to obtain a false body identity vector and a false body mode vector, merging the three to transmit the false image pair reconstructed by the decoder output, and constructing a double decoupling generation loss function to optimize;

As shown in fig. 4, the true image and the false image each having dimensions of h×w×3 of the input pair are input to the true decoder and the false encoder, respectively. Let the hidden vector dimension be n, the true human decoder obtains the true human hidden vector with dimension of 2×n, which contains the mean value of the true human identity vector with dimension of n

And true person identity vector variance->

The prosthetic encoder obtains a prosthetic hidden vector of dimension 4 x n comprising a prosthetic identity vector mean of dimension n>

Prosthesis identity vector variance->

Prosthesis mode vector mean->

Prosthetic mode vector variance->

Subsequently sampling true from a standard normal distributionPerson identity variable component->

Prosthesis identity variable component->

And prosthesis pattern variable component->

Carrying out heavy parameter operation shown in the following formula to obtain true identity vector +.>

Prosthesis identity vector->

And a prosthesis mode vector->

True person identity vector

Prosthesis identity vector->

And a prosthesis mode vector->

Combining into hidden vector with dimension of 3 Xn, inputting to decoder, outputting reconstructed true and false image pair with dimension of H XW X6, wherein reconstructed true image pair with dimension of H XW X3 is contained->

And reconstructing a prosthetic image->

To better learn the prosthesis modes of different attack modes, the prosthesis mode vector is used for

Into full connection layer fc (·) and out of prosthesis class vector +.>

And with a true prosthesis class label y _s Calculating cross entropy to obtain classification loss

The following formula is shown:

wherein k is the number of prosthesis classes, Y _s Is a prosthesis class label vector subjected to one-time thermal coding.

For the purpose of prosthetic identity vector

And a prosthesis mode vector->

Effective separation, adding angular orthogonality constraint between the two to obtain orthogonality loss->

The following formula is shown:

wherein </and >. Indicate inner products.

For reconstructing real person image

And reconstructing a prosthetic image->

Calculate reconstruction loss with corresponding artwork>

The following formula is shown:

to maintain identity consistency of hidden vectors, true person identity vectors with n-dimension are compared

Prosthesis identity vector->

Calculate the maximum average difference loss->

The following formula is shown:

to maintain identity consistency of reconstructed images, reconstructed images of true persons are reconstructed

And reconstructing a prosthetic image->

Calculating pairing loss->

The following formula is shown:

to fit the distribution of hidden vectors to a standard normal distribution, constraints are placed by calculating the Kullback-Leibler divergence (KL divergence), as shown in the following equation:

wherein

Where n is the dimension of the hidden vector.

Finally, the loss weighting summation is carried out to obtain a double decoupling generator loss function

The following formula is shown:

wherein λ₁ 、λ ₂ 、λ ₃ 、λ ₄ Is weighted and has optimal values of 10, 0.5, 0.1 and 1 respectively.

The real encoder, the prosthetic encoder and the decoder are optimized for the target, T algebra are trained iteratively, and the optimal value of T is 200. The parameter updating formula is as follows:

m _t+1 ＝β ₁ m _t +(1-β ₁ )g

v _t+1 ＝β ₂ v _t +(1-β ₂ )g ²

wherein β₁ ，β ₂ Optimally set to 0.9,0.999, parameter e for preventing zero division optimally set to 1e-8, learning rate eta optimally set to 0.0002, theta _t Parameters representing the t-th iteration.

S4: inputting the standard normal distribution sampling noise to a trained decoder to obtain a generated sample;

as shown in fig. 5, 6 (a) and 6 (b), the hidden vector dimension is set to n, and the hidden vector is distributed from the standard normal state

And reconstructing a prosthetic mode hidden vector

In order to ensure the identity consistency of the true and false images, reconstructing the true identity hidden vector +.>

Hiding the vector directly from the reconstructed prosthesis identity>

The true and false image pairs are copied and then generated by the decoder, and as the generated samples, as shown in fig. 6 (a) and 6 (b), the true and false image pairs generated by the double decoupling generator are obtained, the dimension n optimum value of this embodiment is 128, and the generated sample number optimum value is 6400.

S5: cutting a training set formed by an original sample and a generated sample, dividing the training set into a label sample and a label-free sample, and carrying out random data enhancement on the label-free sample to obtain an enhanced label-free sample;

in this embodiment, firstly, a RGB color channel chart of H×W size to be trained is randomly cut out to form a block of H×W size

And the image block is characterized in that N original samples to be trained are randomly selected from the original samples to be trained to serve as labeled data, the remaining samples and the generated samples form unlabeled data together, and the number ratio of the labeled samples to the unlabeled samples is mu. And then carrying out random data enhancement on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, equalizing a histogram, reversing colors, enabling pixel values to be 0-4 bit positions with the lowest pixel value, randomly rotating, horizontally miscut, vertically miscut, horizontally translating, vertically translating, reversing all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for each sample to enhance. The preferred values for H, W, N, μ of this embodiment are 256, 6000, 4;

S6: constructing a detector and teacher network;

as shown in FIG. 7, the teacher network and the detector each have the same network structure, the input size is H×W×3, the input size is H×W×16 by first passing through the convolution layer, the batch normalization layer, and the ReLU, and then passing through three sets of units composed of residual blocks connected by the convolution layer stack and the skip stage, the output sizes are 1, 1/2, and 1/4 of the original sizes, the channel numbers are 32, 64, and 128, and the obtained sizes are

The feature vector of (2) is changed into 1×1×128 through global average pooling layer size, and finally the classification vector with dimension of 2 is output through full connection layer.

S7: constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain teacher semi-supervision loss, pseudo labels of unlabeled data and enhanced unlabeled loss:

as shown in fig. 8, CE (q, p) is first agreed to represent the cross entropy loss of two distributions q and p; if p is a label value, the pseudo label vector is firstly changed into a pseudo label vector through one-hot coding. Let k be the number of label categories, the cross entropy loss be expressed as:

/>

secondly, the contract argmax (v) represents the index of the maximum value in the fetched vector v;

for labeled samples { x _l ,y _l } with a feeding parameter of θ _T Teacher network T outputs teacher tagged prediction result T (x _l ；θ _T ) With the real label y _l Calculating cross entropy to obtain teacher label loss

The following formula.

For unlabeled exemplar x _u And the enhanced unlabeled exemplar as described above obtained by one data enhancement

Respectively feeding the teacher data into a teacher network T to obtain a teacher label-free prediction result T (x _u ；θ _T ) And teacher enhanced label-free prediction results

Calculating cross entropy loss of the two results to obtain no tag loss of teachers +.>

Teacher label-free prediction result T (x) _u ；θ _T ) The category of the maximum value is taken out as pseudo tag +.>

The following formula is shown:

then the teacher has label loss

And teacher no tag loss->

Weighted summation to obtain teacher semi-supervision loss>

The following formula.

Wherein s is the current step number, s _tl Lambda is the weight without tag loss for the total number of steps.

Enhancing unlabeled prediction results for teachers

The following formula.

S8: a detector learning module is constructed, and a label sample, an enhanced label-free sample and a pseudo label of the label-free sample are sent to the detector learning module to update detector parameters, so that detector updating loss is obtained;

As shown in fig. 8, the enhanced unlabeled exemplar is to be read

The feeding parameter is theta _D The detector D of (1) gets the detector enhanced label-free prediction result +.>

And pseudo tag->

Calculating cross entropy to obtain the enhanced label-free loss of the detector

The following formula is shown:

let the detector learning rate be eta _D No tag loss according to detector

Optimizing the detector by gradient descent method to obtain optimized parameter of theta' _D Is represented by the following formula:

will have a sample of labels (x _l ,y _l ) Respectively send the parameters to be theta before optimization _D The detector and optimized parameters of (2) are theta' _D Obtaining a label loss from the old detector

And a new detector with tag loss->

Then the two are differenced to obtain the detector update loss +.>

The following formula. />

S9: updating teacher network parameters by using teacher semi-supervised loss, enhanced non-label loss and detector updating loss:

as shown in fig. 8, the detector is updated and lost

No tag loss with teacher enhancement>

Multiplying and then semi-supervising the loss of teachers>

Adding to form teacher loss->

Let teacher's network learning rate be eta _T The teacher network was optimized using gradient descent, expressed as:

s10: iteratively updating network parameters of the detector and the teacher network by using an optimizer according to the loss function, storing the parameters of the teacher network and the detector after training is completed,

Representing gradient calculations:

in this embodiment, the teacher network and the detector both use SGD optimizers with nester ov momentum, where the momentum is μ, the preferred value is 0.9, and the initial value of learning rate is ε ₀ The optimal value is 0.05, and the learning rate is attenuated along with the training iteration times;

enhancing label-free loss from a detector in a detector learning module

Optimizing with a detector optimizer with the goal of minimizing losses; according to teacher loss in teacher update module>

Optimizing with a teacher network optimizer with the goal of minimizing losses.

S11: determining a threshold using the validation set;

in this embodiment, the specific steps include: sending the RGB color channel diagram of the face of the verification set to a detector to obtain a classification fraction p, then carrying out equidistant sampling in a value range (0, 1) to obtain different judgment thresholds, obtaining a predicted label value according to the thresholds, comparing the predicted label value with a real label, calculating a false alarm rate and a false omission rate, and taking the thresholds when the false alarm rate and the false omission rate are equal as a test judgment threshold T;

s12: testing a model;

in this embodiment, the specific steps include: and sending the RGB color channel diagram of the face of the test set to a detector to obtain a classification score p, obtaining a final predicted label value according to a test decision threshold T, and calculating a reference index.

The performance evaluation index of the living body measurement method of this embodiment adopts Error acceptance Rate (False Acceptance Rate, FAR), error rejection Rate (False Rejection Rate, FRR), correct acceptance Rate (True Acceptance Rate, TAR), equal Error Rate (EER), half Error Rate (Half Total Error Rate, hter. The above-mentioned index is described in detail by using the confusion matrix of table 1:

TABLE 1 confusion matrix table

Label/predict	Predicted to be true	Prediction as false
			The tag is true	TA	FR
The label being false	FA	TR

The False Acceptance Rate (FAR) refers to the ratio of the number of living faces to the number of non-living faces in the label judgment of the non-living faces:

the False Rejection Rate (FRR) refers to the ratio of the number of non-living faces to the number of living faces in the label determined by the living faces:

the correct acceptance rate (TAR) refers to the ratio of the number of living faces to the number of living faces in the label:

the Equal Error Rate (EER) is the error rate when the FRR is equal to the FAR;

half error rate (HTER) is the average of FRR and FAR:

in order to prove the effectiveness of the invention and to test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on the CASIA-MFSD, replay-attach and MSU-MFSD databases. The in-library experimental results and the cross-library experimental results are shown in tables 2 and 3, respectively:

TABLE 2 results of in-library experiments

TABLE 3 Cross-library experimental results

As can be seen from Table 2, the semi-error rate and the equal error rate of the method in the library are low, and the library has excellent performance of fraud detection; as can be seen from table 3, the half error rate of cross-library detection is also lower; the training set consists of labeled samples and unlabeled samples of less frames extracted from each segment of training set video, but the diversity of training data is enriched by generating data through a double decoupling generator, and the model is progressively trained through meta learning, so that the learning capacity of limited sample characteristics and the model inspiring capacity are improved. The experimental result proves that the method can ensure high accuracy in the library and greatly reduce the error rate across libraries and obviously improve the generalization performance under the condition of insufficient label training samples.

As shown in fig. 9, the present embodiment further provides a living body detection system based on double decoupling generation and semi-supervised learning, including: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;

In this embodiment, the data preprocessing module is configured to extract a face area image from an input image to obtain an RGB color channel image, pair each true image of an original sample of the RGB color channel image to be trained with a false image of the same identity to form a true-false image pair, and label the false image with an attack type label;

in this embodiment, the double-decoupling generator building module is configured to build a real encoder, a prosthetic encoder, and a decoder to form a double-decoupling generator;

in this embodiment, the double-decoupling generator training module is configured to send a true image to a true person encoder to obtain a true person identity vector, send a paired false image to a false person encoder to obtain a false person identity vector and a false person mode vector, and combine the true person identity vector, the false person identity vector and the false person mode vector to send the true person identity vector, the false person identity vector and the false person mode vector to a decoder to output a reconstructed true and false image pair, so as to construct a double-decoupling generation loss function to optimize the true and false image pair;

in this embodiment, the generated sample construction module is configured to obtain a generated sample from the standard normal distributed sampling noise input to the trained decoder;

in this embodiment, the unsupervised data enhancing module is configured to cut a training set formed by an original sample and a generated sample, and construct a labeled sample, an unlabeled sample, and an enhanced unlabeled sample;

In this embodiment, the detector building module and the teacher network building module are respectively configured to build a detector and a teacher network;

in this embodiment, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module, so as to obtain a teacher semi-supervised loss, a pseudo-label of unlabeled data, and a teacher enhanced unlabeled loss;

in this embodiment, the detector learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the detector learning module to update the detector parameters, so as to obtain a detector update loss;

in this embodiment, the network parameter updating module is configured to update the parameters of the teacher network by using the semi-supervised loss of the teacher, the enhanced label-free loss of the teacher, and the updating loss of the detector, iteratively update the parameters of the detector and the teacher network by using the optimizer according to the loss function, and save the parameters of the teacher network and the detector after training is completed;

in this embodiment, the verification module is configured to send the verification set face RGB color channel diagram to the trained detector to obtain a classification score, obtain a predicted tag value according to different decision thresholds, compare the predicted tag value with a real tag, calculate a false alarm rate and a false omission rate, and take a threshold value when the two are equal as a test decision threshold value;

In this embodiment, the test module is configured to send the RGB color channel chart of the face to the trained detector to obtain the classification score, obtain the final predicted label value according to the test decision threshold, and calculate the reference index.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The living body detection method based on double decoupling generation and semi-supervised learning is characterized by comprising the following steps of:

constructing a real encoder, a prosthesis encoder and a decoder;

the method for constructing the real encoder, the prosthesis encoder and the decoder comprises the following specific steps:

the main network of the decoder is built by adopting a convolution layer, an up-sampling layer, a Sigmoid activation layer and residual blocks, a hidden layer true human vector and a hidden layer false vector which are output by a true human encoder and a false body encoder are firstly input into a full connection layer, pass through six groups of units consisting of the residual blocks and the up-sampling layer which are stacked and connected by a convolution layer, and finally output an image pair through the convolution layer;

the method comprises the specific steps of sending a true image to a true encoder to obtain a true identity vector, and sending a matched false image to a false encoder to obtain a false identity vector and a false mode vector, wherein the specific steps comprise:

He ZhenHuman identity vector variance->

Prosthesis identity vector variance->

Prosthesis mode vector mean->

Prosthetic mode vector variance->

Prosthesis identity variable component->

And prosthesis pattern variable component->

Performing heavy parameter operation to obtain true identity vectors

Prosthesis identity vector->

And a prosthesis mode vector->

The concrete steps are as follows:

to the true person identity vector

Prosthesis identity vector->

And a prosthesis mode vector->

And reconstructing a prosthetic image->

The method comprises the specific steps of combining the true identity vector, the false identity vector and the false mode vector, sending the combined true identity vector, the false identity vector and the false mode vector into a true and false image pair reconstructed by the decoder output, constructing a double decoupling generation loss function, and optimizing the double decoupling generation loss function, wherein the specific steps comprise:

pattern vector of prosthesis

Into full connection layer fc (·) and out of prosthesis class vector +.>

Expressed as:

in the prosthetic identity vector

And a prosthesis mode vector->

Adding an angular orthogonality constraint to obtain an orthogonality loss->

Expressed as:

wherein ,

representation->

and />

Is an inner product of (2);

for reconstructing real person image

Reconstructing a prosthetic image->

Calculate reconstruction loss with corresponding artwork>

Expressed as:

for true person identity vector

Prosthesis identity vector->

Calculate the maximum average difference loss->

Expressed as:

for reconstructing real person image

And reconstructing a prosthetic image->

Calculating pairing loss->

Expressed as:

wherein ,

n is the dimension of the hidden vector;

Expressed as:

Optimizing a real encoder, a prosthetic encoder and a decoder for a target;

the method comprises the specific steps of inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample, wherein the specific steps comprise:

Sampling the random noise with dimension n twice to obtain reconstructed prosthesis identity hidden vector +. >

Reconstruction of a prosthetic phantomHidden vector +.>

Reconstructing true person identity hidden vectors

Hiding the vector directly from the reconstructed prosthesis identity>

Copying, and then generating true and false image pairs through a decoder to serve as generated samples;

constructing a detector and teacher network;

2. The living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the detector and the teacher network have the same network structure, and are provided with a convolution layer, a batch normalization layer, a ReLU, three groups of units formed by residual blocks of convolution layer stack and skip level connection, a global averaging pooling layer and a full connection layer, and the full connection layer outputs classification vectors.

3. The living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the steps of sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to a teacher learning module to obtain a teacher semi-supervised loss, a pseudo label of unlabeled data and a teacher enhanced unlabeled loss comprise:

labeled sample { x _l ,y _l The input parameter is theta _T Teacher network T outputs teacher tagged prediction results

With the real label y _l Calculating cross entropy to obtain the label loss of teachers >

The concrete steps are as follows:

wherein CE represents cross entropy loss;

label-free sample x _u Enhanced unlabeled exemplar

Respectively input to teacher network T to obtainTeacher label-free prediction result T (x) _u ；θ _T ) And teacher enhanced label-free prediction result->

Calculating cross entropy loss of the two to obtain no-label loss of teachers->

The concrete steps are as follows:

label-free prediction of results T (x) from teacher _u ；θ _T ) Extracting the category of the maximum value as a pseudo tag

The concrete steps are as follows:

the teacher has label loss

And teacher no tag loss->

Weighted summation to obtain semi-supervision loss of teacher

Expressed as:

enhancing unlabeled prediction results for teachers

Expressed as:

4. the living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the steps of sending the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to a detector learning module to update the detector parameters and obtain the detector update loss include:

the enhanced unlabeled exemplar is to be processed

The feeding parameter is theta _D Detector D of (1) to obtain detector-enhanced label-free prediction results

And pseudo tag->

Calculating cross entropy to obtain detector enhancement no tag loss +.>

The concrete steps are as follows:

no tag loss of detector

wherein ,η_D Indicating the rate of learning of the detector,

representing gradient calculations;

And a new detector with tag loss->

Then the two are differenced to obtain the detector update loss +.>

The concrete steps are as follows:

5. the living body detection method based on double decoupling generation and semi-supervised learning according to claim 1, wherein the updating of the teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss, and detector updating loss comprises the following specific steps:

loss of detector update

No tag loss with teacher enhancement>

Multiplied by the total weight and then is semi-supervised by the teacher to lose

Adding to form teacher loss->

The teacher network is optimized by gradient descent method, expressed as:

wherein ,θ_T Representing parameters, theta, before optimizing teacher network′ _T Representing parameters, eta after teacher network optimization _T Represents the network learning rate of the teacher,

representing the gradient calculations.

6. A living body detection system based on double decoupling generation and semi-supervised learning, comprising: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a test module;