CN114663986A

CN114663986A - In-vivo detection method and system based on double-decoupling generation and semi-supervised learning

Info

Publication number: CN114663986A
Application number: CN202210329816.3A
Authority: CN
Inventors: 冯浩宇; 胡永健; 刘琲贝; 余翔宇; 葛治中
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-06-24
Anticipated expiration: 2042-03-31
Also published as: CN114663986B

Abstract

The invention discloses a living body detection method and a living body detection system based on double decoupling generation and semi-supervised learning, wherein the method comprises the following steps: preprocessing data to obtain an original sample of the RGB color channel image, and pairing the same identity to obtain a true and false image pair; the real person encoder outputs a real person identity vector, the prosthesis encoder outputs a prosthesis identity vector and a prosthesis mode vector, the three are combined and sent to the decoder to obtain a reconstructed true and false image pair, a double decoupling generation loss function is constructed, and noise is sent to the trained decoder to obtain a generated sample; constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample for the original sample and the generated sample, sending the labeled sample, the non-labeled sample and the enhanced non-labeled sample to a teacher learning module to obtain semi-supervised loss of a teacher, a pseudo label of the non-labeled sample and enhanced non-labeled loss, and updating network parameters of a detector and the teacher; determining a threshold value by using the verification set; and loading the test data to a detector to obtain a classification score, and judging a classification result according to a threshold value. The invention can improve the robustness of the in-vivo detection model.

Description

In-vivo detection method and system based on double-decoupling generation and semi-supervised learning

Technical Field

The invention relates to the technical field of face recognition anti-cheating detection, in particular to a living body detection method and system based on double decoupling generation and semi-supervised learning.

Background

Today, there is a dramatic increase in the business and industry in the context of using facial biometric technology, for example, facial unlocking technology may be used to protect personal privacy in electronic devices, and facial biometric technology may be used to authenticate payments. However, using the face as a biometric feature for authentication is not secure. Facial biometric systems may be vulnerable to spoofing attacks. Face spoofing attacks can be generally classified into four categories: 1) the photo attack is that an attacker deceives the authentication system by using a photo printed or displayed on a screen; 2) video replay attack, wherein an attacker utilizes a video deception authentication system of an attacker shot in advance; 3) the face mask attacks, and an attacker wears a face mask deception system elaborately manufactured according to the attacked; 4) and against sample attack, an attacker generates specific sample noise through a GAN network to interfere the face authentication system to generate wrong directional identity verification. These face spoofing attacks are not only low cost but also can spoof the system, severely impacting and threatening the application of the face recognition system.

The in-vivo detection plays a crucial role in preventing a face recognition system from being attacked by a prosthesis, and benefits from the strong feature extraction capability of a deep learning network, the in-vivo detection algorithm based on deep learning obtains better performance than the in-vivo detection algorithm based on the traditional manual feature algorithm, although most in-vivo detection algorithms based on deep learning can achieve good detection effect in a library, the cross-library detection performance is poor, and the main reason is that data inside and outside the library are collected under different conditions, for example, shooting equipment, ambient light and attack presentation equipment are different, so that the data inside and outside the library are distributed differently, and domain displacement exists between the two. When the diversity of the training data is insufficient, overfitting is easy to occur in the in-library learning process, and the cross-library generalization performance is poor. Although the cause can be judged, the problem is not easy to solve in the real world application process, and the living body detection model is difficult to collect labeled training samples in all scenes, so that most of the existing anti-spoofing data sets at present have insufficient diversity. For example, the common CASIA, Replay-attach and msu shooting devices are respectively 3, 2 and 2, and the shooting backgrounds are respectively 3, 2 and 1.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which is characterized in that living body/prosthesis characteristics are modeled through decoupling learning, a generated sample is synthesized in a potential space to expand a data set, the diversity of training data is improved, meanwhile, the high discrimination of the generated sample is ensured, and the influence of generated noise on model learning is reduced.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a living body detection method based on double decoupling generation and semi-supervised learning, which comprises the following steps:

picking up a face region image from an input image to obtain an RGB color channel image;

matching each true image of an original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;

constructing a real person encoder, a prosthesis encoder and a decoder;

sending the true image into a true encoder to obtain a true identity vector, sending the matched false image into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the true identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined true identity vector, the prosthesis identity vector and the prosthesis mode vector into a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function to optimize;

inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;

cutting a training set consisting of an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;

constructing a detector and teacher network;

constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample into the teacher learning module to obtain semi-supervised loss of a teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;

constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;

updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss;

iteratively updating parameters of the detector and the teacher network by using an optimizer according to the loss function, and storing the parameters of the teacher network and the detector after training is completed;

sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing detection rate, and taking the thresholds with the same value as a test judgment threshold;

and sending the RGB color channel map of the face of the test set into a trained detector to obtain classification scores, obtaining a final predicted label value according to a test decision threshold value, and calculating a reference index.

As a preferred technical solution, the construction of the real-person encoder, the prosthetic encoder, and the decoder specifically includes the steps of:

the backbone networks of the real person encoder and the prosthesis encoder adopt the same network structure, and are provided with a convolutional layer, an example normalization layer, a LeakyReLU and five groups of units consisting of residual blocks which are stacked by the pooling layer and the convolutional layer and are connected in a skipping level mode;

the real person encoder outputs a hidden layer real person vector through the full connecting layer, and the prosthesis encoder outputs a hidden layer prosthesis vector through the full connecting layer;

the decoder backbone network is built by a convolutional layer, an upper sampling layer, a Sigmoid active layer and a residual block, hidden layer real man vectors and hidden layer prosthesis vectors output by a real man encoder and a prosthesis encoder are input into a full connection layer firstly, pass through six groups of units formed by residual blocks and upper sampling layers which are stacked by the convolutional layer and connected in a skipping stage, and finally pass through the convolutional layer to output an image pair.

As a preferred technical scheme, the real image is sent into a real person encoder to obtain a real person identity vector, and the matched false image is sent into a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, and the method specifically comprises the following steps:

the human decoder outputs human hidden vectors including human identity vector mean

And true person identity vector variance

The prosthesis encoder outputs a prosthesis hidden vector comprising a prosthesis identity vector mean

Variance of identity vector of prosthesis

Mean of prosthesis mode vector

Variance of prosthesis mode vector

Sampling real person identity variation component from standard normal distribution

Identity variation of prosthesis

And a prosthetic mode variation component

Performing repeated parameter operation to respectively obtain real person identity vectors

Prosthesis identity vector

And a prosthesis mode vector

The concrete expression is as follows:

vector true person identity

Prosthesis identity vector

And a prosthesis mode vector

Merging into a hidden vector, inputting to a decoder, outputtingGenerating a reconstructed true-false image pair, including a reconstructed true-human image

And reconstructing the prosthesis image

As a preferred technical scheme, the method comprises the steps of merging a human identity vector, a prosthesis identity vector and a prosthesis mode vector, sending the merged human identity vector, prosthesis identity vector and prosthesis mode vector to a decoder to output a reconstructed true and false image pair, and constructing a double decoupling generation loss function for optimization, wherein the method comprises the following specific steps:

vector of prosthesis mode

Sending into the full connection layer fc (·), and outputting the prosthesis category vector

And with the actual prosthesis class label y_sCalculating cross entropy to obtain classification loss

Expressed as:

wherein k is the number of prosthesis classes, Y_sLabeling vectors for the one-hot coded prosthesis class;

identity vector in prosthesis

And a prosthesis mode vector

Adding angular quadrature constraints to obtain quadrature losses

Expressed as:

wherein <, > represents the inner product;

for reconstructing real person image

Reconstructing a prosthetic image

Calculating reconstruction losses with corresponding original images

Expressed as:

to real person identity vector

Prosthesis identity vector

Calculating the maximum average differential loss

Expressed as:

for reconstructing real person image

And reconstructing the prosthesis image

Calculating pairing loss

Expressed as:

constrained by calculating the KL divergence, as shown in the following equation:

wherein ,

n is the dimension of the hidden vector;

weighting and summing losses to obtain loss function of double decoupling generator

Expressed as:

wherein ,λ₁、λ₂、λ₃、λ₄Represents a corresponding weight value;

employing Adam optimizer to minimize double decoupling generator loss function

A real-man encoder, a prosthetic encoder and a decoder are optimized for the target.

As a preferred technical solution, the method for obtaining the generated samples from the standard normal distribution sampling noise input to the trained decoder comprises the following specific steps:

let the hidden vector dimension be n, from a standard normal distribution

Sampling random noise with dimension n twice to respectively obtain reconstructed prosthesis identity hidden vectors

And reconstructing the prosthesis pattern hidden vector

Reconstructing a real identity hidden vector

Concealing vectors directly from reconstructed prosthesis identity

The true and false image pairs are generated through a decoder as the generation samples.

The detector and the teacher network have the same network structure, and are provided with a convolutional layer, a batch normalization layer, a ReLU, three units consisting of residual blocks formed by convolutional layer stacking and skip level connection, a global average pooling layer and a full connection layer, and the full connection layer outputs a classification vector.

As a preferred technical scheme, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss, and the method specifically comprises the following steps:

labeled exemplars { x }_l,y_lThe input parameter is theta_TThe teacher network T outputs a teacher labeled prediction result T (x)_l；θ_T) And with the genuine label y_lCalculating cross entropy to obtain teacher labeled loss

The concrete expression is as follows:

where CE represents cross entropy loss;

unlabeled sample x_uAnd enhanced unlabeled samples

Respectively inputting the teacher network T to obtain a teacher unlabelled prediction result T (x)_u；θ_T) And teacher enhanced unlabeled prediction results

Calculating the cross entropy loss of the two to obtain the label-free loss of the teacher

The concrete expression is as follows:

predicting result T (x) from teacher without label_u；θ_T) The category to which the maximum value belongs is taken out as a false label

The concrete expression is as follows:

lose teachers with labels

No label loss with teacher

Weighted summation to obtain semi-supervised loss of teacher

Expressed as:

wherein s is the current step number s_tlLambda is the weight of no tag loss for the total number of steps;

enhancing unlabeled prediction results for teachers

Obtaining the enhanced label-free loss through a cross entropy function with the category h to which the maximum value taken out by the self belongs

Expressed as:

as a preferred technical solution, the method for updating the detector parameter by sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample to the detector learning module to obtain the detector update loss includes the following specific steps:

the reinforced label-free sample

The feeding parameter is theta_DDetector D to obtain detector enhanced label-free prediction results

And a pseudo tag

Calculating cross entropy to obtain detector enhanced label-free loss

The concrete expression is as follows:

loss of label for detector

Optimizing the detector by using a gradient descent method to obtain an optimized parameter theta'_DIs specifically represented as:

wherein ,η_DWhich represents the learning rate of the detector,

representing a gradient calculation;

labeled sample (x)_l,y_l) Respectively fed into the optimization parameters of theta_DIs theta 'and the optimized parameter is'_DObtaining an old detector with a loss of label

And loss of label to the new detector

Then the two are differenced to obtain the update loss of the detector

The concrete expression is as follows:

as a preferred technical scheme, the method for updating teacher network parameters by using teacher semi-supervised loss, teacher enhanced label-free loss and detector update loss comprises the following specific steps:

loss of detector updates

Enhancing label-free loss with teachers

Multiply and then partially monitor the loss with the teacher

Add up to form teacher's loss

The teacher network is optimized by a gradient descent method, and the method is represented as follows:

wherein ,θ_TRepresenting a parameter, theta ', before optimization of the teacher network'_TRepresenting the teacher's network optimized parameter, η_TThe network learning rate of the teacher is shown,

representing the gradient calculation.

The invention also provides a living body detection system based on double decoupling generation and semi-supervised learning, which comprises: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;

the data preprocessing module is used for matting human face region images from input images to obtain RGB color channel images, matching each true image of an original sample of the RGB color channel images to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;

the double decoupling generator building module is used for building a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;

the double decoupling generator training module is used for sending the real image into the real encoder to obtain a real person identity vector, sending the matched false image into the prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, combining the real person identity vector, the prosthesis identity vector and the prosthesis mode vector, sending the combined real person identity vector, the prosthesis identity vector and the prosthesis mode vector into the decoder to output a reconstructed real and false image pair, and constructing a double decoupling generation loss function for optimization;

the generated sample construction module is used for inputting standard normal distribution sampling noise to a trained decoder to obtain a generated sample;

the unsupervised data enhancement module is used for cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;

the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;

the teacher learning module is used for sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;

the detector learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples to the detector learning module to update the detector parameters to obtain the update loss of the detector;

the network parameter updating module is used for updating teacher network parameters by utilizing teacher semi-supervised loss, teacher enhanced label-free loss and detector updating loss, iteratively updating parameters of the detector and the teacher network by using an optimizer according to a loss function, and storing the parameters of the teacher network and the detector after training is finished;

the verification module is used for sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating a false alarm rate and a missing rate, and taking the threshold with the same value as a test judgment threshold;

the test module is used for sending the RGB color channel map of the face of the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) in the data generation stage, the invention adopts a real-person encoder, a prosthesis encoder and a decoder to construct a double-decoupling generator, inputs a true-false image pair of an original sample to train the double-decoupling generator, then inputs standard normal distribution sampling noise to the trained decoder in the double-decoupling generator to obtain a generated sample, and the generated sample is used as a part of unlabeled data to enrich the diversity of training data and solve the problem of insufficient diversity of the training data.

(2) In the training stage, a semi-supervised learning framework that a teacher generates pseudo labels and a detector feeds back is adopted, specifically, a teacher network provides the detector with the pseudo labels without label data to supervise the detector learning, the detector parameters are updated and then the performance is evaluated on the labeled data, and the generated pseudo labels are optimized by feeding back loss to the teacher network, so that the problems of model training under the condition of limited labeled training data and label uncertainty caused by fuzzy generated samples due to the defects of a variational self-encoder are solved; and high-discrimination characteristics of the image blocks are mined, and the learning capability of the network is improved, so that unseen sample acquisition environments can be better generalized.

(3) In the detection stage, the model tests and loads data to the detector to obtain corresponding classification scores, and the classification result is judged according to the threshold value.

Drawings

FIG. 1 is a schematic flow chart of the in-vivo detection method based on double-decoupling generation and semi-supervised learning according to the present invention;

FIG. 2 is a schematic diagram of a network structure of a real human encoder and a prosthesis encoder according to the present invention;

FIG. 3 is a schematic diagram of a network structure of a decoder according to the present invention;

FIG. 4 is a schematic diagram of the training phase of the dual de-coupling generator of the present invention;

FIG. 5 is a schematic flow chart of a generation phase of the dual decoupling generator of the present invention;

FIG. 6(a) is a schematic diagram of a live image generated by a dual decoupling generator of the present invention;

FIG. 6(b) is a schematic diagram of an image of a prosthesis generated by a dual decoupling generator of the present invention;

FIG. 7 is a schematic diagram of a network architecture of a detector and teacher network according to the present invention;

FIG. 8 is an overall framework diagram of semi-supervised learning of the present invention;

FIG. 9 is a block diagram of the living body detection system based on the dual decoupling generation and the semi-supervised learning.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment uses the Replay-attach, CASIA-MFSD and MSU _ MFSD biopsy data sets for training and testing as examples, and the implementation process of the embodiment is described in detail. The Replay-attach data set comprises 1200 videos, real human faces from 50 testers and generated deceptive human faces are collected by using a MacBook camera with the resolution of 320 x 240 pixels, and the real human faces and the deceptive human faces are divided into a training set, a verification set and a test set according to the ratio of 3:3: 4; the CASIA-MFSD data set comprises 600 videos, real faces from 50 testers and deceptive faces generated according to the real faces are collected by three cameras with the resolutions of 640 multiplied by 480 pixels, 480 multiplied by 640 pixels and 1920 multiplied by 1080 pixels respectively, and the videos are divided into a training set and a testing set according to the ratio of 2: 3; the MSU _ MFSD data set includes 280 videos, with real faces from 35 testers and spoofed faces generated therefrom, 15 for the training set and 20 for the testing set. Since the CASIA-MFSD and MSU _ MFSD live test datasets do not contain a validation set, the present embodiment performs threshold determination using the corresponding test set as the validation set for both datasets. And then framing the video of the data set to obtain a picture. The embodiment is carried out on a Linux system and is mainly implemented on the basis of a deep learning framework Pytrich1.6.1, and the used display cards are GTX1080Ti, CUDA version 10.1.105 and cudnn version 7.6.4.

As shown in fig. 1, the present embodiment provides a living body detection method based on double-decoupling generation and semi-supervised learning, including the following steps:

s1: picking up a face region image from an input image to obtain an RGB color channel image;

in this embodiment, the specific steps include: and detecting a face area of the input image by using an MTCNN face recognition algorithm, cutting and unifying the size to obtain a face image, wherein the face image is in an RGB format and has three color channels of red, green and blue. Then, matching each true image of the original sample of the RGB color channel image to be trained with a false image with the same identity to form a true-false image pair, and labeling an attack category label for the false image;

s2: constructing a real person encoder, a prosthesis encoder and a decoder to form a double decoupling generator;

in this embodiment, as shown in fig. 2, the respective backbone networks of the real encoder and the prosthetic encoder have the same network structure, and the input size is H × W × 3, and the input size is H × W × 32 after passing through the convolutional layer, the example normalization layer, the leakage relu, and five groups of pooling layers,The unit composed of residual blocks with the convolution layer stacking and jump level connection outputs 1/2, 1/4, 1/8, 1/16 and 1/32 with the original sizes respectively, the channel numbers are 64, 128, 256, 512 and 512 respectively, and the obtained unit has the size of

The feature vector of (2). After the output characteristics of the backbone network are obtained, the real man encoder outputs hidden layer real man vectors with the size of 2 x hdim through the full connection layer; the prosthesis encoder outputs a hidden layer prosthesis vector with a size of 4 × hdim through the full connected layer.

As shown in fig. 3, the decoder backbone network is built with convolutional layers, upsampling layers, Sigmoid active layers, and residual blocks. The input size is 3 × hdim, and first the input full link layer size is changed to

Then, through six groups of units consisting of residual blocks and up-sampling layers which are stacked and connected in a jump stage by convolutional layers, 1/32, 1/16, 1/8, 1/4, 1/2 and 1 with the original sizes are output, the channel numbers are 512, 256, 128 and 64 respectively, and finally, through convolutional layers, an image pair with the size of H multiplied by W multiplied by 6 is output.

S3: constructing a double decoupling generation module, wherein the double decoupling generation module consists of a double decoupling generator and a double decoupling generator loss function, the double decoupling generator consists of a real person encoder, a prosthesis encoder and a decoder, a real image is sent into the real person encoder to obtain a real person identity vector, a matched false image is sent into the prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, the three are combined and sent into the decoder to output a reconstructed real and false image pair, and the double decoupling generation loss function is constructed to optimize;

as shown in fig. 4, a true image and a false image, each having dimensions H × W × 3, are input to the true human decoder and the false human encoder, respectively. Setting hidden vector dimension as n, real person decoder obtains real person hidden vector with dimension 2 Xn including real person identity vector mean value with dimension n

And variance of identity vector of real person

The prosthesis encoder obtains a prosthesis hidden vector with dimension 4 x n, which comprises a prosthesis identity vector mean with dimension n

Variance of identity vector of prosthesis

Mean of prosthesis mode vector

Variance of prosthesis mode vector

Then sampling the real person identity variation component from the standard normal distribution

Identity variation of prosthesis

And a prosthesis mode variation component

Carrying out the heavy parameter operation shown as the following formula to respectively obtain the identity vectors of real persons

Prosthesis identity vector

And a prosthesis mode vector

Real person identity vector

Prosthesis identity vector

And a prosthesis mode vector

Combining into hidden vectors with dimension of 3 Xn, inputting into decoder, outputting reconstructed true and false image pairs with dimension of H × W × 6, wherein the reconstructed true and false image pairs include reconstructed true human image with dimension of H × W × 3

And reconstructing the prosthesis image

To better learn the prosthesis patterns of different attack patterns, a prosthesis pattern vector is used

As shown in the following formula:

wherein k is the number of prosthesis classes, Y_sThe vectors are labeled for the one-hot coded prosthesis class.

To make the prosthesis identity vector

And a prosthesis mode vector

Effective separation is carried out, and the orthogonal loss is obtained by adding angle orthogonal constraint between the two

As shown in the following formula:

wherein <, > represents the inner product.

For reconstructing real person image

And reconstructing the prosthesis image

Calculating reconstruction losses with corresponding original images

As shown in the following formula:

for keeping identity consistency of hidden vectors, a true person identity vector with dimension n is subjected

Prosthesis identity vector

Calculating the maximum average differential loss

As shown in the following formula:

in order to maintain identity consistency of reconstructed images, reconstructed real person images

And reconstructing the prosthesis image

Calculating pairing loss

As shown in the following formula:

to fit the distribution of the hidden vectors to a standard normal distribution, the constraint is performed by calculating the Kullback-Leibler divergence (KL divergence), as shown in the following equation:

wherein

Where n is the dimension of the concealment vector.

Finally, weighting and summing the losses to obtain a loss function of the double decoupling generator

As shown in the following formula:

wherein λ₁、λ₂、λ₃、λ₄The weights are weighted, and the optimal values are 10, 0.5, 0.1 and 1, respectively.

Employing Adam optimizer to minimize double decoupling generator loss function

And (3) optimizing the real-person encoder, the prosthesis encoder and the decoder for the target, and iteratively training T generations, wherein the optimal value of T is 200. The parameter update formula is as follows:

m_t+1＝β₁m_t+(1-β₁)g

v_t+1＝β₂v_t+(1-β₂)g²

wherein β₁，β₂Optimally set to 0.9, 0.999, the parameter epsilon preventing divide by zero is optimally set to 1e-8, the learning rate eta is optimally set to 0.0002, theta_tRepresenting the parameters of the t-th iteration.

S4: inputting standard normal distribution sampling noise into a trained decoder to obtain a generated sample;

as shown in FIG. 5, FIG. 6(a) and FIG. 6(b), the hidden vector dimension is n and is normally distributed from the norm

And reconstructing the prosthesis pattern hidden vector

In order to ensure identity consistency of true and false images, a human identity hidden vector is reconstructed

Concealing vectors directly from reconstructed prosthesis identity

After copying, the true and false image pairs can be generated by the decoder as the generation samples, as shown in fig. 6(a) and 6(b), the true and false image pairs generated by the dual decoupling generator are obtained, the number n of the dimension in this embodiment is optimally 128, and the number of the generation samples is optimally 6400.

S5: cutting a training set consisting of an original sample and a generated sample, dividing the training set into a labeled sample and a non-labeled sample, and performing random data enhancement on the non-labeled sample to obtain an enhanced non-labeled sample;

in this embodiment, first, a piece of RGB color channel map of size H × W to be trained is randomly cut out to have a size of H × W

And in the image block, N original samples to be trained are randomly selected from the original samples to be trained to serve as labeled data, the rest samples and the generated samples jointly form unlabeled data, and the number ratio of the labeled samples to the unlabeled samples is mu. And then random data enhancement is carried out on the unlabeled sample to obtain the enhanced unlabeled sample. The data enhancement method comprises the steps of maximizing contrast, adjusting brightness, adjusting color balance, adjusting contrast, adjusting sharpness, clipping, histogram equalization, inverting, randomly rotating pixel values with the lowest bit positions of 0-4, horizontally and vertically shearing, horizontally translating, vertically translating, inverting all pixel values higher than a certain threshold value, and randomly selecting two data enhancement methods for enhancement of each sample. The preferred values of H, W, N and mu in the embodiment are 256, 6000 and 4;

s6: constructing a detector and teacher network;

as shown in fig. 7, the teacher network and the detector have the same network structure, and the input size is H × W × 3, and the output size is 1, 1/2, 1/4 of the original size through the convolutional layer, the batch normalization layer, the ReLU to H × W × 16, and the unit composed of three sets of residual blocks stacked and connected by skip level, respectivelyThe number of tracks is 32, 64, 128, respectively, giving a size of

The feature vector is changed into 1 multiplied by 128 through the size of the global average pooling layer, and finally a classification vector with dimension of 2 is output through the full connection layer.

S7: a teacher learning module is constructed, the labeled sample, the unlabeled sample and the enhanced unlabeled sample are sent to the teacher learning module, and the semi-supervised loss of the teacher, the pseudo label of the unlabeled data and the enhanced unlabeled loss are obtained:

as shown in FIG. 8, first agree that CE (q, p) represents the cross entropy loss of two distributions q and p; if p is a label value, the one-hot encoding is firstly carried out to change the pseudo label vector. Let k be the number of label classes, the cross entropy loss is expressed as:

secondly, appointing argmax (v) to represent the index where the maximum value in the vector v is taken out;

for labeled samples x_l,y_lSending a parameter theta_TThe teacher network T outputs a teacher labeled prediction result T (x)_l；θ_T) And with the genuine label y_lCalculating cross entropy to obtain teacher labeled loss

Represented by the following formula.

For unlabeled sample x_uAnd the enhanced unlabeled sample obtained by one-time data enhancement

Respectively sent to a teacher network T to obtain a teacher label-free prediction knotFruit T (x)_u；θ_T) And teacher enhanced unlabeled prediction results

Calculating the cross entropy loss of the two results to obtain the teacher label-free loss

Teacher unlabeled prediction result T (x)_u；θ_T) The category to which the maximum value belongs is taken out as a false label

Represented by the following formula:

then the teacher is marked for loss

No label loss with teacher

Weighted summation to obtain semi-supervised loss of teacher

Represented by the following formula.

Where s is the current step number, s_tlλ is the weight of the label-free penalty for the total number of steps.

Enhancing unlabeled prediction results for teachers

Represented by the following formula.

S8: constructing a detector learning module, and sending the labeled sample, the enhanced unlabeled sample and the pseudo label of the unlabeled sample into the detector learning module to update the detector parameters to obtain the update loss of the detector;

as shown in fig. 8, the enhanced unlabeled specimen

And a pseudo tag

Calculating cross entropy to obtain detector enhanced label-free loss

Represented by the following formula:

let the learning rate of the detector be eta_DAccording to detector loss without label

Optimizing the detector by using a gradient descent method to obtain an optimized parameter theta'_DThe detector of (a), is represented by the following formula:

will have a sample (x) labeled_l,y_l) Respectively fed into the optimization parameters of theta_DIs θ 'and the optimized parameter'_DObtaining an old detector with a loss of label

And loss of label to the new detector

Then the difference is made between the two to obtain the update loss of the detector

Represented by the following formula.

S9: updating teacher network parameters by utilizing teacher semi-supervision loss, enhanced label-free loss and detector updating loss:

as shown in FIG. 8, loss of detector update

Strengthen no label with teacherLoss of power

Multiply and then with teacher's semi-supervised loss

Add up to form teacher's loss

Let the teacher's network learning rate be eta_TAnd optimizing the teacher network by using a gradient descent method, wherein the formula is as follows:

s10: iteratively updating the network parameters of the detector and the teacher network by using an optimizer according to the loss function, storing the parameters of the teacher network and the detector after training is completed,

represents the gradient calculation:

in this embodiment, both the teacher network and the detector use SGD optimizers with nersterov momentum, where the momentum is μ, the preferred value is 0.9, and the initial value of the learning rate is ε₀The optimal value is 0.05, and the learning rate is attenuated along with the training iteration times;

enhancing label-free loss from detectors in a detector learning module

Optimizing with a detector optimizer with the goal of minimizing losses; based on teacher loss in teacher update module

Optimizer using teacher networkOptimization is performed with the goal of minimizing losses.

S11: determining a threshold value by using the verification set;

in this embodiment, the specific steps include: sending the face RGB color channel map of the verification set into a detector to obtain a classification score p, then carrying out equal-interval sampling in a value range (0,1) to obtain different judgment thresholds, obtaining a predicted label value according to the threshold, comparing the predicted label value with a real label, calculating a false alarm rate and a false omission rate, and taking the threshold with the same value as a test judgment threshold T;

s12: testing the model;

in this embodiment, the specific steps include: and sending the RGB color channel map of the face in the test set into a detector to obtain a classification score p, then obtaining a final predicted label value according to a test judgment threshold T, and calculating a reference index.

The performance evaluation indexes of the living body detection algorithm in this embodiment adopt a False Acceptance Rate (FAR), a False Rejection Rate (FRR), a True Acceptance Rate (TAR), an Equal Error Rate (EER), a Half Error Rate (Half Total Error Rate, hter) and the confusion matrix of table 1 to describe the indexes in detail:

table 1 confusion matrix table

Tagging/prediction	The prediction is true	Prediction of false
			The label is true	TA	FR
The label is false	FA	TR

The False Acceptance Rate (FAR) refers to the ratio of the number of live faces determined by the non-live faces to the number of non-live faces labeled:

the False Rejection Rate (FRR) is the ratio of the number of non-live faces determined by live faces to the number of live faces labeled:

the correct acceptance rate (TAR) is the ratio of the number of live faces determined by the live faces to the number of live faces labeled:

equal Error Rate (EER) is the error rate when FRR and FAR are equal;

half error rate (HTER) is the mean of FRR and FAR:

in order to prove the effectiveness of the invention and test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on CASIA-MFSD, Replay-Attack and MSU-MFSD databases. The in-library and cross-library experimental results are shown in tables 2 and 3, respectively:

table 2 library of experimental results

TABLE 3 Cross-Bank Experimental results

As can be seen from Table 2, the half error rate and the equal error rate of the method in the library are both low, and the library has excellent performance of deception detection; as can be seen from Table 3, the half-error rate of cross-bank detection is also low; the training set is composed of labeled samples and unlabeled samples of a small number of frames extracted from each section of training set video, but the diversity of training data is enriched by generating data obtained through a double-decoupling generator, and the learning capability and model heuristic capability of limited sample characteristics are improved by training the model progressively through meta-learning. The experimental results prove that under the condition that the labeled training samples are insufficient, the high accuracy in the library is ensured, the cross-library error rate is greatly reduced, and the generalization performance is obviously improved.

As shown in fig. 9, the present embodiment further provides a living body detection system based on double-decoupling generation and semi-supervised learning, including: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;

in this embodiment, the data preprocessing module is configured to extract a face region image from an input image to obtain an RGB color channel image, pair each true image of an original sample of the RGB color channel image to be trained with a false image of the same identity to form a true-false image pair, and label an attack category label for the false image;

in this embodiment, the dual decoupling generator building module is used for building a real encoder, a prosthetic encoder and a decoder to form a dual decoupling generator;

in this embodiment, the dual decoupling generator training module is configured to send a true image to a true encoder to obtain a true identity vector, send a matched false image to a false encoder to obtain a false identity vector and a false pattern vector, combine the true identity vector, the false identity vector and the false pattern vector, send the combined true identity vector, false identity vector and false pattern vector to a decoder to output a reconstructed true and false image pair, and construct a dual decoupling generation loss function for optimization;

in this embodiment, the generated sample construction module is configured to input standard normal distribution sampling noise to a trained decoder to obtain a generated sample;

in this embodiment, the unsupervised data enhancement module is configured to crop a training set formed by an original sample and a generated sample, and construct a labeled sample, a unlabeled sample, and an enhanced unlabeled sample;

in this embodiment, the detector building module and the teacher network building module are respectively used for building a detector and a teacher network;

in this embodiment, the teacher learning module is configured to send the labeled sample, the unlabeled sample, and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of the unlabeled data, and enhanced unlabeled loss of the teacher;

in this embodiment, the detector learning module is configured to send the labeled sample, the enhanced unlabeled sample, and the pseudo label of the unlabeled sample to the detector learning module to update the detector parameters, so as to obtain an update loss of the detector;

in this embodiment, the network parameter updating module is configured to update the teacher network parameters by using the teacher semi-supervised loss, the teacher enhanced label-free loss, and the detector updating loss, iteratively update the parameters of the detector and the teacher network by using the optimizer according to the loss function, and store the parameters of the teacher network and the detector after training is completed;

in this embodiment, the verification module is configured to send a verification set face RGB color channel map to a trained detector to obtain a classification score, obtain a predicted label value according to different decision thresholds, compare the predicted label value with a real label, calculate a false alarm rate and a false omission factor, and take a threshold value when the predicted label value and the real label value are equal to each other as a test decision threshold;

in this embodiment, the test module is configured to send the test set face RGB color channel map to a trained detector, obtain a classification score, obtain a final predicted label value according to a test decision threshold, and calculate a reference index.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A living body detection method based on double decoupling generation and semi-supervised learning is characterized by comprising the following steps:

constructing a real person encoder, a prosthesis encoder and a decoder;

cutting a training set formed by an original sample and a generated sample, and constructing a labeled sample, a non-labeled sample and an enhanced non-labeled sample;

constructing a detector and teacher network;

constructing a teacher learning module, and sending the labeled sample, the unlabeled sample and the enhanced unlabeled sample to the teacher learning module to obtain semi-supervised loss of the teacher, pseudo labels of unlabeled data and enhanced unlabeled loss of the teacher;

and sending the RGB color channel map of the face in the test set to a trained detector to obtain a classification score, obtaining a final predicted label value according to a test judgment threshold value, and calculating a reference index.

2. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the construction of the real-person encoder, the prosthesis encoder and the decoder comprises the following specific steps:

the respective backbone networks of the real person encoder and the prosthesis encoder adopt the same network structure, and are provided with a convolution layer, an example normalization layer, a LeakyReLU and five groups of units consisting of residual blocks which are formed by stacking and grade-skipping connection of the pooling layer and the convolution layer;

3. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the true image is sent to a true human encoder to obtain a true human identity vector, and the matched false image is sent to a prosthesis encoder to obtain a prosthesis identity vector and a prosthesis mode vector, and the specific steps include:

And true person identity vector variance

Variance of identity vector of prosthesis

Mean of prosthesis mode vector

Variance of the prosthesis mode vector

Identity variation of prosthesis

And a prosthetic mode variation component

Prosthesis identity vector

And a prosthesis mode vector

The concrete expression is as follows:

vector true person identity

Prosthesis identity vector

And a prosthesis mode vector

Merging into hidden vector, inputting to decoder, outputting reconstructed true and false image pair including reconstructed true human image

And reconstructing the prosthesis image

4. The in-vivo detection method based on double-decoupling generation and semi-supervised learning of claim 3, wherein the true human identity vector, the prosthesis identity vector and the prosthesis mode vector are merged and sent to a decoder to output a reconstructed true and false image pair, a double-decoupling generation loss function is constructed for optimization, and the specific steps include:

vector of prosthesis mode

Feeding in the fully-connected layer fc (, output prosthesis class vector

Expressed as:

identity vector in prosthesis

And a prosthesis mode vector

Adding angular quadrature constraints to obtain quadrature losses

Expressed as:

wherein <, > represents the inner product;

for reconstructing real person image

Reconstructing a prosthetic image

Calculating reconstruction losses with corresponding original images

Expressed as:

to real person identity vector

Prosthesis identity vector

Calculating the maximum average differential loss

Expressed as:

for reconstructing real person image

And reconstructing the prosthesis image

Calculating pairing loss

Expressed as:

wherein ,

n is the dimension of the hidden vector;

Expressed as:

wherein ,λ₁、λ₂、λ₃、λ₄Represents a corresponding weight value;

employing Adam optimizer to minimize double decoupling generator loss function

5. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the generating samples are obtained from a standard normal distribution sampling noise input to a trained decoder, and the specific steps comprise:

let the hidden vector dimension be n, from a standard normal distribution

And reconstructing the prosthesis pattern hidden vector

Reconstructing a real identity hidden vector

Concealing vectors directly from reconstructed prosthesis identity

6. The in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the detector and teacher network have the same network structure, and are provided with a convolutional layer, a batch normalization layer, a ReLU, three units of residual blocks formed by convolutional layer stacking and skip level connection, a global average pooling layer and a full connection layer, and the full connection layer outputs classification vectors.

7. The in-vivo detection method based on double-decoupling generation and semi-supervised learning of claim 1, wherein the labeled samples, the unlabeled samples and the enhanced unlabeled samples are sent to a teacher learning module to obtain teacher semi-supervised loss, pseudo labels of unlabeled data and teacher enhanced unlabeled loss, and the specific steps include:

labeled exemplars { x }_l,y_lThe input parameter is theta_TThe teacher network T outputs a teacher labeled prediction result T (x)_l；θ_T) And the genuine label y_lCalculating cross entropy to obtain teacher marked markLoss of label

The concrete expression is as follows:

where CE represents cross entropy loss;

unlabeled sample x_uAnd enhanced unlabeled samples

The concrete expression is as follows:

predicting result T (x) from teacher without label_u；θ_T) Taking the category to which the maximum value belongs as a pseudo label

The concrete expression is as follows:

lose teachers with labels

No label loss with teacher

Weighted summation to obtain semi-supervised loss of teacher

Expressed as:

enhancing unlabeled prediction results for teachers

Expressed as:

8. the in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the pseudo labels of the labeled samples, the enhanced unlabeled samples and the unlabeled samples are sent to a detector learning module to update detector parameters, so as to obtain a detector update loss, and the specific steps include:

the reinforced label-free sample

And a pseudo tag

Calculating cross entropy to obtain detector enhanced label-free loss

The concrete expression is as follows:

loss of label for detector

wherein ,η_DWhich represents the learning rate of the detector,

representing a gradient calculation;

labeled sample (x)_l,y_l) Respectively feeding the parameters to be optimized to be theta_DIs theta 'and the optimized parameter is'_DObtaining an old detector with a loss of label

And loss of label to the new detector

Then the two are differenced to obtain the update loss of the detector

The concrete expression is as follows:

9. the in-vivo detection method based on double-decoupling generation and semi-supervised learning as claimed in claim 1, wherein the teacher network parameters are updated by teacher semi-supervised loss, teacher enhanced non-label loss and detector update loss, and the specific steps include:

loss of detector updates

Enhancing label-free loss with teachers

Multiply and then partially monitor the loss with the teacher

Add up to form teacher's loss

representing the gradient calculation.

10. A living body detection system based on double decoupling generation and semi-supervised learning, characterized by comprising: the system comprises a data preprocessing module, a double decoupling generator building module, a double decoupling generator training module, a generated sample building module, an unsupervised data enhancement module, a detector building module, a teacher network building module, a teacher learning module, a detector learning module, a network parameter updating module, a verification module and a testing module;

the double decoupling generator training module is used for sending the real image into the real person encoder to obtain a real person identity vector, sending the matched false image into the false body encoder to obtain a false body identity vector and a false body mode vector, combining the real person identity vector, the false body identity vector and the false body mode vector, sending the combined combination into the decoder to output a reconstructed real and false image pair, and constructing a double decoupling generation loss function to optimize;

the detector learning module is used for sending the labeled samples, the enhanced unlabeled samples and the pseudo labels of the unlabeled samples to the detector learning module to update the detector parameters, so that the update loss of the detector is obtained;

the network parameter updating module is used for updating teacher network parameters by utilizing teacher semi-supervision loss, teacher enhanced label-free loss and detector updating loss, iteratively updating parameters of the detector and the teacher network by using an optimizer according to a loss function, and storing the parameters of the teacher network and the detector after training is completed;

the verification module is used for sending the RGB color channel map of the face of the verification set into a trained detector to obtain classification scores, obtaining predicted label values according to different judgment thresholds, comparing the predicted label values with real labels, calculating the false alarm rate and the missing rate, and taking the threshold with the same value as the test judgment threshold;