CN116403290A

CN116403290A - Living body detection method based on self-supervision domain clustering and domain generalization

Info

Publication number: CN116403290A
Application number: CN202310287952.5A
Authority: CN
Inventors: 杨若瑜; 姚凯哲
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2023-03-23
Filing date: 2023-03-23
Publication date: 2023-07-07

Abstract

The invention discloses a living body detection method based on self-supervision domain clustering and domain generalization, which comprises the steps of generating a domain information migration sample, constructing a self-supervision domain clustering network, training, and training a sample with a pseudo domain label by using a domain generalization living body detection model; firstly, generating positive and negative samples by a data enhancement method of domain information migration, and then using a self-supervision domain clustering module to distribute pseudo domain labels for the samples to obtain a source domain data set of domain generalization living body detection; the domain generalization living detection module is trained by using a countermeasure training mode, the memory triplet loss auxiliary optimization model is used, and the characteristics trained before are saved by using an unbalanced memory cache updating strategy. The invention can autonomously allocate the domain label for the face sample, directly perform domain generalization living body detection training on the face data without the domain label, has more flexible generalization capability, saves the labeling cost and has larger application value.

Description

Living body detection method based on self-supervision domain clustering and domain generalization

Technical Field

The invention relates to a living body detection method, in particular to a living body detection method based on self-supervision domain clustering and domain generalization.

Background

The face recognition technology is widely applied to the scenes of identity authentication, security control, social networks, photo management and the like. However, due to technical limitations, face recognition techniques may fail in the face of various forgery attacks. Among the most common attack modes are presentation type attack, namely attack on the face recognition system by means of face-flipped photos, face video playback, face fake masks and the like. Therefore, in order to improve the safety and accuracy of the face recognition technology, the living body detection technology is receiving increasing attention from researchers.

The living body detection technique is a technique for distinguishing a real face from artificially synthesized face information. It usually analyzes the biological characteristics in the facial image or video to detect whether there is a trace of the forged face in the facial image, and distinguishes whether it is a still picture, video, face information artificially synthesized by a 3D mask, etc. Common biological features include, among others, eyes, mouth and nose, facial expression, surface texture, facial albedo, etc.

In-vivo detection based on domain generalization is a popular research direction in recent years. Previous in-vivo detection methods based on deep learning perform well on a single training set, but may encounter new, unknown forgery attacks during testing, resulting in failure of the detection model. The field generalization-based living body detection method is proposed to enable the model to migrate to new and unknown fake attacks better, so that the robustness and accuracy of the detection model are improved. The method has the core ideas that the face information of the true face and the fake face belonging to different distributions is respectively mapped into different feature spaces through a training algorithm, and the feature spaces of the data belonging to different distributions have a certain degree of overlap, so that the model learns the general face attack representation in the information of a plurality of domains. Thus, when the algorithm faces a new attack sample, whether the face exists truly can be judged through the distribution state of the new attack sample in the feature space.

The living body detection technology based on domain generalization can adapt to various different environments, and has good generalization performance and higher accuracy. Such methods typically guide model learning during the training phase with the data set to which the sample belongs as an explicit domain label. However, the attack samples in the real scene often come from a plurality of different shooting devices and scenes, the samples comprise a mixture of a plurality of different domains, and the specific domain label of each sample is unknown and cannot be directly applied to the prior domain generalization model.

Disclosure of Invention

The invention aims to: the invention aims to solve the technical problem of providing a living body detection method based on self-supervision domain clustering and domain generalization aiming at the defects of the prior art.

In order to solve the technical problems, the invention discloses a living body detection method based on self-supervision domain clustering and domain generalization, which generates pseudo domain labels of face samples in a self-supervision domain clustering mode, trains sample data with the pseudo domain labels by using the domain generalization method, and obtains classification results of living body detection, and the specific method comprises the following steps:

step 1, generating a self-supervision domain clustering sample, namely randomly selecting data belonging to different domains from images of a mixed source domain data set as a sample, and performing domain information migration between two images based on a low-frequency signal on an image frequency domain to generate a positive sample and a negative sample, wherein the specific method comprises the following steps of:

Step 1-1, randomly sampling two different face images from a mixed source domain data set to serve as a source image and a migration target image, obtaining frequency domain signals of the two images by using Fourier transformation, and decomposing amplitude components from the frequency domain signals

And phase component->

The specific method comprises the following steps:

where x is an input source image or a migration target image,

is a fourier transform function, arg (·) represents taking a phase value for the complex number;

step 1-2, filtering amplitude components of frequency domain signals of a source image by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; filtering amplitude components of the frequency domain signals of the migration target image by using a Gaussian high-pass filter to obtain high-frequency signals of the migration target image;

step 1-3, fusing the low frequency signal of the source image and the high frequency signal of the migration target image in proportion, and transforming the mixed frequency domain signal back to a two-dimensional image by using Fourier inversion to obtain an image x after domain information migration _DT The specific method comprises the following steps:

wherein, the liquid crystal display device comprises a liquid crystal display device,

is the amplitude component, x, of the domain information migration _s ,x _t H is a source image and a migration target image _L ,H _H Is a Gaussian low-pass filter and a Gaussian high-pass filter, < >>

Is an inverse Fourier transform function, lambda ₁ ,λ ₂ J is an imaginary unit for the super parameter;

step 1-4, performing auxiliary enhancement on a source image and a domain information migration image by using a contrast learning data enhancement method to obtain positive and negative samples; wherein, the image of the positive sample through domain information migration contains domain information similar to the source image; the negative samples are other target images from different domain distributions.

Step 2, constructing a self-supervision domain clustering network and training the self-supervision domain clustering network, wherein the training is to train a domain clustering method based on contrast learning, construct a learning task, extract domain related features from positive and negative samples containing domain information obtained in the step 1, and perform domain clustering according to the distribution of the samples in a feature space; the self-supervision domain clustering network consists of a double-branch backbone network, a feature mapping head, a feature dictionary and a feature clustering device;

the method comprises the steps that a ResNet-18 network model is adopted in the dual-branch trunk network, two branches are respectively a query coding network and a momentum coding network, and positive samples obtained in the step 1 are respectively sent into the two branches to extract domain related features; mapping the extracted domain related features by using a feature mapping head to obtain comparison features q and k ₊ The method comprises the steps of carrying out a first treatment on the surface of the The two branches are initialized by adopting the same weight parameters, and the weight parameters are updated by using different methods, wherein the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters by using a momentum updating mode, wherein the momentum updating mode of the weight parameters is as follows:

w _k ＝mw _k +(1-m)w _q

wherein w is _k For inquiring the weight of the coding network, m is a momentum coefficient, the value is between 0 and 1, and w _q Weights for the momentum encoding network;

the feature mapping head adopts a structure of a multi-layer perceptron and performs feature mapping on features of two similar samples extracted by the double-branch encoder;

the feature dictionary adopts a queue structure and is used for storing feature representations generated by previous batches through momentum network branches as negative sample features for comparison with the sample features of the current batches;

the feature clustering device clusters the domain related features extracted by the momentum encoder by adopting a K average clustering algorithm to obtain a clustering center of each clustering cluster;

and optimizing the self-supervision domain clustering network by using a sample comparison learning loss function and a clustering center comparison learning loss function.

The self-supervision domain clustering network is trained, and comprises the following steps:

step 2-1, calculating a cluster and a cluster center to which a sample belongs, wherein the method specifically comprises the following steps:

before each round of training starts, all training samples x= { X ₁ ,x ₂ ,…,x _N Input momentum coding branches to obtain a domain related characteristic representation Z= { Z of each sample ₁ ,z ₂ ,…,z _N -wherein N represents the number of samples; then, clustering the domain related characteristic representation Z of each sample by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } ₁ ,c ₂ ,…,c _K }。

Step 2-2, calculating the similarity between the contrast features, specifically including:

calculating the similarity sim (q, k) between two positive sample features ₊ ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the feature k stored in the feature dictionary by adopting the feature representation generated by the batch through momentum network branches before the feature dictionary is stored _i As a negative sample feature, calculate its similarity sim (q, k) _i ) The method comprises the steps of carrying out a first treatment on the surface of the Similarity is calculated using the inner product of the two features.

2-3, constructing a comprehensive loss function of the self-supervision domain clustering network, and training the self-supervision domain clustering network by calculating the comprehensive loss function, wherein the method specifically comprises the following steps of:

construction sample contrast loss L _s Loss L in contrast to cluster _c In particular, theThe method comprises the following steps:

where r is the number of negative samples; tau sum

Is a temperature coefficient used for controlling the shape of the characteristic distribution;

the comprehensive loss function of the self-supervision domain clustering network is constructed as follows:

L _CL ＝L _s +λL _c

wherein lambda is the balance coefficient;

the self-supervision clustering network is trained by calculating the comprehensive loss function, namely, the result of self-supervision domain clustering is used as a pseudo domain label to be distributed to the belonging samples, specifically: dividing the clustering result into K parts according to the clustering clusters, wherein each part represents a domain; assigning a domain number to each domain from 1 as a pseudo domain label; a corresponding pseudo-domain label is assigned to each sample of the cluster.

Training the sample with the pseudo domain label by using a domain generalization living body detection model, namely, using the sample distributed with the pseudo domain label according to the clustering result as a source domain data set, and extracting living body domain invariant features of the sample by using a domain generalization method based on countermeasure training; and taking the final output of the domain generalization living body detection model obtained by training as a living body detection classification result to finish living body detection based on self-supervision domain clustering and domain generalization.

The network structure of the domain generalization living body detection model comprises: the device comprises a feature generator, a feature classifier, a domain discriminator and a feature memory cache;

Obtaining a characteristic representation of the sample by using a characteristic generator, sending the generated characteristic into a characteristic classifier, and calculating a classification cross entropy loss; inputting the feature representation into a domain discriminator to obtain a domain discrimination result, training a feature generator and the domain discriminator in a countermeasure training mode, calculating domain countermeasure loss, and reducing domain deviation contained in the feature; splicing the generated features with the features in the memory buffer, calculating the memory triplet loss of the features, and updating the feature memory buffer according to the calculation result of the loss function; finally, the domain generalization network model is optimized by using the classification cross entropy loss, the antagonism loss and the memory triplet loss.

The specific calculation method of the fight loss and memory triplet loss is as follows:

optimizing a network formed by the feature generator and the domain discriminator by using a countermeasure training mode, reversing the gradient from the domain discriminator to the feature generator when the countermeasure loss carries out gradient back propagation, so that training targets between the domain discriminator and the feature generator are opposite, and the whole countermeasure training mode is as follows:

wherein X and Y _D Respectively a sample set and a pseudo domain label set, G is a feature generator, D is a domain discriminator, K is the number of source domains,

For the indicator function, when i=y has a value of 1, otherwise 0, y represents the domain label of sample x; calculating the countering loss by using a binary cross entropy function;

using a feature memory cache storage domain to generalize feature representations of previous rounds of iteration of the living body detection model; splicing the features in the memory cache with the features generated in the iteration, and calculating the memory triplet loss, namely a specific loss function L _MemTriplet The expression is as follows:

wherein F is _n For the feature set of the iteration of this round, M is the feature set in the memory buffer, f ^a ,f ^p ,f ⁿ Respectively an anchor point sample characteristic, a positive sample characteristic and a negative sample characteristic, wherein alpha is an edge coefficient;

and using an unbalanced characteristic memory cache updating strategy, namely adding the first h difficult sample characteristics of the true human face and all forged human face sample characteristics in the memory triplet loss function calculation result into the characteristic memory cache every training iteration.

The model is optimized by using the classification cross entropy loss, the antagonism loss and the memory triplet loss, namely the comprehensive loss function of the construction domain generalization living body detection model is as follows:

L _dg ＝L _cls +λ ₁ L _ada +λ ₃ L _MemTriplet

wherein L is _cls A two-class cross entropy loss which is the living body characteristic of the human face; lambda (lambda) ₁ ,λ ₂ Is a super parameter.

The beneficial effects are that:

(1) The invention provides a data enhancement method for domain information migration, which realizes migration of image domain information by using image frequency domain transformation and frequency domain filtering and mixing methods, and can keep original structural information of a migrated image, thereby helping a model learn better domain related characteristics.

(2) The false domain label of the face image is generated by using the methods of contrast learning, clustering and the like, so that the problem of face domain label missing in a mixed data set is solved, and domain generalization living body detection training can be performed without depending on a domain label marked in advance.

(3) The commonality of different face features is learned by adopting the domain countermeasure training and memory triplet loss function, the influence of domain deviation is reduced, the features participating in domain generalization training are enriched by utilizing the feature memory cache, and the generalization capability of the model is further improved. Meanwhile, the method also provides an unbalanced memory cache updating strategy, and the attention degree of the model to the inter-class difference and the intra-class difference between the true face and the false face is redistributed, so that the convergence speed of the model is accelerated, and the stability of the model is improved.

Drawings

The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention.

Fig. 2 is an overall flow chart of an embodiment of the present invention.

Fig. 3 is a diagram of domain information migration data enhancement effect according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a self-supervision domain clustering network according to an embodiment of the present invention.

Detailed Description

The invention provides a living body detection method based on self-supervision domain clustering and domain generalization, which comprises the following steps:

(1) Self-supervision domain cluster sample generation

And randomly selecting data samples belonging to different domains from the mixed source domain image, and generating positive and negative samples required by a self-supervision domain clustering algorithm by using a data augmentation method of domain information migration.

The method for performing domain information migration between two images based on the low-frequency signals on the image frequency domain comprises the following specific steps:

1.1 randomly sampling two different face images from a mixed source domain dataset as a source image and a migration target image. A frequency domain signal of the two images is obtained using Fourier transformation, and an amplitude component and a phase component are decomposed from the frequency domain signal. The transformation is specifically expressed as:

where x is an input source image or a migration target image,

is a fourier transform function, arg (·) represents taking a phase value for the complex number.

1.2, filtering amplitude components of the source image frequency domain signals by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; and filtering the amplitude component of the frequency domain signal by using a Gaussian high-pass filter to obtain a high-frequency signal of the migration target image.

1.3 fusing the low frequency signal of the source image and the high frequency signal of the migration target image in proportion, and transforming the mixed frequency domain signal back to the two-dimensional image by using Fourier inversion to obtain an image x after domain information migration _DT . The specific migration process can be expressed as:

wherein x is _s ,x _t H is a source image and a migration target image _L ,H _H Is a gaussian low pass filter and a gaussian high pass filter,

is an inverse Fourier transform function, lambda ₁ ,λ ₂ Is a super parameter.

1.4, auxiliary enhancement is carried out on the source image and the domain information migration image by using a common contrast learning data enhancement method, and a final result is used as an input sample of the self-supervision clustering model.

(2) Constructing self-supervision domain clustering network and training

Training by using a domain clustering method based on contrast learning, wherein the network structure of the domain clustering method consists of a double-branch backbone network, a feature mapping head, a feature dictionary and a feature clustering device;

the dual-branch backbone network adopts a ResNet-18 model, two branches are a query (query) coding network and a momentum (momentum) coding network respectively, and positive samples generated by domain information migration in the step 1 are respectively sent into the two branches to extract domain related characteristics. The dual-branch backbone network is initialized by adopting the same weight parameters, and weight parameter updating is carried out by using different methods; the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters in a momentum updating mode, so that consistency of the front and rear extracted features is ensured.

The feature mapping head adopts a structure of a multi-layer perceptron (MLP) to perform feature mapping on features of two similar samples extracted by the double-branch encoder. The use of feature mapping helps the network identify domain-specific features of each input image and improves the generalization ability of the network over different faces.

The feature dictionary adopts a queue structure and is used for storing feature representations generated by previous batches through momentum network branches as negative sample features for comparison with current batch sample features.

And the feature clustering device clusters the domain related features extracted by the momentum encoder by adopting a K-means clustering algorithm (K-means) to obtain a clustering center of each clustering cluster.

And optimizing the network by using the sample comparison learning loss function and the clustering center comparison learning loss function.

The training step of the self-supervision domain clustering network in the step 2 comprises the following steps:

2.1 before each round of training starts, all training samples x= { X ₁ ,x ₂ ,…,x _N Input momentum coding branches to obtain a domain related characteristic representation Z= { Z of each sample ₁ ,z ₂ ,…,z _N -a }; then clustering Z by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } ₁ ,c ₂ ,…,c _K }。

2.2 for each training iteration, the samples that were migrated with domain information generated in step 1

And->

Respectively input inquiry code network f _q (-) and momentum encoding network f _k (. Cndot.) obtaining a characteristic representation h of two domain-like samples _i ,f ₊ ：

2.3 mapping the extracted domain related features by using a feature mapping head to obtain final two contrast features q, k ₊ ：

q＝g _q (h),k ₊ ＝g _k (h ₊ )

Wherein g (-) is a multilayer perceptron (MLP) comprising a hidden layer.

2.4 calculation of the similarity sim (q, k) ₊ ) And querying sample features q and negative sample features k stored in a feature dictionary _i Similarity sim (q, k) _i ) The similarity is calculated here using the inner product of two features:

sim(v _i ,v _j )＝v _i ·v _j

2.5 calculating the characteristic q of the query sample and the cluster center c of the query sample _q Similarity sim (q, c) _q ) Querying sample feature q and other cluster centers c _j Similarity sim (q, c) _j )。

2.6 constructing the final composite loss function as follows:

L _CL ＝L _s +λL _c

L _s for sample contrast loss, the method is used for shortening the distance between positive samples and simultaneously pushing the distance between negative samples, and is used as a main optimization target of a self-supervision domain clustering network; l (L) _c For cluster comparison loss, the method is used for zooming in samples belonging to the same domain and pushing away samples of different domains, and is used as an auxiliary optimization target of a network; lambda is the equilibrium coefficient. Here, theIs constructed using the InfoNCE loss function commonly used for contrast learning:

Wherein r and k are the number of negative samples and the number of cluster centers respectively; tau sum

Is a temperature coefficient, is used to control the shape of the feature distribution.

2.7 training the self-supervision domain clustering network by calculating the comprehensive loss function, and distributing the final clustering result as a pseudo domain label to the belonging sample. The method comprises the following steps: dividing the clustering result into K parts according to the clustering clusters, wherein each part represents a domain; assigning a domain number to each domain from 1 as a pseudo domain label; a corresponding pseudo-domain label is assigned to each sample of the cluster.

(3) Training samples with pseudo-domain labels using domain-generalized living body detection model

The network structure of the domain generalization living body detection model comprises a feature generator, a feature classifier, a domain discriminator and a feature memory cache.

And obtaining a characteristic representation of the sample by using a characteristic generator, feeding the generated characteristic into a characteristic classifier, and calculating a classification cross entropy loss. The feature representation is input into a domain discriminator to obtain a domain discrimination result, a feature generator and the domain discriminator are trained in a countermeasure training mode, domain countermeasure loss is calculated, and domain deviation contained in the feature is reduced. And splicing the generated features with the features in the memory buffer, calculating the memory triplet loss of the features, and updating the feature memory buffer according to the calculation result of the loss function. Finally, the domain generalization network model is optimized by using the classification cross entropy loss, the antagonism loss and the memory triplet loss.

The feature generator uses the ResNet-18 model as the backbone network and initializes the network with weight parameters that are pre-trained on ImageNet. Inputting the RGB face image obtained by up-sampling the source domain into a feature generator, and normalizing the output features by using L2 regularization to obtain a feature representation of the sample

Wherein G is a feature generator,

for input samples, d is the domain to which the sample belongs and Norm is the L2 canonical function.

The domain discriminator is structured as a multi-layer perceptron with a hidden layer. And distinguishing the domain related information of the generated features by using a domain discriminator G, and predicting the domain label to which the input sample belongs. Calculating the countermeasures loss L by using the pseudo domain labels generated in the step 2 as supervision information _ada 。

Optimizing a network formed by a feature generator and a domain arbiter by using a countermeasure training mode, wherein the network comprises the following concrete steps: when the counter-propagation of the gradient is carried out on the counter-propagation loss, the gradient from the domain discriminator to the feature generator is reversed, so that the training targets between the domain discriminator and the feature generator are opposite, the domain discriminator is optimized, the capability of the feature generator for distinguishing the domain related information of the sample is weakened, and the domain deviation of the generated features is reduced. The overall form of optimization of the challenge training is as follows:

Wherein X, Y _D Respectively a sample set and a pseudo domain label set; the countermeasures losses are calculated using a binary cross entropy function.

The feature representations of the first few iterations of the network are stored in a feature memory cache for enriching the diversity of features involved in training. And splicing the features in the memory cache with the features generated in the iteration, and calculating the memory triplet loss. The specific loss function is expressed as follows:

wherein F is _n For the feature set of the iteration of this round, M is the feature set in the memory buffer, f ^a ,f ^p ,f ⁿ The method comprises the steps of respectively obtaining anchor point sample characteristics, positive sample characteristics and negative sample characteristics, wherein alpha is an edge coefficient and is used for controlling the difference degree of distances between similar samples and non-similar samples.

In order to make domain generalization networks more concerned about inter-class differences between faces and counterfeited faces than intra-class differences, an unbalanced feature memory cache update strategy is used. And adding the first h difficult sample features of the true human face and all forged human face sample features in the memory triplet loss function calculation result into a feature memory cache every training iteration.

The comprehensive loss function of the domain generalization living body detection model is as follows:

L _dg ＝L _cls +λ ₁ L _ada +λ ₂ L _MemTriplet

wherein lambda is ₁ ,λ ₂ Is a super parameter.

Examples:

as shown in fig. 1, in a preferred embodiment of the present invention, a living body detection model based on self-supervision domain clustering and domain generalization is provided, and pseudo domain labels can be generated on a mixed living body detection data set with domain labels missing, and domain generalization living body detection training can be performed. As shown in fig. 2, the method specifically includes the following steps:

(1) The data augmentation method of domain information migration is used to generate positive and negative samples required by the self-supervision domain clustering algorithm.

The method for performing domain information migration between two images based on low-frequency signals on an image frequency domain comprises the following specific steps:

1.1 randomly sampling two different face images from a mixed source domain dataset as a source image and a migration target image. In the embodiment of the invention, the mixed source domain data set is composed of a plurality of sample sets from different domains, and each sample is a real or fake face RGB image. And respectively performing Fourier frequency domain transformation on the two images to obtain frequency domain signals of the two images, and decomposing amplitude components and phase components from the frequency domain signals. The transformation is specifically expressed as:

where x is an input source image or a migration target image,

is a fourier transform function, arg (·) represents taking a phase value for the complex number. In the present example, in order to accelerate the computation, the image is subjected to frequency domain conversion using a Fast Fourier Transform (FFT) in particular.

1.2, filtering amplitude components of the source image frequency domain signals by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; and filtering the amplitude component of the frequency domain signal by using a Gaussian high-pass filter to obtain a high-frequency signal of the migration target image. In the example of the invention, the cut-off frequency parameter D of the Gaussian low-pass filter and the Gaussian high-pass filter ₀ The values are all in [1,5 ]]And the domain information of the source image and the structure information of the target image can be effectively extracted.

FIG. 3 shows the effect of the domain information migration data enhancement method, wherein the left side is a source face image, the right side is a first-action migration target face image, the second action is a face image after domain information migration, and the face image in the figure is selected from four public data sets including MSU-MFSD, CASIA-SURF, OULU-NPU and Idiap Replay-attach.

1.4 auxiliary data augmentation is performed on the source image and the domain information migration image, in the example of the invention, random image clipping, random horizontal flipping and random blur change are used for image data augmentation to improve the robustness of subsequent model training. And taking the final result as an input sample of the self-supervision domain clustering model.

(2) Constructing self-supervision domain clustering network and training

Training was performed using a domain clustering method based on contrast learning. Fig. 4 shows a specific structure of a Self-supervised domain clustered network (Self-Supervised Domain Clustering Network, SDCN). The SDCN is composed of a query coding network, a momentum coding network, a feature mapping head, a feature dictionary and a clustering module, and the network is optimized by using a sample comparison loss function and a clustering comparison loss function, and the training steps comprise:

2.1 before each round of training starts, all training samples x= { X ₁ ,x ₂ ,…,x _N Input momentum encodingBranching to obtain domain related characteristic representation Z= { Z of each sample ₁ ,z ₂ ,…,z _N -a }; then clustering Z by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } ₁ ,c ₂ ,…,c _K }. In the embodiment of the invention, in order to reduce the influence of noise data on training, abnormal points in a clustering result are discarded.

And->

Respectively input inquiry code network F _q (-) and momentum encoding network F _k (. Cndot.) the use of a catalyst. The coding network adopts a ResNet-18 model, two branches adopt the same weight parameters for initialization, and different methods are used for updating the weight parameters; the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters in a momentum updating mode, so that consistency of the front and rear extracted features is ensured. In the embodiment of the invention, the momentum updating mode of the weight parameter is as follows: w (w) _k ＝mw _k +(1-m)w _q M=0.999. Feature representation h extracted to two domain-like samples through a dual-branch encoding network _i ,h ₊ ：

2.3 mapping the extracted domain-related features using a feature mapping header, maximizing the ability of the SDCN to identify domain-like samples. Finally, two contrast characteristics q, k are obtained ₊ ：

q＝g _q (h),k ₊ ＝g _k (h ₊ )

Where g (·) is a multi-layer perceptron (MLP) comprising a hidden layer, the output dimension is 128.

2.4 calculate the similarity sim (q, k) between two positive sample features ₊ ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the feature k stored in the feature dictionary by adopting the feature representation generated by the batch through momentum network branches before the feature dictionary is stored _i As a negative sample feature, calculate its similarity sim (q, k) _i ). The similarity is calculated here using the inner product of two features:

sim(v _i ,v _j )＝v _i ·v _j

2.6 constructing the final composite loss function as follows:

L _CL ＝L _s +λL _c

L _s for sample contrast loss, the method is used for shortening the distance between positive samples and simultaneously pushing the distance between negative samples, and is used as a main optimization target of a self-supervision domain clustering network; l (L) _c For cluster comparison loss, the method is used for zooming in samples belonging to the same domain and pushing away samples of different domains, and is used as an auxiliary optimization target of a network; lambda is the equilibrium coefficient. The contrast loss here is constructed using the InfoNCE loss function commonly used for contrast learning:

Is a temperature coefficient, is used to control the shape of the feature distribution. The comparison loss function can be optimized to gradually increase the hidden contentsThe samples of the domain differ in distribution in the high-dimensional feature space.

And (3) taking the face sample distributed with the pseudo domain label in the step (2) as a source domain training data set, and sending the face sample into a domain reactivating living body detection model for training. By modeling and learning intra-domain and inter-domain differences of data from different domains, the model can be better adapted to the data of the different domains, the influence of sample domain offset is reduced, more effective human face living body characteristics are generated, and therefore the generalization capability of the model is improved.

First, a face living body feature representation of a sample is extracted using a feature generator. The feature generator uses the ResNet-18 model as the backbone network and initializes the network with weight parameters that are pre-trained on ImageNet. Normalizing the output characteristics by using L2 regularization to obtain characteristic representation of the sample

Wherein G is a feature generator,

For the generated human face living body characteristics, as the samples come from different potential domains, the distribution of the generated human face living body characteristics in the characteristic space is greatly different. In the examples of the invention byThe domain countermeasure training is a way to match the distribution among the multiple source domains to guide the feature generator to generate domain invariant features of the sample. The domain countermeasure training module is constructed by adding a domain arbiter after the feature generator. The domain discriminator is a multi-layer perceptron with a hidden layer, uses the domain discriminator G to distinguish the domain related information of the generated characteristics, and predicts the domain label of the input sample. Subsequently, using the pseudo domain label generated in step 2 as supervision information, a countermeasures loss L is calculated _ad . The countermeasure training is carried out in a gradient return mode, and specifically comprises the following steps:

The ability of the feature discriminators to discriminate the domain to which the feature belongs is optimized by gradient back propagation. And a gradient inversion layer is inserted between the domain discriminator and the feature generator, and the gradient transferred to the feature generator is inverted, so that the training targets between the domain discriminator and the feature generator are opposite, the capability of the feature generator for distinguishing the domain related information of the sample is weakened, and the domain deviation of the generated feature is reduced. The entire challenge training process can be expressed as one optimization problem as follows:

wherein X, Y _D Respectively a sample set and a pseudo domain label set; the contrast loss is calculated using a cross entropy function. The countermeasure training is performed by minimizing training errors of the domain discriminators and maximizing generation errors of the feature generator.

For the living detection task, we expect the distribution difference of real faces and fake faces in the feature space to be as large as possible, so a triplet loss function is typically used to assist model training. In the prior art, the same batch of features are only used for calculating the triplet loss, the previous features cannot be utilized for optimization, and in order to enrich the diversity of the features participating in training, the training convergence speed and stability of the model are improved. The characteristic representation of the previous iteration of the network is stored in a characteristic memory buffer, the characteristics in the memory buffer are spliced with the characteristics generated in the iteration, and the memory triplet loss is calculated. The specific loss function is expressed as follows:

The triplet loss is helpful to encode similar samples into similar feature space and encode heterogeneous samples into distant feature space, however, because of the diversity of real faces represented by factors such as skin color, environment, makeup and the like, too much attention is paid to the distance of the coded features of the real faces, which tends to cause model overfitting. In order to make the model pay more attention to the inter-class differences between the true face and the fake face rather than the intra-class differences, an unbalanced feature memory cache update strategy is used in the embodiment of the invention, which is specifically as follows: each training iteration adds the first h difficult sample features of the true face and all forged face sample features in the memory triplet loss function calculation result to the feature memory buffer, and in this embodiment, h=10 is taken.

Finally, the generated characteristics are sent into a living body characteristic classifier to predict the living body detection result, and a binary cross entropy loss L is calculated _cls 。

The comprehensive loss function of the domain generalization countermeasure network in the embodiment of the invention is as follows:

L _dg ＝L _cls +λ ₁ L _ada +λ ₂ L _MemTriplet

Wherein lambda is ₁ ,λ ₂ Is a super parameter. The domain generalization training is performed by optimizing the loss function, so that the model can generate the domain-invariant living body characteristics of the human face, and the model is generalized to an unknown domain.

The living body detection network after training convergence can be applied to living body detection tasks of actual RGB face images, and has better generalization capability. In the test stage, a feature generator of the domain-generalized living body detection network is used for acquiring a feature representation of a sample, and the feature representation is sent to a feature classifier to obtain a final living body detection result. The in-vivo detection effect of the present example was verified by a specific experiment as follows.

Experimental data: four published biopsy data sets were used to evaluate the effectiveness of the method of the invention, respectively: oulu-NPU (O), CASIA-MFSD (C), IDIAP ReplayAttack (I), and MSU-MFSD (M). And preprocessing the face videos provided in the four data sets by using an MTCNN model, taking a part of effective frames as experimental images, cutting out face areas from the experimental images, and obtaining 32800 pieces of data in total. In the experiment, three data sets are randomly selected and combined into a large mixed source domain data set, the data in the mixed source domain data set does not contain any domain label, and the rest is used as a target domain for testing, so that cross-data-set testing is performed. Thus, the experiment contained a total of 4 test tasks: o & C & I to M, O & M & I to C, O & C & M to I, I & C & M to O.

Experimental parameters: in experiments, the model of the embodiment of the invention trains and tests on the hardware environment of the 4-block RTX 2080Ti GPU. The batch size of the self-supervision domain clustering network is 256, the value of the clustering cluster K is 4, and the learning rate is set to be 0.03; the batch size of the domain generalization network is set to 60, and the learning rate is set to 0.01; the whole model uses a random gradient descent (Stochastic Gradient Descent, SDG) as an optimizer with a momentum value set to 0.9.

Evaluation index: the living body detection task needs to consider the respective classification error conditions of the attack sample and the real sample, so the embodiment of the invention uses a half total error rate (Half Total Error Rate, HTER) and an Area Under ROC Curve (AUC) as experimental evaluation indexes.

Experimental results:

(1) Comparison with common Living detection models

Methods using examples of the invention and several non-domain generalized living detection methods are common: MS_LBP, binary CNN, color Texture LBPTOP, auxiliary were compared, and the experimental results are shown in Table 1.

Table 1 comparison of the experimental results of the present method with the common living detection method on four test tasks

From the experimental results in table 1, it can be seen that the method in the example of the present invention has obvious advantages in both evaluation indexes compared with the method of training only on a single domain, which indicates that our model can extract domain invariant features related to living human face from multi-source domain data, thereby generalizing to unknown domains.

(2) Comparison with a homeodomain generalization Living detection method

We have chosen several existing domain generalization living detection methods: MMD-AAE, MADDG, D ² AM, DRDG, ANRL, SSDG a comparative experiment was carried out with the method of the example of the present invention and the results are shown in table 2.

Table 2 comparison of experimental results of the present method with the existing domain generalization living body detection method on four test tasks

It can be seen that the model achieves the performance close to the best results of similar models on 4 test tasks, so that the model can efficiently extract the unchanged characteristics of living bodies, has higher generalization capability, and does not use any pre-labeled domain labels in the training stage unlike the existing most-domain-generalized living body detection model.

In addition, unlike existing methods that use datasets as domain labels, our method uses self-supervised domain clustering to generate pseudo-domain labels. In the test task from O & C & I to M, our method appears to surpass the existing other domain generalization model in two evaluation indexes, which means that the module can divide the domain to which different face samples belong more carefully and accurately to a certain extent.

In a specific implementation, the application provides a computer storage medium and a corresponding data processing unit, wherein the computer storage medium can store a computer program, and the computer program can run the invention content of a living body detection method based on self-supervision domain clustering and domain generalization and part or all of the steps in each embodiment when being executed by the data processing unit. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.

It will be apparent to those skilled in the art that the technical solutions in the embodiments of the present invention may be implemented by means of a computer program and its corresponding general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in the form of a computer program, i.e. a software product, which may be stored in a storage medium, and include several instructions to cause a device (which may be a personal computer, a server, a single-chip microcomputer, MUU or a network device, etc.) including a data processing unit to perform the methods described in the embodiments or some parts of the embodiments of the present invention.

The invention provides a thought and a method of a living body detection method based on self-supervision domain clustering and domain generalization, and the method and the way for realizing the technical scheme are numerous, the above is only a preferred embodiment of the invention, and it should be pointed out that a plurality of improvements and modifications can be made by those skilled in the art without departing from the principle of the invention, and the improvements and the modifications are also regarded as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims

1. A living body detection method based on self-supervision domain clustering and domain generalization is characterized in that pseudo domain labels of face samples are generated in a self-supervision domain clustering mode, sample data with the pseudo domain labels are trained by using the domain generalization method, and classification results of living body detection are obtained, and the living body detection method specifically comprises the following steps:

step 1, generating self-supervision domain clustering samples, namely randomly selecting data belonging to different domains from images of a mixed source domain data set as samples, and performing domain information migration between two images based on low-frequency signals on an image frequency domain to generate positive and negative samples;

step 2, constructing a self-supervision domain clustering network and training the self-supervision domain clustering network, wherein the training is to train a domain clustering method based on contrast learning, construct a learning task, extract domain related features from positive and negative samples containing domain information obtained in the step 1, and perform domain clustering according to the distribution of the samples in a feature space;

Training the sample with the pseudo domain label by using a domain generalization living body detection model, namely using the sample with the pseudo domain label distributed according to the clustering result as a source domain data set, extracting living body domain invariant features of the sample by using a domain generalization method based on countermeasure training, and optimizing the domain generalization living body detection model by using classification cross entropy loss, countermeasure loss and memory triplet loss; and taking the final output of the domain generalization living body detection model obtained by training as a living body detection classification result to finish living body detection based on self-supervision domain clustering and domain generalization.

2. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 1, wherein the method in step 1 performs domain information migration between two images based on low frequency signals on an image frequency domain, and specifically comprises:

And phase component->

The specific method comprises the following steps:

where x is an input source image or a migration target image,

step 1-4, performing auxiliary enhancement on a source image and a domain information migration image by using a contrast learning data enhancement method to obtain positive and negative samples; the positive sample is an image subjected to domain information migration and contains domain information similar to the source image; the negative samples are other target images from different domain distributions.

3. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 2, wherein the self-supervision domain clustering network in step 2 consists of a dual-branch backbone network, a feature mapping head, a feature dictionary and a feature clustering device;

w _k ＝mw _k +(1-m)w _q

and optimizing the self-supervision domain clustering network by using a sample comparison learning loss function and a clustering comparison learning loss function.

4. A method for in-vivo detection based on self-supervised domain clustering and domain generalization as recited in claim 3, wherein the training of the self-supervised domain clustering network in step 2 comprises the steps of:

step 2-1, calculating a cluster to which a sample belongs and a cluster center;

step 2-2, calculating the similarity between the contrast features;

and 2-3, constructing a comprehensive loss function of the self-supervision domain clustering network, and training the self-supervision domain clustering network by calculating the comprehensive loss function.

5. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 4, wherein the calculating the cluster and the cluster center to which the sample belongs in step 2-1 specifically comprises:

6. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 5, wherein the step 2-2 of calculating the similarity between contrast features specifically comprises:

calculating two positive sample features q, k ₊ Similarity sim (q, k) ₊ ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the feature k stored in the feature dictionary by adopting the feature representation generated by the batch through momentum network branches before the feature dictionary is stored _i As a negative sample feature, calculate its similarity sim (q, k) _i ) The method comprises the steps of carrying out a first treatment on the surface of the Similarity is calculated using the inner product of the two features.

7. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 6, wherein the constructing the comprehensive loss function of the self-supervision domain clustering network in step 2-3 specifically comprises:

construction sample contrast loss L _s Loss L in contrast to cluster _c The specific method comprises the following steps:

where r is the number of negative samples; tau sum

L _CL ＝L _s +λL _c

wherein lambda is the balance coefficient;

8. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 7, wherein the network structure of the domain generalization living body detection model in step 3 comprises: the device comprises a feature generator, a feature classifier, a domain discriminator and a feature memory cache;

obtaining a characteristic representation of the sample by using a characteristic generator, sending the generated characteristic into a characteristic classifier, and calculating a classification cross entropy loss; inputting the feature representation into a domain discriminator to obtain a domain discrimination result, training a feature generator and the domain discriminator in a countermeasure training mode, calculating domain countermeasure loss, and reducing domain deviation contained in the feature; splicing the generated features with the features in the memory buffer, calculating the memory triplet loss of the features, and updating the feature memory buffer according to the calculation result of the loss function; finally, the domain generalization living detection model is optimized using the classification cross entropy loss, the antagonism loss and the memory triplet loss.

9. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 8, wherein the fight loss and memory triplet loss in step 3 is specifically calculated as follows:

10. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 9, wherein the optimizing the living body detection model of domain generalization by using the classification cross entropy loss, the contrast loss and the memory triplet loss in the step 3 is that the comprehensive loss function of constructing the living body detection model of domain generalization is:

L _dg ＝L _cl +λ ₁ L _ada +λ ₂ L _MemTriplet