CN116403290A - Living body detection method based on self-supervision domain clustering and domain generalization - Google Patents

Living body detection method based on self-supervision domain clustering and domain generalization Download PDF

Info

Publication number
CN116403290A
CN116403290A CN202310287952.5A CN202310287952A CN116403290A CN 116403290 A CN116403290 A CN 116403290A CN 202310287952 A CN202310287952 A CN 202310287952A CN 116403290 A CN116403290 A CN 116403290A
Authority
CN
China
Prior art keywords
domain
clustering
feature
sample
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310287952.5A
Other languages
Chinese (zh)
Inventor
杨若瑜
姚凯哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310287952.5A priority Critical patent/CN116403290A/en
Publication of CN116403290A publication Critical patent/CN116403290A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a living body detection method based on self-supervision domain clustering and domain generalization, which comprises the steps of generating a domain information migration sample, constructing a self-supervision domain clustering network, training, and training a sample with a pseudo domain label by using a domain generalization living body detection model; firstly, generating positive and negative samples by a data enhancement method of domain information migration, and then using a self-supervision domain clustering module to distribute pseudo domain labels for the samples to obtain a source domain data set of domain generalization living body detection; the domain generalization living detection module is trained by using a countermeasure training mode, the memory triplet loss auxiliary optimization model is used, and the characteristics trained before are saved by using an unbalanced memory cache updating strategy. The invention can autonomously allocate the domain label for the face sample, directly perform domain generalization living body detection training on the face data without the domain label, has more flexible generalization capability, saves the labeling cost and has larger application value.

Description

Living body detection method based on self-supervision domain clustering and domain generalization
Technical Field
The invention relates to a living body detection method, in particular to a living body detection method based on self-supervision domain clustering and domain generalization.
Background
The face recognition technology is widely applied to the scenes of identity authentication, security control, social networks, photo management and the like. However, due to technical limitations, face recognition techniques may fail in the face of various forgery attacks. Among the most common attack modes are presentation type attack, namely attack on the face recognition system by means of face-flipped photos, face video playback, face fake masks and the like. Therefore, in order to improve the safety and accuracy of the face recognition technology, the living body detection technology is receiving increasing attention from researchers.
The living body detection technique is a technique for distinguishing a real face from artificially synthesized face information. It usually analyzes the biological characteristics in the facial image or video to detect whether there is a trace of the forged face in the facial image, and distinguishes whether it is a still picture, video, face information artificially synthesized by a 3D mask, etc. Common biological features include, among others, eyes, mouth and nose, facial expression, surface texture, facial albedo, etc.
In-vivo detection based on domain generalization is a popular research direction in recent years. Previous in-vivo detection methods based on deep learning perform well on a single training set, but may encounter new, unknown forgery attacks during testing, resulting in failure of the detection model. The field generalization-based living body detection method is proposed to enable the model to migrate to new and unknown fake attacks better, so that the robustness and accuracy of the detection model are improved. The method has the core ideas that the face information of the true face and the fake face belonging to different distributions is respectively mapped into different feature spaces through a training algorithm, and the feature spaces of the data belonging to different distributions have a certain degree of overlap, so that the model learns the general face attack representation in the information of a plurality of domains. Thus, when the algorithm faces a new attack sample, whether the face exists truly can be judged through the distribution state of the new attack sample in the feature space.
The living body detection technology based on domain generalization can adapt to various different environments, and has good generalization performance and higher accuracy. Such methods typically guide model learning during the training phase with the data set to which the sample belongs as an explicit domain label. However, the attack samples in the real scene often come from a plurality of different shooting devices and scenes, the samples comprise a mixture of a plurality of different domains, and the specific domain label of each sample is unknown and cannot be directly applied to the prior domain generalization model.
Disclosure of Invention
The invention aims to: the invention aims to solve the technical problem of providing a living body detection method based on self-supervision domain clustering and domain generalization aiming at the defects of the prior art.
In order to solve the technical problems, the invention discloses a living body detection method based on self-supervision domain clustering and domain generalization, which generates pseudo domain labels of face samples in a self-supervision domain clustering mode, trains sample data with the pseudo domain labels by using the domain generalization method, and obtains classification results of living body detection, and the specific method comprises the following steps:
step 1, generating a self-supervision domain clustering sample, namely randomly selecting data belonging to different domains from images of a mixed source domain data set as a sample, and performing domain information migration between two images based on a low-frequency signal on an image frequency domain to generate a positive sample and a negative sample, wherein the specific method comprises the following steps of:
Step 1-1, randomly sampling two different face images from a mixed source domain data set to serve as a source image and a migration target image, obtaining frequency domain signals of the two images by using Fourier transformation, and decomposing amplitude components from the frequency domain signals
Figure BDA0004140358630000021
And phase component->
Figure BDA0004140358630000022
The specific method comprises the following steps:
Figure BDA0004140358630000023
Figure BDA0004140358630000024
where x is an input source image or a migration target image,
Figure BDA0004140358630000025
is a fourier transform function, arg (·) represents taking a phase value for the complex number;
step 1-2, filtering amplitude components of frequency domain signals of a source image by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; filtering amplitude components of the frequency domain signals of the migration target image by using a Gaussian high-pass filter to obtain high-frequency signals of the migration target image;
step 1-3, fusing the low frequency signal of the source image and the high frequency signal of the migration target image in proportion, and transforming the mixed frequency domain signal back to a two-dimensional image by using Fourier inversion to obtain an image x after domain information migration DT The specific method comprises the following steps:
Figure BDA0004140358630000026
Figure BDA0004140358630000027
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004140358630000028
is the amplitude component, x, of the domain information migration s ,x t H is a source image and a migration target image L ,H H Is a Gaussian low-pass filter and a Gaussian high-pass filter, < >>
Figure BDA0004140358630000029
Is an inverse Fourier transform function, lambda 12 J is an imaginary unit for the super parameter;
step 1-4, performing auxiliary enhancement on a source image and a domain information migration image by using a contrast learning data enhancement method to obtain positive and negative samples; wherein, the image of the positive sample through domain information migration contains domain information similar to the source image; the negative samples are other target images from different domain distributions.
Step 2, constructing a self-supervision domain clustering network and training the self-supervision domain clustering network, wherein the training is to train a domain clustering method based on contrast learning, construct a learning task, extract domain related features from positive and negative samples containing domain information obtained in the step 1, and perform domain clustering according to the distribution of the samples in a feature space; the self-supervision domain clustering network consists of a double-branch backbone network, a feature mapping head, a feature dictionary and a feature clustering device;
the method comprises the steps that a ResNet-18 network model is adopted in the dual-branch trunk network, two branches are respectively a query coding network and a momentum coding network, and positive samples obtained in the step 1 are respectively sent into the two branches to extract domain related features; mapping the extracted domain related features by using a feature mapping head to obtain comparison features q and k + The method comprises the steps of carrying out a first treatment on the surface of the The two branches are initialized by adopting the same weight parameters, and the weight parameters are updated by using different methods, wherein the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters by using a momentum updating mode, wherein the momentum updating mode of the weight parameters is as follows:
w k =mw k +(1-m)w q
wherein w is k For inquiring the weight of the coding network, m is a momentum coefficient, the value is between 0 and 1, and w q Weights for the momentum encoding network;
the feature mapping head adopts a structure of a multi-layer perceptron and performs feature mapping on features of two similar samples extracted by the double-branch encoder;
the feature dictionary adopts a queue structure and is used for storing feature representations generated by previous batches through momentum network branches as negative sample features for comparison with the sample features of the current batches;
the feature clustering device clusters the domain related features extracted by the momentum encoder by adopting a K average clustering algorithm to obtain a clustering center of each clustering cluster;
and optimizing the self-supervision domain clustering network by using a sample comparison learning loss function and a clustering center comparison learning loss function.
The self-supervision domain clustering network is trained, and comprises the following steps:
step 2-1, calculating a cluster and a cluster center to which a sample belongs, wherein the method specifically comprises the following steps:
before each round of training starts, all training samples x= { X 1 ,x 2 ,…,x N Input momentum coding branches to obtain a domain related characteristic representation Z= { Z of each sample 1 ,z 2 ,…,z N -wherein N represents the number of samples; then, clustering the domain related characteristic representation Z of each sample by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } 1 ,c 2 ,…,c K }。
Step 2-2, calculating the similarity between the contrast features, specifically including:
calculating the similarity sim (q, k) between two positive sample features + ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the feature k stored in the feature dictionary by adopting the feature representation generated by the batch through momentum network branches before the feature dictionary is stored i As a negative sample feature, calculate its similarity sim (q, k) i ) The method comprises the steps of carrying out a first treatment on the surface of the Similarity is calculated using the inner product of the two features.
2-3, constructing a comprehensive loss function of the self-supervision domain clustering network, and training the self-supervision domain clustering network by calculating the comprehensive loss function, wherein the method specifically comprises the following steps of:
construction sample contrast loss L s Loss L in contrast to cluster c In particular, theThe method comprises the following steps:
Figure BDA0004140358630000041
Figure BDA0004140358630000042
where r is the number of negative samples; tau sum
Figure BDA0004140358630000043
Is a temperature coefficient used for controlling the shape of the characteristic distribution;
the comprehensive loss function of the self-supervision domain clustering network is constructed as follows:
L CL =L s +λL c
wherein lambda is the balance coefficient;
the self-supervision clustering network is trained by calculating the comprehensive loss function, namely, the result of self-supervision domain clustering is used as a pseudo domain label to be distributed to the belonging samples, specifically: dividing the clustering result into K parts according to the clustering clusters, wherein each part represents a domain; assigning a domain number to each domain from 1 as a pseudo domain label; a corresponding pseudo-domain label is assigned to each sample of the cluster.
Training the sample with the pseudo domain label by using a domain generalization living body detection model, namely, using the sample distributed with the pseudo domain label according to the clustering result as a source domain data set, and extracting living body domain invariant features of the sample by using a domain generalization method based on countermeasure training; and taking the final output of the domain generalization living body detection model obtained by training as a living body detection classification result to finish living body detection based on self-supervision domain clustering and domain generalization.
The network structure of the domain generalization living body detection model comprises: the device comprises a feature generator, a feature classifier, a domain discriminator and a feature memory cache;
Obtaining a characteristic representation of the sample by using a characteristic generator, sending the generated characteristic into a characteristic classifier, and calculating a classification cross entropy loss; inputting the feature representation into a domain discriminator to obtain a domain discrimination result, training a feature generator and the domain discriminator in a countermeasure training mode, calculating domain countermeasure loss, and reducing domain deviation contained in the feature; splicing the generated features with the features in the memory buffer, calculating the memory triplet loss of the features, and updating the feature memory buffer according to the calculation result of the loss function; finally, the domain generalization network model is optimized by using the classification cross entropy loss, the antagonism loss and the memory triplet loss.
The specific calculation method of the fight loss and memory triplet loss is as follows:
optimizing a network formed by the feature generator and the domain discriminator by using a countermeasure training mode, reversing the gradient from the domain discriminator to the feature generator when the countermeasure loss carries out gradient back propagation, so that training targets between the domain discriminator and the feature generator are opposite, and the whole countermeasure training mode is as follows:
Figure BDA0004140358630000051
wherein X and Y D Respectively a sample set and a pseudo domain label set, G is a feature generator, D is a domain discriminator, K is the number of source domains,
Figure BDA0004140358630000052
For the indicator function, when i=y has a value of 1, otherwise 0, y represents the domain label of sample x; calculating the countering loss by using a binary cross entropy function;
using a feature memory cache storage domain to generalize feature representations of previous rounds of iteration of the living body detection model; splicing the features in the memory cache with the features generated in the iteration, and calculating the memory triplet loss, namely a specific loss function L MemTriplet The expression is as follows:
Figure BDA0004140358630000053
wherein F is n For the feature set of the iteration of this round, M is the feature set in the memory buffer, f a ,f p ,f n Respectively an anchor point sample characteristic, a positive sample characteristic and a negative sample characteristic, wherein alpha is an edge coefficient;
and using an unbalanced characteristic memory cache updating strategy, namely adding the first h difficult sample characteristics of the true human face and all forged human face sample characteristics in the memory triplet loss function calculation result into the characteristic memory cache every training iteration.
The model is optimized by using the classification cross entropy loss, the antagonism loss and the memory triplet loss, namely the comprehensive loss function of the construction domain generalization living body detection model is as follows:
L dg =L cls1 L ada3 L MemTriplet
wherein L is cls A two-class cross entropy loss which is the living body characteristic of the human face; lambda (lambda) 12 Is a super parameter.
The beneficial effects are that:
(1) The invention provides a data enhancement method for domain information migration, which realizes migration of image domain information by using image frequency domain transformation and frequency domain filtering and mixing methods, and can keep original structural information of a migrated image, thereby helping a model learn better domain related characteristics.
(2) The false domain label of the face image is generated by using the methods of contrast learning, clustering and the like, so that the problem of face domain label missing in a mixed data set is solved, and domain generalization living body detection training can be performed without depending on a domain label marked in advance.
(3) The commonality of different face features is learned by adopting the domain countermeasure training and memory triplet loss function, the influence of domain deviation is reduced, the features participating in domain generalization training are enriched by utilizing the feature memory cache, and the generalization capability of the model is further improved. Meanwhile, the method also provides an unbalanced memory cache updating strategy, and the attention degree of the model to the inter-class difference and the intra-class difference between the true face and the false face is redistributed, so that the convergence speed of the model is accelerated, and the stability of the model is improved.
Drawings
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.
Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention.
Fig. 2 is an overall flow chart of an embodiment of the present invention.
Fig. 3 is a diagram of domain information migration data enhancement effect according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a self-supervision domain clustering network according to an embodiment of the present invention.
Detailed Description
The invention provides a living body detection method based on self-supervision domain clustering and domain generalization, which comprises the following steps:
(1) Self-supervision domain cluster sample generation
And randomly selecting data samples belonging to different domains from the mixed source domain image, and generating positive and negative samples required by a self-supervision domain clustering algorithm by using a data augmentation method of domain information migration.
The method for performing domain information migration between two images based on the low-frequency signals on the image frequency domain comprises the following specific steps:
1.1 randomly sampling two different face images from a mixed source domain dataset as a source image and a migration target image. A frequency domain signal of the two images is obtained using Fourier transformation, and an amplitude component and a phase component are decomposed from the frequency domain signal. The transformation is specifically expressed as:
Figure BDA0004140358630000061
Figure BDA0004140358630000071
where x is an input source image or a migration target image,
Figure BDA0004140358630000072
is a fourier transform function, arg (·) represents taking a phase value for the complex number.
1.2, filtering amplitude components of the source image frequency domain signals by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; and filtering the amplitude component of the frequency domain signal by using a Gaussian high-pass filter to obtain a high-frequency signal of the migration target image.
1.3 fusing the low frequency signal of the source image and the high frequency signal of the migration target image in proportion, and transforming the mixed frequency domain signal back to the two-dimensional image by using Fourier inversion to obtain an image x after domain information migration DT . The specific migration process can be expressed as:
Figure BDA0004140358630000073
Figure BDA0004140358630000074
wherein x is s ,x t H is a source image and a migration target image L ,H H Is a gaussian low pass filter and a gaussian high pass filter,
Figure BDA0004140358630000075
is an inverse Fourier transform function, lambda 12 Is a super parameter.
1.4, auxiliary enhancement is carried out on the source image and the domain information migration image by using a common contrast learning data enhancement method, and a final result is used as an input sample of the self-supervision clustering model.
(2) Constructing self-supervision domain clustering network and training
Training by using a domain clustering method based on contrast learning, wherein the network structure of the domain clustering method consists of a double-branch backbone network, a feature mapping head, a feature dictionary and a feature clustering device;
the dual-branch backbone network adopts a ResNet-18 model, two branches are a query (query) coding network and a momentum (momentum) coding network respectively, and positive samples generated by domain information migration in the step 1 are respectively sent into the two branches to extract domain related characteristics. The dual-branch backbone network is initialized by adopting the same weight parameters, and weight parameter updating is carried out by using different methods; the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters in a momentum updating mode, so that consistency of the front and rear extracted features is ensured.
The feature mapping head adopts a structure of a multi-layer perceptron (MLP) to perform feature mapping on features of two similar samples extracted by the double-branch encoder. The use of feature mapping helps the network identify domain-specific features of each input image and improves the generalization ability of the network over different faces.
The feature dictionary adopts a queue structure and is used for storing feature representations generated by previous batches through momentum network branches as negative sample features for comparison with current batch sample features.
And the feature clustering device clusters the domain related features extracted by the momentum encoder by adopting a K-means clustering algorithm (K-means) to obtain a clustering center of each clustering cluster.
And optimizing the network by using the sample comparison learning loss function and the clustering center comparison learning loss function.
The training step of the self-supervision domain clustering network in the step 2 comprises the following steps:
2.1 before each round of training starts, all training samples x= { X 1 ,x 2 ,…,x N Input momentum coding branches to obtain a domain related characteristic representation Z= { Z of each sample 1 ,z 2 ,…,z N -a }; then clustering Z by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } 1 ,c 2 ,…,c K }。
2.2 for each training iteration, the samples that were migrated with domain information generated in step 1
Figure BDA0004140358630000081
And->
Figure BDA0004140358630000082
Respectively input inquiry code network f q (-) and momentum encoding network f k (. Cndot.) obtaining a characteristic representation h of two domain-like samples i ,f +
Figure BDA0004140358630000083
2.3 mapping the extracted domain related features by using a feature mapping head to obtain final two contrast features q, k +
q=g q (h),k + =g k (h + )
Wherein g (-) is a multilayer perceptron (MLP) comprising a hidden layer.
2.4 calculation of the similarity sim (q, k) + ) And querying sample features q and negative sample features k stored in a feature dictionary i Similarity sim (q, k) i ) The similarity is calculated here using the inner product of two features:
sim(v i ,v j )=v i ·v j
2.5 calculating the characteristic q of the query sample and the cluster center c of the query sample q Similarity sim (q, c) q ) Querying sample feature q and other cluster centers c j Similarity sim (q, c) j )。
2.6 constructing the final composite loss function as follows:
L CL =L s +λL c
L s for sample contrast loss, the method is used for shortening the distance between positive samples and simultaneously pushing the distance between negative samples, and is used as a main optimization target of a self-supervision domain clustering network; l (L) c For cluster comparison loss, the method is used for zooming in samples belonging to the same domain and pushing away samples of different domains, and is used as an auxiliary optimization target of a network; lambda is the equilibrium coefficient. Here, theIs constructed using the InfoNCE loss function commonly used for contrast learning:
Figure BDA0004140358630000091
Figure BDA0004140358630000092
Wherein r and k are the number of negative samples and the number of cluster centers respectively; tau sum
Figure BDA0004140358630000093
Is a temperature coefficient, is used to control the shape of the feature distribution.
2.7 training the self-supervision domain clustering network by calculating the comprehensive loss function, and distributing the final clustering result as a pseudo domain label to the belonging sample. The method comprises the following steps: dividing the clustering result into K parts according to the clustering clusters, wherein each part represents a domain; assigning a domain number to each domain from 1 as a pseudo domain label; a corresponding pseudo-domain label is assigned to each sample of the cluster.
(3) Training samples with pseudo-domain labels using domain-generalized living body detection model
The network structure of the domain generalization living body detection model comprises a feature generator, a feature classifier, a domain discriminator and a feature memory cache.
And obtaining a characteristic representation of the sample by using a characteristic generator, feeding the generated characteristic into a characteristic classifier, and calculating a classification cross entropy loss. The feature representation is input into a domain discriminator to obtain a domain discrimination result, a feature generator and the domain discriminator are trained in a countermeasure training mode, domain countermeasure loss is calculated, and domain deviation contained in the feature is reduced. And splicing the generated features with the features in the memory buffer, calculating the memory triplet loss of the features, and updating the feature memory buffer according to the calculation result of the loss function. Finally, the domain generalization network model is optimized by using the classification cross entropy loss, the antagonism loss and the memory triplet loss.
The feature generator uses the ResNet-18 model as the backbone network and initializes the network with weight parameters that are pre-trained on ImageNet. Inputting the RGB face image obtained by up-sampling the source domain into a feature generator, and normalizing the output features by using L2 regularization to obtain a feature representation of the sample
Figure BDA0004140358630000094
Figure BDA0004140358630000095
Wherein G is a feature generator,
Figure BDA0004140358630000096
for input samples, d is the domain to which the sample belongs and Norm is the L2 canonical function.
The domain discriminator is structured as a multi-layer perceptron with a hidden layer. And distinguishing the domain related information of the generated features by using a domain discriminator G, and predicting the domain label to which the input sample belongs. Calculating the countermeasures loss L by using the pseudo domain labels generated in the step 2 as supervision information ada
Optimizing a network formed by a feature generator and a domain arbiter by using a countermeasure training mode, wherein the network comprises the following concrete steps: when the counter-propagation of the gradient is carried out on the counter-propagation loss, the gradient from the domain discriminator to the feature generator is reversed, so that the training targets between the domain discriminator and the feature generator are opposite, the domain discriminator is optimized, the capability of the feature generator for distinguishing the domain related information of the sample is weakened, and the domain deviation of the generated features is reduced. The overall form of optimization of the challenge training is as follows:
Figure BDA0004140358630000101
Wherein X, Y D Respectively a sample set and a pseudo domain label set; the countermeasures losses are calculated using a binary cross entropy function.
The feature representations of the first few iterations of the network are stored in a feature memory cache for enriching the diversity of features involved in training. And splicing the features in the memory cache with the features generated in the iteration, and calculating the memory triplet loss. The specific loss function is expressed as follows:
Figure BDA0004140358630000102
wherein F is n For the feature set of the iteration of this round, M is the feature set in the memory buffer, f a ,f p ,f n The method comprises the steps of respectively obtaining anchor point sample characteristics, positive sample characteristics and negative sample characteristics, wherein alpha is an edge coefficient and is used for controlling the difference degree of distances between similar samples and non-similar samples.
In order to make domain generalization networks more concerned about inter-class differences between faces and counterfeited faces than intra-class differences, an unbalanced feature memory cache update strategy is used. And adding the first h difficult sample features of the true human face and all forged human face sample features in the memory triplet loss function calculation result into a feature memory cache every training iteration.
The comprehensive loss function of the domain generalization living body detection model is as follows:
L dg =L cls1 L ada2 L MemTriplet
wherein lambda is 12 Is a super parameter.
Examples:
as shown in fig. 1, in a preferred embodiment of the present invention, a living body detection model based on self-supervision domain clustering and domain generalization is provided, and pseudo domain labels can be generated on a mixed living body detection data set with domain labels missing, and domain generalization living body detection training can be performed. As shown in fig. 2, the method specifically includes the following steps:
(1) The data augmentation method of domain information migration is used to generate positive and negative samples required by the self-supervision domain clustering algorithm.
The method for performing domain information migration between two images based on low-frequency signals on an image frequency domain comprises the following specific steps:
1.1 randomly sampling two different face images from a mixed source domain dataset as a source image and a migration target image. In the embodiment of the invention, the mixed source domain data set is composed of a plurality of sample sets from different domains, and each sample is a real or fake face RGB image. And respectively performing Fourier frequency domain transformation on the two images to obtain frequency domain signals of the two images, and decomposing amplitude components and phase components from the frequency domain signals. The transformation is specifically expressed as:
Figure BDA0004140358630000111
Figure BDA0004140358630000112
where x is an input source image or a migration target image,
Figure BDA0004140358630000113
is a fourier transform function, arg (·) represents taking a phase value for the complex number. In the present example, in order to accelerate the computation, the image is subjected to frequency domain conversion using a Fast Fourier Transform (FFT) in particular.
1.2, filtering amplitude components of the source image frequency domain signals by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; and filtering the amplitude component of the frequency domain signal by using a Gaussian high-pass filter to obtain a high-frequency signal of the migration target image. In the example of the invention, the cut-off frequency parameter D of the Gaussian low-pass filter and the Gaussian high-pass filter 0 The values are all in [1,5 ]]And the domain information of the source image and the structure information of the target image can be effectively extracted.
1.3 fusing the low frequency signal of the source image and the high frequency signal of the migration target image in proportion, and transforming the mixed frequency domain signal back to the two-dimensional image by using Fourier inversion to obtain an image x after domain information migration DT . The specific migration process can be expressed as:
Figure BDA0004140358630000114
Figure BDA0004140358630000115
wherein x is s ,x t H is a source image and a migration target image L ,H H Is a gaussian low pass filter and a gaussian high pass filter,
Figure BDA0004140358630000116
is an inverse Fourier transform function, lambda 12 Is a super parameter.
FIG. 3 shows the effect of the domain information migration data enhancement method, wherein the left side is a source face image, the right side is a first-action migration target face image, the second action is a face image after domain information migration, and the face image in the figure is selected from four public data sets including MSU-MFSD, CASIA-SURF, OULU-NPU and Idiap Replay-attach.
1.4 auxiliary data augmentation is performed on the source image and the domain information migration image, in the example of the invention, random image clipping, random horizontal flipping and random blur change are used for image data augmentation to improve the robustness of subsequent model training. And taking the final result as an input sample of the self-supervision domain clustering model.
(2) Constructing self-supervision domain clustering network and training
Training was performed using a domain clustering method based on contrast learning. Fig. 4 shows a specific structure of a Self-supervised domain clustered network (Self-Supervised Domain Clustering Network, SDCN). The SDCN is composed of a query coding network, a momentum coding network, a feature mapping head, a feature dictionary and a clustering module, and the network is optimized by using a sample comparison loss function and a clustering comparison loss function, and the training steps comprise:
2.1 before each round of training starts, all training samples x= { X 1 ,x 2 ,…,x N Input momentum encodingBranching to obtain domain related characteristic representation Z= { Z of each sample 1 ,z 2 ,…,z N -a }; then clustering Z by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } 1 ,c 2 ,…,c K }. In the embodiment of the invention, in order to reduce the influence of noise data on training, abnormal points in a clustering result are discarded.
2.2 for each training iteration, the samples that were migrated with domain information generated in step 1
Figure BDA0004140358630000121
And->
Figure BDA0004140358630000122
Respectively input inquiry code network F q (-) and momentum encoding network F k (. Cndot.) the use of a catalyst. The coding network adopts a ResNet-18 model, two branches adopt the same weight parameters for initialization, and different methods are used for updating the weight parameters; the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters in a momentum updating mode, so that consistency of the front and rear extracted features is ensured. In the embodiment of the invention, the momentum updating mode of the weight parameter is as follows: w (w) k =mw k +(1-m)w q M=0.999. Feature representation h extracted to two domain-like samples through a dual-branch encoding network i ,h +
Figure BDA0004140358630000123
2.3 mapping the extracted domain-related features using a feature mapping header, maximizing the ability of the SDCN to identify domain-like samples. Finally, two contrast characteristics q, k are obtained +
q=g q (h),k + =g k (h + )
Where g (·) is a multi-layer perceptron (MLP) comprising a hidden layer, the output dimension is 128.
2.4 calculate the similarity sim (q, k) between two positive sample features + ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the feature k stored in the feature dictionary by adopting the feature representation generated by the batch through momentum network branches before the feature dictionary is stored i As a negative sample feature, calculate its similarity sim (q, k) i ). The similarity is calculated here using the inner product of two features:
sim(v i ,v j )=v i ·v j
2.5 calculating the characteristic q of the query sample and the cluster center c of the query sample q Similarity sim (q, c) q ) Querying sample feature q and other cluster centers c j Similarity sim (q, c) j )。
2.6 constructing the final composite loss function as follows:
L CL =L s +λL c
L s for sample contrast loss, the method is used for shortening the distance between positive samples and simultaneously pushing the distance between negative samples, and is used as a main optimization target of a self-supervision domain clustering network; l (L) c For cluster comparison loss, the method is used for zooming in samples belonging to the same domain and pushing away samples of different domains, and is used as an auxiliary optimization target of a network; lambda is the equilibrium coefficient. The contrast loss here is constructed using the InfoNCE loss function commonly used for contrast learning:
Figure BDA0004140358630000131
Figure BDA0004140358630000132
Wherein r and K are the number of negative samples and the number of cluster centers respectively; tau sum
Figure BDA0004140358630000133
Is a temperature coefficient, is used to control the shape of the feature distribution. The comparison loss function can be optimized to gradually increase the hidden contentsThe samples of the domain differ in distribution in the high-dimensional feature space.
2.7 training the self-supervision domain clustering network by calculating the comprehensive loss function, and distributing the final clustering result as a pseudo domain label to the belonging sample. The method comprises the following steps: dividing the clustering result into K parts according to the clustering clusters, wherein each part represents a domain; assigning a domain number to each domain from 1 as a pseudo domain label; a corresponding pseudo-domain label is assigned to each sample of the cluster.
(3) Training samples with pseudo-domain labels using domain-generalized living body detection model
And (3) taking the face sample distributed with the pseudo domain label in the step (2) as a source domain training data set, and sending the face sample into a domain reactivating living body detection model for training. By modeling and learning intra-domain and inter-domain differences of data from different domains, the model can be better adapted to the data of the different domains, the influence of sample domain offset is reduced, more effective human face living body characteristics are generated, and therefore the generalization capability of the model is improved.
First, a face living body feature representation of a sample is extracted using a feature generator. The feature generator uses the ResNet-18 model as the backbone network and initializes the network with weight parameters that are pre-trained on ImageNet. Normalizing the output characteristics by using L2 regularization to obtain characteristic representation of the sample
Figure BDA0004140358630000134
Figure BDA0004140358630000135
Wherein G is a feature generator,
Figure BDA0004140358630000136
for input samples, d is the domain to which the sample belongs and Norm is the L2 canonical function.
For the generated human face living body characteristics, as the samples come from different potential domains, the distribution of the generated human face living body characteristics in the characteristic space is greatly different. In the examples of the invention byThe domain countermeasure training is a way to match the distribution among the multiple source domains to guide the feature generator to generate domain invariant features of the sample. The domain countermeasure training module is constructed by adding a domain arbiter after the feature generator. The domain discriminator is a multi-layer perceptron with a hidden layer, uses the domain discriminator G to distinguish the domain related information of the generated characteristics, and predicts the domain label of the input sample. Subsequently, using the pseudo domain label generated in step 2 as supervision information, a countermeasures loss L is calculated ad . The countermeasure training is carried out in a gradient return mode, and specifically comprises the following steps:
The ability of the feature discriminators to discriminate the domain to which the feature belongs is optimized by gradient back propagation. And a gradient inversion layer is inserted between the domain discriminator and the feature generator, and the gradient transferred to the feature generator is inverted, so that the training targets between the domain discriminator and the feature generator are opposite, the capability of the feature generator for distinguishing the domain related information of the sample is weakened, and the domain deviation of the generated feature is reduced. The entire challenge training process can be expressed as one optimization problem as follows:
Figure BDA0004140358630000141
wherein X, Y D Respectively a sample set and a pseudo domain label set; the contrast loss is calculated using a cross entropy function. The countermeasure training is performed by minimizing training errors of the domain discriminators and maximizing generation errors of the feature generator.
For the living detection task, we expect the distribution difference of real faces and fake faces in the feature space to be as large as possible, so a triplet loss function is typically used to assist model training. In the prior art, the same batch of features are only used for calculating the triplet loss, the previous features cannot be utilized for optimization, and in order to enrich the diversity of the features participating in training, the training convergence speed and stability of the model are improved. The characteristic representation of the previous iteration of the network is stored in a characteristic memory buffer, the characteristics in the memory buffer are spliced with the characteristics generated in the iteration, and the memory triplet loss is calculated. The specific loss function is expressed as follows:
Figure BDA0004140358630000142
Wherein F is n For the feature set of the iteration of this round, M is the feature set in the memory buffer, f a ,f p ,f n The method comprises the steps of respectively obtaining anchor point sample characteristics, positive sample characteristics and negative sample characteristics, wherein alpha is an edge coefficient and is used for controlling the difference degree of distances between similar samples and non-similar samples.
The triplet loss is helpful to encode similar samples into similar feature space and encode heterogeneous samples into distant feature space, however, because of the diversity of real faces represented by factors such as skin color, environment, makeup and the like, too much attention is paid to the distance of the coded features of the real faces, which tends to cause model overfitting. In order to make the model pay more attention to the inter-class differences between the true face and the fake face rather than the intra-class differences, an unbalanced feature memory cache update strategy is used in the embodiment of the invention, which is specifically as follows: each training iteration adds the first h difficult sample features of the true face and all forged face sample features in the memory triplet loss function calculation result to the feature memory buffer, and in this embodiment, h=10 is taken.
Finally, the generated characteristics are sent into a living body characteristic classifier to predict the living body detection result, and a binary cross entropy loss L is calculated cls
The comprehensive loss function of the domain generalization countermeasure network in the embodiment of the invention is as follows:
L dg =L cls1 L ada2 L MemTriplet
Wherein lambda is 12 Is a super parameter. The domain generalization training is performed by optimizing the loss function, so that the model can generate the domain-invariant living body characteristics of the human face, and the model is generalized to an unknown domain.
The living body detection network after training convergence can be applied to living body detection tasks of actual RGB face images, and has better generalization capability. In the test stage, a feature generator of the domain-generalized living body detection network is used for acquiring a feature representation of a sample, and the feature representation is sent to a feature classifier to obtain a final living body detection result. The in-vivo detection effect of the present example was verified by a specific experiment as follows.
Experimental data: four published biopsy data sets were used to evaluate the effectiveness of the method of the invention, respectively: oulu-NPU (O), CASIA-MFSD (C), IDIAP ReplayAttack (I), and MSU-MFSD (M). And preprocessing the face videos provided in the four data sets by using an MTCNN model, taking a part of effective frames as experimental images, cutting out face areas from the experimental images, and obtaining 32800 pieces of data in total. In the experiment, three data sets are randomly selected and combined into a large mixed source domain data set, the data in the mixed source domain data set does not contain any domain label, and the rest is used as a target domain for testing, so that cross-data-set testing is performed. Thus, the experiment contained a total of 4 test tasks: o & C & I to M, O & M & I to C, O & C & M to I, I & C & M to O.
Experimental parameters: in experiments, the model of the embodiment of the invention trains and tests on the hardware environment of the 4-block RTX 2080Ti GPU. The batch size of the self-supervision domain clustering network is 256, the value of the clustering cluster K is 4, and the learning rate is set to be 0.03; the batch size of the domain generalization network is set to 60, and the learning rate is set to 0.01; the whole model uses a random gradient descent (Stochastic Gradient Descent, SDG) as an optimizer with a momentum value set to 0.9.
Evaluation index: the living body detection task needs to consider the respective classification error conditions of the attack sample and the real sample, so the embodiment of the invention uses a half total error rate (Half Total Error Rate, HTER) and an Area Under ROC Curve (AUC) as experimental evaluation indexes.
Experimental results:
(1) Comparison with common Living detection models
Methods using examples of the invention and several non-domain generalized living detection methods are common: MS_LBP, binary CNN, color Texture LBPTOP, auxiliary were compared, and the experimental results are shown in Table 1.
Table 1 comparison of the experimental results of the present method with the common living detection method on four test tasks
Figure BDA0004140358630000161
From the experimental results in table 1, it can be seen that the method in the example of the present invention has obvious advantages in both evaluation indexes compared with the method of training only on a single domain, which indicates that our model can extract domain invariant features related to living human face from multi-source domain data, thereby generalizing to unknown domains.
(2) Comparison with a homeodomain generalization Living detection method
We have chosen several existing domain generalization living detection methods: MMD-AAE, MADDG, D 2 AM, DRDG, ANRL, SSDG a comparative experiment was carried out with the method of the example of the present invention and the results are shown in table 2.
Table 2 comparison of experimental results of the present method with the existing domain generalization living body detection method on four test tasks
Figure BDA0004140358630000162
It can be seen that the model achieves the performance close to the best results of similar models on 4 test tasks, so that the model can efficiently extract the unchanged characteristics of living bodies, has higher generalization capability, and does not use any pre-labeled domain labels in the training stage unlike the existing most-domain-generalized living body detection model.
In addition, unlike existing methods that use datasets as domain labels, our method uses self-supervised domain clustering to generate pseudo-domain labels. In the test task from O & C & I to M, our method appears to surpass the existing other domain generalization model in two evaluation indexes, which means that the module can divide the domain to which different face samples belong more carefully and accurately to a certain extent.
In a specific implementation, the application provides a computer storage medium and a corresponding data processing unit, wherein the computer storage medium can store a computer program, and the computer program can run the invention content of a living body detection method based on self-supervision domain clustering and domain generalization and part or all of the steps in each embodiment when being executed by the data processing unit. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.
It will be apparent to those skilled in the art that the technical solutions in the embodiments of the present invention may be implemented by means of a computer program and its corresponding general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in the form of a computer program, i.e. a software product, which may be stored in a storage medium, and include several instructions to cause a device (which may be a personal computer, a server, a single-chip microcomputer, MUU or a network device, etc.) including a data processing unit to perform the methods described in the embodiments or some parts of the embodiments of the present invention.
The invention provides a thought and a method of a living body detection method based on self-supervision domain clustering and domain generalization, and the method and the way for realizing the technical scheme are numerous, the above is only a preferred embodiment of the invention, and it should be pointed out that a plurality of improvements and modifications can be made by those skilled in the art without departing from the principle of the invention, and the improvements and the modifications are also regarded as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (10)

1. A living body detection method based on self-supervision domain clustering and domain generalization is characterized in that pseudo domain labels of face samples are generated in a self-supervision domain clustering mode, sample data with the pseudo domain labels are trained by using the domain generalization method, and classification results of living body detection are obtained, and the living body detection method specifically comprises the following steps:
step 1, generating self-supervision domain clustering samples, namely randomly selecting data belonging to different domains from images of a mixed source domain data set as samples, and performing domain information migration between two images based on low-frequency signals on an image frequency domain to generate positive and negative samples;
step 2, constructing a self-supervision domain clustering network and training the self-supervision domain clustering network, wherein the training is to train a domain clustering method based on contrast learning, construct a learning task, extract domain related features from positive and negative samples containing domain information obtained in the step 1, and perform domain clustering according to the distribution of the samples in a feature space;
Training the sample with the pseudo domain label by using a domain generalization living body detection model, namely using the sample with the pseudo domain label distributed according to the clustering result as a source domain data set, extracting living body domain invariant features of the sample by using a domain generalization method based on countermeasure training, and optimizing the domain generalization living body detection model by using classification cross entropy loss, countermeasure loss and memory triplet loss; and taking the final output of the domain generalization living body detection model obtained by training as a living body detection classification result to finish living body detection based on self-supervision domain clustering and domain generalization.
2. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 1, wherein the method in step 1 performs domain information migration between two images based on low frequency signals on an image frequency domain, and specifically comprises:
step 1-1, randomly sampling two different face images from a mixed source domain data set to serve as a source image and a migration target image, obtaining frequency domain signals of the two images by using Fourier transformation, and decomposing amplitude components from the frequency domain signals
Figure FDA0004140358620000011
And phase component->
Figure FDA0004140358620000012
The specific method comprises the following steps:
Figure FDA0004140358620000013
Figure FDA0004140358620000014
where x is an input source image or a migration target image,
Figure FDA0004140358620000015
Is a fourier transform function, arg (·) represents taking a phase value for the complex number;
step 1-2, filtering amplitude components of frequency domain signals of a source image by using a Gaussian low-pass filter to obtain low-frequency signals of the source image; filtering amplitude components of the frequency domain signals of the migration target image by using a Gaussian high-pass filter to obtain high-frequency signals of the migration target image;
step 1-3, fusing the low frequency signal of the source image and the high frequency signal of the migration target image in proportion, and transforming the mixed frequency domain signal back to a two-dimensional image by using Fourier inversion to obtain an image x after domain information migration DT The specific method comprises the following steps:
Figure FDA0004140358620000021
Figure FDA0004140358620000022
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004140358620000023
is the amplitude component, x, of the domain information migration s ,x t H is a source image and a migration target image L ,H H Is a Gaussian low-pass filter and a Gaussian high-pass filter, < >>
Figure FDA0004140358620000024
Is an inverse Fourier transform function, lambda 12 J is an imaginary unit for the super parameter;
step 1-4, performing auxiliary enhancement on a source image and a domain information migration image by using a contrast learning data enhancement method to obtain positive and negative samples; the positive sample is an image subjected to domain information migration and contains domain information similar to the source image; the negative samples are other target images from different domain distributions.
3. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 2, wherein the self-supervision domain clustering network in step 2 consists of a dual-branch backbone network, a feature mapping head, a feature dictionary and a feature clustering device;
the method comprises the steps that a ResNet-18 network model is adopted in the dual-branch trunk network, two branches are respectively a query coding network and a momentum coding network, and positive samples obtained in the step 1 are respectively sent into the two branches to extract domain related features; mapping the extracted domain related features by using a feature mapping head to obtain comparison features q and k + The method comprises the steps of carrying out a first treatment on the surface of the The two branches are initialized by adopting the same weight parameters, and the weight parameters are updated by using different methods, wherein the weight parameters of the query coding network are updated in the training process by using gradient back propagation; the momentum coding network updates the weight parameters by using a momentum updating mode, wherein the momentum updating mode of the weight parameters is as follows:
w k =mw k +(1-m)w q
wherein w is k For inquiring the weight of the coding network, m is a momentum coefficient, the value is between 0 and 1, and w q Weights for the momentum encoding network;
the feature mapping head adopts a structure of a multi-layer perceptron and performs feature mapping on features of two similar samples extracted by the double-branch encoder;
The feature dictionary adopts a queue structure and is used for storing feature representations generated by previous batches through momentum network branches as negative sample features for comparison with the sample features of the current batches;
the feature clustering device clusters the domain related features extracted by the momentum encoder by adopting a K average clustering algorithm to obtain a clustering center of each clustering cluster;
and optimizing the self-supervision domain clustering network by using a sample comparison learning loss function and a clustering comparison learning loss function.
4. A method for in-vivo detection based on self-supervised domain clustering and domain generalization as recited in claim 3, wherein the training of the self-supervised domain clustering network in step 2 comprises the steps of:
step 2-1, calculating a cluster to which a sample belongs and a cluster center;
step 2-2, calculating the similarity between the contrast features;
and 2-3, constructing a comprehensive loss function of the self-supervision domain clustering network, and training the self-supervision domain clustering network by calculating the comprehensive loss function.
5. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 4, wherein the calculating the cluster and the cluster center to which the sample belongs in step 2-1 specifically comprises:
Before each round of training starts, all training samples x= { X 1 ,x 2 ,…,x N Input momentum coding branches to obtain a domain related characteristic representation Z= { Z of each sample 1 ,z 2 ,…,z N -wherein N represents the number of samples; then, clustering the domain related characteristic representation Z of each sample by using a K-average clustering method to obtain a cluster to which each sample belongs and K cluster centers { c } 1 ,c 2 ,…,c K }。
6. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 5, wherein the step 2-2 of calculating the similarity between contrast features specifically comprises:
calculating two positive sample features q, k + Similarity sim (q, k) + ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the feature k stored in the feature dictionary by adopting the feature representation generated by the batch through momentum network branches before the feature dictionary is stored i As a negative sample feature, calculate its similarity sim (q, k) i ) The method comprises the steps of carrying out a first treatment on the surface of the Similarity is calculated using the inner product of the two features.
7. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 6, wherein the constructing the comprehensive loss function of the self-supervision domain clustering network in step 2-3 specifically comprises:
construction sample contrast loss L s Loss L in contrast to cluster c The specific method comprises the following steps:
Figure FDA0004140358620000031
Figure FDA0004140358620000032
where r is the number of negative samples; tau sum
Figure FDA0004140358620000033
Is a temperature coefficient used for controlling the shape of the characteristic distribution;
the comprehensive loss function of the self-supervision domain clustering network is constructed as follows:
L CL =L s +λL c
wherein lambda is the balance coefficient;
the self-supervision clustering network is trained by calculating the comprehensive loss function, namely, the result of self-supervision domain clustering is used as a pseudo domain label to be distributed to the belonging samples, specifically: dividing the clustering result into K parts according to the clustering clusters, wherein each part represents a domain; assigning a domain number to each domain from 1 as a pseudo domain label; a corresponding pseudo-domain label is assigned to each sample of the cluster.
8. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 7, wherein the network structure of the domain generalization living body detection model in step 3 comprises: the device comprises a feature generator, a feature classifier, a domain discriminator and a feature memory cache;
obtaining a characteristic representation of the sample by using a characteristic generator, sending the generated characteristic into a characteristic classifier, and calculating a classification cross entropy loss; inputting the feature representation into a domain discriminator to obtain a domain discrimination result, training a feature generator and the domain discriminator in a countermeasure training mode, calculating domain countermeasure loss, and reducing domain deviation contained in the feature; splicing the generated features with the features in the memory buffer, calculating the memory triplet loss of the features, and updating the feature memory buffer according to the calculation result of the loss function; finally, the domain generalization living detection model is optimized using the classification cross entropy loss, the antagonism loss and the memory triplet loss.
9. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 8, wherein the fight loss and memory triplet loss in step 3 is specifically calculated as follows:
optimizing a network formed by the feature generator and the domain discriminator by using a countermeasure training mode, reversing the gradient from the domain discriminator to the feature generator when the countermeasure loss carries out gradient back propagation, so that training targets between the domain discriminator and the feature generator are opposite, and the whole countermeasure training mode is as follows:
Figure FDA0004140358620000041
wherein X and Y D Respectively a sample set and a pseudo domain label set, G is a feature generator, D is a domain discriminator, K is the number of source domains,
Figure FDA0004140358620000042
for the indicator function, when i=y has a value of 1, otherwise 0, y represents the domain label of sample x; calculating the countering loss by using a binary cross entropy function;
using a feature memory cache storage domain to generalize feature representations of previous rounds of iteration of the living body detection model; splicing the features in the memory cache with the features generated in the iteration, and calculating the memory triplet loss, namely a specific loss function L MemTriplet The expression is as follows:
Figure FDA0004140358620000051
wherein F is n For the feature set of the iteration of this round, M is the feature set in the memory buffer, f a ,f p ,f n Respectively an anchor point sample characteristic, a positive sample characteristic and a negative sample characteristic, wherein alpha is an edge coefficient;
and using an unbalanced characteristic memory cache updating strategy, namely adding the first h difficult sample characteristics of the true human face and all forged human face sample characteristics in the memory triplet loss function calculation result into the characteristic memory cache every training iteration.
10. The living body detection method based on self-supervision domain clustering and domain generalization according to claim 9, wherein the optimizing the living body detection model of domain generalization by using the classification cross entropy loss, the contrast loss and the memory triplet loss in the step 3 is that the comprehensive loss function of constructing the living body detection model of domain generalization is:
L dg =L cl1 L ada2 L MemTriplet
wherein L is cls A two-class cross entropy loss which is the living body characteristic of the human face; lambda (lambda) 12 Is a super parameter.
CN202310287952.5A 2023-03-23 2023-03-23 Living body detection method based on self-supervision domain clustering and domain generalization Pending CN116403290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310287952.5A CN116403290A (en) 2023-03-23 2023-03-23 Living body detection method based on self-supervision domain clustering and domain generalization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310287952.5A CN116403290A (en) 2023-03-23 2023-03-23 Living body detection method based on self-supervision domain clustering and domain generalization

Publications (1)

Publication Number Publication Date
CN116403290A true CN116403290A (en) 2023-07-07

Family

ID=87006712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310287952.5A Pending CN116403290A (en) 2023-03-23 2023-03-23 Living body detection method based on self-supervision domain clustering and domain generalization

Country Status (1)

Country Link
CN (1) CN116403290A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630727A (en) * 2023-07-26 2023-08-22 苏州浪潮智能科技有限公司 Model training method, deep pseudo image detection method, device, equipment and medium
CN116910556A (en) * 2023-07-24 2023-10-20 润联智能科技股份有限公司 Power plant equipment abnormality detection method, training device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910556A (en) * 2023-07-24 2023-10-20 润联智能科技股份有限公司 Power plant equipment abnormality detection method, training device, equipment and medium
CN116630727A (en) * 2023-07-26 2023-08-22 苏州浪潮智能科技有限公司 Model training method, deep pseudo image detection method, device, equipment and medium
CN116630727B (en) * 2023-07-26 2023-11-03 苏州浪潮智能科技有限公司 Model training method, deep pseudo image detection method, device, equipment and medium

Similar Documents

Publication Publication Date Title
Xu et al. Model-driven deep-learning
CN112766158B (en) Multi-task cascading type face shielding expression recognition method
CN116403290A (en) Living body detection method based on self-supervision domain clustering and domain generalization
CN111080513B (en) Attention mechanism-based human face image super-resolution method
CN112949837A (en) Target recognition federal deep learning method based on trusted network
Wang et al. Industrial cyber-physical systems-based cloud IoT edge for federated heterogeneous distillation
Du et al. Age factor removal network based on transfer learning and adversarial learning for cross-age face recognition
Ibitoye et al. Differentially private self-normalizing neural networks for adversarial robustness in federated learning
CN111259264B (en) Time sequence scoring prediction method based on generation countermeasure network
CN114282059A (en) Video retrieval method, device, equipment and storage medium
Lin et al. The design of error-correcting output codes based deep forest for the micro-expression recognition
CN112308093A (en) Air quality perception method based on image recognition, model training method and system
CN115457374B (en) Deep pseudo-image detection model generalization evaluation method and device based on reasoning mode
Duan et al. SSGD: A safe and efficient method of gradient descent
CN112668401B (en) Face privacy protection method and device based on feature decoupling
CN112950222A (en) Resource processing abnormity detection method and device, electronic equipment and storage medium
Wu et al. Unsupervised domain adaptation for disguised face recognition
Wu et al. Learning domain-invariant representation for generalizing face forgery detection
CN111476267A (en) Method and electronic device for classifying drug efficacy according to cell image
Jaswanth et al. Deep learning based intelligent system for robust face spoofing detection using texture feature measurement
CN116721315B (en) Living body detection model training method, living body detection model training device, medium and electronic equipment
CN117040939B (en) Vehicle-mounted network intrusion detection method based on improved visual self-attention model
Cao et al. Federated Learning Based on Feature Fusion
Zhou et al. Deep fuzzy classification by stacked architecture for epileptic electroencephalograms signals
Sun et al. Texture-guided multiscale feature learning network for palmprint image quality assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination