CN116189255A - Face living body detection method based on generation type domain adaptation - Google Patents
Face living body detection method based on generation type domain adaptation Download PDFInfo
- Publication number
- CN116189255A CN116189255A CN202211571186.7A CN202211571186A CN116189255A CN 116189255 A CN116189255 A CN 116189255A CN 202211571186 A CN202211571186 A CN 202211571186A CN 116189255 A CN116189255 A CN 116189255A
- Authority
- CN
- China
- Prior art keywords
- image
- living body
- body detection
- loss
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 55
- 230000006978 adaptation Effects 0.000 title claims abstract description 15
- 238000001228 spectrum Methods 0.000 claims abstract description 17
- 210000005036 nerve Anatomy 0.000 claims abstract description 13
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 15
- 238000000034 method Methods 0.000 claims description 12
- 230000001537 neural effect Effects 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 235000010627 Phaseolus vulgaris Nutrition 0.000 claims description 3
- 244000046052 Phaseolus vulgaris Species 0.000 claims description 3
- 230000009977 dual effect Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 206010016035 Face presentation Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Algebra (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a face living body detection method based on generation type domain adaptation, which comprises the following steps: building a living body detection model and a generator; carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images; the diversified target images and the original target images are stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images; and inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of living body detection of the human face. The invention solves the problems that the existing face living body detection method has insufficient supervision of unlabeled target domains, most of work mainly focuses on the alignment of high-level semantic features, and low-level features of face living body detection tasks are ignored.
Description
Technical Field
The invention relates to the technical field of human face living body detection, in particular to a human face living body detection method based on generation domain adaptation.
Background
Face live detection (FAS) is intended to detect face images from real persons or various face presentation attacks. Early works utilized hand-made functions to address this issue, such as SIFT, LBP, and HOG. There are several methods to utilize information from different domains, such as HSV and YCrCb color space, time domain and fourier spectrum. Recent approaches utilize CNNs to model FAS with two classifications or additional supervision, such as depth maps, reflectance maps, and r-ppg signals. Other approaches employ decoupling and custom operators to improve performance. Although good results were achieved in training within the dataset, their performance on the target domain still significantly declines due to the larger domain shifts.
To improve performance under cross-domain settings, domain Generalization (DG) is introduced in FAS tasks. However, the DG-FAS method aims at mapping samples to a common feature space and lacks specific information of an invisible domain, inevitably resulting in unsatisfactory results. Recent studies on UDA-FAS have relied primarily on pseudo-labeling, counterlearning, or minimizing domain differences to reduce domain shifting. However, they are still under-supervised by unlabeled target domains, which may lead to negative migration of the source model. Furthermore, most of the work is focused on the alignment of high-level semantic features, while low-level features critical to FAS tasks are ignored.
Disclosure of Invention
Aiming at the defects, the invention provides a face living body detection method based on the generated domain adaptation, which aims to solve the problems that the existing face living body detection method has insufficient supervision of unlabeled target domains, most of work mainly focuses on the alignment of high-level semantic features and ignores low-level features of a face living body detection task.
To achieve the purpose, the invention adopts the following technical scheme:
a face living body detection method based on generation domain adaptation comprises the following steps:
step S1: building a living body detection model and a generator;
step S2: carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images;
step S3: the diversified target images and the original target images are both stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images;
step S4: and inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of living body detection of the human face.
Preferably, in step S2, the following steps are specifically included:
step S21: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formula is as follows:
wherein ,F(xt ) (u, v) is the target image x t U and v are frequency variations in the frequency domain, H is the maximum height of the image, W is the maximum width of the image, H is the image height, and W is the image width;
step S22: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formulas are as follows:
A(x t )(u,v)=[R 2 (x t )(u,v)+I 2 (x t )(u,v)] 1/2
wherein ,A(xt ) (u, v) is amplitude, P (x) t ) (u, v) is the phase, R (x) t ) Is F (x) t ) Is the real part of I (x) t ) Is F (x) t ) Is the imaginary part of (2);
step S23: computing the target domain D from the same unlabeled target domain t Is an arbitrary image of two of (a)The specific formula is as follows:
wherein ,interpolation for mixed amplitude +.>For image->Amplitude interpolation of>For image->lambda-U (0, eta), the super-parameter eta controls the enhanced intensity;
step S24: combining the mixed amplitude spectrum with the original phase spectrum to reconstruct a new fourier representation:
wherein ,for interpolating the image +.>Is a two-dimensional discrete Fourier transform of>For interpolating the image +.>Mixed amplitude of>For interpolating the image +.>Is the original phase of (a);
step S25: will beAn interpolated image is generated by inverse fourier transform, and the specific formula is as follows:
Preferably, in step S3, the inter-domain neural statistical consistency is used to guide the generator to generate the source pattern image, specifically including the following steps: calculating interdomain gap L stat Namely, the inter-domain nerve statistical consistency loss is expressed as follows:
where l= e {1,2, …, L } represents the first layer in the source training model, including the feature extractor, classifier, and depth estimator, L represents the first layer in the source training model,representing running average of source style data, +.>Representing the running variance of the source style data, +.>Representing a stored average value of the source model,/>Representing the stored variance of the source model.
Preferably, in step S3, the content is constrained by adopting dual semantic consistency of feature level and image level to ensure that semantic content is preserved in the image conversion process, specifically comprising the following steps:
step S31: source style image to be generatedAnd original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Perception loss L per The specific formula of (2) is as follows:
wherein ,image +.>And original target image x t Between (a) and (b)Perception loss, C j The number of channels of the j-th layer of the characteristic diagram is H j For the height of the j-th layer of the feature map, W j For the width of the j-th layer of the feature map, +.>Image +.>Convolution of layer j,>for the original target image x t Convolution of layer j;
step S32 by minimizing semantic consistency loss L ph To enhance phase consistency between original target image and source pattern image, L ph The specific formula is as follows:
wherein,<,>is the dot product of the two-dimensional space, I.I. | 2 Is the L2 norm and is used to determine,for the negative cosine distance, x, between the original phase of the original target image and the generated phase of the source pattern image t For the original target image->For source pattern image, F (x t ) j For the jth original target image x t Fourier transform of->For the j-th source pattern image +.>Is a fourier transform of (a).
Preferably, the method further comprises the step of training a living body detection model, and specifically comprises the following steps of:
step S51: calculating entropy loss through a classifier and a depth estimator to obtain the entropy loss of the classifier and the entropy loss of the depth estimator, wherein the specific formula is as follows:
wherein ,Lent1 For classifier entropy loss, L ent2 For the depth estimator entropy loss,the label probability distribution for a source pattern image, C is the channel, H is the height of the image, W is the width of the image, C is the total number of channels, H is the upper height limit of the image, W is the upper width limit of the image,/->(h, w) is any pixel of the source pattern image, < >>Estimating the depth of any pixel on the source pattern image;
step S52: adding the entropy loss of the classifier and the entropy loss of the depth estimator to obtain the total entropy loss, wherein the specific formula is as follows:
L ent =L ent1 +L ent2
wherein ,Lent Is the total entropy loss.
Preferably, the total loss of generator parameter optimization is calculated from the total entropy loss, perceptual loss, semantic consistency loss, and inter-domain neural statistics consistency loss of the in-vivo detection model training, with the specific formulas as follows:
L total =L stat +L per +λ ent L ent +λ ph L ph
wherein ,Ltotal L is the total loss ent Lambda is the total entropy loss ent Weighting coefficients for total entropy loss, L ph Lambda is the loss of semantic consistency ph Weighting coefficients for semantic consistency loss, L stat For interdomain neural statistical consistency loss, L per Is a perceived loss.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the scheme, on one hand, the intra-domain spectrum mixing is adopted to expand the target data distribution, so that unmarked target data can be tested on an invisible test subset of the target domain, and the problem of insufficient supervision of the unmarked target domain is avoided. On the other hand, the inter-domain nerve statistics consistency is adopted to guide the generator to generate a source pattern image, the feature statistics of the target data and the feature statistics of the source pattern data are completely aligned at a high level and a low level, and the inter-domain gap is effectively reduced.
Drawings
Fig. 1 is a step diagram of a face in-vivo detection method based on generated domain adaptation.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
A face living body detection method based on generation domain adaptation comprises the following steps:
step S1: building a living body detection model and a generator;
step S2: carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images;
step S3: the diversified target images and the original target images are both stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images;
and S4, inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of the living body detection of the human face.
In the face living body detection method based on the generated domain adaptation, as shown in fig. 1, a living body detection model and a generator are established in the first step, so that preparation is made for face living body detection and generation of source pattern images. The second step is that the images of the unlabeled target fields are subjected to intra-field spectrum mixing to generate diversified target images and original target images; specifically, for unlabeled target data, if training is performed on only a visible training subset of the target domain and not on an invisible testing subset of the target domain, the image quality of the source pattern domain may be reduced. The third step is that the diversified target image and the original target image are both stylized into a source style image through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style image, specifically, the scheme provides inter-domain nerve statistics consistency to guide the generator to generate the source style image, and feature statistics of target data and feature statistics of source style data are completely aligned at a high level and a low level, and inter-domain differences are effectively reduced. Inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of the living body detection of the human face; specifically, the accuracy of the result of the human face living body detection obtained through the living body detection model is higher.
According to the scheme, on one hand, the intra-domain spectrum mixing is adopted to expand the target data distribution, so that unmarked target data can be tested on an invisible test subset of the target domain, and the problem of insufficient supervision of the unmarked target domain is avoided. On the other hand, the inter-domain nerve statistics consistency is adopted to guide the generator to generate a source pattern image, the feature statistics of the target data and the feature statistics of the source pattern data are completely aligned at a high level and a low level, and the inter-domain gap is effectively reduced.
Preferably, in step S2, the method specifically includes the following steps:
step S21: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formula is as follows:
wherein ,F(xt ) (u, v) is the target image x t U and v are frequency variations in the frequency domain, H is the maximum height of the image, W is the maximum width of the image, H is the image height, and W is the image width;
step S22: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formulas are as follows:
A(x t )(u,v)=[R 2 (x t )(u,v)+I 2 (x t )(u,v)] 1/2
wherein ,A(xt ) (u, v) is amplitude, P (x) t ) (u, v) is the phase, R (x) t ) Is F (x) t ) Is the real part of I (x) t ) Is F (x) t ) Is the imaginary part of (2);
step S23: computing the target domain D from the same unlabeled target domain t Is an arbitrary image of two of (a)The specific formula is as follows:
wherein ,interpolation for mixed amplitude +.>For image->Amplitude interpolation of>For image->lambda-U (0, eta), the super-parameter eta controls the enhanced intensity;
step S24: combining the mixed amplitude spectrum with the original phase spectrum to reconstruct a new fourier representation:
wherein ,for interpolating the image +.>Is a two-dimensional discrete Fourier transform of>For interpolating the image +.>Mixed amplitude of>For interpolating the image +.>Is the original phase of (a);
step S25: will beAn interpolated image is generated by inverse fourier transform, and the specific formula is as follows:
In this embodiment, since the phase tends to preserve most of the content in the fourier spectrum of the signal, while the amplitude mainly contains patterns of specific domains, a diversified image can be generated that preserves content in a continuous frequency space but has new patterns. In particular, by performing the above steps, invisible target samples with new patterns and original content can be efficiently generated in a continuous frequency space, and by feeding forward these diversified images to the generator, the generalization ability of different subsets within the target domain can be further enhanced.
Preferably, in step S3, the inter-domain neural statistical consistency is used to guide the generator to generate the source pattern image, specifically including the following steps: calculating interdomain gap L stat Namely, the inter-domain nerve statistical consistency loss is expressed as follows:
where l= e {1,2, …, L } represents the first layer in the source training model, including the feature extractor, classifier, and depth estimator, L represents the first layer in the source training model,representing running average of source style data, +.>Representing the running variance of the source style data, +.>Representing a stored average value of the source model,/>Representing the stored variance of the source model.
Specifically, batch Normalization (BN) normalizes each input feature within a small batch in a channel manner such that the mean μ and variance σ of the features within the small batch 2 Let the mean μ be 0, variance σ 2 1, the specific formula is as follows:
wherein B is the minimum batch size, x i Is an input feature.
Further illustratively, in training the living body detection model, the source statistics at step n+1 and />The specific formula is as follows:
wherein ,for the exponential moving average of the source statistics at step n+1, +.>Exponentially moving variance for source statistics at step n+1, α being the ratio, ++>Index moving average of source statistics at step n, +.>Exponentially moving variance for source statistics at step n, +.>Index average value of source statistics at step n, +.>Is the exponential variance of the source statistics at step n.
The neural statistics of source features stored in a well-trained biopsy model provides sufficient supervision for low-level and high-level features that may represent a domain-specific style that may be fully used to aid in distribution alignment in UDA. However, the conventional method only uses the output features of the higher layers to perform distribution alignment, and cannot fully use the abundant and differentiated active clues in the features of the lower layers. Therefore, we can easily estimate the source style data taking into account these stored BN statistics wherein />
The embodiment provides a loss L of inter-domain neural statistics consistency stat To match source style dataRunning average of +.>Running variance->And stored statistics of source model +.>Feature statistics between the two domains, thereby bridging the inter-domain gap. At loss L stat Under the direction of (a), the present embodiment can approximate a source style field having a similar style to the source field. Unlike existing generation of image content from input random noise, the neural statistical consistency of the present embodiment uses BN statistical alignment as a constraint to style the input image without changing the content.
Preferably, in step S3, the content is constrained by adopting dual semantic consistency of feature level and image level to ensure that semantic content is preserved in the image conversion process, and specifically comprises the following steps:
step S31: source style image to be generatedAnd original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Perception loss L per The specific formula of (2) is as follows:
wherein ,image +.>And original target image x t Perceived loss between C j The number of channels of the j-th layer of the characteristic diagram is H j For the height of the j-th layer of the feature map, W j For the width of the j-th layer of the feature map, +.>Image +.>Convolution of layer j,>for the original target image x t Convolution of layer j;
step S32 by minimizing semantic consistency loss L ph To enhance phase consistency between original target image and source pattern image, L ph The specific formula is as follows:
wherein,<,>is the dot product of the two-dimensional space, I.I. | 2 Is the L2 norm and is used to determine,for the negative cosine distance, x, between the original phase of the original target image and the generated phase of the source pattern image t For the original target image->For source pattern image, F (x t ) j For the jth original target image x t Fourier transform of->For the j-th source pattern image +.>Is a fourier transform of (a).
In the present embodiment, at the feature level, the generated source pattern imageAnd original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Thereby reducing the perceived difference between them.
However, using this perceptual penalty in space alone is not sufficient to ensure semantic consistency. Fourier transformation from one domain to another affects only the amplitude of its spectrum and not its phase, where the phase component retains most of the content in the original signal, while the amplitude component contains mainly patterns, semantic inconsistencies can be explicitly penalized by ensuring that the phase is preserved before and after the image conversion. Thus, by minimizing semantic consistency loss L ph I.e. minimizing source style imagesAnd original target image x t The image level differences over the fourier spectrum maintain phase consistency.
Preferably, the method further comprises the step of training a living body detection model, and specifically comprises the following steps of:
step S51: calculating entropy loss through a classifier and a depth estimator to obtain the entropy loss of the classifier and the entropy loss of the depth estimator, wherein the specific formula is as follows:
wherein ,Lent1 For classifier entropy loss, L ent2 For the depth estimator entropy loss,the label probability distribution for a source pattern image, C is the channel, H is the height of the image, W is the width of the image, C is the total number of channels, H is the upper height limit of the image, W is the upper width limit of the image,/->For any pixel of the source pattern image, +.>Estimating the depth of any pixel on the source pattern image;
step S52: adding the entropy loss of the classifier and the entropy loss of the depth estimator to obtain the total entropy loss, wherein the specific formula is as follows:
L ent =L ent1 +L ent2
wherein ,Lent Is the total entropy loss.
In this embodiment, the entropy loss of the classifier and the depth estimator is calculated, so that the prediction distribution obtained by the living body detection model is as close as possible to the actual distribution of the data.
Preferably, the total loss of generator parameter optimization is calculated according to the total entropy loss, the perception loss, the semantic consistency loss and the inter-domain nerve statistics consistency loss trained by the living body detection model, and the specific formula is as follows:
L total =L stdt +L per +λ ent L ent +λ ph L ph
wherein ,Ltotal L is the total loss ent Lambda is the total entropy loss ent Weighting coefficients for total entropy loss, L ph Lambda is the loss of semantic consistency ph Weighting coefficients for semantic consistency loss, L stat For interdomain neural statistical consistency loss, L per Is a perceived loss.
In this embodiment, in the pretrained living body detection model, parameters of the feature extractor, the classifier, the depth estimator, and the VGG16 module are fixed, and thus, the resulting source pattern image can be further optimized by optimizing parameters of the generator. Loss can be generated in the optimization process, and the quality of the source pattern image can be effectively improved by reducing the loss.
Furthermore, functional units in various embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations of the above embodiments may be made by those skilled in the art within the scope of the invention.
Claims (6)
1. A face living body detection method based on generation domain adaptation is characterized in that: the method comprises the following steps:
step S1: building a living body detection model and a generator;
step S2: carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images;
step S3: the diversified target images and the original target images are both stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images;
step S4: and inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of living body detection of the human face.
2. The face living body detection method based on the generated domain adaptation according to claim 1, wherein: in step S2, the method specifically includes the following steps:
step S21: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formula is as follows:
wherein ,F(xt ) (u, v) is the target image x t U and v are frequency variations in the frequency domain, H is the maximum height of the image, W is the maximum width of the image, H is the image height, and W is the image width;
step S22: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formulas are as follows:
A(x t )(u,v)=[R 2 (x t )(u,v)+I 2 (x t )(u,v)] 1/2
wherein ,A(xt ) (u, v) is amplitude, P (x) t ) (u, v) is the phase, R (x) t ) Is F (x) t ) Is the real part of I (x) t ) Is F (x) t ) Is the imaginary part of (2);
step S23: computing the target domain D from the same unlabeled target domain t Is an arbitrary image of two of (a)The specific formula is as follows:
wherein ,interpolation for mixed amplitude +.>For image->Amplitude interpolation of>For image->lambda-U (0, eta), the super-parameter eta controls the enhanced intensity;
step S24: combining the mixed amplitude spectrum with the original phase spectrum to reconstruct a new fourier representation:
wherein ,for interpolating the image +.>Is a two-dimensional discrete Fourier transform of>For interpolating the image +.>Mixed amplitude of>For interpolating the image +.>Is the original phase of (a);
step S25: will beAn interpolated image is generated by inverse fourier transform, and the specific formula is as follows:
3. The face living body detection method based on the generated domain adaptation according to claim 1, wherein: in step S3, the inter-domain neural statistical consistency is adopted to guide the generator to generate the source pattern image, which specifically comprises the following steps: calculating interdomain gap L stat Namely, the inter-domain nerve statistical consistency loss is expressed as follows:
where l= e {1,2,., L } represents the first layer in the source training model, including the feature extractor, classifier, and depth estimator, L represents the first layer in the source training model,representing running average of source style data, +.>Operator representing source style dataDifference (S)>Representing a stored average value of the source model,/>Representing the stored variance of the source model.
4. A face living body detection method based on generated domain adaptation according to claim 3, wherein: in step S3, the content is constrained by adopting dual semantic consistency of the feature level and the image level to ensure that the semantic content is preserved in the image conversion process, which specifically comprises the following steps:
step S31: source style image to be generatedAnd original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Perception loss L per The specific formula of (2) is as follows:
wherein ,image +.>And original target image x t Perceived loss between C j The number of channels of the j-th layer of the characteristic diagram is H j For the height of the j-th layer of the feature map, W j For the width of the j-th layer of the feature map, +.>Image +.>Convolution of layer j,>for the original target image x t Convolution of layer j;
step S32: by minimizing semantic consistency loss L ph To enhance phase consistency between original target image and source pattern image, L ph The specific formula is as follows:
wherein,<,>is the dot product of the two-dimensional space, I.I 2 Is the L2 norm and is used to determine,for the negative cosine distance, x, between the original phase of the original target image and the generated phase of the source pattern image t For the original target image->For source pattern image, F (x t ) j For the jth original target image x t Fourier transform of->For the j-th source pattern image +.>Is a fourier transform of (a).
5. The face living body detection method based on the generated domain adaptation according to claim 4, wherein the face living body detection method is characterized in that: the method also comprises the step of training a living body detection model, and specifically comprises the following steps of:
step S51: calculating entropy loss through a classifier and a depth estimator to obtain the entropy loss of the classifier and the entropy loss of the depth estimator, wherein the specific formula is as follows:
wherein ,Lent1 For classifier entropy loss, L ent2 For the depth estimator entropy loss,the label probability distribution for a source pattern image, C is the channel, H is the height of the image, W is the width of the image, C is the total number of channels, H is the upper height limit of the image, W is the upper width limit of the image,/->For any pixel of the source pattern image, +.>Estimating the depth of any pixel on the source pattern image;
step S52: adding the entropy loss of the classifier and the entropy loss of the depth estimator to obtain the total entropy loss, wherein the specific formula is as follows:
L ent =L ent1 +L ent2
wherein ,Lent Is the total entropy loss.
6. The face living body detection method based on the generated domain adaptation according to claim 5, wherein the face living body detection method is characterized in that: according to the total entropy loss, the perception loss, the semantic consistency loss and the inter-domain nerve statistics consistency loss of the living body detection model training, calculating the total loss of generator parameter optimization, wherein the specific formula is as follows:
L total =L stat +L per +λ ent L ent +λ ph L ph
wherein ,Ltotal L is the total loss ent Lambda is the total entropy loss ent Weighting coefficients for total entropy loss, L ph Lambda is the loss of semantic consistency ph Weighting coefficients for semantic consistency loss, L stat For interdomain neural statistical consistency loss, L per Is a perceived loss.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211571186.7A CN116189255A (en) | 2022-12-08 | 2022-12-08 | Face living body detection method based on generation type domain adaptation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211571186.7A CN116189255A (en) | 2022-12-08 | 2022-12-08 | Face living body detection method based on generation type domain adaptation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116189255A true CN116189255A (en) | 2023-05-30 |
Family
ID=86437343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211571186.7A Pending CN116189255A (en) | 2022-12-08 | 2022-12-08 | Face living body detection method based on generation type domain adaptation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116189255A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117456309A (en) * | 2023-12-20 | 2024-01-26 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Cross-domain target identification method based on intermediate domain guidance and metric learning constraint |
-
2022
- 2022-12-08 CN CN202211571186.7A patent/CN116189255A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117456309A (en) * | 2023-12-20 | 2024-01-26 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Cross-domain target identification method based on intermediate domain guidance and metric learning constraint |
CN117456309B (en) * | 2023-12-20 | 2024-03-15 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Cross-domain target identification method based on intermediate domain guidance and metric learning constraint |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jifara et al. | Medical image denoising using convolutional neural network: a residual learning approach | |
Gu et al. | No-reference image sharpness assessment in autoregressive parameter space | |
DE102016108737B4 (en) | Knowledge-based ultrasound image enhancement | |
Yu et al. | Image denoising using trivariate shrinkage filter in the wavelet domain and joint bilateral filter in the spatial domain | |
US8150151B2 (en) | Method for coding pixels or voxels of a digital image and a method for processing digital images | |
Fan et al. | Two-layer Gaussian process regression with example selection for image dehazing | |
CN103077506B (en) | In conjunction with local and non-local adaptive denoising method | |
Chen et al. | Semi-reference sonar image quality assessment based on task and visual perception | |
Wang et al. | Guided image contrast enhancement based on retrieved images in cloud | |
CN116189255A (en) | Face living body detection method based on generation type domain adaptation | |
CN115661649B (en) | BP neural network-based shipborne microwave radar image oil spill detection method and system | |
CN108627241A (en) | A kind of bottle-nosed dolphin click signal detecting methods based on gauss hybrid models | |
CN116188488B (en) | Gray gradient-based B-ultrasonic image focus region segmentation method and device | |
Acharya et al. | Image sub-division and quadruple clipped adaptive histogram equalization (ISQCAHE) for low exposure image enhancement | |
Guo et al. | An underwater image quality assessment metric | |
Verma et al. | Modified sigmoid function based gray scale image contrast enhancement using particle swarm optimization | |
Hao et al. | Two-stage underwater image restoration algorithm based on physical model and causal intervention | |
CN115311144A (en) | Wavelet domain-based standard flow super-resolution image reconstruction method | |
Li et al. | A novelty harmony search algorithm of image segmentation for multilevel thresholding using learning experience and search space constraints | |
Shen et al. | Feature-segmentation strategy based convolutional neural network for no-reference image quality assessment | |
Iscan et al. | Ultrasound image segmentation by using wavelet transform and self-organizing neural network | |
Anderson et al. | Classifying multi-frequency fisheries acoustic data using a robust probabilistic classification technique | |
Ma et al. | Edge-guided cnn for denoising images from portable ultrasound devices | |
CN115187855A (en) | Seabed substrate sonar image classification method | |
CN114898091A (en) | Image countermeasure sample generation method and device based on regional information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |