CN116189255A - Face living body detection method based on generation type domain adaptation - Google Patents

Face living body detection method based on generation type domain adaptation Download PDF

Info

Publication number
CN116189255A
CN116189255A CN202211571186.7A CN202211571186A CN116189255A CN 116189255 A CN116189255 A CN 116189255A CN 202211571186 A CN202211571186 A CN 202211571186A CN 116189255 A CN116189255 A CN 116189255A
Authority
CN
China
Prior art keywords
image
living body
body detection
loss
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211571186.7A
Other languages
Chinese (zh)
Inventor
杨海东
付传辉
李泽辉
杨标
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Original Assignee
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute filed Critical Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority to CN202211571186.7A priority Critical patent/CN116189255A/en
Publication of CN116189255A publication Critical patent/CN116189255A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Algebra (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face living body detection method based on generation type domain adaptation, which comprises the following steps: building a living body detection model and a generator; carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images; the diversified target images and the original target images are stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images; and inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of living body detection of the human face. The invention solves the problems that the existing face living body detection method has insufficient supervision of unlabeled target domains, most of work mainly focuses on the alignment of high-level semantic features, and low-level features of face living body detection tasks are ignored.

Description

Face living body detection method based on generation type domain adaptation
Technical Field
The invention relates to the technical field of human face living body detection, in particular to a human face living body detection method based on generation domain adaptation.
Background
Face live detection (FAS) is intended to detect face images from real persons or various face presentation attacks. Early works utilized hand-made functions to address this issue, such as SIFT, LBP, and HOG. There are several methods to utilize information from different domains, such as HSV and YCrCb color space, time domain and fourier spectrum. Recent approaches utilize CNNs to model FAS with two classifications or additional supervision, such as depth maps, reflectance maps, and r-ppg signals. Other approaches employ decoupling and custom operators to improve performance. Although good results were achieved in training within the dataset, their performance on the target domain still significantly declines due to the larger domain shifts.
To improve performance under cross-domain settings, domain Generalization (DG) is introduced in FAS tasks. However, the DG-FAS method aims at mapping samples to a common feature space and lacks specific information of an invisible domain, inevitably resulting in unsatisfactory results. Recent studies on UDA-FAS have relied primarily on pseudo-labeling, counterlearning, or minimizing domain differences to reduce domain shifting. However, they are still under-supervised by unlabeled target domains, which may lead to negative migration of the source model. Furthermore, most of the work is focused on the alignment of high-level semantic features, while low-level features critical to FAS tasks are ignored.
Disclosure of Invention
Aiming at the defects, the invention provides a face living body detection method based on the generated domain adaptation, which aims to solve the problems that the existing face living body detection method has insufficient supervision of unlabeled target domains, most of work mainly focuses on the alignment of high-level semantic features and ignores low-level features of a face living body detection task.
To achieve the purpose, the invention adopts the following technical scheme:
a face living body detection method based on generation domain adaptation comprises the following steps:
step S1: building a living body detection model and a generator;
step S2: carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images;
step S3: the diversified target images and the original target images are both stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images;
step S4: and inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of living body detection of the human face.
Preferably, in step S2, the following steps are specifically included:
step S21: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formula is as follows:
Figure BDA0003988114320000021
wherein ,F(xt ) (u, v) is the target image x t U and v are frequency variations in the frequency domain, H is the maximum height of the image, W is the maximum width of the image, H is the image height, and W is the image width;
step S22: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formulas are as follows:
A(x t )(u,v)=[R 2 (x t )(u,v)+I 2 (x t )(u,v)] 1/2
Figure BDA0003988114320000022
wherein ,A(xt ) (u, v) is amplitude, P (x) t ) (u, v) is the phase, R (x) t ) Is F (x) t ) Is the real part of I (x) t ) Is F (x) t ) Is the imaginary part of (2);
step S23: computing the target domain D from the same unlabeled target domain t Is an arbitrary image of two of (a)
Figure BDA0003988114320000031
The specific formula is as follows:
Figure BDA0003988114320000032
wherein ,
Figure BDA0003988114320000033
interpolation for mixed amplitude +.>
Figure BDA0003988114320000034
For image->
Figure BDA0003988114320000035
Amplitude interpolation of>
Figure BDA0003988114320000036
For image->
Figure BDA0003988114320000037
lambda-U (0, eta), the super-parameter eta controls the enhanced intensity;
step S24: combining the mixed amplitude spectrum with the original phase spectrum to reconstruct a new fourier representation:
Figure BDA0003988114320000038
wherein ,
Figure BDA0003988114320000039
for interpolating the image +.>
Figure BDA00039881143200000310
Is a two-dimensional discrete Fourier transform of>
Figure BDA00039881143200000311
For interpolating the image +.>
Figure BDA00039881143200000312
Mixed amplitude of>
Figure BDA00039881143200000313
For interpolating the image +.>
Figure BDA00039881143200000314
Is the original phase of (a);
step S25: will be
Figure BDA00039881143200000315
An interpolated image is generated by inverse fourier transform, and the specific formula is as follows:
Figure BDA00039881143200000316
wherein ,
Figure BDA00039881143200000317
for interpolating the image.
Preferably, in step S3, the inter-domain neural statistical consistency is used to guide the generator to generate the source pattern image, specifically including the following steps: calculating interdomain gap L stat Namely, the inter-domain nerve statistical consistency loss is expressed as follows:
Figure BDA00039881143200000318
where l= e {1,2, …, L } represents the first layer in the source training model, including the feature extractor, classifier, and depth estimator, L represents the first layer in the source training model,
Figure BDA00039881143200000319
representing running average of source style data, +.>
Figure BDA00039881143200000320
Representing the running variance of the source style data, +.>
Figure BDA00039881143200000321
Representing a stored average value of the source model,/>
Figure BDA00039881143200000322
Representing the stored variance of the source model.
Preferably, in step S3, the content is constrained by adopting dual semantic consistency of feature level and image level to ensure that semantic content is preserved in the image conversion process, specifically comprising the following steps:
step S31: source style image to be generated
Figure BDA00039881143200000416
And original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Perception loss L per The specific formula of (2) is as follows:
Figure BDA0003988114320000041
/>
wherein ,
Figure BDA0003988114320000042
image +.>
Figure BDA00039881143200000417
And original target image x t Between (a) and (b)Perception loss, C j The number of channels of the j-th layer of the characteristic diagram is H j For the height of the j-th layer of the feature map, W j For the width of the j-th layer of the feature map, +.>
Figure BDA0003988114320000043
Image +.>
Figure BDA0003988114320000044
Convolution of layer j,>
Figure BDA0003988114320000045
for the original target image x t Convolution of layer j;
step S32 by minimizing semantic consistency loss L ph To enhance phase consistency between original target image and source pattern image, L ph The specific formula is as follows:
Figure BDA0003988114320000046
wherein,<,>is the dot product of the two-dimensional space, I.I. | 2 Is the L2 norm and is used to determine,
Figure BDA0003988114320000047
for the negative cosine distance, x, between the original phase of the original target image and the generated phase of the source pattern image t For the original target image->
Figure BDA0003988114320000048
For source pattern image, F (x t ) j For the jth original target image x t Fourier transform of->
Figure BDA0003988114320000049
For the j-th source pattern image +.>
Figure BDA00039881143200000410
Is a fourier transform of (a).
Preferably, the method further comprises the step of training a living body detection model, and specifically comprises the following steps of:
step S51: calculating entropy loss through a classifier and a depth estimator to obtain the entropy loss of the classifier and the entropy loss of the depth estimator, wherein the specific formula is as follows:
Figure BDA00039881143200000411
Figure BDA00039881143200000412
wherein ,Lent1 For classifier entropy loss, L ent2 For the depth estimator entropy loss,
Figure BDA00039881143200000413
the label probability distribution for a source pattern image, C is the channel, H is the height of the image, W is the width of the image, C is the total number of channels, H is the upper height limit of the image, W is the upper width limit of the image,/->
Figure BDA0003988114320000051
(h, w) is any pixel of the source pattern image, < >>
Figure BDA0003988114320000052
Estimating the depth of any pixel on the source pattern image;
step S52: adding the entropy loss of the classifier and the entropy loss of the depth estimator to obtain the total entropy loss, wherein the specific formula is as follows:
L ent =L ent1 +L ent2
wherein ,Lent Is the total entropy loss.
Preferably, the total loss of generator parameter optimization is calculated from the total entropy loss, perceptual loss, semantic consistency loss, and inter-domain neural statistics consistency loss of the in-vivo detection model training, with the specific formulas as follows:
L total =L stat +L perent L entph L ph
wherein ,Ltotal L is the total loss ent Lambda is the total entropy loss ent Weighting coefficients for total entropy loss, L ph Lambda is the loss of semantic consistency ph Weighting coefficients for semantic consistency loss, L stat For interdomain neural statistical consistency loss, L per Is a perceived loss.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the scheme, on one hand, the intra-domain spectrum mixing is adopted to expand the target data distribution, so that unmarked target data can be tested on an invisible test subset of the target domain, and the problem of insufficient supervision of the unmarked target domain is avoided. On the other hand, the inter-domain nerve statistics consistency is adopted to guide the generator to generate a source pattern image, the feature statistics of the target data and the feature statistics of the source pattern data are completely aligned at a high level and a low level, and the inter-domain gap is effectively reduced.
Drawings
Fig. 1 is a step diagram of a face in-vivo detection method based on generated domain adaptation.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
A face living body detection method based on generation domain adaptation comprises the following steps:
step S1: building a living body detection model and a generator;
step S2: carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images;
step S3: the diversified target images and the original target images are both stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images;
and S4, inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of the living body detection of the human face.
In the face living body detection method based on the generated domain adaptation, as shown in fig. 1, a living body detection model and a generator are established in the first step, so that preparation is made for face living body detection and generation of source pattern images. The second step is that the images of the unlabeled target fields are subjected to intra-field spectrum mixing to generate diversified target images and original target images; specifically, for unlabeled target data, if training is performed on only a visible training subset of the target domain and not on an invisible testing subset of the target domain, the image quality of the source pattern domain may be reduced. The third step is that the diversified target image and the original target image are both stylized into a source style image through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style image, specifically, the scheme provides inter-domain nerve statistics consistency to guide the generator to generate the source style image, and feature statistics of target data and feature statistics of source style data are completely aligned at a high level and a low level, and inter-domain differences are effectively reduced. Inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of the living body detection of the human face; specifically, the accuracy of the result of the human face living body detection obtained through the living body detection model is higher.
According to the scheme, on one hand, the intra-domain spectrum mixing is adopted to expand the target data distribution, so that unmarked target data can be tested on an invisible test subset of the target domain, and the problem of insufficient supervision of the unmarked target domain is avoided. On the other hand, the inter-domain nerve statistics consistency is adopted to guide the generator to generate a source pattern image, the feature statistics of the target data and the feature statistics of the source pattern data are completely aligned at a high level and a low level, and the inter-domain gap is effectively reduced.
Preferably, in step S2, the method specifically includes the following steps:
step S21: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formula is as follows:
Figure BDA0003988114320000071
wherein ,F(xt ) (u, v) is the target image x t U and v are frequency variations in the frequency domain, H is the maximum height of the image, W is the maximum width of the image, H is the image height, and W is the image width;
step S22: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formulas are as follows:
A(x t )(u,v)=[R 2 (x t )(u,v)+I 2 (x t )(u,v)] 1/2
Figure BDA0003988114320000072
wherein ,A(xt ) (u, v) is amplitude, P (x) t ) (u, v) is the phase, R (x) t ) Is F (x) t ) Is the real part of I (x) t ) Is F (x) t ) Is the imaginary part of (2);
step S23: computing the target domain D from the same unlabeled target domain t Is an arbitrary image of two of (a)
Figure BDA0003988114320000073
The specific formula is as follows:
Figure BDA0003988114320000081
wherein ,
Figure BDA0003988114320000082
interpolation for mixed amplitude +.>
Figure BDA0003988114320000083
For image->
Figure BDA0003988114320000084
Amplitude interpolation of>
Figure BDA0003988114320000085
For image->
Figure BDA0003988114320000086
lambda-U (0, eta), the super-parameter eta controls the enhanced intensity;
step S24: combining the mixed amplitude spectrum with the original phase spectrum to reconstruct a new fourier representation:
Figure BDA0003988114320000087
wherein ,
Figure BDA00039881143200000824
for interpolating the image +.>
Figure BDA0003988114320000089
Is a two-dimensional discrete Fourier transform of>
Figure BDA00039881143200000825
For interpolating the image +.>
Figure BDA00039881143200000811
Mixed amplitude of>
Figure BDA00039881143200000823
For interpolating the image +.>
Figure BDA00039881143200000813
Is the original phase of (a);
step S25: will be
Figure BDA00039881143200000822
An interpolated image is generated by inverse fourier transform, and the specific formula is as follows:
Figure BDA00039881143200000815
wherein ,
Figure BDA00039881143200000816
for interpolating the image.
In this embodiment, since the phase tends to preserve most of the content in the fourier spectrum of the signal, while the amplitude mainly contains patterns of specific domains, a diversified image can be generated that preserves content in a continuous frequency space but has new patterns. In particular, by performing the above steps, invisible target samples with new patterns and original content can be efficiently generated in a continuous frequency space, and by feeding forward these diversified images to the generator, the generalization ability of different subsets within the target domain can be further enhanced.
Preferably, in step S3, the inter-domain neural statistical consistency is used to guide the generator to generate the source pattern image, specifically including the following steps: calculating interdomain gap L stat Namely, the inter-domain nerve statistical consistency loss is expressed as follows:
Figure BDA00039881143200000817
where l= e {1,2, …, L } represents the first layer in the source training model, including the feature extractor, classifier, and depth estimator, L represents the first layer in the source training model,
Figure BDA00039881143200000818
representing running average of source style data, +.>
Figure BDA00039881143200000819
Representing the running variance of the source style data, +.>
Figure BDA00039881143200000820
Representing a stored average value of the source model,/>
Figure BDA00039881143200000821
Representing the stored variance of the source model.
Specifically, batch Normalization (BN) normalizes each input feature within a small batch in a channel manner such that the mean μ and variance σ of the features within the small batch 2 Let the mean μ be 0, variance σ 2 1, the specific formula is as follows:
Figure BDA0003988114320000091
wherein B is the minimum batch size, x i Is an input feature.
Further illustratively, in training the living body detection model, the source statistics at step n+1
Figure BDA0003988114320000092
and />
Figure BDA0003988114320000093
The specific formula is as follows:
Figure BDA0003988114320000094
Figure BDA0003988114320000095
wherein ,
Figure BDA0003988114320000096
for the exponential moving average of the source statistics at step n+1, +.>
Figure BDA0003988114320000097
Exponentially moving variance for source statistics at step n+1, α being the ratio, ++>
Figure BDA0003988114320000098
Index moving average of source statistics at step n, +.>
Figure BDA0003988114320000099
Exponentially moving variance for source statistics at step n, +.>
Figure BDA00039881143200000910
Index average value of source statistics at step n, +.>
Figure BDA00039881143200000911
Is the exponential variance of the source statistics at step n.
The neural statistics of source features stored in a well-trained biopsy model provides sufficient supervision for low-level and high-level features that may represent a domain-specific style that may be fully used to aid in distribution alignment in UDA. However, the conventional method only uses the output features of the higher layers to perform distribution alignment, and cannot fully use the abundant and differentiated active clues in the features of the lower layers. Therefore, we can easily estimate the source style data taking into account these stored BN statistics
Figure BDA00039881143200000918
wherein />
Figure BDA00039881143200000912
Figure BDA00039881143200000913
The embodiment provides a loss L of inter-domain neural statistics consistency stat To match source style data
Figure BDA00039881143200000914
Running average of +.>
Figure BDA00039881143200000915
Running variance->
Figure BDA00039881143200000916
And stored statistics of source model +.>
Figure BDA00039881143200000917
Feature statistics between the two domains, thereby bridging the inter-domain gap. At loss L stat Under the direction of (a), the present embodiment can approximate a source style field having a similar style to the source field. Unlike existing generation of image content from input random noise, the neural statistical consistency of the present embodiment uses BN statistical alignment as a constraint to style the input image without changing the content.
Preferably, in step S3, the content is constrained by adopting dual semantic consistency of feature level and image level to ensure that semantic content is preserved in the image conversion process, and specifically comprises the following steps:
step S31: source style image to be generated
Figure BDA0003988114320000101
And original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Perception loss L per The specific formula of (2) is as follows:
Figure BDA0003988114320000102
/>
wherein ,
Figure BDA0003988114320000103
image +.>
Figure BDA0003988114320000104
And original target image x t Perceived loss between C j The number of channels of the j-th layer of the characteristic diagram is H j For the height of the j-th layer of the feature map, W j For the width of the j-th layer of the feature map, +.>
Figure BDA0003988114320000105
Image +.>
Figure BDA0003988114320000106
Convolution of layer j,>
Figure BDA0003988114320000107
for the original target image x t Convolution of layer j;
step S32 by minimizing semantic consistency loss L ph To enhance phase consistency between original target image and source pattern image, L ph The specific formula is as follows:
Figure BDA0003988114320000108
wherein,<,>is the dot product of the two-dimensional space, I.I. | 2 Is the L2 norm and is used to determine,
Figure BDA0003988114320000109
for the negative cosine distance, x, between the original phase of the original target image and the generated phase of the source pattern image t For the original target image->
Figure BDA00039881143200001010
For source pattern image, F (x t ) j For the jth original target image x t Fourier transform of->
Figure BDA00039881143200001011
For the j-th source pattern image +.>
Figure BDA00039881143200001012
Is a fourier transform of (a).
In the present embodiment, at the feature level, the generated source pattern image
Figure BDA00039881143200001013
And original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Thereby reducing the perceived difference between them.
However, using this perceptual penalty in space alone is not sufficient to ensure semantic consistency. Fourier transformation from one domain to another affects only the amplitude of its spectrum and not its phase, where the phase component retains most of the content in the original signal, while the amplitude component contains mainly patterns, semantic inconsistencies can be explicitly penalized by ensuring that the phase is preserved before and after the image conversion. Thus, by minimizing semantic consistency loss L ph I.e. minimizing source style images
Figure BDA0003988114320000111
And original target image x t The image level differences over the fourier spectrum maintain phase consistency.
Preferably, the method further comprises the step of training a living body detection model, and specifically comprises the following steps of:
step S51: calculating entropy loss through a classifier and a depth estimator to obtain the entropy loss of the classifier and the entropy loss of the depth estimator, wherein the specific formula is as follows:
Figure BDA0003988114320000112
Figure BDA0003988114320000113
wherein ,Lent1 For classifier entropy loss, L ent2 For the depth estimator entropy loss,
Figure BDA0003988114320000114
the label probability distribution for a source pattern image, C is the channel, H is the height of the image, W is the width of the image, C is the total number of channels, H is the upper height limit of the image, W is the upper width limit of the image,/->
Figure BDA0003988114320000117
For any pixel of the source pattern image, +.>
Figure BDA0003988114320000116
Estimating the depth of any pixel on the source pattern image;
step S52: adding the entropy loss of the classifier and the entropy loss of the depth estimator to obtain the total entropy loss, wherein the specific formula is as follows:
L ent =L ent1 +L ent2
wherein ,Lent Is the total entropy loss.
In this embodiment, the entropy loss of the classifier and the depth estimator is calculated, so that the prediction distribution obtained by the living body detection model is as close as possible to the actual distribution of the data.
Preferably, the total loss of generator parameter optimization is calculated according to the total entropy loss, the perception loss, the semantic consistency loss and the inter-domain nerve statistics consistency loss trained by the living body detection model, and the specific formula is as follows:
L total =L stdt +L perent L entph L ph
wherein ,Ltotal L is the total loss ent Lambda is the total entropy loss ent Weighting coefficients for total entropy loss, L ph Lambda is the loss of semantic consistency ph Weighting coefficients for semantic consistency loss, L stat For interdomain neural statistical consistency loss, L per Is a perceived loss.
In this embodiment, in the pretrained living body detection model, parameters of the feature extractor, the classifier, the depth estimator, and the VGG16 module are fixed, and thus, the resulting source pattern image can be further optimized by optimizing parameters of the generator. Loss can be generated in the optimization process, and the quality of the source pattern image can be effectively improved by reducing the loss.
Furthermore, functional units in various embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations of the above embodiments may be made by those skilled in the art within the scope of the invention.

Claims (6)

1. A face living body detection method based on generation domain adaptation is characterized in that: the method comprises the following steps:
step S1: building a living body detection model and a generator;
step S2: carrying out intra-domain spectrum mixing on the images of the unlabeled target domains to generate diversified target images and original target images;
step S3: the diversified target images and the original target images are both stylized into source style images through image conversion, and inter-domain nerve statistics consistency is adopted to guide a generator to generate the source style images;
step S4: and inputting the source pattern image into a living body detection model to carry out living body detection of the human face, and outputting a result of living body detection of the human face.
2. The face living body detection method based on the generated domain adaptation according to claim 1, wherein: in step S2, the method specifically includes the following steps:
step S21: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formula is as follows:
Figure FDA0003988114310000011
wherein ,F(xt ) (u, v) is the target image x t U and v are frequency variations in the frequency domain, H is the maximum height of the image, W is the maximum width of the image, H is the image height, and W is the image width;
step S22: calculating a target image x t ∈D t Fourier transform F (x) t ) The specific formulas are as follows:
A(x t )(u,v)=[R 2 (x t )(u,v)+I 2 (x t )(u,v)] 1/2
Figure FDA0003988114310000012
wherein ,A(xt ) (u, v) is amplitude, P (x) t ) (u, v) is the phase, R (x) t ) Is F (x) t ) Is the real part of I (x) t ) Is F (x) t ) Is the imaginary part of (2);
step S23: computing the target domain D from the same unlabeled target domain t Is an arbitrary image of two of (a)
Figure FDA0003988114310000021
The specific formula is as follows:
Figure FDA0003988114310000022
wherein ,
Figure FDA0003988114310000023
interpolation for mixed amplitude +.>
Figure FDA0003988114310000024
For image->
Figure FDA0003988114310000025
Amplitude interpolation of>
Figure FDA0003988114310000026
For image->
Figure FDA0003988114310000027
lambda-U (0, eta), the super-parameter eta controls the enhanced intensity;
step S24: combining the mixed amplitude spectrum with the original phase spectrum to reconstruct a new fourier representation:
Figure FDA0003988114310000028
wherein ,
Figure FDA0003988114310000029
for interpolating the image +.>
Figure FDA00039881143100000210
Is a two-dimensional discrete Fourier transform of>
Figure FDA00039881143100000211
For interpolating the image +.>
Figure FDA00039881143100000212
Mixed amplitude of>
Figure FDA00039881143100000213
For interpolating the image +.>
Figure FDA00039881143100000214
Is the original phase of (a);
step S25: will be
Figure FDA00039881143100000215
An interpolated image is generated by inverse fourier transform, and the specific formula is as follows:
Figure FDA00039881143100000216
wherein ,
Figure FDA00039881143100000217
for interpolating the image. />
3. The face living body detection method based on the generated domain adaptation according to claim 1, wherein: in step S3, the inter-domain neural statistical consistency is adopted to guide the generator to generate the source pattern image, which specifically comprises the following steps: calculating interdomain gap L stat Namely, the inter-domain nerve statistical consistency loss is expressed as follows:
Figure FDA00039881143100000218
where l= e {1,2,., L } represents the first layer in the source training model, including the feature extractor, classifier, and depth estimator, L represents the first layer in the source training model,
Figure FDA00039881143100000219
representing running average of source style data, +.>
Figure FDA00039881143100000220
Operator representing source style dataDifference (S)>
Figure FDA00039881143100000221
Representing a stored average value of the source model,/>
Figure FDA00039881143100000222
Representing the stored variance of the source model.
4. A face living body detection method based on generated domain adaptation according to claim 3, wherein: in step S3, the content is constrained by adopting dual semantic consistency of the feature level and the image level to ensure that the semantic content is preserved in the image conversion process, which specifically comprises the following steps:
step S31: source style image to be generated
Figure FDA0003988114310000031
And original target image x t As input, a perceptual penalty L is imposed on the potential features of the pre-trained VGG16 module on ImageNet per Perception loss L per The specific formula of (2) is as follows:
Figure FDA0003988114310000032
wherein ,
Figure FDA0003988114310000033
image +.>
Figure FDA0003988114310000034
And original target image x t Perceived loss between C j The number of channels of the j-th layer of the characteristic diagram is H j For the height of the j-th layer of the feature map, W j For the width of the j-th layer of the feature map, +.>
Figure FDA0003988114310000035
Image +.>
Figure FDA0003988114310000036
Convolution of layer j,>
Figure FDA0003988114310000037
for the original target image x t Convolution of layer j;
step S32: by minimizing semantic consistency loss L ph To enhance phase consistency between original target image and source pattern image, L ph The specific formula is as follows:
Figure FDA0003988114310000038
wherein,<,>is the dot product of the two-dimensional space, I.I 2 Is the L2 norm and is used to determine,
Figure FDA0003988114310000039
for the negative cosine distance, x, between the original phase of the original target image and the generated phase of the source pattern image t For the original target image->
Figure FDA00039881143100000310
For source pattern image, F (x t ) j For the jth original target image x t Fourier transform of->
Figure FDA00039881143100000311
For the j-th source pattern image +.>
Figure FDA00039881143100000312
Is a fourier transform of (a).
5. The face living body detection method based on the generated domain adaptation according to claim 4, wherein the face living body detection method is characterized in that: the method also comprises the step of training a living body detection model, and specifically comprises the following steps of:
step S51: calculating entropy loss through a classifier and a depth estimator to obtain the entropy loss of the classifier and the entropy loss of the depth estimator, wherein the specific formula is as follows:
Figure FDA0003988114310000041
/>
Figure FDA0003988114310000042
wherein ,Lent1 For classifier entropy loss, L ent2 For the depth estimator entropy loss,
Figure FDA0003988114310000043
the label probability distribution for a source pattern image, C is the channel, H is the height of the image, W is the width of the image, C is the total number of channels, H is the upper height limit of the image, W is the upper width limit of the image,/->
Figure FDA0003988114310000044
For any pixel of the source pattern image, +.>
Figure FDA0003988114310000045
Estimating the depth of any pixel on the source pattern image;
step S52: adding the entropy loss of the classifier and the entropy loss of the depth estimator to obtain the total entropy loss, wherein the specific formula is as follows:
L ent =L ent1 +L ent2
wherein ,Lent Is the total entropy loss.
6. The face living body detection method based on the generated domain adaptation according to claim 5, wherein the face living body detection method is characterized in that: according to the total entropy loss, the perception loss, the semantic consistency loss and the inter-domain nerve statistics consistency loss of the living body detection model training, calculating the total loss of generator parameter optimization, wherein the specific formula is as follows:
L total =L stat +L perent L entph L ph
wherein ,Ltotal L is the total loss ent Lambda is the total entropy loss ent Weighting coefficients for total entropy loss, L ph Lambda is the loss of semantic consistency ph Weighting coefficients for semantic consistency loss, L stat For interdomain neural statistical consistency loss, L per Is a perceived loss.
CN202211571186.7A 2022-12-08 2022-12-08 Face living body detection method based on generation type domain adaptation Pending CN116189255A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211571186.7A CN116189255A (en) 2022-12-08 2022-12-08 Face living body detection method based on generation type domain adaptation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211571186.7A CN116189255A (en) 2022-12-08 2022-12-08 Face living body detection method based on generation type domain adaptation

Publications (1)

Publication Number Publication Date
CN116189255A true CN116189255A (en) 2023-05-30

Family

ID=86437343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211571186.7A Pending CN116189255A (en) 2022-12-08 2022-12-08 Face living body detection method based on generation type domain adaptation

Country Status (1)

Country Link
CN (1) CN116189255A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456309A (en) * 2023-12-20 2024-01-26 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Cross-domain target identification method based on intermediate domain guidance and metric learning constraint

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456309A (en) * 2023-12-20 2024-01-26 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Cross-domain target identification method based on intermediate domain guidance and metric learning constraint
CN117456309B (en) * 2023-12-20 2024-03-15 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Cross-domain target identification method based on intermediate domain guidance and metric learning constraint

Similar Documents

Publication Publication Date Title
Jifara et al. Medical image denoising using convolutional neural network: a residual learning approach
DE102016108737B4 (en) Knowledge-based ultrasound image enhancement
Yu et al. Image denoising using trivariate shrinkage filter in the wavelet domain and joint bilateral filter in the spatial domain
US8150151B2 (en) Method for coding pixels or voxels of a digital image and a method for processing digital images
Fan et al. Two-layer Gaussian process regression with example selection for image dehazing
CN103077506B (en) In conjunction with local and non-local adaptive denoising method
Chen et al. Semi-reference sonar image quality assessment based on task and visual perception
Wang et al. Guided image contrast enhancement based on retrieved images in cloud
CN116189255A (en) Face living body detection method based on generation type domain adaptation
CN115661649B (en) BP neural network-based shipborne microwave radar image oil spill detection method and system
Lee et al. Principled ultrasound data augmentation for classification of standard planes
Acharya et al. Image sub-division and quadruple clipped adaptive histogram equalization (ISQCAHE) for low exposure image enhancement
Verma et al. Modified sigmoid function based gray scale image contrast enhancement using particle swarm optimization
Guo et al. An underwater image quality assessment metric
Reddy Effective CNN-MSO method for brain tumor detection and segmentation
Hao et al. Two-stage underwater image restoration algorithm based on physical model and causal intervention
Shen et al. Feature-segmentation strategy based convolutional neural network for no-reference image quality assessment
Li et al. A novelty harmony search algorithm of image segmentation for multilevel thresholding using learning experience and search space constraints
Iscan et al. Ultrasound image segmentation by using wavelet transform and self-organizing neural network
Anderson et al. Classifying multi-frequency fisheries acoustic data using a robust probabilistic classification technique
Ma et al. Edge-guided cnn for denoising images from portable ultrasound devices
CN115311144A (en) Wavelet domain-based standard flow super-resolution image reconstruction method
CN114898091A (en) Image countermeasure sample generation method and device based on regional information
Singh et al. Variational mode decomposition-based multilevel threshold selection scheme for color image segmentation
Hu et al. Ultrasound speckle reduction based on histogram curve matching and region growing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination