CN117727104A - Near infrared living body detection device and method based on bilateral attention - Google Patents

Near infrared living body detection device and method based on bilateral attention Download PDF

Info

Publication number
CN117727104A
CN117727104A CN202410180161.7A CN202410180161A CN117727104A CN 117727104 A CN117727104 A CN 117727104A CN 202410180161 A CN202410180161 A CN 202410180161A CN 117727104 A CN117727104 A CN 117727104A
Authority
CN
China
Prior art keywords
image feature
face image
attention
feature map
bilateral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410180161.7A
Other languages
Chinese (zh)
Other versions
CN117727104B (en
Inventor
何一凡
陈昕
张帅
王汉超
贾宝芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Ruiwei Information Technology Co ltd
Original Assignee
Xiamen Ruiwei Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Ruiwei Information Technology Co ltd filed Critical Xiamen Ruiwei Information Technology Co ltd
Priority to CN202410180161.7A priority Critical patent/CN117727104B/en
Publication of CN117727104A publication Critical patent/CN117727104A/en
Application granted granted Critical
Publication of CN117727104B publication Critical patent/CN117727104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a near infrared living body detection device and method based on a bilateral attention mechanism, which can fully utilize channel and position information in an image by introducing the bilateral attention mechanism, improve the distinguishing capability between true and false faces and finally improve the experience of users in a face recognition system. The convolutional neural network structure is used for extracting a face image feature map with discrimination from an input near infrared image and is used as input of a bilateral attention mechanism, the bilateral attention mechanism extracts features of the face image feature map from dimensions of channels and positions to obtain finer and more discriminative features, a model is helped to better capture correlation and importance among different channels and pay attention to key areas in an input tensor, and therefore discrimination capability of a living body detection model is improved.

Description

Near infrared living body detection device and method based on bilateral attention
Technical Field
The invention relates to the technical field of computer living body detection, in particular to a near infrared living body detection device and method based on bilateral attention.
Background
Living detection is a method for verifying identity, aimed at determining the true physiological characteristics of a subject. The living body detection plays an important role in face recognition, and can effectively resist various common attack means, such as fraudulent conduct by using means of photos, videos, face changing technology, masks, shielding, 3D animation, screen flipping and the like. By detecting the living body of the user, the authenticity of the user can be ensured, so that the benefit and the safety of the user are ensured.
With the development of deep learning, the performance of the living body detection method based on the deep neural network is greatly improved. Such methods use convolutional neural networks to learn features with high distinguishability from large-scale face datasets, the convolutional neural networks used comprising: residual learning network, central differential convolution network, LSTM, challenge generation network. Despite significant advances in biopsy techniques, there are still some drawbacks, such as:
1. insufficient discrimination of network extraction features results in poor protection against attacks on the replacement of local facial data (e.g., true faces with false noses, true faces with false eyes).
2. The current living body detection algorithm has poor robustness and insufficient generalization capability when facing an attack scene which does not appear before.
3. Current transformer-based biopsy algorithms typically require a significant amount of computational resources and high performance hardware to train and infer. This is a challenge for some resource-constrained devices or scenarios.
In view of the above, the present inventors have intensively conceived to solve various drawbacks and inconveniences caused by the lack of perfection of the conventional living body detection technique, and have actively studied and tried to develop the present invention.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a near infrared living body detection device based on bilateral attention, which can fully utilize channel and position information in an image, improve the distinguishing capability between true and false faces and finally improve the experience of users in a face recognition system.
The invention further aims to overcome the defects of the prior art, and provides a near infrared living body detection method based on bilateral attention, which can fully utilize channel and position information in an image, improve the distinguishing capability between true and false faces and finally improve the experience of users in a face recognition system.
In order to achieve the above object, the solution of the present invention is:
the near infrared living body detection device based on the bilateral attention mechanism mainly comprises a self-defined convolutional neural network structure, a bilateral attention mechanism and a classifier;
the convolutional neural network structure is used for extracting a face image feature map with discriminant power from an input near infrared image and is used as input of a bilateral attention mechanism;
the bilateral attention mechanism comprises a channel attention sub-module and a position attention sub-module, face image feature images output by the convolutional neural network structure are respectively input into the channel attention sub-module and the position attention sub-module, and the channel attention sub-module weights the face image feature images by learning the importance of each channel so as to reduce redundant information and emphasize channel features useful for living body detection; the position attention submodule processes attack, deformation and shielding of different scales by paying attention to key face areas in the face image feature map;
the classifier uses the self-defined convolutional neural network structure and the features extracted by the bilateral attention module to classify the true and false people, and the classifier accurately classifies the input images according to the expression and distinguishing capability of the features.
Further, the bilateral attention mechanism has two branches, wherein the first branch consists of a convolution module and a channel attention submodule, and the second branch consists of a convolution module and a position attention submodule; the output characteristics of the two branches are connected in a characteristic splicing mode; finally, the spliced features pass through a convolution module to obtain output features.
Further, the convolution module includes convolution, batch normalization, and ReLU activation functions.
Further, the channel feature extraction method of the channel attention sub-module comprises the following steps:
firstly, the dimension reduction is carried out on the input face image feature map by using a convolution module of 1 multiplied by 1,wherein x is a face image feature map;
then, the face image feature images after dimension reduction are remodeled into face image feature images of [ N, C/2, H W ], [ N, H W, C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, W is the width of the face feature images,
obtaining the two face image feature images by batch matrix multiplicationThe method comprises the steps of carrying out a first treatment on the surface of the Along->The last dimension takes the maximum value to obtain the shape of [ N, C/2,1 ]]Tensor->Expanding its shape to [ N, C/2]]The method comprises the steps of carrying out a first treatment on the surface of the Will->Minus->Obtaining a new tensor->
Along withIs>In operation, the attention weight of the channel is obtained,
human face image characteristic diagram after channel attention weight and dimension reductionUsing batchesMatrix multiplication and reshaping the output to a shape of [ N, C/2, H, W]Is used for the tensor of (c),
thus, the facial image feature map adjusted by the channel attention submodule is obtained.
Further, the method for extracting the position features by the position attention submodule comprises the following steps:
firstly, a convolution module of 1 multiplied by 1 is utilized to reduce the dimension of an input face image feature map, and the feature shape after dimension reduction is [ N, C/2, H, W ]],Wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature map, W is the width of the face feature map, and x is the face image feature map;
then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1, H ]]Obtaining a face image feature map,/>
Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtain finer position information,
subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>
For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1,
finally, the face image feature map is obtainedRespectively and->、/>Multiplying to obtain the facial image feature map adjusted by the position attention submodule.
Further, the specific classification method of the classifier comprises the following steps:
firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module;
then, reducing the width and height dimensions of the tensor to 1 through a pooling module;
then, carrying out characteristic remodelling on the tensor after pooling;
further, a living body detection result with the shape of [ N,2] is obtained through the full connection layer, wherein N is the size of batch size, and 2 is the fraction;
finally, the cross entropy loss between the predicted result and the label is calculated, the formula is as follows,
wherein,for the number of input samples; />Is a label; />Is the predicted result.
A near infrared living body detection method based on a bilateral attention mechanism, comprising the following steps:
step S1: inputting a near infrared image;
step S2: extracting a high-quality face image feature map from the near infrared image input in the step S1 by using a self-defined convolutional neural network structure;
s3, inputting the face image feature map in the step S2 into a bilateral attention mechanism, wherein the bilateral attention mechanism is provided with two branches, one branch consists of a convolution module and a channel attention submodule, the other branch consists of the convolution module and a position attention submodule, the face image feature map is respectively input into the two branches of the bilateral attention mechanism, the channel attention submodule extracts channel features after respectively passing through the convolution module, the importance of each channel is learned, and the face image feature map is weighted; extracting position features by a position attention submodule, focusing on key face areas, and processing attacks, deformation and shielding of different scales;
step S4: characteristic splicing is carried out on the channel characteristics and the position characteristics;
step S5: the spliced features are subjected to a convolution module to obtain output features;
step S6: and inputting the output characteristics into a classifier, and accurately classifying the input facial image characteristic images by the classifier according to the expression and distinguishing capability of the characteristics.
Further, the channel feature extraction method of the channel attention sub-module comprises the following steps:
firstly, the dimension reduction is carried out on the input face image feature map by using a convolution module of 1 multiplied by 1,wherein x is a face image feature map;
then, respectively remolding the face image feature images after dimension reduction into face image feature images of [ N, C/2, H.W ], [ N, H.W and C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, and W is the width of the face feature images;
obtaining the two face image feature images by batch matrix multiplicationThe method comprises the steps of carrying out a first treatment on the surface of the Along->The last dimension takes the maximum value to obtain the shape of [ N, C/2,1 ]]Tensor->Expanding its shape to [ N, C/2]]The method comprises the steps of carrying out a first treatment on the surface of the Will->Minus->Obtaining a new tensor->
Along withIs>In operation, the attention weight of the channel is obtained,
human face image characteristic diagram after channel attention weight and dimension reductionUsing batch matrix multiplication and reshaping the output into a shape of [ N, C/2, H, W]Is used for the tensor of (c),
thus, the facial image feature map adjusted by the channel attention submodule is obtained.
Further, the method for extracting the position features by the position attention submodule comprises the following steps:
firstly, a convolution module of 1 multiplied by 1 is utilized to reduce the dimension of an input face image feature map, and the feature shape after dimension reduction is [ N, C/2, H, W ]],Wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature map, W is the width of the face feature map, and x is the face image feature map;
then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1 ],H]Obtaining a face image feature map,/>
Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtain finer position information:
subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>
For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1:
finally, the face image feature map is obtainedRespectively and->、/>Multiplying to obtain the facial image feature map adjusted by the position attention submodule.
Further, the specific classification method of the classifier comprises the following steps:
firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module;
then, reducing the width and height dimensions of the tensor to 1 through a pooling module;
then, carrying out characteristic remodelling on the tensor after pooling;
further, a living body detection result with the shape of [ N,2] is obtained through the full connection layer, wherein N is the size of batch size, and 2 is the fraction;
finally, the cross entropy loss between the predicted result and the label is calculated, the formula is as follows,
wherein,for the number of input samples; />Is a label; />Is the predicted result.
By adopting the scheme, compared with the method and the device for near infrared living body detection based on the bilateral attention mechanism and the prior technical scheme, the method and the device for near infrared living body detection based on the bilateral attention mechanism are used for adaptively learning the fine texture characteristics of different faces, wherein the channel attention sub-module can automatically learn the importance of each channel and weight the characteristic diagram, so that the model can pay attention to the channel characteristics which are useful for distinguishing the true and false faces better, and meanwhile, the influence of redundant information such as illumination change, noise and the like can be effectively restrained, and the robustness and the stability of living body detection are improved; the position attention sub-module can learn and pay attention to key facial areas related to attack, such as eyes, nose, mouth and the like, so that the sensitivity and recognition accuracy of a living body detection algorithm to attack features can be improved, and meanwhile, the model can be assisted to pay attention to the key facial areas which are not shielded or deformed, and interference caused by deformation and shielding areas is reduced. Finally, the anti-attack performance of the model on the facial data of the replaced local five sense organs is enhanced. More importantly, compared with large networks such as transformers, the method is based on a self-defined network structure and a bilateral attention mechanism, has low demand on computing resources, and can be operated in equipment or scenes with limited resources.
The invention provides a bilateral attention mechanism which can effectively learn details and texture information in the face so as to distinguish the face replacing local five sense organs. The invention can more effectively extract the true and false face features with discriminant from different face information by adopting the self-defined convolutional neural network structure, and the parameter number and the calculated amount of the model before quantization are only 1021KB and 11.95GFLOPs, so that the model can be operated in equipment or a scene with limited resources. The living body detection technology has the advantages that the real person passing rate is over 99 percent, various types of paper can be prevented, local facial features can be replaced, and the high-precision 3D head model and resin mask can attack over 98 percent.
Drawings
Fig. 1 is an overall frame diagram of a near infrared living body detection device based on a bilateral attention mechanism of the present invention.
FIG. 2 is a block diagram of a bilateral attention mechanism of the present invention.
FIG. 3 is a network architecture diagram of a channel attention sub-module of the present invention.
FIG. 4 is a network architecture diagram of a location attention sub-module of the present invention.
Fig. 5 is a block diagram of a classifier of the present invention.
Detailed Description
In order to further explain the technical scheme of the invention, the invention is explained in detail by specific examples.
As shown in fig. 1, the invention discloses a near infrared living body detection device based on a bilateral attention mechanism, which comprises the following implementation ideas: the channel and position information in the image can be fully utilized by introducing a bilateral attention mechanism, the distinguishing capability between true and false faces is improved, and finally the experience of a user in a face recognition system is improved. The whole frame diagram is shown in fig. 1, the input of the model is a jpg or png near infrared image, after the near infrared image is preprocessed (not shown in the figure), a self-defined convolution neural network structure (convolution modules 1 to 9 in fig. 1) is used for extracting a high-quality face image feature diagram from the preprocessed near infrared image to serve as the input of a bilateral attention mechanism, the bilateral attention mechanism extracts the features of the face image feature diagram from the dimensions of channels and positions, finer and more discernable features are obtained, the model is helped to better capture the correlation and importance among different channels and pay attention to key areas in input tensors, so that the discernability of a living body detection model is improved, and the classifier is used for classifying true and false people by using the features extracted by the self-defined convolution neural network structure and the bilateral attention mechanism.
The near infrared living body detection device based on the bilateral attention mechanism mainly comprises a self-defined convolutional neural network structure, the bilateral attention mechanism and a classifier.
The self-defined convolutional neural network structure is used for extracting a face image feature map with discriminant from an input near infrared image. The input near infrared image is firstly subjected to a series of preprocessing steps to form a face image feature map as the input of a bilateral attention mechanism, wherein the preprocessing steps specifically comprise face detection and image enhancement: random rotation, random overturn, random image scrambling according to patch, etc., and the self-defined convolutional neural network extracts high-quality characteristic representation from the preprocessed facial image characteristic diagram. As shown in fig. 1, the convolution modules 1 to 9 are custom convolutional neural network structures, wherein the convolution modules 1 to 7 perform feature extraction and downsampling (reduce the resolution of the feature map), and the convolution modules 8 and 9 perform feature extraction and dimension reduction (reduce the number of channels of the feature map).
Bilateral attention mechanisms are used to enhance the expressive power of features. The bilateral attention mechanism consists of two attention sub-modules: the channel attention sub-module and the position attention sub-module. The face image feature images output by the convolutional neural network structure are respectively input into a channel attention sub-module and a position attention sub-module, and the channel attention sub-module weights the face image feature images by learning the importance of each channel so as to reduce redundant information and emphasize channel features useful for living body detection; the position attention submodule processes attacks, deformations and occlusions of different scales by paying attention to key facial areas, and improves sensitivity, robustness and accuracy of living body detection.
The bilateral attention mechanism is to extract features from the dimensions of the channel and location. Compared with a single-dimensional attention mechanism, the multi-dimensional attention mechanism can obtain finer and more discernable features, and helps the model better capture the correlation and importance between different channels and focus on key areas in the input tensor, so that the discernability of the living body detection model is improved.
FIG. 2 is a schematic diagram of a two-sided attention mechanism, the two-sided attention mechanism having two branches, the first branch comprising a convolution module and a channel attention sub-module, and the second branch comprising a convolution module and a position attention sub-module; the output characteristics of the two branches are connected in a characteristic splicing mode; finally, the spliced features can be subjected to a convolution module to obtain output features. The convolution module includes convolution (Conv), batch Normalization (BN), and ReLU activation functions.
As shown in fig. 3, which is a block diagram of a channel attention sub-module, the channel feature extraction method of the channel attention sub-module is as follows:
firstly, the convolution module of 1 multiplied by 1 is utilized to reduce the dimension of the input face image feature map,wherein x is a face image feature map;
then, the face image feature images after dimension reduction are remodeled into face image feature images of [ N, C/2, H×W ], [ N, H×W, C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, and W is the width of the face feature images:
obtaining the two face image feature images by batch matrix multiplicationThe method comprises the steps of carrying out a first treatment on the surface of the Along->The last dimension takes the maximum value to obtain the shape of [ N, C/2,1 ]]Tensor->Expanding its shape to [ N, C/2]]The method comprises the steps of carrying out a first treatment on the surface of the Will->Minus->Obtaining a new tensor->
Along withIs>In operation, the attention weight of the channel is obtained,
human face image characteristic diagram after channel attention weight and dimension reductionUsing batch matrix multiplication and reshaping the output into a shape of [ N, C/2, H, W]Is used for the tensor of (c),
thus, the facial image feature map adjusted by the channel attention submodule is obtained.
As shown in fig. 4, which is a block diagram of the location attention sub-module, the flow of the location attention sub-module extracting the location feature is as follows:
firstly, a convolution module of 1 multiplied by 1 is utilized to reduce the dimension of an input face image feature map, and the feature shape after dimension reduction is [ N, C/2, H, W ]],Wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature map, W is the width of the face feature map, and x is the face image feature map;
then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1, H ]]Obtaining a face image feature map,/>
Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtain finer position information:
subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>
For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1:
finally, the face image feature map is obtainedRespectively and->、/>And multiplying to obtain the facial image feature map adjusted by the position attention submodule.
The classifier uses the characteristics extracted by the self-defined convolutional neural network structure and the bilateral attention module to classify the true and false persons. The classifier accurately classifies the input image according to the expression and distinguishing capability of the features.
As shown in fig. 5, which is a structural diagram of the classifier, as shown in the figure, a specific flow of the classifier is as follows:
firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module, wherein the shape of the tensors is changed from [ N,64,16,16] to [ N,48,16,16];
then, reducing the width and height dimensions of the tensor to 1 through a pooling module;
then, the pooling tensors are subjected to characteristic remodeling, and the shape is [ N,48];
furthermore, a living body detection result with the shape of [ N,2] is obtained through the full connection layer;
finally, the cross entropy loss between the predicted result and the label is calculated, the formula is as follows,
wherein,for the number of input samples; />Is a label; />Is the predicted result.
During model training, before the images of the training samples are input into a living body detection model to be trained, the input images are required to be processed by using methods of random rotation, random overturning, random image scrambling according to the patch, gaussian noise addition, random contrast enhancement and the like, so that the richness of the training samples is improved. The input image is then normalized, limiting the pixel values of the image to between 0-1.
The training period of the model was 200 epoch (model loss had converged when the model was trained to 200 epoch), batch size was 1024, learning rate was 0.01, optimizer was SGD, momentum was 0.9, and the optimizer tuning strategy was StepLR.
During model reasoning, the images can be input into the model after normalization operation. The input image is processed by a convolution module, a bilateral attention mechanism, characteristic remodeling and the like to be changed into a two-dimensional tensor with the shape of [ N,64 ]. And then, carrying out matrix multiplication on the two-dimensional tensor and the optimized full-connection layer to obtain a living detection result with the shape of [ N,2] (wherein N is the size of the batch size and 2 is the score), the first score is the false face score, the second score is the true face score, and the category corresponding to the larger score is taken as the detection result.
As shown in fig. 1 to 5, the present invention also discloses a near infrared living body detection method based on a bilateral attention mechanism, which comprises the following steps:
step S1: inputting a near infrared image;
step S2: preprocessing the image input in the step S1 by using a self-defined convolutional neural network structure, and extracting a high-quality face image feature map from the preprocessed image;
s3, inputting the face image feature map in the step S2 into a bilateral attention mechanism, wherein the bilateral attention mechanism is provided with two branches, one branch consists of a convolution module and a channel attention submodule, the other branch consists of the convolution module and a position attention submodule, the face image feature map is respectively input into the two branches of the bilateral attention mechanism, the channel attention submodule extracts channel features after respectively passing through the convolution module, the importance of each channel is learned, and the face image feature map is weighted; extracting position features by a position attention submodule, focusing on key face areas, and processing attacks, deformation and shielding of different scales;
step S4: characteristic splicing is carried out on the channel characteristics and the position characteristics;
step S5: the spliced features are subjected to a convolution module to obtain output features;
step S6: and inputting the output characteristics into a classifier, and accurately classifying the input facial image characteristic images by the classifier according to the expression and distinguishing capability of the characteristics.
The above examples and drawings are not intended to limit the form or form of the present invention, and any suitable variations or modifications thereof by those skilled in the art should be construed as not departing from the scope of the present invention.

Claims (10)

1. Near infrared living body detection device based on bilateral attention mechanism, characterized by mainly comprising: convolutional neural network structure, bilateral attention mechanism and classifier;
the convolutional neural network structure is used for extracting a face image feature map with discriminant power from an input near infrared image and is used as input of a bilateral attention mechanism;
the bilateral attention mechanism comprises a channel attention sub-module and a position attention sub-module, face image feature images output by the convolutional neural network structure are respectively input into the channel attention sub-module and the position attention sub-module, and the channel attention sub-module weights the face image feature images by learning the importance of each channel so as to reduce redundant information and emphasize channel features useful for living body detection; the position attention submodule processes attack, deformation and shielding of different scales by paying attention to key face areas in the face image feature map;
the classifier uses the self-defined convolutional neural network structure and the features extracted by the bilateral attention module to classify the true and false people, and the classifier accurately classifies the input images according to the expression and distinguishing capability of the features.
2. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein: the bilateral attention mechanism is provided with two branches, wherein the first branch consists of a convolution module and a channel attention submodule, and the second branch consists of a convolution module and a position attention submodule; the output characteristics of the two branches are connected in a characteristic splicing mode; finally, the spliced features pass through a convolution module to obtain output features.
3. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 2, wherein: the convolution module includes convolution, batch normalization, and ReLU activation functions.
4. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein: the channel feature extraction method of the channel attention sub-module comprises the following steps:
firstly, the dimension reduction is carried out on the input face image feature map by using a convolution module of 1 multiplied by 1,wherein x is a face image feature map;
then, the face image feature images after dimension reduction are remodeled into face image feature images of [ N, C/2, H×W ], [ N, H×W, C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, and W is the width of the face feature images:
obtaining the two face image feature images by batch matrix multiplicationThe method comprises the steps of carrying out a first treatment on the surface of the Along->The last dimension takes the maximum value to obtain the shape of [ N, C/2,1 ]]Tensor->Expanding its shape to [ N, C/2]]The method comprises the steps of carrying out a first treatment on the surface of the Will->Minus->Obtaining a new tensor->
Along withIs>In operation, the attention weight of the channel is obtained,
human face image characteristic diagram after channel attention weight and dimension reductionUsing batch matrix multiplication and reshaping the output into a shape of [ N, C/2, H, W]Is used for the tensor of (c),
thus, the facial image feature map adjusted by the channel attention submodule is obtained.
5. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein the method for extracting the position feature by the position attention submodule is as follows:
firstly, a convolution module of 1 multiplied by 1 is utilized to reduce the dimension of an input face image feature map, and the feature shape after dimension reduction is [ N, C/2, H, W ]],Wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature map, W is the width of the face feature map, and x is the face image feature map;
then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1, H ]]Obtaining a face image feature map,/>
Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtain finer position information,
subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>
For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1,
finally, the face image feature map is obtainedRespectively and->、/>Multiplying to obtain the facial image feature map adjusted by the position attention submodule.
6. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein the specific classification method of the classifier is as follows:
firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module;
then, reducing the width and height dimensions of the tensor to 1 through a pooling module;
then, carrying out characteristic remodelling on the tensor after pooling;
further, a living body detection result with the shape of [ N,2] is obtained through the full connection layer, wherein N is the size of batch size, and 2 is the fraction;
finally, the cross entropy loss between the predicted result and the label is calculated, the formula is as follows,
wherein,for the number of input samples; />Is a label; />Is the predicted result.
7. The near infrared living body detection method based on the bilateral attention mechanism is characterized by comprising the following steps of:
step S1: inputting a near infrared image;
step S2: extracting a high-quality face image feature map from the near infrared image input in the step S1 by using a self-defined convolutional neural network structure;
s3, inputting the face image feature map in the step S2 into a bilateral attention mechanism, wherein the bilateral attention mechanism is provided with two branches, one branch consists of a convolution module and a channel attention submodule, the other branch consists of the convolution module and a position attention submodule, the face image feature map is respectively input into the two branches of the bilateral attention mechanism, the channel attention submodule extracts channel features after respectively passing through the convolution module, the importance of each channel is learned, and the face image feature map is weighted; extracting position features by a position attention submodule, focusing on key face areas, and processing attacks, deformation and shielding of different scales;
step S4: characteristic splicing is carried out on the channel characteristics and the position characteristics;
step S5: the spliced features are subjected to a convolution module to obtain output features;
step S6: and inputting the output characteristics into a classifier, and accurately classifying the input facial image characteristic images by the classifier according to the expression and distinguishing capability of the characteristics.
8. The near infrared living body detection method based on the bilateral attention mechanism as claimed in claim 7, wherein the channel feature extraction method of the channel attention sub-module is as follows:
firstly, the dimension reduction is carried out on the input face image feature map by using a convolution module of 1 multiplied by 1,wherein x is a face image feature map;
then, respectively remolding the face image feature images after dimension reduction into face image feature images of [ N, C/2, H.W ], [ N, H.W and C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, and W is the width of the face feature images;
obtaining the two face image feature images by batch matrix multiplicationThe method comprises the steps of carrying out a first treatment on the surface of the Along->The last dimension takes the maximum value to obtain the shape of [ N, C/2,1 ]]Tensor->Expanding its shape to [ N, C/2]]The method comprises the steps of carrying out a first treatment on the surface of the Will->Minus->Obtaining a new tensor->
Along withIs>In operation, the attention weight of the channel is obtained,
human face image characteristic diagram after channel attention weight and dimension reductionUsing batch matrix multiplication and reshaping the output into a shape of [ N, C/2, H, W]Is used for the tensor of (c),
thus, the facial image feature map adjusted by the channel attention submodule is obtained.
9. The near infrared living body detection method based on the bilateral attention mechanism as set forth in claim 7, wherein the method for extracting the position features by the position attention submodule is as follows:
firstly, a convolution module of 1 multiplied by 1 is utilized to reduce the dimension of an input face image feature map, and the feature shape after dimension reduction is [ N, C/2, H, W ]],Wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature map, W is the width of the face feature map, and x is the face image feature map;
then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1, H ]]Obtaining a face image feature map,/>
Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtainFiner location information;
subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>
For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1;
finally, the face image feature map is obtainedRespectively and->、/>Multiplying to obtain the facial image feature map adjusted by the position attention submodule.
10. The near infrared living body detection method based on the bilateral attention mechanism as claimed in claim 7, wherein the specific classification method of the classifier is as follows:
firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module;
then, reducing the width and height dimensions of the tensor to 1 through a pooling module;
then, carrying out characteristic remodelling on the tensor after pooling;
further, a living body detection result with the shape of [ N,2] is obtained through the full connection layer, wherein N is the size of batch size, and 2 is the fraction;
finally, the cross entropy loss between the predicted result and the label is calculated, the formula is as follows,
wherein,for the number of input samples; />Is a label; />Is the predicted result.
CN202410180161.7A 2024-02-18 2024-02-18 Near infrared living body detection device and method based on bilateral attention Active CN117727104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410180161.7A CN117727104B (en) 2024-02-18 2024-02-18 Near infrared living body detection device and method based on bilateral attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410180161.7A CN117727104B (en) 2024-02-18 2024-02-18 Near infrared living body detection device and method based on bilateral attention

Publications (2)

Publication Number Publication Date
CN117727104A true CN117727104A (en) 2024-03-19
CN117727104B CN117727104B (en) 2024-05-07

Family

ID=90207425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410180161.7A Active CN117727104B (en) 2024-02-18 2024-02-18 Near infrared living body detection device and method based on bilateral attention

Country Status (1)

Country Link
CN (1) CN117727104B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461973A (en) * 2020-01-17 2020-07-28 华中科技大学 Super-resolution reconstruction method and system for image
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
WO2023273290A1 (en) * 2021-06-29 2023-01-05 山东建筑大学 Object image re-identification method based on multi-feature information capture and correlation analysis
CN115690497A (en) * 2022-10-27 2023-02-03 内蒙古工业大学 Pollen image classification method based on attention mechanism and convolutional neural network
CN116563781A (en) * 2023-04-25 2023-08-08 广西电网有限责任公司来宾供电局 Image monitoring and diagnosing method for inspection robot
CN116740669A (en) * 2023-08-16 2023-09-12 之江实验室 Multi-view image detection method, device, computer equipment and storage medium
CN116883805A (en) * 2022-12-16 2023-10-13 杭州师范大学 Image tampering detection method based on convolutional neural network
US20230351573A1 (en) * 2021-03-17 2023-11-02 Southeast University Intelligent detection method and unmanned surface vehicle for multiple type faults of near-water bridges

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461973A (en) * 2020-01-17 2020-07-28 华中科技大学 Super-resolution reconstruction method and system for image
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
US20230351573A1 (en) * 2021-03-17 2023-11-02 Southeast University Intelligent detection method and unmanned surface vehicle for multiple type faults of near-water bridges
WO2023273290A1 (en) * 2021-06-29 2023-01-05 山东建筑大学 Object image re-identification method based on multi-feature information capture and correlation analysis
CN115690497A (en) * 2022-10-27 2023-02-03 内蒙古工业大学 Pollen image classification method based on attention mechanism and convolutional neural network
CN116883805A (en) * 2022-12-16 2023-10-13 杭州师范大学 Image tampering detection method based on convolutional neural network
CN116563781A (en) * 2023-04-25 2023-08-08 广西电网有限责任公司来宾供电局 Image monitoring and diagnosing method for inspection robot
CN116740669A (en) * 2023-08-16 2023-09-12 之江实验室 Multi-view image detection method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈浩楠: "《基于光照一致化和上下文感知的人脸活体检测算法研究》", 《优秀博士论文》, 1 December 2019 (2019-12-01) *

Also Published As

Publication number Publication date
CN117727104B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
Bayar et al. A deep learning approach to universal image manipulation detection using a new convolutional layer
CN110543846B (en) Multi-pose face image obverse method based on generation countermeasure network
CN112766158B (en) Multi-task cascading type face shielding expression recognition method
CN108268859A (en) A kind of facial expression recognizing method based on deep learning
CN110516616A (en) A kind of double authentication face method for anti-counterfeit based on extensive RGB and near-infrared data set
Baek et al. Generative adversarial ensemble learning for face forensics
CN106709418A (en) Face identification method based on scene photo and identification photo and identification apparatus thereof
CN111832405A (en) Face recognition method based on HOG and depth residual error network
Prasad et al. INDIAN SIGN LANGUAGE RECOGNITION SYSTEM USING NEW FUSION BASED EDGE OPERATOR.
CN111767877A (en) Living body detection method based on infrared features
Gong et al. DeepfakeNet, an efficient deepfake detection method
Huang et al. Human emotion recognition based on face and facial expression detection using deep belief network under complicated backgrounds
Liu et al. A multi-stream convolutional neural network for micro-expression recognition using optical flow and evm
Symeon et al. AFace PREPROCESSING APPROACH FOR IMPROVED DEEPFAKE DETECTION
CN117636436A (en) Multi-person real-time facial expression recognition method and system based on attention mechanism
CN117727104B (en) Near infrared living body detection device and method based on bilateral attention
John et al. Comparative analysis on different deepfake detection methods and semi supervised gan architecture for deepfake detection
CN112183357B (en) Multi-scale living body detection method and system based on deep learning
Alharbi et al. Spoofing Face Detection Using Novel Edge-Net Autoencoder for Security.
CN114913607A (en) Finger vein counterfeit detection method based on multi-feature fusion
CN112818978A (en) Optical symbol recognition method based on multi-resolution automatic encoder
Shuai et al. Face recognition method of mask occlusion
Kanwal et al. Exposing ai generated deepfake images using siamese network with triplet loss
Zhang et al. A Mask-Wearing Face Recognition Method Based on Low-Level Features and Deep Residual Networks
Sutradhar et al. Anti-Spoofing System for Face Detection Using Convolutional Neural Network Check for updates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant