CN117727104A

CN117727104A - Near infrared living body detection device and method based on bilateral attention

Info

Publication number: CN117727104A
Application number: CN202410180161.7A
Authority: CN
Inventors: 何一凡; 陈昕; 张帅; 王汉超; 贾宝芝
Original assignee: Xiamen Ruiwei Information Technology Co ltd
Current assignee: Xiamen Ruiwei Information Technology Co ltd
Priority date: 2024-02-18
Filing date: 2024-02-18
Publication date: 2024-03-19
Anticipated expiration: 2044-02-18
Also published as: CN117727104B

Abstract

The invention discloses a near infrared living body detection device and method based on a bilateral attention mechanism, which can fully utilize channel and position information in an image by introducing the bilateral attention mechanism, improve the distinguishing capability between true and false faces and finally improve the experience of users in a face recognition system. The convolutional neural network structure is used for extracting a face image feature map with discrimination from an input near infrared image and is used as input of a bilateral attention mechanism, the bilateral attention mechanism extracts features of the face image feature map from dimensions of channels and positions to obtain finer and more discriminative features, a model is helped to better capture correlation and importance among different channels and pay attention to key areas in an input tensor, and therefore discrimination capability of a living body detection model is improved.

Description

Near infrared living body detection device and method based on bilateral attention

Technical Field

The invention relates to the technical field of computer living body detection, in particular to a near infrared living body detection device and method based on bilateral attention.

Background

Living detection is a method for verifying identity, aimed at determining the true physiological characteristics of a subject. The living body detection plays an important role in face recognition, and can effectively resist various common attack means, such as fraudulent conduct by using means of photos, videos, face changing technology, masks, shielding, 3D animation, screen flipping and the like. By detecting the living body of the user, the authenticity of the user can be ensured, so that the benefit and the safety of the user are ensured.

With the development of deep learning, the performance of the living body detection method based on the deep neural network is greatly improved. Such methods use convolutional neural networks to learn features with high distinguishability from large-scale face datasets, the convolutional neural networks used comprising: residual learning network, central differential convolution network, LSTM, challenge generation network. Despite significant advances in biopsy techniques, there are still some drawbacks, such as:

1. insufficient discrimination of network extraction features results in poor protection against attacks on the replacement of local facial data (e.g., true faces with false noses, true faces with false eyes).

2. The current living body detection algorithm has poor robustness and insufficient generalization capability when facing an attack scene which does not appear before.

3. Current transformer-based biopsy algorithms typically require a significant amount of computational resources and high performance hardware to train and infer. This is a challenge for some resource-constrained devices or scenarios.

In view of the above, the present inventors have intensively conceived to solve various drawbacks and inconveniences caused by the lack of perfection of the conventional living body detection technique, and have actively studied and tried to develop the present invention.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a near infrared living body detection device based on bilateral attention, which can fully utilize channel and position information in an image, improve the distinguishing capability between true and false faces and finally improve the experience of users in a face recognition system.

The invention further aims to overcome the defects of the prior art, and provides a near infrared living body detection method based on bilateral attention, which can fully utilize channel and position information in an image, improve the distinguishing capability between true and false faces and finally improve the experience of users in a face recognition system.

In order to achieve the above object, the solution of the present invention is:

the near infrared living body detection device based on the bilateral attention mechanism mainly comprises a self-defined convolutional neural network structure, a bilateral attention mechanism and a classifier;

the convolutional neural network structure is used for extracting a face image feature map with discriminant power from an input near infrared image and is used as input of a bilateral attention mechanism;

the bilateral attention mechanism comprises a channel attention sub-module and a position attention sub-module, face image feature images output by the convolutional neural network structure are respectively input into the channel attention sub-module and the position attention sub-module, and the channel attention sub-module weights the face image feature images by learning the importance of each channel so as to reduce redundant information and emphasize channel features useful for living body detection; the position attention submodule processes attack, deformation and shielding of different scales by paying attention to key face areas in the face image feature map;

the classifier uses the self-defined convolutional neural network structure and the features extracted by the bilateral attention module to classify the true and false people, and the classifier accurately classifies the input images according to the expression and distinguishing capability of the features.

Further, the bilateral attention mechanism has two branches, wherein the first branch consists of a convolution module and a channel attention submodule, and the second branch consists of a convolution module and a position attention submodule; the output characteristics of the two branches are connected in a characteristic splicing mode; finally, the spliced features pass through a convolution module to obtain output features.

Further, the convolution module includes convolution, batch normalization, and ReLU activation functions.

Further, the channel feature extraction method of the channel attention sub-module comprises the following steps:

firstly, the dimension reduction is carried out on the input face image feature map by using a convolution module of 1 multiplied by 1,wherein x is a face image feature map;

then, the face image feature images after dimension reduction are remodeled into face image feature images of [ N, C/2, H W ], [ N, H W, C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, W is the width of the face feature images,

obtaining the two face image feature images by batch matrix multiplicationThe method comprises the steps of carrying out a first treatment on the surface of the Along->The last dimension takes the maximum value to obtain the shape of [ N, C/2,1 ]]Tensor->Expanding its shape to [ N, C/2]]The method comprises the steps of carrying out a first treatment on the surface of the Will->Minus->Obtaining a new tensor->，

Along withIs>In operation, the attention weight of the channel is obtained,

human face image characteristic diagram after channel attention weight and dimension reductionUsing batchesMatrix multiplication and reshaping the output to a shape of [ N, C/2, H, W]Is used for the tensor of (c),

thus, the facial image feature map adjusted by the channel attention submodule is obtained.

Further, the method for extracting the position features by the position attention submodule comprises the following steps:

firstly, a convolution module of 1 multiplied by 1 is utilized to reduce the dimension of an input face image feature map, and the feature shape after dimension reduction is [ N, C/2, H, W ]]，Wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature map, W is the width of the face feature map, and x is the face image feature map;

then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1, H ]]Obtaining a face image feature map，/>，

；

Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtain finer position information,

；

subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>，

；

For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1,

；

finally, the face image feature map is obtainedRespectively and->、/>Multiplying to obtain the facial image feature map adjusted by the position attention submodule.

Further, the specific classification method of the classifier comprises the following steps:

firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module;

then, reducing the width and height dimensions of the tensor to 1 through a pooling module;

then, carrying out characteristic remodelling on the tensor after pooling;

further, a living body detection result with the shape of [ N,2] is obtained through the full connection layer, wherein N is the size of batch size, and 2 is the fraction;

finally, the cross entropy loss between the predicted result and the label is calculated, the formula is as follows,

wherein,for the number of input samples; />Is a label; />Is the predicted result.

A near infrared living body detection method based on a bilateral attention mechanism, comprising the following steps:

step S1: inputting a near infrared image;

step S2: extracting a high-quality face image feature map from the near infrared image input in the step S1 by using a self-defined convolutional neural network structure;

s3, inputting the face image feature map in the step S2 into a bilateral attention mechanism, wherein the bilateral attention mechanism is provided with two branches, one branch consists of a convolution module and a channel attention submodule, the other branch consists of the convolution module and a position attention submodule, the face image feature map is respectively input into the two branches of the bilateral attention mechanism, the channel attention submodule extracts channel features after respectively passing through the convolution module, the importance of each channel is learned, and the face image feature map is weighted; extracting position features by a position attention submodule, focusing on key face areas, and processing attacks, deformation and shielding of different scales;

step S4: characteristic splicing is carried out on the channel characteristics and the position characteristics;

step S5: the spliced features are subjected to a convolution module to obtain output features;

step S6: and inputting the output characteristics into a classifier, and accurately classifying the input facial image characteristic images by the classifier according to the expression and distinguishing capability of the characteristics.

then, respectively remolding the face image feature images after dimension reduction into face image feature images of [ N, C/2, H.W ], [ N, H.W and C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, and W is the width of the face feature images;

Along withIs>In operation, the attention weight of the channel is obtained,

human face image characteristic diagram after channel attention weight and dimension reductionUsing batch matrix multiplication and reshaping the output into a shape of [ N, C/2, H, W]Is used for the tensor of (c),

then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1 ],H]Obtaining a face image feature map，/>：

；

Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtain finer position information:

；

subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>：

；

For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1:

；

then, carrying out characteristic remodelling on the tensor after pooling;

By adopting the scheme, compared with the method and the device for near infrared living body detection based on the bilateral attention mechanism and the prior technical scheme, the method and the device for near infrared living body detection based on the bilateral attention mechanism are used for adaptively learning the fine texture characteristics of different faces, wherein the channel attention sub-module can automatically learn the importance of each channel and weight the characteristic diagram, so that the model can pay attention to the channel characteristics which are useful for distinguishing the true and false faces better, and meanwhile, the influence of redundant information such as illumination change, noise and the like can be effectively restrained, and the robustness and the stability of living body detection are improved; the position attention sub-module can learn and pay attention to key facial areas related to attack, such as eyes, nose, mouth and the like, so that the sensitivity and recognition accuracy of a living body detection algorithm to attack features can be improved, and meanwhile, the model can be assisted to pay attention to the key facial areas which are not shielded or deformed, and interference caused by deformation and shielding areas is reduced. Finally, the anti-attack performance of the model on the facial data of the replaced local five sense organs is enhanced. More importantly, compared with large networks such as transformers, the method is based on a self-defined network structure and a bilateral attention mechanism, has low demand on computing resources, and can be operated in equipment or scenes with limited resources.

The invention provides a bilateral attention mechanism which can effectively learn details and texture information in the face so as to distinguish the face replacing local five sense organs. The invention can more effectively extract the true and false face features with discriminant from different face information by adopting the self-defined convolutional neural network structure, and the parameter number and the calculated amount of the model before quantization are only 1021KB and 11.95GFLOPs, so that the model can be operated in equipment or a scene with limited resources. The living body detection technology has the advantages that the real person passing rate is over 99 percent, various types of paper can be prevented, local facial features can be replaced, and the high-precision 3D head model and resin mask can attack over 98 percent.

Drawings

Fig. 1 is an overall frame diagram of a near infrared living body detection device based on a bilateral attention mechanism of the present invention.

FIG. 2 is a block diagram of a bilateral attention mechanism of the present invention.

FIG. 3 is a network architecture diagram of a channel attention sub-module of the present invention.

FIG. 4 is a network architecture diagram of a location attention sub-module of the present invention.

Fig. 5 is a block diagram of a classifier of the present invention.

Detailed Description

In order to further explain the technical scheme of the invention, the invention is explained in detail by specific examples.

As shown in fig. 1, the invention discloses a near infrared living body detection device based on a bilateral attention mechanism, which comprises the following implementation ideas: the channel and position information in the image can be fully utilized by introducing a bilateral attention mechanism, the distinguishing capability between true and false faces is improved, and finally the experience of a user in a face recognition system is improved. The whole frame diagram is shown in fig. 1, the input of the model is a jpg or png near infrared image, after the near infrared image is preprocessed (not shown in the figure), a self-defined convolution neural network structure (convolution modules 1 to 9 in fig. 1) is used for extracting a high-quality face image feature diagram from the preprocessed near infrared image to serve as the input of a bilateral attention mechanism, the bilateral attention mechanism extracts the features of the face image feature diagram from the dimensions of channels and positions, finer and more discernable features are obtained, the model is helped to better capture the correlation and importance among different channels and pay attention to key areas in input tensors, so that the discernability of a living body detection model is improved, and the classifier is used for classifying true and false people by using the features extracted by the self-defined convolution neural network structure and the bilateral attention mechanism.

The near infrared living body detection device based on the bilateral attention mechanism mainly comprises a self-defined convolutional neural network structure, the bilateral attention mechanism and a classifier.

The self-defined convolutional neural network structure is used for extracting a face image feature map with discriminant from an input near infrared image. The input near infrared image is firstly subjected to a series of preprocessing steps to form a face image feature map as the input of a bilateral attention mechanism, wherein the preprocessing steps specifically comprise face detection and image enhancement: random rotation, random overturn, random image scrambling according to patch, etc., and the self-defined convolutional neural network extracts high-quality characteristic representation from the preprocessed facial image characteristic diagram. As shown in fig. 1, the convolution modules 1 to 9 are custom convolutional neural network structures, wherein the convolution modules 1 to 7 perform feature extraction and downsampling (reduce the resolution of the feature map), and the convolution modules 8 and 9 perform feature extraction and dimension reduction (reduce the number of channels of the feature map).

Bilateral attention mechanisms are used to enhance the expressive power of features. The bilateral attention mechanism consists of two attention sub-modules: the channel attention sub-module and the position attention sub-module. The face image feature images output by the convolutional neural network structure are respectively input into a channel attention sub-module and a position attention sub-module, and the channel attention sub-module weights the face image feature images by learning the importance of each channel so as to reduce redundant information and emphasize channel features useful for living body detection; the position attention submodule processes attacks, deformations and occlusions of different scales by paying attention to key facial areas, and improves sensitivity, robustness and accuracy of living body detection.

The bilateral attention mechanism is to extract features from the dimensions of the channel and location. Compared with a single-dimensional attention mechanism, the multi-dimensional attention mechanism can obtain finer and more discernable features, and helps the model better capture the correlation and importance between different channels and focus on key areas in the input tensor, so that the discernability of the living body detection model is improved.

FIG. 2 is a schematic diagram of a two-sided attention mechanism, the two-sided attention mechanism having two branches, the first branch comprising a convolution module and a channel attention sub-module, and the second branch comprising a convolution module and a position attention sub-module; the output characteristics of the two branches are connected in a characteristic splicing mode; finally, the spliced features can be subjected to a convolution module to obtain output features. The convolution module includes convolution (Conv), batch Normalization (BN), and ReLU activation functions.

As shown in fig. 3, which is a block diagram of a channel attention sub-module, the channel feature extraction method of the channel attention sub-module is as follows:

firstly, the convolution module of 1 multiplied by 1 is utilized to reduce the dimension of the input face image feature map,wherein x is a face image feature map;

then, the face image feature images after dimension reduction are remodeled into face image feature images of [ N, C/2, H×W ], [ N, H×W, C/2], wherein N is the number of input pictures, C is the number of channels, H is the height of the face feature images, and W is the width of the face feature images:

Along withIs>In operation, the attention weight of the channel is obtained,

As shown in fig. 4, which is a block diagram of the location attention sub-module, the flow of the location attention sub-module extracting the location feature is as follows:

then, carrying out average pooling operation on the face image feature images after dimension reduction along the H dimension and the W dimension respectively, wherein the shapes of the face image feature images after the pooling module are [ N, C/2,1, W ]]、[N,C/2,H,1]And converting the second characteristic, wherein the shape of the transposed face image characteristic diagram is [ N, C/2,1, H ]]Obtaining a face image feature map，/>：

；

finally, the face image feature map is obtainedRespectively and->、/>And multiplying to obtain the facial image feature map adjusted by the position attention submodule.

The classifier uses the characteristics extracted by the self-defined convolutional neural network structure and the bilateral attention module to classify the true and false persons. The classifier accurately classifies the input image according to the expression and distinguishing capability of the features.

As shown in fig. 5, which is a structural diagram of the classifier, as shown in the figure, a specific flow of the classifier is as follows:

firstly, performing dimension reduction processing on tensors output by a bilateral attention mechanism by using a convolution module, wherein the shape of the tensors is changed from [ N,64,16,16] to [ N,48,16,16];

then, the pooling tensors are subjected to characteristic remodeling, and the shape is [ N,48];

furthermore, a living body detection result with the shape of [ N,2] is obtained through the full connection layer;

During model training, before the images of the training samples are input into a living body detection model to be trained, the input images are required to be processed by using methods of random rotation, random overturning, random image scrambling according to the patch, gaussian noise addition, random contrast enhancement and the like, so that the richness of the training samples is improved. The input image is then normalized, limiting the pixel values of the image to between 0-1.

The training period of the model was 200 epoch (model loss had converged when the model was trained to 200 epoch), batch size was 1024, learning rate was 0.01, optimizer was SGD, momentum was 0.9, and the optimizer tuning strategy was StepLR.

During model reasoning, the images can be input into the model after normalization operation. The input image is processed by a convolution module, a bilateral attention mechanism, characteristic remodeling and the like to be changed into a two-dimensional tensor with the shape of [ N,64 ]. And then, carrying out matrix multiplication on the two-dimensional tensor and the optimized full-connection layer to obtain a living detection result with the shape of [ N,2] (wherein N is the size of the batch size and 2 is the score), the first score is the false face score, the second score is the true face score, and the category corresponding to the larger score is taken as the detection result.

As shown in fig. 1 to 5, the present invention also discloses a near infrared living body detection method based on a bilateral attention mechanism, which comprises the following steps:

step S1: inputting a near infrared image;

step S2: preprocessing the image input in the step S1 by using a self-defined convolutional neural network structure, and extracting a high-quality face image feature map from the preprocessed image;

The above examples and drawings are not intended to limit the form or form of the present invention, and any suitable variations or modifications thereof by those skilled in the art should be construed as not departing from the scope of the present invention.

Claims

1. Near infrared living body detection device based on bilateral attention mechanism, characterized by mainly comprising: convolutional neural network structure, bilateral attention mechanism and classifier;

2. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein: the bilateral attention mechanism is provided with two branches, wherein the first branch consists of a convolution module and a channel attention submodule, and the second branch consists of a convolution module and a position attention submodule; the output characteristics of the two branches are connected in a characteristic splicing mode; finally, the spliced features pass through a convolution module to obtain output features.

3. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 2, wherein: the convolution module includes convolution, batch normalization, and ReLU activation functions.

4. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein: the channel feature extraction method of the channel attention sub-module comprises the following steps:

Along withIs>In operation, the attention weight of the channel is obtained,

5. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein the method for extracting the position feature by the position attention submodule is as follows:

；

6. The near infrared living body detection device based on the bilateral attention mechanism as claimed in claim 1, wherein the specific classification method of the classifier is as follows:

then, carrying out characteristic remodelling on the tensor after pooling;

7. The near infrared living body detection method based on the bilateral attention mechanism is characterized by comprising the following steps of:

step S1: inputting a near infrared image;

8. The near infrared living body detection method based on the bilateral attention mechanism as claimed in claim 7, wherein the channel feature extraction method of the channel attention sub-module is as follows:

Along withIs>In operation, the attention weight of the channel is obtained,

9. The near infrared living body detection method based on the bilateral attention mechanism as set forth in claim 7, wherein the method for extracting the position features by the position attention submodule is as follows:

；

Splicing face image feature mapAnd->Obtaining a facial image feature map +.>And convolving it to obtainFiner location information;

；

subsequently, the face image feature map is separatedObtain->And->And transferring the separated second facial image feature map +.>；

；

For a pair ofAnd->Go->Operating and expanding to ensure that the weights in the two face image feature images are between 0 and 1;

；

10. The near infrared living body detection method based on the bilateral attention mechanism as claimed in claim 7, wherein the specific classification method of the classifier is as follows:

then, carrying out characteristic remodelling on the tensor after pooling;