CN114612980A

CN114612980A - Deformed face detection based on multi-azimuth fusion attention

Info

Publication number: CN114612980A
Application number: CN202210235051.7A
Authority: CN
Inventors: 彭烨凡; 龙敏; 徐启航
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-10

Abstract

The invention provides a deformed face detection method based on multi-azimuth blending attention aiming at face deformation detection, which comprises the following steps: 1) segmenting and normalizing the face of the image according to the eye coordinates detected by the dlib landmark point detector; 2) considering the position information of the channel attention neglect, a new attention module is proposed; 3) the double-branch convolution network is fused so as to improve the detection precision. 4) And classifying the final feature map by using the SVM.

Description

Deformed face detection based on multi-azimuth fusion attention

Technical Field

The invention relates to the field of face fusion attack detection, in particular to a deformed face detection technology based on multi-azimuth fusion attention.

Background

The face recognition technology obtains great achievement in the security protection field. Over the past few years, researchers have identified various potential deficiencies with biometric systems. Recently, vulnerabilities have been established for face and fingerprint recognition based on deformed biometric images and templates. Morphing techniques may be used to create artificial biometric samples that resemble the biometric information of two (or more) data subjects in the image and feature domain. If an image or template containing deformed individual feature information is infiltrated into the biometric identification system, the subjects that make up the deformed image will successfully authenticate both (or all) according to a single enrollment template. Therefore, a unique association between an individual and his biometric reference data is not necessary.

Such attacks constitute a serious safety hazard for biometric systems, in particular for widely deployed border control systems and electronic travel documents. Different commercial face recognition systems have been found to be highly vulnerable to such attacks. Because of the high intra-class variability of faces, face recognition systems achieve acceptable mismatch rates (FNMRs) at mismatch rates (FMRs) of up to 0.1%. That is, automatic detection of a distorted face image is crucial to ensure the safety of an operational face recognition system.

In order to solve the potential defects of the face recognition system, the detection of the face deformation attack becomes a problem to be solved urgently. The existing human face deformation attack detection method mainly comprises four algorithm types, namely deformation detection methods based on texture features, image quality, deep learning and mixed features. Capturing the change of the microscopic texture of the picture in the deformation process by a texture-based method, thereby realizing the detection of the deformed human face; the method based on image quality detects a deformed face by quantizing the difference of compression artifacts and noise introduced in the deformation process; recent deep learning-based methods use pre-trained CNN architecture extraction features to detect face deformation by classification. However, these methods still have the problems of high error rate, poor robustness, and high network complexity.

Disclosure of Invention

In view of the above disadvantages of the prior art, the present invention provides a method for detecting a deformed human face based on multi-directional blending attention. The method aims to solve the problems of high error rate, poor robustness, large parameter quantity and the like in the conventional method.

In order to achieve the above object, the present invention provides a block diagram based on multi-directional attention fusion, comprising the following steps:

a1, preprocessing an input image;

a2, passing through a double-branch convolution network module;

a3, passing through a multi-azimuth attention blending module;

a4, classification

The invention provides a deformed face detection method based on multi-azimuth attention blending. Compared with the prior art, the method has the following beneficial effects:

the scheme adopts a deep learning method, and mainly detects the deformed human face by fusing a multi-azimuth convergence attention module and a double-branch convolution network. Methods for deformable face detection by conventional attention mechanisms have been successful. Unlike channel attention, which converts the feature tensor into a single feature vector through 2-dimensional global pooling, multi-directional fusion attention decomposes channel attention into two 1-dimensional feature encoding processes, aggregating features along 2 spatial directions, respectively. In this way, remote dependencies can be captured in one spatial direction, while accurate location information can be retained in another spatial direction. The generated feature maps are then encoded as a pair of orientation-aware and location-sensitive attribute maps, respectively, which can be applied complementarily to the input feature maps to enhance the representation of the object of interest. And more important information can be captured by mixing the face image with the double-branch convolution network, different characteristics of real and deformed face images can be better captured, and the face image detection method is favorable for reliably detecting deformed faces.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a block diagram of a deformable face detection based on multi-aspect blend attention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention is described in detail below with reference to the drawings and the detailed description. As shown in fig. 1, a method for detecting a deformed face based on multi-directional attention fusion includes steps a 1-a 4:

a1, preprocessing the input image;

a2, passing through a double-branch convolution network module;

a3 model of multi-directional attention fusion

A4, classification

Each step is described in detail below.

In step a1, in a face morphing attack, the face region is generally centered in the image. To accurately extract features from the image, only the largest central region of the image is retained. In the pre-processing stage, the face of the image is segmented and normalized according to the eye coordinates detected by the dlib landmark point detector.

In step a2, given the input signature X, two signatures U1 and U2 were generated by a3 × 3 block convolution and 3 × 3 hole convolution (5 × 5 receptive field), respectively, from the original signature. The two feature maps are then added to generate a new feature map, the generated map is passed through a multi-orientation blend attention module and through both a and b functions, and the generated function values are multiplied by the original U1 and U2. Since the sum of the function values of a and b is equal to 1, it can be realized to set weights for the feature maps of the branches, because the sizes of the convolution kernels of different branches are different, so that the network can select a proper convolution kernel by itself (the A, B matrixes in the functions of a and b are initialized before training, the sizes are all C × d, and z is the feature map before passing through A, B functions after multi-azimuth blending attention), where we have:

in step A3, a multi-orientation blending attention module is designed. The method comprises the following steps: given an input X, each channel is first encoded along a horizontal and vertical coordinate, respectively, using a posing kernel of size (H,1) or (1, W). Thus, the output of the c-th channel of height h can be expressed as:

likewise, the output of the c-th channel of width w can be written as:

the 2 transformations respectively aggregate features along two spatial directions to obtain a pair of direction-sensing feature maps. After passing through the transform in the information embedding, the section subjects the above transform to a convert operation, and then subjects it to a transform operation using a convolution transform function:

f＝δ(F₁[z^h,z^w])) (5)

wherein.]For the concatenate operation along the spatial dimension, δ is the nonlinear activation function, and f is the intermediate feature map that encodes the spatial information in the horizontal and vertical directions. Then will decompose into 2 individual tensors f along the spatial dimension^h∈R^C/r×WAnd f^w∈R^C/r×WWhere r is the control rate used to control the SE block. Using another 2 1 x 1 convolution transformations F_hAnd F_wRespectively will f_hAnd f_wThe transformation into a tensor with the same number of channels to the input X yields:

g^h＝σ(F_h(f^h)) (6)

g^w＝σ(F_w(f^w)) (7)

where σ is the sigmoid activation function. To reduce the complexity and computational overhead of the model, a suitable reduction ratio r (e.g., 16) is typically used here to reduce the number of channels of f. Then outputs g^hAnd g^wExtensions were made as attention weights, respectively.

Finally, the output of the multi-azimuth attentiveness module may be written as:

in step a4, the last 1 key step of the present invention is to determine the human face by finding the optimal classification model through a high-discrimination machine learning algorithm. The support vector machine containing the radial basis function is selected as the classifier, and the classifier not only has high classification accuracy, but also is widely applied to research subjects such as face recognition and the like. And sending the features subjected to the dimensionality reduction in the last step into the SVM, and finishing the detection of the deformed human face according to the output data of the SVM.

The invention provides a deformed face detection method based on multi-azimuth blending attention, and the innovation points of the method comprise the following steps:

a method for detecting a deformed human face based on multi-azimuth fusion attention is provided. The method carries out the detection of the deformed human face by the interaction of a multi-azimuth convergence attention module and a double-branch convolution network. The new attention module can better capture the difference between real and deformed human face images and is beneficial to reliably detecting deformed human faces.

A method of compensating for channel attention ignoring location information is proposed. The use of conventional channel attention only considers re-weighting the importance of each channel by channel relationship and ignores the location information, but the location information is important for generating spatially selective attribute maps. A new attention block is therefore introduced which takes into account not only the inter-channel relationships but also the location information of the feature space.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A detection system based on multi-azimuth convergence attention. Characterized in that the method is executed by a computer and comprises the following steps:

a1, preprocessing an input image;

a2, passing through a double-branch convolution network module;

a3, passing through a multi-azimuth convergence attention module;

a4, classification.

2. The multi-azimuth fusional attention detection method as claimed in claim 1, wherein said normalized area is clipped to 224 x 224 pixels to ensure that the deformation detection algorithm is applied only to the face area, and the specific implementation of a1 is as follows: in a face morphing attack, the face region is usually located in the center of the image. To accurately extract features from the image, only the largest central region of the image is retained. In the pre-processing stage, the face of the image is segmented and normalized according to the eye coordinates detected by the dlib landmark point detector.

3. The method for detecting multi-directional concentration as claimed in claim 1, wherein the implementation procedure of a2 is as follows: given the input signature X, the original signature was passed through one 3 × 3 packet convolution and 3 × 3 hole convolution (field 5 × 5) to generate two signatures U1 and U2, respectively. The two feature maps are then added to generate a new feature map, the generated map is passed through a multi-orientation blend attention module and through both a and b functions, and the generated function values are multiplied by the original U1 and U2. Since the sum of the function values of a and b is equal to 1, the weighting of the feature maps of the branches can be realized, and because the sizes of convolution kernels of different branches are different, the network can select a proper convolution kernel by itself (the A, B matrixes in the functions of a and b are initialized before training, the sizes of the matrixes are C x d, and z is the feature map before passing through the A, B function after multi-azimuth blending attention) here:

4. the method as claimed in claim 1, wherein the implementation procedure of a3 is as follows: given an input X, each channel is first encoded along a horizontal and vertical coordinate, respectively, using a posing kernel of size (H,1) or (1, W). Thus, the output of the c-th channel with height h can be expressed as:

likewise, the output of the c-th channel of width w can be written as:

f＝δ(F₁[z^h,z^w])) (5)

in the formula (II).]For the concatenate operation along the spatial dimension, δ is the nonlinear activation function, and f is the intermediate feature map that encodes the spatial information in the horizontal and vertical directions. Then along the spatial dimension will be decomposed into 2 separate tensors f^h∈R^C/r×WAnd f^w∈R^C/r×WWhere r is the control rate used to control SEblock. Using another 2 1 x 1 convolution transformations F_hAnd F_wRespectively will f_hAnd f_wThe transformation into a tensor with the same number of channels to the input X yields:

g^h＝σ(F_h(f^h)) (6)

g^w＝σ(F_w(f^w)) (7)

where σ is the sigmoid activation function. To reduce the complexity and computational overhead of the model, a suitable reduction ratio r (e.g., 16) is typically used here to reduce the number of channels of f. Then outputs g^hAnd g^wExtensions are made as attentionweights, respectively.

Finally, the output of the multi-azimuth attentiveness blended module may be written as:

5. the multi-azimuth molten attention detection method as claimed in claim 1, wherein the reduced-dimension features are classified by SVM, and the specific implementation process of a4 is as follows: the last 1 key step of the invention is to find the optimal classification model through a high-discrimination machine learning algorithm so as to judge the human face. The classifier has high classification accuracy and is widely applied to research subjects such as face recognition and the like. And sending the features subjected to the dimensionality reduction in the last step into the SVM, and finishing the detection of the deformed human face according to the output data of the SVM.