CN114495210A

CN114495210A - Posture change face recognition method based on attention mechanism

Info

Publication number: CN114495210A
Application number: CN202210013502.2A
Authority: CN
Inventors: 张鹏; 赵锋; 张悦; 李孟委
Original assignee: Nantong Institute Of Intelligent Optics North China University; North University of China
Current assignee: Nantong Institute for Advanced Study; North University of China
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2022-05-13

Abstract

The invention discloses a face recognition method based on the attitude change of an attention mechanism, which comprises the steps of firstly labeling image data with a face in a natural scene, and dividing the image data into a training set and a testing set; then introducing a double-sided attention mechanism module LA-SENET into a face recognition system, and highlighting feature information with the most distinguishing property of the face under the posture change by the module; the introduction of the LANet module automatically locates the most distinctive facial regions and at the same time the introduction of the SENet module highlights the more important channels; then designing a Bottleneck-attention module of an inverted multi-scale residual error structure based on the MobileNet V2 to obtain three features with different scales; fusing the characteristics of different layers by adopting a multi-scale characteristic fusion method, carrying out channel splicing operation by using a Concatenate operation, and finally outputting through a full connection layer; the finally obtained local features are fused with the global information finally output by the network and output; the invention can effectively learn the face characteristics under the posture change and improve the face recognition accuracy.

Description

Posture change face recognition method based on attention mechanism

Technical Field

The invention relates to the technical field of pattern recognition, in particular to a posture change face recognition method based on an attention mechanism.

Background

As an important direction in computer vision, face recognition uses a computer technology to recognize face information in an image or a video, and extracts the most critical visual feature information as a specific feature from the face information, thereby finally distinguishing identity information. Although the accuracy of face recognition is high at present, the accuracy of most face recognition algorithms is relatively reduced when processing images with large posture change. This also means that pose changes become an important challenge in the field of face recognition, and therefore how to accurately recognize faces affected by different pose changes becomes a key problem in the field of face recognition.

In order to overcome the influence of the posture change on the face recognition, researchers have conducted diligent research for the purpose. The most important way is to collect face data sets under the influence of different posture changes, such as YouTubeFace, and the face recognition model trained by using a large number of data sets has better self-adaption capability and fitting capability for images, but the process of collecting the data sets consumes a large amount of material resources and manpower, thereby causing unnecessary resource waste. Another important way is to solve the problem of face recognition with few training samples by generating faces at different angles through the front images. Such as those proposed by Blanz and Vetter: the three-dimensional deformation model (3DMM) is established on the basis of the three-dimensional face database, and the influence of posture change and the like is considered, so that the generated three-dimensional face model is high in precision. However, the method has higher requirement on the precision of the three-dimensional model, the three-dimensional modeling time is longer, and the model optimization is more complex. Another important way is to design a corresponding face recognition network for each different pose, adaptively select a suitable network from the images during the test, and then integrate the results of the different networks. However, this method has significant drawbacks: it requires multiple views to be acquired from multiple perspectives of each face. This is not possible in many practical situations and often only a single view of the front view or other pose is available.

Disclosure of Invention

The invention aims to provide a posture change face recognition method based on an attention mechanism, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a face recognition method based on posture change of an attention mechanism comprises the following steps:

the method comprises the following steps: dividing the acquired data set into a training data set and a testing data set, and performing data preprocessing on the selected training data set;

step two: cutting the images of the training set in the step one into 112 × 112 image blocks;

step three: designing a convolutional neural network based on an attention mechanism;

step four: a convolutional neural network based on an attention mechanism is trained.

Preferably, the third step specifically comprises the following steps:

3.1: designing a double-sided attention mechanism module LA-SENEt: the attention mechanism module includes: the LANet space attention module and the SENet channel attention module are used for highlighting feature information which is most distinctive of the human face under the posture change;

3.2: designing an inverted residual error structure: designing Bottleneck-attention of an inverted residual error structure based on the MobileNet V2, namely expanding the number of channels and then compressing for reducing the calculated amount;

3.3: constructing a multi-scale feature fusion module: the multi-scale feature fusion module is formed by connecting a feature fusion module and an up-down sampling module;

3.4: constructing a global average pooling module: the global average pooling module is composed of GDConv and Linear Conv;

3.5: the method comprises the steps of building an attention mechanism network based on a MobileFaceNet model as a main body, wherein the attention mechanism-based posture change face recognition system network consists of six parts, namely an input module, an attention mechanism module LA-SENet, an inverted residual error module Bottleneck-attention, a multi-scale feature fusion module, a global average pooling module and an output module.

Preferably, the fourth step specifically comprises the following steps:

4.1: setting an activation function and a loss function, and estimating network parameters by using a difference value between a real image and an image after the convolution neural network based on an attention mechanism;

constructing SoftMax as the loss function of the invention, which adds a probability for each class, wherein the expression of the SoftMax function is as follows:

wherein x is_iRepresenting the depth feature of the ith sample, belonging to the ith class; w_jJ-th column representing weight W; b_jIs a deviation term; n represents the total number of training data categories, and N represents the batch size;

4.2: selecting an optimization function to carry out iterative training on the convolutional neural network based on the attention mechanism;

4.3: setting training parameters including learning rate, iterative Batch and Batch value important parameters;

4.4: the trained network was tested using the image dataset of lfw, cplfw, agendb _30, cfp as the test dataset of the present invention.

Preferably, in step 3.1: the attention mechanism module LA-SENEet is composed of a space attention mechanism LANet and a channel attention mechanism SENEet. The LANet is formed by two continuous 1 x 1 convolution kernels, and after each convolution, Relu and Sigmod are connected in series respectively; the SENET is composed of two continuous FC layers, and Relu and Sigmod are respectively connected in series after each FC layer.

Preferably, in step 3.2: the inverted residual module consists of different Stride structures.

Preferably, in the step 4.2: the network is iteratively trained using the SGD algorithm.

Preferably, in step 4.3: the learning rate initial value is set to 0.1, the number of iterations is set to 25, and the Batch value is set to 512.

Compared with the prior art, the invention has the beneficial effects that: the invention designs the convolutional neural network module based on the attention mechanism, and the range of the convolutional neural network module based on the attention mechanism is larger than that of a general convolutional neural network sensing field. Therefore, not only can more characteristics of the low-resolution image be extracted, but also a high-frequency information part in the image can be extracted by utilizing the attention module; designing a multi-scale feature fusion module which can fuse the features of different convolutional layers; an inverted residual error module is designed, the convolutional neural network based on the inverted residual error module successfully solves the problem that the training difficulty is increased along with the deepening of the network, and the features with different scales are output; a global average pooling module is constructed that can minimize the over-fitting effect by reducing the number of parameters of the model.

Drawings

FIG. 1 is a flow chart of feature extraction of an attention-based pose change face recognition system according to the present invention.

FIG. 2 is a schematic block diagram of a LA-SENET module of the double-sided power amplifier of the present invention;

FIG. 3 is a schematic block diagram of a Bottleneck-attention module of an inverted residual error structure according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "connected," and the like are to be construed broadly, such as "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Referring to fig. 1, the present invention provides the following technical solutions: a face recognition method based on posture change of an attention mechanism comprises the following steps:

the method comprises the following steps: dividing the collected data set into a training data set and a testing data set, and carrying out data preprocessing on the selected training data set: vgg2face data, namely a large-scale face database comprising multiple postures and multiple visual angles, are selected as a training data set of the posture change face recognition system based on the attention mechanism, and image data sets of lfw, cplfw, agenb _30 and cfp are selected as a test data set;

In the invention, the third step specifically comprises the following steps:

3.1: designing a double-sided attention mechanism module LA-SENEt: the attention mechanism module includes: the LANet space attention module and the SENet channel attention module are used for highlighting feature information which is most distinctive of the human face under the posture change; as shown in fig. 3, in the LANet module, there are two consecutive 1 × 1 convolutional layers, the ReLu function is added after the first convolution, and the Sigmod activation function is added after the second convolution, so as to aggregate the spatial information across channels into one channel. Where the second convolution outputs 1 channel with a sigmoid function, i.e., spatial attention. The SEnet module is divided into a compression part and an excitation part. The module compresses the image into a one-dimensional image with a larger field, and after the one-dimensional image is added into the FC layer, each same importance is predicted;

3.2: designing an inverted residual error structure: designing Bottleneck-attribute of an inverted residual error structure based on MobileNet V2, namely expanding the number of channels and then compressing for reducing the calculated amount, wherein an inverted residual error module consists of different Stride structures; as shown in fig. 4, the residual block of the multi-scale inversion is composed of Stride with different values. After 3 1 × 1 convolutions, two 1 × 1 convolutions are subjected to depth separable convolution by 3 × 3, one of the depth separable convolutions is subjected to 3 × 3 convolution, three branches are connected by using a Concat function, and finally the size of the channel is adjusted by using a 1 × 1 convolution kernel, if Stride ═ 2, the number of the convolutions is subjected to 1 × 1 convolution and expansion, and then the size of the channel is adjusted by using a 3 × 3 depth separable convolution kernel;

as shown in fig. 1, the images of different scales output by the residual error module after multi-scale inversion are subjected to feature fusion. Let x₁、 x₂And x₃For the features of different layers, the features of these different layers are multiplied by the weighting parameters α, β, and γ and added to obtain a new fusion feature, as shown in the following formula:

wherein the content of the first and second substances,

an (i, j) th vector representing an output feature map;

representing a feature vector at position (i, j) on the feature map adjusted from level n to level l;

and

the spatial importance weight of feature mapping from three different levels to l levels learned by network adaptation is referred to;

the weight parameters α, β, and γ are obtained by performing 1 × 1 convolution on the feature maps of the respective layers after resize. The formula of the weight parameter after the SoftMax function is expressed as follows:

3.5: the method comprises the steps of building an attention mechanism network based on a MobileFaceNet model as a main body, wherein the attention mechanism-based posture change face recognition system network consists of six parts, namely an input module, an attention mechanism module LA-SENet, an inverted residual error module Bottleneck-attention, a multi-scale feature fusion module, a global average pooling module and an output module. The input module is formed by convolution of 3 x 3, and the attention mechanism module LA-SENET uses the attention mechanism module described in 3.1; the residual module uses the MobileNetV2 inverted residual structure described in 3.2; the multi-scale feature fusion module shown in the figure IV is formed by connecting 3 multi-channel feature extraction modules of 3.3; as shown in the fourth figure, the global average pooling module is composed of GDConv with 7 × 7 kernels and Linear Conv with 1 × 1 kernels, and the output module is composed of a Concat function and an FC layer.

In the invention, the fourth step specifically comprises the following steps:

4.2: selecting an optimization function to carry out iterative training on the convolutional neural network based on the attention mechanism, and carrying out iterative training on the network by using an SGD algorithm;

4.3: setting training parameters including learning rate, iterative batches and important parameters of a Batch value, setting an initial value of the learning rate to be 0.1, setting the iteration times to be 25 and setting the Batch value to be 512;

In summary, the invention designs the convolutional neural network module based on the attention mechanism, and the range of the perception field of the convolutional neural network module based on the attention mechanism is larger than that of the convolutional neural network in a general sense. Therefore, not only can more characteristics of the low-resolution image be extracted, but also a high-frequency information part in the image can be extracted by utilizing the attention module; designing a multi-scale feature fusion module which can fuse the features of different convolutional layers; an inverted residual error module is designed, and the convolutional neural network based on the inverted residual error module successfully solves the problem that the training difficulty is increased along with the network deepening and outputs the characteristics of different scales; a global average pooling module is constructed that can minimize the over-fitting effect by reducing the number of parameters of the model.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A face recognition method based on attitude change of an attention mechanism is characterized in that: the method comprises the following steps:

2. The method of claim 1, wherein the face recognition method based on the attention mechanism comprises: the third step specifically comprises the following steps:

2.1: designing a double-sided attention mechanism module LA-SENEt: the attention mechanism module comprises: the LANet space attention module and the SENet channel attention module are used for highlighting feature information which is most distinctive of the human face under the posture change;

2.2: designing an inverted residual error structure: designing Bottleneck-attention of an inverted residual error structure based on the MobileNet V2, namely expanding the number of channels and then compressing for reducing the calculated amount;

2.3: constructing a multi-scale feature fusion module: the multi-scale feature fusion module is formed by connecting a feature fusion module and an up-down sampling module;

2.4: constructing a global average pooling module: the global average pooling module is composed of GDConv and Linear Conv;

2.5: the method comprises the steps of building an attention mechanism network based on a MobileFaceNet model as a main body, wherein the attention mechanism-based posture change face recognition system network consists of six parts, namely an input module, an attention mechanism module LA-SENet, an inverted residual error module Bottleneck-attention, a multi-scale feature fusion module, a global average pooling module and an output module.

3. The method for recognizing the human face with the posture change based on the attention mechanism as claimed in claim 1, wherein: the fourth step specifically comprises the following steps:

3.1: setting an activation function and a loss function, and estimating network parameters by using a difference value between a real image and an image after the convolution neural network based on an attention mechanism;

constructing SoftMax as the loss function of the invention, which adds a probability for each class, wherein the SoftMax function expression is as follows:

3.2: selecting an optimization function to carry out iterative training on the convolutional neural network based on the attention mechanism;

3.3: setting training parameters including learning rate, iterative Batch and Batch value important parameters;

3.4: the trained network was tested using the image dataset of lfw, cplfw, agendb _30, cfp as the test dataset of the present invention.

4. The method of claim 1, wherein the face recognition method based on the attention mechanism comprises: in the step 3.1: the attention mechanism module LA-SENEt is composed of a space attention mechanism LANet and a channel attention mechanism SENEt; the LANet is formed by two continuous 1 x 1 convolution kernels, and after each convolution, Relu and Sigmod are connected in series respectively; the SENET is composed of two continuous FC layers, and Relu and Sigmod are respectively connected in series after each FC layer.

5. The method of claim 1, wherein the face recognition method based on the attention mechanism comprises: in the step 3.2: the inverted residual module consists of different Stride structures.

6. The method of claim 1, wherein the face recognition method based on the attention mechanism comprises: in the step 4.2: the network is iteratively trained using the SGD algorithm.

7. The method of claim 1, wherein the face recognition method based on the attention mechanism comprises: in the step 4.3: the learning rate initial value is set to 0.1, the number of iterations is set to 25, and the Batch value is set to 512.