CN115240049A

CN115240049A - Deep learning model based on attention mechanism

Info

Publication number: CN115240049A
Application number: CN202210640395.6A
Authority: CN
Inventors: 李垚; 余南南
Original assignee: Jiangsu Normal University
Current assignee: Jiangsu Normal University
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-10-25

Abstract

The invention discloses an attention mechanism-based deep learning model, which comprises a data set, wherein the data set is divided into a training data set and a testing data set in proportion after being zoomed; the deep learning model consists of an encoder which is responsible for extracting the characteristic information and a decoder which is responsible for recovering the spatial information. The present invention extracts both local low-level features and global context features by combining a convolution operation with a self-attention mechanism. Meanwhile, a channel and a space attention module are introduced into the jump connection between the encoder and the decoder to inhibit irrelevant information, and useful information is utilized to the maximum extent, so that the segmentation precision is improved, and more accurate segmentation of the cerebral hematoma is realized.

Description

Deep learning model based on attention mechanism

Technical Field

The invention relates to the technical field of graph segmentation, in particular to a deep learning model based on an attention mechanism.

Background

With the wide application of deep learning technology in the field of medical image processing in recent years, a semantic segmentation model based on a convolutional neural network has excellent performance in segmenting a target lesion region. The feature extraction capability and the spatial information recovery capability of the model play an important role in the segmentation precision, and directly influence the final prediction result.

In the brain hematoma segmentation task, the hematoma in the brain CT image is a high-density region, but due to the complexity of the brain structure and the diversity of the morphology and location of the hematoma, it is extremely difficult to accurately and reliably segment the hematoma. The convolution operation has the limitation that only linear operation and local feature extraction can be carried out, and the segmentation task often has over-segmentation and under-segmentation problems, in particular to hematoma with irregular shape and hematoma close to skull. Accurate segmentation of cerebral hematomas using existing semantic segmentation models is therefore difficult to achieve.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a deep learning model based on an attention mechanism, which utilizes a self-attention mechanism, a channel attention mechanism and a space attention mechanism to improve the segmentation accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme:

the deep learning model based on the attention mechanism is characterized by comprising the following steps:

the data set is divided into a training data set and a testing data set in proportion after being zoomed;

and the deep learning model consists of an encoder responsible for extracting the characteristic information and a decoder responsible for recovering the spatial information.

Preferably, the deep learning model is specifically designed as follows:

(2.1) carrying out four times of downsampling maximum pooling operation by an encoder, wherein the encoder consists of four layers of residual error networks and a self-attention network, the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the image universe is extracted through high-order operation, and the hematoma features are extracted from the overall angle;

(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;

(2.3) jumping connection exists between the encoder and the decoder, the encoder features learn the weight of each channel and feature region through the channel attention module and the space attention module, useful feature information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.

Preferably, the self-attention network consists of a plurality of attention heads located at the base of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a link between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region may be influenced by any other pixel. The self-attention calculation formula is as follows:

preferably, the feature map X generated by the channel attention module from the input CT images through the convolutional layer is increased from a single channel to multiple channels. The information expressed by the feature map of each channel is different, and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationship between each channel and then multiply by the corresponding channel.

Preferably, the calculation formula of the channel attention coefficient is as follows:

Attention _C (X)＝σ(MLP(Avgpool(X))+MLP(Maxpool(X)))

where σ denotes the sigmoid activation function, MLP denotes the multi-layer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.

Preferably, the spatial attention module focuses on the information most meaningful to the current segmentation task, and the calculation formula of the spatial attention coefficient is as follows:

Attention _S (X)＝σ(f ^7×7 ([Avgpool(X)；Maxpool(X)]))

where σ denotes a sigmoid activation function, f ^7×7 Represents the convolution operation of 7 × 7, avgpool and Maxpool represent the global average pooling and the global maximum pooling, respectively.

Preferably, the ratio of training data set to test data set is 9.

The invention has the advantages that: the attention-based deep learning model provided by the invention simultaneously extracts local low-level features and global context features by combining the convolution operation with the self-attention mechanism. Meanwhile, a channel and a space attention module are introduced into the jump connection between the encoder and the decoder to inhibit irrelevant information, and useful information is utilized to the maximum extent, so that the segmentation precision is improved, and more accurate segmentation of the cerebral hematoma is realized.

Drawings

FIG. 1 is a schematic diagram of a deep learning model of the attention mechanism of the present invention;

FIG. 2 is a schematic diagram of a self-attention network structure according to the present invention;

FIG. 3 is a schematic structural diagram of a channel attention module according to the present invention;

FIG. 4 is a schematic structural diagram of a spatial attention module according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1 to 4, the deep learning model based on attention mechanism provided by the present invention includes:

the deep learning model consists of an encoder which is responsible for extracting the characteristic information and a decoder which is responsible for recovering the spatial information.

The method comprises the following specific steps:

(1) Acquiring a brain CT or brain MR image as a data set, and dividing the data set into a training data set and a test data set according to the proportion of 9;

(2) Constructing a deep learning model of an attention mechanism, wherein the deep learning model consists of an encoder responsible for extracting characteristic information and a decoder responsible for recovering spatial information;

(3) Taking the training set as input to carry out iterative training on the deep learning model to reach the loss convergence level;

(4) And (4) inputting the test set into the model trained in the step (3), and outputting a final prediction result.

The deep learning model in the step (2) is divided into an encoder part and a decoder part, and is specifically designed as follows:

and (2.3) jumping connection exists between the encoder and the decoder, the encoder characteristics learn the weight on each channel and characteristic region through the channel attention module and the space attention module, useful characteristic information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.

The deep learning model inputs a 512-by-512 single-channel original gray scale map, extracts local features through convolution layers in a residual error network, and doubles the number of channels of the feature map. The down-sampling is the maximum pooling operation, and the size of the feature map is reduced to half of the original size after each down-sampling operation. The terminal of the encoder adopts a self-attention network to expand the receptive field into a whole image, and further utilizes global context information. Each convolutional layer of the decoder is connected to the corresponding encoder layer through a channel and spatial attention module. The outputs from the channel and spatial attention modules are channel-spliced with the upsampled output of the previous layer of the decoder. The decoder doubles the size of the feature map and halves the number of channels to restore the spatial information of the image by up-sampling and convolution layers of bilinear interpolation layer by layer. The final result of upsampling is a single-channel feature map of the original input size, and then the segmentation result is output through a classification layer consisting of 3-by-3 convolution and sigmoid activation functions. All convolution layers had a composition of 2 (Conv 3+ BN + ReLU), where Conv 3+ 3 was 3+ 3 convolution, BN was batch normalization, and ReLU was a nonlinear activation function.

Self-attention network:

the self-attention network consists of a plurality of attention heads located at the bottom of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a relation between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region is influenced by any other pixel. The self-attention calculation formula is as follows:

as shown in FIG. 2, the inputs from the attention network are a profile X having a height H, a width W, and a number of channels C. To account for the global context information, the position coding information is added to X. Position coding is of great importance for cerebral haematoma segmentation, since different brain tissues are located at different fixed positions in the CT image, respectively. After position encoding, absolute and relative positional information between the brain tissues can be captured. Then, xreshape is divided into three one-dimensional vectors, namely a query matrix Q, a key matrix K and a value matrix V. A is the attention coefficient matrix, the actual meaning of which is the correlation of a given element in Q and all elements in K. And obtaining the final attention output according to the weighted average of the elements in the self-attention formula calculation value matrix, and outputting the feature map Y after reshape. In the brain hematoma segmentation task, the dimensions of three matrixes Q, K and V are the same, and the three matrixes are respectively the results of one-dimensional vectors which are embedded differently. The embedded matrices are denoted Wq, wk, and Wv.

A lane attention module:

as shown in fig. 3, the feature map X generated by the channel attention module from the input CT image passing through the convolutional layer is increased from single channel to multi-channel. The information expressed by the feature map of each channel is different, and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationship between each channel and then multiply by the corresponding channel.

The calculation formula of the channel attention coefficient is as follows:

Attention _C (X)＝σ(MLP(Avgpool(X))+MLP(Maxpool(X)))

where σ denotes sigmoid activation function, MLP denotes multilayer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.

Spatial attention module:

as shown in FIG. 4, the spatial attention module focuses on the information that is most meaningful to the current segmentation task. In a CT image, there are problems of blurred boundaries and low contrast in a cerebral hematoma region. Thus exploiting spatial attention over hopping connections improves the efficiency of aggregation of spatial information. The role of the spatial attention module is to model the priority relationship of spatial locations. As shown in fig. 4, in order to effectively learn the spatial attention weight, the spatial attention module performs dimension reduction on the input feature map X by using global average pooling and global maximum pooling operations, generates two feature maps for each spatial position, performs channel splicing on the two feature maps, performs dimension reduction by 7 × 7 convolution operation to obtain a single channel, obtains a weight coefficient by sigmoid function activation, and multiplies the weight coefficient by the input to obtain a final spatial attention output. The spatial attention coefficient is calculated as follows:

Attention _S (X)＝σ(f ^7×7 ([Avgpool(X)；Maxpool(X)]))

where σ denotes a sigmoid activation function, f ^7×7 The convolution operation of 7 × 7 is indicated, and Avgpool and Maxpool indicate global average pooling and global maximum pooling, respectively.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A deep learning model based on an attention mechanism, comprising:

2. The attention mechanism-based deep learning model of claim 1, wherein: the deep learning model is specifically designed as follows:

(2.1) the encoder performs four times of downsampling maximum pooling operation and consists of four layers of residual error networks and a self-attention network, wherein the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the whole domain of the image is extracted through high-order operation, and the hematoma feature is extracted from the global angle;

3. The attention mechanism-based deep learning model of claim 2, wherein: the self-attention network consists of a plurality of attention heads located at the bottom of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a relation between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region may be influenced by any other pixel. The self-attention calculation formula is as follows:

4. the attention mechanism-based deep learning model of claim 2, wherein: the feature map X generated by the channel attention module from the input CT image through the convolutional layer is increased from a single channel to multiple channels. The information expressed by the feature map for each channel is different and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationships between each channel and then multiply by the corresponding channel.

5. The attention mechanism-based deep learning model of claim 4, wherein: the calculation formula of the channel attention coefficient is as follows:

Attention _C (X)＝σ(MLP(Avgpool(X))+MLP(Maxpool(X)))

6. The attention-based deep learning model of any one of claims 2-5, wherein: the spatial attention module focuses on the information most meaningful for the current segmentation task, and the calculation formula of the spatial attention coefficient is as follows:

Attention _S (X)＝σ(f ^7×7 ([Avgpool(X)；Maxpool(X)]))

7. The attention mechanism-based deep learning model of any one of claims 2 to 5, wherein: the training data set and the test data set are in a ratio of 9.