CN115240049A - Deep learning model based on attention mechanism - Google Patents
Deep learning model based on attention mechanism Download PDFInfo
- Publication number
- CN115240049A CN115240049A CN202210640395.6A CN202210640395A CN115240049A CN 115240049 A CN115240049 A CN 115240049A CN 202210640395 A CN202210640395 A CN 202210640395A CN 115240049 A CN115240049 A CN 115240049A
- Authority
- CN
- China
- Prior art keywords
- attention
- channel
- deep learning
- learning model
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an attention mechanism-based deep learning model, which comprises a data set, wherein the data set is divided into a training data set and a testing data set in proportion after being zoomed; the deep learning model consists of an encoder which is responsible for extracting the characteristic information and a decoder which is responsible for recovering the spatial information. The present invention extracts both local low-level features and global context features by combining a convolution operation with a self-attention mechanism. Meanwhile, a channel and a space attention module are introduced into the jump connection between the encoder and the decoder to inhibit irrelevant information, and useful information is utilized to the maximum extent, so that the segmentation precision is improved, and more accurate segmentation of the cerebral hematoma is realized.
Description
Technical Field
The invention relates to the technical field of graph segmentation, in particular to a deep learning model based on an attention mechanism.
Background
With the wide application of deep learning technology in the field of medical image processing in recent years, a semantic segmentation model based on a convolutional neural network has excellent performance in segmenting a target lesion region. The feature extraction capability and the spatial information recovery capability of the model play an important role in the segmentation precision, and directly influence the final prediction result.
In the brain hematoma segmentation task, the hematoma in the brain CT image is a high-density region, but due to the complexity of the brain structure and the diversity of the morphology and location of the hematoma, it is extremely difficult to accurately and reliably segment the hematoma. The convolution operation has the limitation that only linear operation and local feature extraction can be carried out, and the segmentation task often has over-segmentation and under-segmentation problems, in particular to hematoma with irregular shape and hematoma close to skull. Accurate segmentation of cerebral hematomas using existing semantic segmentation models is therefore difficult to achieve.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a deep learning model based on an attention mechanism, which utilizes a self-attention mechanism, a channel attention mechanism and a space attention mechanism to improve the segmentation accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme:
the deep learning model based on the attention mechanism is characterized by comprising the following steps:
the data set is divided into a training data set and a testing data set in proportion after being zoomed;
and the deep learning model consists of an encoder responsible for extracting the characteristic information and a decoder responsible for recovering the spatial information.
Preferably, the deep learning model is specifically designed as follows:
(2.1) carrying out four times of downsampling maximum pooling operation by an encoder, wherein the encoder consists of four layers of residual error networks and a self-attention network, the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the image universe is extracted through high-order operation, and the hematoma features are extracted from the overall angle;
(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;
(2.3) jumping connection exists between the encoder and the decoder, the encoder features learn the weight of each channel and feature region through the channel attention module and the space attention module, useful feature information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.
Preferably, the self-attention network consists of a plurality of attention heads located at the base of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a link between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region may be influenced by any other pixel. The self-attention calculation formula is as follows:
preferably, the feature map X generated by the channel attention module from the input CT images through the convolutional layer is increased from a single channel to multiple channels. The information expressed by the feature map of each channel is different, and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationship between each channel and then multiply by the corresponding channel.
Preferably, the calculation formula of the channel attention coefficient is as follows:
Attention C (X)=σ(MLP(Avgpool(X))+MLP(Maxpool(X)))
where σ denotes the sigmoid activation function, MLP denotes the multi-layer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.
Preferably, the spatial attention module focuses on the information most meaningful to the current segmentation task, and the calculation formula of the spatial attention coefficient is as follows:
Attention S (X)=σ(f 7×7 ([Avgpool(X);Maxpool(X)]))
where σ denotes a sigmoid activation function, f 7×7 Represents the convolution operation of 7 × 7, avgpool and Maxpool represent the global average pooling and the global maximum pooling, respectively.
Preferably, the ratio of training data set to test data set is 9.
The invention has the advantages that: the attention-based deep learning model provided by the invention simultaneously extracts local low-level features and global context features by combining the convolution operation with the self-attention mechanism. Meanwhile, a channel and a space attention module are introduced into the jump connection between the encoder and the decoder to inhibit irrelevant information, and useful information is utilized to the maximum extent, so that the segmentation precision is improved, and more accurate segmentation of the cerebral hematoma is realized.
Drawings
FIG. 1 is a schematic diagram of a deep learning model of the attention mechanism of the present invention;
FIG. 2 is a schematic diagram of a self-attention network structure according to the present invention;
FIG. 3 is a schematic structural diagram of a channel attention module according to the present invention;
FIG. 4 is a schematic structural diagram of a spatial attention module according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1 to 4, the deep learning model based on attention mechanism provided by the present invention includes:
the data set is divided into a training data set and a testing data set in proportion after being zoomed;
the deep learning model consists of an encoder which is responsible for extracting the characteristic information and a decoder which is responsible for recovering the spatial information.
The method comprises the following specific steps:
(1) Acquiring a brain CT or brain MR image as a data set, and dividing the data set into a training data set and a test data set according to the proportion of 9;
(2) Constructing a deep learning model of an attention mechanism, wherein the deep learning model consists of an encoder responsible for extracting characteristic information and a decoder responsible for recovering spatial information;
(3) Taking the training set as input to carry out iterative training on the deep learning model to reach the loss convergence level;
(4) And (4) inputting the test set into the model trained in the step (3), and outputting a final prediction result.
The deep learning model in the step (2) is divided into an encoder part and a decoder part, and is specifically designed as follows:
(2.1) carrying out four times of downsampling maximum pooling operation by an encoder, wherein the encoder consists of four layers of residual error networks and a self-attention network, the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the image universe is extracted through high-order operation, and the hematoma features are extracted from the overall angle;
(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;
and (2.3) jumping connection exists between the encoder and the decoder, the encoder characteristics learn the weight on each channel and characteristic region through the channel attention module and the space attention module, useful characteristic information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.
The deep learning model inputs a 512-by-512 single-channel original gray scale map, extracts local features through convolution layers in a residual error network, and doubles the number of channels of the feature map. The down-sampling is the maximum pooling operation, and the size of the feature map is reduced to half of the original size after each down-sampling operation. The terminal of the encoder adopts a self-attention network to expand the receptive field into a whole image, and further utilizes global context information. Each convolutional layer of the decoder is connected to the corresponding encoder layer through a channel and spatial attention module. The outputs from the channel and spatial attention modules are channel-spliced with the upsampled output of the previous layer of the decoder. The decoder doubles the size of the feature map and halves the number of channels to restore the spatial information of the image by up-sampling and convolution layers of bilinear interpolation layer by layer. The final result of upsampling is a single-channel feature map of the original input size, and then the segmentation result is output through a classification layer consisting of 3-by-3 convolution and sigmoid activation functions. All convolution layers had a composition of 2 (Conv 3+ BN + ReLU), where Conv 3+ 3 was 3+ 3 convolution, BN was batch normalization, and ReLU was a nonlinear activation function.
Self-attention network:
the self-attention network consists of a plurality of attention heads located at the bottom of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a relation between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region is influenced by any other pixel. The self-attention calculation formula is as follows:
as shown in FIG. 2, the inputs from the attention network are a profile X having a height H, a width W, and a number of channels C. To account for the global context information, the position coding information is added to X. Position coding is of great importance for cerebral haematoma segmentation, since different brain tissues are located at different fixed positions in the CT image, respectively. After position encoding, absolute and relative positional information between the brain tissues can be captured. Then, xreshape is divided into three one-dimensional vectors, namely a query matrix Q, a key matrix K and a value matrix V. A is the attention coefficient matrix, the actual meaning of which is the correlation of a given element in Q and all elements in K. And obtaining the final attention output according to the weighted average of the elements in the self-attention formula calculation value matrix, and outputting the feature map Y after reshape. In the brain hematoma segmentation task, the dimensions of three matrixes Q, K and V are the same, and the three matrixes are respectively the results of one-dimensional vectors which are embedded differently. The embedded matrices are denoted Wq, wk, and Wv.
A lane attention module:
as shown in fig. 3, the feature map X generated by the channel attention module from the input CT image passing through the convolutional layer is increased from single channel to multi-channel. The information expressed by the feature map of each channel is different, and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationship between each channel and then multiply by the corresponding channel.
The calculation formula of the channel attention coefficient is as follows:
Attention C (X)=σ(MLP(Avgpool(X))+MLP(Maxpool(X)))
where σ denotes sigmoid activation function, MLP denotes multilayer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.
Spatial attention module:
as shown in FIG. 4, the spatial attention module focuses on the information that is most meaningful to the current segmentation task. In a CT image, there are problems of blurred boundaries and low contrast in a cerebral hematoma region. Thus exploiting spatial attention over hopping connections improves the efficiency of aggregation of spatial information. The role of the spatial attention module is to model the priority relationship of spatial locations. As shown in fig. 4, in order to effectively learn the spatial attention weight, the spatial attention module performs dimension reduction on the input feature map X by using global average pooling and global maximum pooling operations, generates two feature maps for each spatial position, performs channel splicing on the two feature maps, performs dimension reduction by 7 × 7 convolution operation to obtain a single channel, obtains a weight coefficient by sigmoid function activation, and multiplies the weight coefficient by the input to obtain a final spatial attention output. The spatial attention coefficient is calculated as follows:
Attention S (X)=σ(f 7×7 ([Avgpool(X);Maxpool(X)]))
where σ denotes a sigmoid activation function, f 7×7 The convolution operation of 7 × 7 is indicated, and Avgpool and Maxpool indicate global average pooling and global maximum pooling, respectively.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. A deep learning model based on an attention mechanism, comprising:
the data set is divided into a training data set and a testing data set in proportion after being zoomed;
and the deep learning model consists of an encoder responsible for extracting the characteristic information and a decoder responsible for recovering the spatial information.
2. The attention mechanism-based deep learning model of claim 1, wherein: the deep learning model is specifically designed as follows:
(2.1) the encoder performs four times of downsampling maximum pooling operation and consists of four layers of residual error networks and a self-attention network, wherein the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the whole domain of the image is extracted through high-order operation, and the hematoma feature is extracted from the global angle;
(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;
(2.3) jumping connection exists between the encoder and the decoder, the encoder features learn the weight of each channel and feature region through the channel attention module and the space attention module, useful feature information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.
3. The attention mechanism-based deep learning model of claim 2, wherein: the self-attention network consists of a plurality of attention heads located at the bottom of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a relation between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region may be influenced by any other pixel. The self-attention calculation formula is as follows:
4. the attention mechanism-based deep learning model of claim 2, wherein: the feature map X generated by the channel attention module from the input CT image through the convolutional layer is increased from a single channel to multiple channels. The information expressed by the feature map for each channel is different and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationships between each channel and then multiply by the corresponding channel.
5. The attention mechanism-based deep learning model of claim 4, wherein: the calculation formula of the channel attention coefficient is as follows:
Attention C (X)=σ(MLP(Avgpool(X))+MLP(Maxpool(X)))
where σ denotes the sigmoid activation function, MLP denotes the multi-layer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.
6. The attention-based deep learning model of any one of claims 2-5, wherein: the spatial attention module focuses on the information most meaningful for the current segmentation task, and the calculation formula of the spatial attention coefficient is as follows:
Attention S (X)=σ(f 7×7 ([Avgpool(X);Maxpool(X)]))
where σ denotes a sigmoid activation function, f 7×7 Represents the convolution operation of 7 × 7, avgpool and Maxpool represent the global average pooling and the global maximum pooling, respectively.
7. The attention mechanism-based deep learning model of any one of claims 2 to 5, wherein: the training data set and the test data set are in a ratio of 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210640395.6A CN115240049A (en) | 2022-06-08 | 2022-06-08 | Deep learning model based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210640395.6A CN115240049A (en) | 2022-06-08 | 2022-06-08 | Deep learning model based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115240049A true CN115240049A (en) | 2022-10-25 |
Family
ID=83668773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210640395.6A Pending CN115240049A (en) | 2022-06-08 | 2022-06-08 | Deep learning model based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115240049A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115424023A (en) * | 2022-11-07 | 2022-12-02 | 北京精诊医疗科技有限公司 | Self-attention mechanism module for enhancing small target segmentation performance |
CN115578404A (en) * | 2022-11-14 | 2023-01-06 | 南昌航空大学 | Liver tumor image enhancement and segmentation method based on deep learning |
-
2022
- 2022-06-08 CN CN202210640395.6A patent/CN115240049A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115424023A (en) * | 2022-11-07 | 2022-12-02 | 北京精诊医疗科技有限公司 | Self-attention mechanism module for enhancing small target segmentation performance |
CN115424023B (en) * | 2022-11-07 | 2023-04-18 | 北京精诊医疗科技有限公司 | Self-attention method for enhancing small target segmentation performance |
CN115578404A (en) * | 2022-11-14 | 2023-01-06 | 南昌航空大学 | Liver tumor image enhancement and segmentation method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110782462B (en) | Semantic segmentation method based on double-flow feature fusion | |
CN108830813B (en) | Knowledge distillation-based image super-resolution enhancement method | |
CN111126453B (en) | Fine-grained image classification method and system based on attention mechanism and cut filling | |
CN110728682B (en) | Semantic segmentation method based on residual pyramid pooling neural network | |
CN115240049A (en) | Deep learning model based on attention mechanism | |
CN112396607B (en) | Deformable convolution fusion enhanced street view image semantic segmentation method | |
CN112258526B (en) | CT kidney region cascade segmentation method based on dual attention mechanism | |
CN113743269B (en) | Method for recognizing human body gesture of video in lightweight manner | |
CN113870335A (en) | Monocular depth estimation method based on multi-scale feature fusion | |
CN117058160B (en) | Three-dimensional medical image segmentation method and system based on self-adaptive feature fusion network | |
CN112819876A (en) | Monocular vision depth estimation method based on deep learning | |
CN116469100A (en) | Dual-band image semantic segmentation method based on Transformer | |
CN115393289A (en) | Tumor image semi-supervised segmentation method based on integrated cross pseudo label | |
CN115330813A (en) | Image processing method, device and equipment and readable storage medium | |
CN116468732A (en) | Lung CT image segmentation method and imaging method based on deep learning | |
CN111242839B (en) | Image scaling and clipping method based on scale level | |
CN117593199A (en) | Double-flow remote sensing image fusion method based on Gaussian prior distribution self-attention | |
CN116385265B (en) | Training method and device for image super-resolution network | |
CN117218508A (en) | Ball screw fault diagnosis method based on channel parallel fusion multi-attention mechanism | |
CN116934820A (en) | Cross-attention-based multi-size window Transformer network cloth image registration method and system | |
CN116311052A (en) | Crowd counting method and device, electronic equipment and storage medium | |
CN114529564A (en) | Lightweight infant brain tissue image segmentation method based on context information | |
CN113762241A (en) | Training method of scene character recognition model, recognition method and device | |
CN114022363A (en) | Image super-resolution reconstruction method, device and computer-readable storage medium | |
CN117456560B (en) | Pedestrian re-identification method based on foreground perception dynamic part learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |