CN115240049A - Deep learning model based on attention mechanism - Google Patents

Deep learning model based on attention mechanism Download PDF

Info

Publication number
CN115240049A
CN115240049A CN202210640395.6A CN202210640395A CN115240049A CN 115240049 A CN115240049 A CN 115240049A CN 202210640395 A CN202210640395 A CN 202210640395A CN 115240049 A CN115240049 A CN 115240049A
Authority
CN
China
Prior art keywords
attention
channel
deep learning
learning model
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210640395.6A
Other languages
Chinese (zh)
Inventor
李垚
余南南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Normal University
Original Assignee
Jiangsu Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Normal University filed Critical Jiangsu Normal University
Priority to CN202210640395.6A priority Critical patent/CN115240049A/en
Publication of CN115240049A publication Critical patent/CN115240049A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an attention mechanism-based deep learning model, which comprises a data set, wherein the data set is divided into a training data set and a testing data set in proportion after being zoomed; the deep learning model consists of an encoder which is responsible for extracting the characteristic information and a decoder which is responsible for recovering the spatial information. The present invention extracts both local low-level features and global context features by combining a convolution operation with a self-attention mechanism. Meanwhile, a channel and a space attention module are introduced into the jump connection between the encoder and the decoder to inhibit irrelevant information, and useful information is utilized to the maximum extent, so that the segmentation precision is improved, and more accurate segmentation of the cerebral hematoma is realized.

Description

Deep learning model based on attention mechanism
Technical Field
The invention relates to the technical field of graph segmentation, in particular to a deep learning model based on an attention mechanism.
Background
With the wide application of deep learning technology in the field of medical image processing in recent years, a semantic segmentation model based on a convolutional neural network has excellent performance in segmenting a target lesion region. The feature extraction capability and the spatial information recovery capability of the model play an important role in the segmentation precision, and directly influence the final prediction result.
In the brain hematoma segmentation task, the hematoma in the brain CT image is a high-density region, but due to the complexity of the brain structure and the diversity of the morphology and location of the hematoma, it is extremely difficult to accurately and reliably segment the hematoma. The convolution operation has the limitation that only linear operation and local feature extraction can be carried out, and the segmentation task often has over-segmentation and under-segmentation problems, in particular to hematoma with irregular shape and hematoma close to skull. Accurate segmentation of cerebral hematomas using existing semantic segmentation models is therefore difficult to achieve.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a deep learning model based on an attention mechanism, which utilizes a self-attention mechanism, a channel attention mechanism and a space attention mechanism to improve the segmentation accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme:
the deep learning model based on the attention mechanism is characterized by comprising the following steps:
the data set is divided into a training data set and a testing data set in proportion after being zoomed;
and the deep learning model consists of an encoder responsible for extracting the characteristic information and a decoder responsible for recovering the spatial information.
Preferably, the deep learning model is specifically designed as follows:
(2.1) carrying out four times of downsampling maximum pooling operation by an encoder, wherein the encoder consists of four layers of residual error networks and a self-attention network, the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the image universe is extracted through high-order operation, and the hematoma features are extracted from the overall angle;
(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;
(2.3) jumping connection exists between the encoder and the decoder, the encoder features learn the weight of each channel and feature region through the channel attention module and the space attention module, useful feature information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.
Preferably, the self-attention network consists of a plurality of attention heads located at the base of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a link between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region may be influenced by any other pixel. The self-attention calculation formula is as follows:
Figure BDA0003683720650000021
preferably, the feature map X generated by the channel attention module from the input CT images through the convolutional layer is increased from a single channel to multiple channels. The information expressed by the feature map of each channel is different, and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationship between each channel and then multiply by the corresponding channel.
Preferably, the calculation formula of the channel attention coefficient is as follows:
Attention C (X)=σ(MLP(Avgpool(X))+MLP(Maxpool(X)))
where σ denotes the sigmoid activation function, MLP denotes the multi-layer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.
Preferably, the spatial attention module focuses on the information most meaningful to the current segmentation task, and the calculation formula of the spatial attention coefficient is as follows:
Attention S (X)=σ(f 7×7 ([Avgpool(X);Maxpool(X)]))
where σ denotes a sigmoid activation function, f 7×7 Represents the convolution operation of 7 × 7, avgpool and Maxpool represent the global average pooling and the global maximum pooling, respectively.
Preferably, the ratio of training data set to test data set is 9.
The invention has the advantages that: the attention-based deep learning model provided by the invention simultaneously extracts local low-level features and global context features by combining the convolution operation with the self-attention mechanism. Meanwhile, a channel and a space attention module are introduced into the jump connection between the encoder and the decoder to inhibit irrelevant information, and useful information is utilized to the maximum extent, so that the segmentation precision is improved, and more accurate segmentation of the cerebral hematoma is realized.
Drawings
FIG. 1 is a schematic diagram of a deep learning model of the attention mechanism of the present invention;
FIG. 2 is a schematic diagram of a self-attention network structure according to the present invention;
FIG. 3 is a schematic structural diagram of a channel attention module according to the present invention;
FIG. 4 is a schematic structural diagram of a spatial attention module according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1 to 4, the deep learning model based on attention mechanism provided by the present invention includes:
the data set is divided into a training data set and a testing data set in proportion after being zoomed;
the deep learning model consists of an encoder which is responsible for extracting the characteristic information and a decoder which is responsible for recovering the spatial information.
The method comprises the following specific steps:
(1) Acquiring a brain CT or brain MR image as a data set, and dividing the data set into a training data set and a test data set according to the proportion of 9;
(2) Constructing a deep learning model of an attention mechanism, wherein the deep learning model consists of an encoder responsible for extracting characteristic information and a decoder responsible for recovering spatial information;
(3) Taking the training set as input to carry out iterative training on the deep learning model to reach the loss convergence level;
(4) And (4) inputting the test set into the model trained in the step (3), and outputting a final prediction result.
The deep learning model in the step (2) is divided into an encoder part and a decoder part, and is specifically designed as follows:
(2.1) carrying out four times of downsampling maximum pooling operation by an encoder, wherein the encoder consists of four layers of residual error networks and a self-attention network, the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the image universe is extracted through high-order operation, and the hematoma features are extracted from the overall angle;
(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;
and (2.3) jumping connection exists between the encoder and the decoder, the encoder characteristics learn the weight on each channel and characteristic region through the channel attention module and the space attention module, useful characteristic information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.
The deep learning model inputs a 512-by-512 single-channel original gray scale map, extracts local features through convolution layers in a residual error network, and doubles the number of channels of the feature map. The down-sampling is the maximum pooling operation, and the size of the feature map is reduced to half of the original size after each down-sampling operation. The terminal of the encoder adopts a self-attention network to expand the receptive field into a whole image, and further utilizes global context information. Each convolutional layer of the decoder is connected to the corresponding encoder layer through a channel and spatial attention module. The outputs from the channel and spatial attention modules are channel-spliced with the upsampled output of the previous layer of the decoder. The decoder doubles the size of the feature map and halves the number of channels to restore the spatial information of the image by up-sampling and convolution layers of bilinear interpolation layer by layer. The final result of upsampling is a single-channel feature map of the original input size, and then the segmentation result is output through a classification layer consisting of 3-by-3 convolution and sigmoid activation functions. All convolution layers had a composition of 2 (Conv 3+ BN + ReLU), where Conv 3+ 3 was 3+ 3 convolution, BN was batch normalization, and ReLU was a nonlinear activation function.
Self-attention network:
the self-attention network consists of a plurality of attention heads located at the bottom of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a relation between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region is influenced by any other pixel. The self-attention calculation formula is as follows:
Figure BDA0003683720650000051
as shown in FIG. 2, the inputs from the attention network are a profile X having a height H, a width W, and a number of channels C. To account for the global context information, the position coding information is added to X. Position coding is of great importance for cerebral haematoma segmentation, since different brain tissues are located at different fixed positions in the CT image, respectively. After position encoding, absolute and relative positional information between the brain tissues can be captured. Then, xreshape is divided into three one-dimensional vectors, namely a query matrix Q, a key matrix K and a value matrix V. A is the attention coefficient matrix, the actual meaning of which is the correlation of a given element in Q and all elements in K. And obtaining the final attention output according to the weighted average of the elements in the self-attention formula calculation value matrix, and outputting the feature map Y after reshape. In the brain hematoma segmentation task, the dimensions of three matrixes Q, K and V are the same, and the three matrixes are respectively the results of one-dimensional vectors which are embedded differently. The embedded matrices are denoted Wq, wk, and Wv.
A lane attention module:
as shown in fig. 3, the feature map X generated by the channel attention module from the input CT image passing through the convolutional layer is increased from single channel to multi-channel. The information expressed by the feature map of each channel is different, and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationship between each channel and then multiply by the corresponding channel.
The calculation formula of the channel attention coefficient is as follows:
Attention C (X)=σ(MLP(Avgpool(X))+MLP(Maxpool(X)))
where σ denotes sigmoid activation function, MLP denotes multilayer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.
Spatial attention module:
as shown in FIG. 4, the spatial attention module focuses on the information that is most meaningful to the current segmentation task. In a CT image, there are problems of blurred boundaries and low contrast in a cerebral hematoma region. Thus exploiting spatial attention over hopping connections improves the efficiency of aggregation of spatial information. The role of the spatial attention module is to model the priority relationship of spatial locations. As shown in fig. 4, in order to effectively learn the spatial attention weight, the spatial attention module performs dimension reduction on the input feature map X by using global average pooling and global maximum pooling operations, generates two feature maps for each spatial position, performs channel splicing on the two feature maps, performs dimension reduction by 7 × 7 convolution operation to obtain a single channel, obtains a weight coefficient by sigmoid function activation, and multiplies the weight coefficient by the input to obtain a final spatial attention output. The spatial attention coefficient is calculated as follows:
Attention S (X)=σ(f 7×7 ([Avgpool(X);Maxpool(X)]))
where σ denotes a sigmoid activation function, f 7×7 The convolution operation of 7 × 7 is indicated, and Avgpool and Maxpool indicate global average pooling and global maximum pooling, respectively.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A deep learning model based on an attention mechanism, comprising:
the data set is divided into a training data set and a testing data set in proportion after being zoomed;
and the deep learning model consists of an encoder responsible for extracting the characteristic information and a decoder responsible for recovering the spatial information.
2. The attention mechanism-based deep learning model of claim 1, wherein: the deep learning model is specifically designed as follows:
(2.1) the encoder performs four times of downsampling maximum pooling operation and consists of four layers of residual error networks and a self-attention network, wherein the residual error networks are used for avoiding the gradient disappearance problem and improving the propagation rate of the features, the self-attention network is positioned at the bottom of the encoder, the feature association characteristic of the whole domain of the image is extracted through high-order operation, and the hematoma feature is extracted from the global angle;
(2.2) the decoder performs four times of up-sampling bilinear interpolation operation, and the decoder consists of four convolutional layers and a final classification layer, wherein each convolutional layer comprises two groups of 3 × 3 convolutions, batch Normalization and a nonlinear activation function ReLU, and the classification layer consists of 3 × 3 convolutions and a sigmoid activation function;
(2.3) jumping connection exists between the encoder and the decoder, the encoder features learn the weight of each channel and feature region through the channel attention module and the space attention module, useful feature information is highlighted through the weight, and channel splicing is carried out with the up-sampling output of the previous layer of the decoder after irrelevant information is suppressed so as to improve the spatial information recovery efficiency.
3. The attention mechanism-based deep learning model of claim 2, wherein: the self-attention network consists of a plurality of attention heads located at the bottom of the encoder. The main function is to obtain the receptive field of the whole input image by establishing a relation between each pixel in the high-level feature map. The classification decision for a particular pixel of the input image when segmenting the brain haematoma region may be influenced by any other pixel. The self-attention calculation formula is as follows:
Figure FDA0003683720640000011
4. the attention mechanism-based deep learning model of claim 2, wherein: the feature map X generated by the channel attention module from the input CT image through the convolutional layer is increased from a single channel to multiple channels. The information expressed by the feature map for each channel is different and valid feature information may only appear in a particular channel. The role of the channel attention module is to learn weights using the relationships between each channel and then multiply by the corresponding channel.
5. The attention mechanism-based deep learning model of claim 4, wherein: the calculation formula of the channel attention coefficient is as follows:
Attention C (X)=σ(MLP(Avgpool(X))+MLP(Maxpool(X)))
where σ denotes the sigmoid activation function, MLP denotes the multi-layer perceptron, avgpool and Maxpool denote global average pooling and global maximum pooling, respectively.
6. The attention-based deep learning model of any one of claims 2-5, wherein: the spatial attention module focuses on the information most meaningful for the current segmentation task, and the calculation formula of the spatial attention coefficient is as follows:
Attention S (X)=σ(f 7×7 ([Avgpool(X);Maxpool(X)]))
where σ denotes a sigmoid activation function, f 7×7 Represents the convolution operation of 7 × 7, avgpool and Maxpool represent the global average pooling and the global maximum pooling, respectively.
7. The attention mechanism-based deep learning model of any one of claims 2 to 5, wherein: the training data set and the test data set are in a ratio of 9.
CN202210640395.6A 2022-06-08 2022-06-08 Deep learning model based on attention mechanism Pending CN115240049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210640395.6A CN115240049A (en) 2022-06-08 2022-06-08 Deep learning model based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210640395.6A CN115240049A (en) 2022-06-08 2022-06-08 Deep learning model based on attention mechanism

Publications (1)

Publication Number Publication Date
CN115240049A true CN115240049A (en) 2022-10-25

Family

ID=83668773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210640395.6A Pending CN115240049A (en) 2022-06-08 2022-06-08 Deep learning model based on attention mechanism

Country Status (1)

Country Link
CN (1) CN115240049A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424023A (en) * 2022-11-07 2022-12-02 北京精诊医疗科技有限公司 Self-attention mechanism module for enhancing small target segmentation performance
CN115578404A (en) * 2022-11-14 2023-01-06 南昌航空大学 Liver tumor image enhancement and segmentation method based on deep learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424023A (en) * 2022-11-07 2022-12-02 北京精诊医疗科技有限公司 Self-attention mechanism module for enhancing small target segmentation performance
CN115424023B (en) * 2022-11-07 2023-04-18 北京精诊医疗科技有限公司 Self-attention method for enhancing small target segmentation performance
CN115578404A (en) * 2022-11-14 2023-01-06 南昌航空大学 Liver tumor image enhancement and segmentation method based on deep learning

Similar Documents

Publication Publication Date Title
CN110782462B (en) Semantic segmentation method based on double-flow feature fusion
CN108830813B (en) Knowledge distillation-based image super-resolution enhancement method
CN111126453B (en) Fine-grained image classification method and system based on attention mechanism and cut filling
CN110728682B (en) Semantic segmentation method based on residual pyramid pooling neural network
CN115240049A (en) Deep learning model based on attention mechanism
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN112258526B (en) CT kidney region cascade segmentation method based on dual attention mechanism
CN113743269B (en) Method for recognizing human body gesture of video in lightweight manner
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN117058160B (en) Three-dimensional medical image segmentation method and system based on self-adaptive feature fusion network
CN112819876A (en) Monocular vision depth estimation method based on deep learning
CN116469100A (en) Dual-band image semantic segmentation method based on Transformer
CN115393289A (en) Tumor image semi-supervised segmentation method based on integrated cross pseudo label
CN115330813A (en) Image processing method, device and equipment and readable storage medium
CN116468732A (en) Lung CT image segmentation method and imaging method based on deep learning
CN111242839B (en) Image scaling and clipping method based on scale level
CN117593199A (en) Double-flow remote sensing image fusion method based on Gaussian prior distribution self-attention
CN116385265B (en) Training method and device for image super-resolution network
CN117218508A (en) Ball screw fault diagnosis method based on channel parallel fusion multi-attention mechanism
CN116934820A (en) Cross-attention-based multi-size window Transformer network cloth image registration method and system
CN116311052A (en) Crowd counting method and device, electronic equipment and storage medium
CN114529564A (en) Lightweight infant brain tissue image segmentation method based on context information
CN113762241A (en) Training method of scene character recognition model, recognition method and device
CN114022363A (en) Image super-resolution reconstruction method, device and computer-readable storage medium
CN117456560B (en) Pedestrian re-identification method based on foreground perception dynamic part learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination