CN115424023B - Self-attention method for enhancing small target segmentation performance - Google Patents
Self-attention method for enhancing small target segmentation performance Download PDFInfo
- Publication number
- CN115424023B CN115424023B CN202211381902.5A CN202211381902A CN115424023B CN 115424023 B CN115424023 B CN 115424023B CN 202211381902 A CN202211381902 A CN 202211381902A CN 115424023 B CN115424023 B CN 115424023B
- Authority
- CN
- China
- Prior art keywords
- phase
- characteristic
- channel
- characteristic diagram
- expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a self-attention mechanism module for enhancing small target segmentation performance, which comprises the following steps: dividing the multi-phase characteristic diagram X into C branches, inputting the C branches into the self-attention mechanism module, wherein each branch has the same structure, the characteristic diagram input by each branch is named as an ith-phase characteristic diagram, the ith-phase characteristic diagram is subjected to single-phase attention enhancement unit to obtain ith characteristic expression, the 1 st characteristic table is connected with the C characteristic expression to obtain final characteristic expression, and the final characteristic expression is H multiplied by W multiplied by D multiplied by C. The channels containing the small target feature information can be weighted, so that the small target segmentation capability is enhanced.
Description
Technical Field
The invention belongs to the technical field of deep learning to medical image classification, and relates to a self-attention method for enhancing small target segmentation performance.
Background
An Attention Mechanism (Attention Mechanism) can help the model to endow different weights to each input part, extract more key and important information, enable the model to make more accurate judgment, and meanwhile, can not bring larger expenses to the calculation and storage of the model. The attention mechanism is simple and can endow the model with stronger discrimination capability, and the current deep learning neural network structure generally comprises the attention mechanism.
The attention mechanism is helpful for improving the feature expression of a small target segmentation network, namely, paying attention to essential features and inhibiting unnecessary features, and the convolution block integration attention mechanism can effectively improve the performance of computer vision tasks such as image classification, target segmentation and example segmentation. In the detection algorithm based on the spatial attention mechanism, because the Ratio (RBI) of the small object Bounding box Area to the Image Area is between 0.08% and 0.58%, the edge features are blurred or even lost, the resolution and the available feature information are limited, and the small target feature information is lost due to the successive multi-layer downsampling convolution, so that the performance of the target segmentation algorithm based on the spatial attention mechanism is limited.
Based on the defects in the prior art, in order to capture the position of a tiny object and sense the global space structure of the tiny object, the invention provides a self-attention method for enhancing the small object segmentation performance on the premise of not increasing the calculation complexity.
Disclosure of Invention
The invention aims to provide a self-attention method for enhancing the small target segmentation performance, which can enhance the small target segmentation performance.
The technical scheme adopted by the invention is that,
a self-attention method for enhancing small object segmentation performance, said self-attention mechanism module comprising the steps of:
inputting a multi-phase characteristic diagram X into the self-attention mechanism module, wherein the multi-phase characteristic diagram X belongs to R C×H×W×D Wherein C, H, W and D respectively represent the number of channels, space height, space width and space depth of the multi-phase; dividing the multi-phase characteristic diagram X into C branches when inputting, wherein the branches are named as a phase 1 branch and a phase 2 branch in sequence, and so on, and the last branch is a phase C branch; each branch has the same structure, and the characteristic diagram input by each branch is named as the i-th phase characteristic diagram X i ∈R H×W×D And i is more than or equal to 2 and less than or equal to C, the phase i characteristic diagram obtains phase i characteristic expressions through a single-phase attention enhancing unit, each characteristic expression is H multiplied by W multiplied by D, the phase 1 characteristic expression is achieved through the phase C characteristic expression, and the phase I characteristic expression and the phase C characteristic expression are spliced together by concat to obtain final characteristic expression, and the final characteristic expression is H multiplied by W multiplied by D multiplied by C.
Further, the single-phase attention-enhancing unit comprises the following steps:
A. the ith phase feature map X i ∈R H×W×D Firstly, performing convolution block operation, dividing an operation result into D branches along a channel dimension D of the phase, and sequentially inputting each branch into 2 standard convolution layers with convolution kernel size k =1 to obtain an i-th phase standard convolution characteristic diagram;
B. inputting the standard convolution characteristic diagram of the phase i into a global average pooling GAP module to obtain a compressed characteristic diagram x of the phase i, wherein x belongs to R 1×D ,x=[x 1 ,x 2 ,...,x D ]Wherein the characteristic x i Gradually capturing specific feature responses during the training process;
C. compression profile x for the first D-1 branches 1 ~x D-1 The weight is normalized in a dot product mode to obtain a channel autocorrelation weight matrix X T The process can be expressed as follows:
the normalization weight uses a sigmoid activation function, and the process can be expressed as the following equation (2):
a,b,...,D-1=σ(W 2 δ(W 1 {x 1 ,x 2 ,...,x D-1 })) (2)
wherein: a, b, D-1 respectively represent x 1 ,x 2 ,...,x D-1 Sigma is a sigmoid activation function, and delta represents an activation function ReLU; w 1 And W 2 Two one-dimensional convolution layers;
D. the channel autocorrelation weight matrix X will be obtained by the following equation T And x D Channel adaptive weight X obtained by dot product s :
X s =X T ·x D
E. In order to capture the long-distance dependence relationship of the channel and obtain effective small target semantic feature representation, the channel is subjected to self-adaptive weight X s And given phase i profile X i ∈R H×W×D Multiplication:
X D =X s ×X i
wherein X D H.times.Wtimes.D is a characteristic expression of phase i.
Drawings
FIG. 1 is a schematic diagram of the overall structure of a self-attention mechanism module according to the present invention;
FIG. 2 is a schematic diagram of a single-phase enhanced attention unit according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
A self-attention method for enhancing small object segmentation performance, as shown in fig. 1, the self-attention mechanism module includes the following steps:
inputting a multi-phase characteristic diagram X into the self-attention mechanism module, wherein the multi-phase characteristic diagram X belongs to R C×H×W×D Wherein C, H, W and D respectively represent the number of channels, space height, space width and space depth of the multi-phase; when the multi-phase characteristic diagram X is input, dividing the multi-phase characteristic diagram into C branches, wherein the branches are named as a phase 1 branch and a phase 2 branch in sequence, and so on, and the last branch is a phase C branch; each branch has the same structure, and the characteristic diagram input by each branch is named as the i-th phase characteristic diagram X i ∈R H×W×D And i is more than or equal to 2 and less than or equal to C, the phase i characteristic diagram obtains phase i characteristic expressions through a single-phase attention enhancing unit, each characteristic expression is H multiplied by W multiplied by D, the phase 1 characteristic expression is achieved through the phase C characteristic expression, and the phase I characteristic expression is spliced by concat to obtain final characteristic expression, and the characteristic expression is H multiplied by W multiplied by D multiplied by C.
Further, as shown in fig. 2, the single-phase enhanced attention unit includes the following steps:
A. the channel weight calculation formula (1) of the single-phase enhanced attention unit is as follows:
wherein: omega represents the weight of the channel, sigma is a sigmoid activation function, and F is convolution operation;
as can be seen from fig. 2, for each single phase, i.e. for the ith phase, the spatial depth D of the multi-phase feature map is the channel dimension of the phase; the ith phase feature map X i ∈R H×W×D Firstly, performing convolution block operation, dividing an operation result into D branches along a channel dimension D of the phase, and sequentially inputting each branch into 2 standard convolution layers with convolution kernel size k =1 to obtain an i-th phase standard convolution characteristic diagram;
B. inputting the standard convolution characteristic diagram of the phase i into a global average pooling GAP module to obtain a compressed characteristic diagram x of the phase i, wherein x belongs to R 1×D ,x=[x 1 ,x 2 ,...,x D ]Wherein the characteristic x i Gradually capturing specific characteristic response in the training process;
C. compressed feature map x of the first D-1 branches 1 ~x D-1 The weight is normalized in a dot product mode to obtain a channel autocorrelation weight matrix X T The process can be expressed as follows:
the normalization weight uses a sigmoid activation function, and the process can be expressed as the following equation (2):
a,b,...,D-1=σ(W 2 δ(W 1 {x 1 ,x 2 ,...,x D-1 })) (2)
wherein: a, b, D-1 respectively represent x 1 ,x 2 ,...,x D-1 Sigma is a sigmoid activation function, and delta represents an activation function ReLU; w 1 And W 2 Two one-dimensional convolution layers;
D. the channel autocorrelation weight matrix X will be obtained by the following equation T And x D Channel adaptive weight X obtained by dot product s :
X s =X T ·x D
E. In order to capture the long-distance dependence relationship of the channel and obtain effective small target semantic feature representation, the channel is subjected to self-adaptive weight X s And given phase i profile X i ∈R H×W×D Multiplication:
X D =X s ·X i
wherein X D H.times.Wtimes.D is a characteristic expression of phase i.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (1)
1. A self-attention method for enhancing the small target segmentation performance is characterized by comprising a self-attention mechanism module, and specifically comprising the following steps:
inputting a multi-phase characteristic diagram X into the self-attention mechanism module, wherein the multi-phase characteristic diagram X belongs to R C×H×W×D Wherein C, H, W and D respectively represent the number of channels, space height, space width and space depth of the multi-phase; dividing the multi-phase characteristic diagram X into C channels when inputting, wherein the channels are named as a phase 1 channel and a phase 2 channel in sequence, and so on, and the last channel is a phase C branch; each channel has the same structure, and the characteristic diagram input by each channel is named as the i-th phase characteristic diagram X i ∈R H×W×D I is more than or equal to 2 and less than or equal to C, the phase i characteristic diagram is subjected to single-phase attention enhancement unit to obtain phase i characteristic expression, each characteristic expression is H multiplied by W multiplied by D, the phase 1 characteristic expression is subjected to C characteristic expression, concat is used for splicing to obtain final characteristic expression, and the final characteristic expression is H multiplied by W multiplied by D multiplied by C;
the single-phase attention enhancing unit comprises the following steps:
s1, for each single-phase, firstly, an i-th phase characteristic diagram X is obtained i ∈R H×W×D The operation is carried out by a convolution block, the operation result is divided into D branches along the space depth D of the phase, and each branch is firstly and sequentially arrangedInputting 2 standard convolution layers with convolution kernel size k =1 to obtain an i-th phase standard convolution characteristic diagram;
s2, inputting the standard convolution feature map of the phase i into a global average pooling GAP module to obtain a compressed feature map x of the phase i, wherein x belongs to R 1×D ,x=[x 1 ,x 2 ,...,x D ]Wherein the characteristic x D Gradually capturing specific characteristic response in the training process;
s3, compression characteristic diagram x of front D-1 branches 1 ~x D-1 Normalizing the weight through dot product operation to obtain a channel autocorrelation weight matrix X T ;
The normalization uses a sigmoid activation function, denoted as a, b 2 δ(W 1 {x 1 ,x 2 ,...,x D-1 }), where: a, b, D-1 respectively represent a compression characteristic diagram x 1 ,x 2 ,...,x D-1 Sigma is a sigmoid activation function, and delta represents a ReLU activation function; w is a group of 1 And W 2 Two one-dimensional convolution layers;
S4, obtaining a channel autocorrelation weight matrix X T And x D Performing dot product operation to obtain channel adaptive weight X s I.e. said X s =X T ·x D ;
S5, in order to capture the long-distance dependence relationship of the channel, obtaining the semantic feature representation of the small target, and carrying out self-adaptive weighting on the channel X s And phase i profile X i ∈R H×W×D Multiplication:
X D =X s ·X i
wherein X D The expression H.times.Wtimes.D is characteristic of phase i.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211381902.5A CN115424023B (en) | 2022-11-07 | 2022-11-07 | Self-attention method for enhancing small target segmentation performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211381902.5A CN115424023B (en) | 2022-11-07 | 2022-11-07 | Self-attention method for enhancing small target segmentation performance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115424023A CN115424023A (en) | 2022-12-02 |
CN115424023B true CN115424023B (en) | 2023-04-18 |
Family
ID=84207858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211381902.5A Active CN115424023B (en) | 2022-11-07 | 2022-11-07 | Self-attention method for enhancing small target segmentation performance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115424023B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743505A (en) * | 2021-09-06 | 2021-12-03 | 辽宁工程技术大学 | Improved SSD target detection method based on self-attention and feature fusion |
CN114119993A (en) * | 2021-10-30 | 2022-03-01 | 南京理工大学 | Salient object detection method based on self-attention mechanism |
CN115240049A (en) * | 2022-06-08 | 2022-10-25 | 江苏师范大学 | Deep learning model based on attention mechanism |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950467B (en) * | 2020-08-14 | 2021-06-25 | 清华大学 | Fusion network lane line detection method based on attention mechanism and terminal equipment |
CN113283435B (en) * | 2021-05-14 | 2023-08-22 | 陕西科技大学 | Remote sensing image semantic segmentation method based on multi-scale attention fusion |
CN114897780B (en) * | 2022-04-12 | 2023-04-07 | 南通大学 | MIP sequence-based mesenteric artery blood vessel reconstruction method |
CN114881962B (en) * | 2022-04-28 | 2024-04-19 | 桂林理工大学 | Retina image blood vessel segmentation method based on improved U-Net network |
CN114943876A (en) * | 2022-06-20 | 2022-08-26 | 南京信息工程大学 | Cloud and cloud shadow detection method and device for multi-level semantic fusion and storage medium |
-
2022
- 2022-11-07 CN CN202211381902.5A patent/CN115424023B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743505A (en) * | 2021-09-06 | 2021-12-03 | 辽宁工程技术大学 | Improved SSD target detection method based on self-attention and feature fusion |
CN114119993A (en) * | 2021-10-30 | 2022-03-01 | 南京理工大学 | Salient object detection method based on self-attention mechanism |
CN115240049A (en) * | 2022-06-08 | 2022-10-25 | 江苏师范大学 | Deep learning model based on attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN115424023A (en) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639692B (en) | Shadow detection method based on attention mechanism | |
Xu et al. | Inter/intra-category discriminative features for aerial image classification: A quality-aware selection model | |
CN112926396B (en) | Action identification method based on double-current convolution attention | |
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
CN110569738B (en) | Natural scene text detection method, equipment and medium based on densely connected network | |
CN111160249A (en) | Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion | |
CN113642634A (en) | Shadow detection method based on mixed attention | |
CN109522831B (en) | Real-time vehicle detection method based on micro-convolution neural network | |
CN111899203A (en) | Real image generation method based on label graph under unsupervised training and storage medium | |
CN115565043A (en) | Method for detecting target by combining multiple characteristic features and target prediction method | |
CN116091979A (en) | Target tracking method based on feature fusion and channel attention | |
CN115222998A (en) | Image classification method | |
CN115880495A (en) | Ship image target detection method and system under complex environment | |
CN118212417A (en) | Medical image segmentation model based on light attention module and training method of model | |
CN114066844A (en) | Pneumonia X-ray image analysis model and method based on attention superposition and feature fusion | |
CN115424023B (en) | Self-attention method for enhancing small target segmentation performance | |
CN114723733B (en) | Class activation mapping method and device based on axiom explanation | |
CN114863132A (en) | Method, system, equipment and storage medium for modeling and capturing image spatial domain information | |
CN110689071B (en) | Target detection system and method based on structured high-order features | |
CN113780305A (en) | Saliency target detection method based on interaction of two clues | |
CN117808784B (en) | Flexible film fold prediction method, system, electronic equipment and medium | |
CN114240991B (en) | Instance segmentation method of RGB image | |
CN118212496B (en) | Image fusion method based on noise reduction and complementary information enhancement | |
CN116503603B (en) | Training method of inter-class shielding target detection network model based on weak supervision semantic segmentation and feature compensation | |
CN114694119B (en) | Traffic sign detection method and related device based on heavy parameterization and feature weighting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |