CN115424023B

CN115424023B - Self-attention method for enhancing small target segmentation performance

Info

Publication number: CN115424023B
Application number: CN202211381902.5A
Authority: CN
Inventors: 王博; 赵威; 申建虎; 张伟; 徐正清
Original assignee: Beijing Precision Diagnosis Medical Technology Co ltd
Current assignee: Beijing Precision Diagnosis Medical Technology Co ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-04-18
Anticipated expiration: 2042-11-07
Also published as: CN115424023A

Abstract

The invention discloses a self-attention mechanism module for enhancing small target segmentation performance, which comprises the following steps: dividing the multi-phase characteristic diagram X into C branches, inputting the C branches into the self-attention mechanism module, wherein each branch has the same structure, the characteristic diagram input by each branch is named as an ith-phase characteristic diagram, the ith-phase characteristic diagram is subjected to single-phase attention enhancement unit to obtain ith characteristic expression, the 1 st characteristic table is connected with the C characteristic expression to obtain final characteristic expression, and the final characteristic expression is H multiplied by W multiplied by D multiplied by C. The channels containing the small target feature information can be weighted, so that the small target segmentation capability is enhanced.

Description

Self-attention method for enhancing small target segmentation performance

Technical Field

The invention belongs to the technical field of deep learning to medical image classification, and relates to a self-attention method for enhancing small target segmentation performance.

Background

An Attention Mechanism (Attention Mechanism) can help the model to endow different weights to each input part, extract more key and important information, enable the model to make more accurate judgment, and meanwhile, can not bring larger expenses to the calculation and storage of the model. The attention mechanism is simple and can endow the model with stronger discrimination capability, and the current deep learning neural network structure generally comprises the attention mechanism.

The attention mechanism is helpful for improving the feature expression of a small target segmentation network, namely, paying attention to essential features and inhibiting unnecessary features, and the convolution block integration attention mechanism can effectively improve the performance of computer vision tasks such as image classification, target segmentation and example segmentation. In the detection algorithm based on the spatial attention mechanism, because the Ratio (RBI) of the small object Bounding box Area to the Image Area is between 0.08% and 0.58%, the edge features are blurred or even lost, the resolution and the available feature information are limited, and the small target feature information is lost due to the successive multi-layer downsampling convolution, so that the performance of the target segmentation algorithm based on the spatial attention mechanism is limited.

Based on the defects in the prior art, in order to capture the position of a tiny object and sense the global space structure of the tiny object, the invention provides a self-attention method for enhancing the small object segmentation performance on the premise of not increasing the calculation complexity.

Disclosure of Invention

The invention aims to provide a self-attention method for enhancing the small target segmentation performance, which can enhance the small target segmentation performance.

The technical scheme adopted by the invention is that,

a self-attention method for enhancing small object segmentation performance, said self-attention mechanism module comprising the steps of:

inputting a multi-phase characteristic diagram X into the self-attention mechanism module, wherein the multi-phase characteristic diagram X belongs to R ^C×H×W×D Wherein C, H, W and D respectively represent the number of channels, space height, space width and space depth of the multi-phase; dividing the multi-phase characteristic diagram X into C branches when inputting, wherein the branches are named as a phase 1 branch and a phase 2 branch in sequence, and so on, and the last branch is a phase C branch; each branch has the same structure, and the characteristic diagram input by each branch is named as the i-th phase characteristic diagram X _i ∈R ^H×W×D And i is more than or equal to 2 and less than or equal to C, the phase i characteristic diagram obtains phase i characteristic expressions through a single-phase attention enhancing unit, each characteristic expression is H multiplied by W multiplied by D, the phase 1 characteristic expression is achieved through the phase C characteristic expression, and the phase I characteristic expression and the phase C characteristic expression are spliced together by concat to obtain final characteristic expression, and the final characteristic expression is H multiplied by W multiplied by D multiplied by C.

Further, the single-phase attention-enhancing unit comprises the following steps:

A. the ith phase feature map X _i ∈R ^H×W×D Firstly, performing convolution block operation, dividing an operation result into D branches along a channel dimension D of the phase, and sequentially inputting each branch into 2 standard convolution layers with convolution kernel size k =1 to obtain an i-th phase standard convolution characteristic diagram;

B. inputting the standard convolution characteristic diagram of the phase i into a global average pooling GAP module to obtain a compressed characteristic diagram x of the phase i, wherein x belongs to R ^1×D ，x＝[x ₁ ,x ₂ ,...,x _D ]Wherein the characteristic x _i Gradually capturing specific feature responses during the training process;

C. compression profile x for the first D-1 branches ₁ ～x _D-1 The weight is normalized in a dot product mode to obtain a channel autocorrelation weight matrix X _T The process can be expressed as follows:

the normalization weight uses a sigmoid activation function, and the process can be expressed as the following equation (2):

a,b,...,D-1＝σ(W ₂ δ(W ₁ {x ₁ ,x ₂ ,...,x _D-1 })) (2)

wherein: a, b, D-1 respectively represent x ₁ ,x ₂ ,...,x _D-1 Sigma is a sigmoid activation function, and delta represents an activation function ReLU; w ₁ And W ₂ Two one-dimensional convolution layers;

D. the channel autocorrelation weight matrix X will be obtained by the following equation _T And x _D Channel adaptive weight X obtained by dot product _s ：

X _s ＝X _T ·x _D

E. In order to capture the long-distance dependence relationship of the channel and obtain effective small target semantic feature representation, the channel is subjected to self-adaptive weight X _s And given phase i profile X _i ∈R ^H×W×D Multiplication:

X _D ＝X _s ×X _i

wherein X _D H.times.Wtimes.D is a characteristic expression of phase i.

Drawings

FIG. 1 is a schematic diagram of the overall structure of a self-attention mechanism module according to the present invention;

FIG. 2 is a schematic diagram of a single-phase enhanced attention unit according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

A self-attention method for enhancing small object segmentation performance, as shown in fig. 1, the self-attention mechanism module includes the following steps:

inputting a multi-phase characteristic diagram X into the self-attention mechanism module, wherein the multi-phase characteristic diagram X belongs to R ^C×H×W×D Wherein C, H, W and D respectively represent the number of channels, space height, space width and space depth of the multi-phase; when the multi-phase characteristic diagram X is input, dividing the multi-phase characteristic diagram into C branches, wherein the branches are named as a phase 1 branch and a phase 2 branch in sequence, and so on, and the last branch is a phase C branch; each branch has the same structure, and the characteristic diagram input by each branch is named as the i-th phase characteristic diagram X _i ∈R ^H×W×D And i is more than or equal to 2 and less than or equal to C, the phase i characteristic diagram obtains phase i characteristic expressions through a single-phase attention enhancing unit, each characteristic expression is H multiplied by W multiplied by D, the phase 1 characteristic expression is achieved through the phase C characteristic expression, and the phase I characteristic expression is spliced by concat to obtain final characteristic expression, and the characteristic expression is H multiplied by W multiplied by D multiplied by C.

Further, as shown in fig. 2, the single-phase enhanced attention unit includes the following steps:

A. the channel weight calculation formula (1) of the single-phase enhanced attention unit is as follows:

wherein: omega represents the weight of the channel, sigma is a sigmoid activation function, and F is convolution operation;

as can be seen from fig. 2, for each single phase, i.e. for the ith phase, the spatial depth D of the multi-phase feature map is the channel dimension of the phase; the ith phase feature map X _i ∈R ^H×W×D Firstly, performing convolution block operation, dividing an operation result into D branches along a channel dimension D of the phase, and sequentially inputting each branch into 2 standard convolution layers with convolution kernel size k =1 to obtain an i-th phase standard convolution characteristic diagram;

B. inputting the standard convolution characteristic diagram of the phase i into a global average pooling GAP module to obtain a compressed characteristic diagram x of the phase i, wherein x belongs to R ^1×D ，x＝[x ₁ ,x ₂ ,...,x _D ]Wherein the characteristic x _i Gradually capturing specific characteristic response in the training process;

C. compressed feature map x of the first D-1 branches ₁ ～x _D-1 The weight is normalized in a dot product mode to obtain a channel autocorrelation weight matrix X _T The process can be expressed as follows:

a,b,...,D-1＝σ(W ₂ δ(W ₁ {x ₁ ,x ₂ ,...,x _D-1 })) (2)

X _s ＝X _T ·x _D

X _D ＝X _s ·X _i

wherein X _D H.times.Wtimes.D is a characteristic expression of phase i.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A self-attention method for enhancing the small target segmentation performance is characterized by comprising a self-attention mechanism module, and specifically comprising the following steps:

inputting a multi-phase characteristic diagram X into the self-attention mechanism module, wherein the multi-phase characteristic diagram X belongs to R ^C×H×W×D Wherein C, H, W and D respectively represent the number of channels, space height, space width and space depth of the multi-phase; dividing the multi-phase characteristic diagram X into C channels when inputting, wherein the channels are named as a phase 1 channel and a phase 2 channel in sequence, and so on, and the last channel is a phase C branch; each channel has the same structure, and the characteristic diagram input by each channel is named as the i-th phase characteristic diagram X _i ∈R ^H×W×D I is more than or equal to 2 and less than or equal to C, the phase i characteristic diagram is subjected to single-phase attention enhancement unit to obtain phase i characteristic expression, each characteristic expression is H multiplied by W multiplied by D, the phase 1 characteristic expression is subjected to C characteristic expression, concat is used for splicing to obtain final characteristic expression, and the final characteristic expression is H multiplied by W multiplied by D multiplied by C;

the single-phase attention enhancing unit comprises the following steps:

s1, for each single-phase, firstly, an i-th phase characteristic diagram X is obtained _i ∈R ^H×W×D The operation is carried out by a convolution block, the operation result is divided into D branches along the space depth D of the phase, and each branch is firstly and sequentially arrangedInputting 2 standard convolution layers with convolution kernel size k =1 to obtain an i-th phase standard convolution characteristic diagram;

s2, inputting the standard convolution feature map of the phase i into a global average pooling GAP module to obtain a compressed feature map x of the phase i, wherein x belongs to R ^1×D ，x＝[x ₁ ,x ₂ ,...,x _D ]Wherein the characteristic x _D Gradually capturing specific characteristic response in the training process;

s3, compression characteristic diagram x of front D-1 branches ₁ ～x _D-1 Normalizing the weight through dot product operation to obtain a channel autocorrelation weight matrix X _T ；

The normalization uses a sigmoid activation function, denoted as a, b ₂ δ(W ₁ {x ₁ ,x ₂ ,...,x _D-1 }), where: a, b, D-1 respectively represent a compression characteristic diagram x ₁ ,x ₂ ,...,x _D-1 Sigma is a sigmoid activation function, and delta represents a ReLU activation function; w is a group of ₁ And W ₂ Two one-dimensional convolution layers;

the channel autocorrelation weight matrix

S4, obtaining a channel autocorrelation weight matrix X _T And x _D Performing dot product operation to obtain channel adaptive weight X _s I.e. said X _s ＝X _T ·x _D ；

S5, in order to capture the long-distance dependence relationship of the channel, obtaining the semantic feature representation of the small target, and carrying out self-adaptive weighting on the channel X _s And phase i profile X _i ∈R ^H×W×D Multiplication:

X _D ＝X _s ·X _i

wherein X _D The expression H.times.Wtimes.D is characteristic of phase i.