CN116129207B

CN116129207B - Image data processing method for attention of multi-scale channel

Info

Publication number: CN116129207B
Application number: CN202310414590.1A
Authority: CN
Inventors: 刘刚; 王冰冰; 周杰; 王磊; 史魁杰; 曾辉; 张金烁; 胡莉
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-08-04
Anticipated expiration: 2043-04-18
Also published as: CN116129207A

Abstract

The invention discloses an image data processing method of multi-scale channel attention, which is characterized in that global features and local features in input data are extracted, so that a convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in complex scenes are relieved.

Description

Image data processing method for attention of multi-scale channel

Technical Field

The invention relates to the field of computer vision, in particular to an image data processing method of multi-scale channel attention.

Background

The channel attention mechanism can remarkably improve the expressive force and generalization capability of the model, has lower calculation cost and is easy to integrate into the existing convolutional neural network structure. Because of these advantages, the channel attention mechanism has also been widely used in the field of deep learning applications such as image classification, object detection, semantic segmentation, etc.

The essence of the channel attention mechanism is to weight average the characteristics of different channels, so that richer, stable and reliable characteristic expression is obtained.

Existing channel attentions include SE, ECA, CA, etc., which focus on only detail information in a certain local feature or semantic information in a global feature, but not both detail information and semantic information, resulting in feature expression of insufficient channel dimensions.

Disclosure of Invention

The invention aims to provide an image data processing method of multi-scale channel attention.

The invention aims to solve the problems that:

the image data processing method for the attention of the multi-scale channel is provided, global features and local features in input data are extracted, so that the convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in a complex scene are relieved.

The image data processing method of the multi-scale channel attention adopts the following technical scheme:

image data processing method for attention of multi-scale channel

S21: the method comprises the steps of performing digital processing on input data (an original image or a feature map), converting extracted features into digital, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;

s22: the method of combining the global channel attention mechanism and the local channel attention mechanism is used for carrying out feature extraction and feature fusion on input data;

s22.1: the global channel attention mechanism uses global average pooling, self-adaptively selects a one-dimensional convolution layer with convolution kernel size and a Sigmoid activation function, the global channel attention can be used for self-adaptively adjusting the weights of different channels through global average pooling and element-by-element conversion of a feature map, so that a model can pay attention to more important features, and the classification performance and the robustness of the model are improved, wherein the calculation formula of the global average is as follows:

wherein the method comprises the steps ofRepresenting global average pooling result,/->For an input image, the dimensions thereof are w×h×c, W, H and C represent the width, height and channel of the input image, respectively, and i and j represent pixel positions on the width and height, respectively;

the calculation formula of the self-adaptive selection is as follows:

where k represents the convolution kernel size of the one-dimensional convolution, C represents the number of channels,meaning that k can only be odd,/or->And b for changing the ratio between C and k, in the present invention +.>And b is 2 and 1, respectively;

the Sigmoid activation function is also called an S-shaped growth curve, and the calculation formula is:

wherein x is input;

s22.2: the multi-layer perceptron MLP realized by two-dimensional convolution is adopted in a local channel attention mechanism and is used for extracting local features, the MLP architecture is activated by two-dimensional convolutions with convolution kernel size of 1 and a middle ReLU function, the number of channels of input data is only changed after two-dimensional convolutions, the number of output channels of a first convolution operation is one sixteenth of the number of input channels, the number of output channels of a second convolution operation is consistent with the number of channels of an embedded position, the local channel attention can help a model to better capture local information in the input features, and the ReLU function only retains positive elements and discards all negative elements by setting corresponding activity values to 0;

s22.3: and carrying out fusion operation on the output of the global attention and the attention, activating the data by using a Sigmoid function to obtain final attention weight, and then carrying out pixel-by-pixel multiplication on the activated data and the input data.

S22.4: compressing existing data according to the range thereof by a Sigmoid function, and compressing any input to a certain value in a section (0, 1) so as to ensure normalization;

s22.5: the pixel-by-pixel multiplication of the input data with the activated data is performed to perform different location weighting operations on the input data, thereby focusing more on global features and local features.

Further, the input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and the attention among the channels is estimated in a manner of shrinking before expanding the channels of the input data in the whole MLP architecture, wherein the shrinkage coefficient is r, the feature scale after shrinkage is h×w×c/r, and the feature scale after expansion is h×w×c by using a ReLU activation function.

Further, in the steps S23 and S24, the global feature and the local feature in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism, respectively, and in the step S26, the fusion operation is performed on the output of the global attention and the local attention, that is, the feature fusion is performed on different features, so that the convolutional neural network focuses on the whole information and the local detail features of the input data, and the problems of target aggregation and target occlusion in the complex scene are relieved.

The invention has the beneficial effects that: the problems of low detection precision, high omission rate and the like caused by the characteristics of large aggregation, serious shielding and the like of small target detection in a complex scene can be further alleviated by a multi-scale channel attention image data processing method, and the multi-scale channel attention image data processing method is used for extracting global features and local features in data and carrying out feature fusion on different features, so that a convolutional neural network is more concerned about the whole information and local detail features of input data, and the problems of target aggregation and target shielding in the complex scene are alleviated.

Drawings

FIG. 1 is a schematic diagram of a method for processing image data of attention of a multi-scale channel according to the present invention;

FIG. 2 is a linear diagram of ReLU function correction according to the invention;

FIG. 3 is a schematic diagram of the normalization of sigmoid function data in the present invention.

Description of the embodiments

The invention will be further clarified and fully described in connection with the accompanying drawings, to which the scope of protection of the invention is not limited.

Examples

As shown in fig. 1 to 3, a method for processing image data of attention of a multi-scale channel includes the steps of:

s22: a method of combining a global channel attention mechanism and a local channel attention mechanism is used, as shown in fig. 1, for carrying out feature extraction and feature fusion on input data;

s22.1: the global channel attention mechanism uses global average pooling, self-adaptively selects a one-dimensional convolution layer with convolution kernel size and a Sigmoid activation function, as shown in the left column of fig. 1, the global channel attention can be obtained by carrying out global average pooling and element-by-element transformation on a feature map, and the weights of different channels are self-adaptively adjusted, so that a model can pay attention to more important features, and the classification performance and the robustness of the model are improved, wherein the calculation formula of the global average pooling is as follows:

wherein y represents a global average pooling result, X is an input image, the sizes of which are W X H X C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent pixel point positions on the width and height respectively;

the calculation formula of the self-adaptive selection is as follows:

where k represents the convolution kernel size of the one-dimensional convolution, C represents the number of channels,meaning that k can only be odd, whereAnd b for changing the ratio between C and k, in this embodiment +.>And b is 2 and 1, respectively;

the Sigmoid activation function is also called an S-shaped growth curve, and as shown in fig. 3, the calculation formula is:

wherein x is input;

s22.2: the multi-layer perceptron MLP realized by two-dimensional convolution is adopted in a local channel attention mechanism and is used for extracting local features, the MLP architecture is formed by two-dimensional convolutions with convolution kernel size of 1 and middle ReLU function activation, the ReLU function activation enables the output of a part of neurons to be 0, the interdependence relation of parameters is reduced, the occurrence of over-fitting problem is relieved, the number of channels of input data is only changed after two-dimensional convolution, the number of output channels of a first convolution operation is one sixteenth of the number of input channels, the number of output channels of a second convolution operation is consistent with the number of channels of an embedding position, the local channel attention can help a model to better capture local information in the input features, as shown in the right column of fig. 1, wherein the ReLU function only retains positive elements and discards all negative elements by setting corresponding activity values to 0, as shown in fig. 2;

S22.4: compression is performed through a Sigmoid function, which compresses the existing data to a certain value in the interval (0, 1) according to the range of the existing data so as to ensure normalization, as shown in fig. 1;

s22.5: the pixel-wise multiplication of the input data with the activated data is performed to perform different position weighting operations on the input data, thereby focusing more on global and local features as shown in fig. 1.

The input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and the attention among the channels of the input data is estimated in a manner of shrinking before expanding in the whole MLP architecture, wherein the shrinkage coefficient is r, the feature scale after shrinkage is h×w×c/r, and the feature scale after expansion is h×w×c by using a ReLU activation function.

In the steps S23 and S24, the global feature and the local feature in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism, respectively, and in the step S26, the fusion operation is performed on the output of the global attention and the local attention, that is, the feature fusion is performed on different features, so that the convolutional neural network focuses on the whole information and the local detail features of the input data, and the problems of target aggregation and target occlusion in the complex scene are relieved.

The embodiments of the present invention are disclosed as preferred embodiments, but not limited thereto, and those skilled in the art will readily appreciate from the foregoing description that various extensions and modifications can be made without departing from the spirit of the present invention.

Claims

1. A method of processing image data of a multi-scale channel attention, comprising the steps of:

s21: the method comprises the steps of performing digital processing on input data, namely an original image or a feature map, converting the extracted features into digital data, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;

s22.1: using global average pooling, a one-dimensional convolution layer with self-adaptive selection convolution kernel size and a Sigmoid activation function in a global channel attention mechanism, wherein the calculation formula of the global average pooling process is as follows:

wherein->Representing global average pooling result,/->For an input image, the dimensions thereof are w×h×c, W, H and C represent the width, height and channel of the input image, respectively, and i and j represent pixel positions on the width and height, respectively;

the calculation formula of the self-adaptive selection is as follows:where k represents the convolution kernel size of the one-dimensional convolution, C represents the number of channels, +.>Meaning that k can only be odd,/or->And b is used to vary the ratio between C and k;

Sithe gmoid activation function is also called an S-type growth curve, and the calculation formula is:wherein x is the input;

s22.2: the multi-layer perceptron MLP is realized by two-dimensional convolution in a local channel attention mechanism and is used for extracting local features, the MLP architecture is activated by two-dimensional convolutions with convolution kernel size of 1 and a middle ReLU function, the number of channels of input data is only changed after the two-dimensional convolutions, the number of output channels of the first convolution operation is one sixteenth of the number of input channels, the number of output channels of the second convolution operation is consistent with the number of channels of an embedded position, and the ReLU function only keeps positive elements and discards all negative elements by setting corresponding activity values to 0;

s22.3: the output of the global attention and the output of the global attention are fused, the Sigmoid function is used for activating data to obtain final attention weight, and then the activated data and the input data are multiplied pixel by pixel;

2. A method of processing image data for multi-scale channel attention as recited in claim 1, wherein,

the input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and in the whole MLP architecture, the attention among channels of the input data is estimated in a first-contraction-then-expansion mode, wherein the contraction coefficient is r, the characteristic scale after contraction is h×w×c/r, a ReLU activation function is used, and the characteristic scale after expansion is h×w×c.