CN116129207A

CN116129207A - Image data processing method for attention of multi-scale channel

Info

Publication number: CN116129207A
Application number: CN202310414590.1A
Authority: CN
Inventors: 刘刚; 王冰冰; 周杰; 王磊; 史魁杰; 曾辉; 张金烁; 胡莉
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-05-16
Anticipated expiration: 2043-04-18
Also published as: CN116129207B

Abstract

The invention discloses an image data processing method of multi-scale channel attention, which is characterized in that global features and local features in input data are extracted, so that a convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in complex scenes are relieved.

Description

Image data processing method for attention of multi-scale channel

Technical Field

The invention relates to the field of computer vision, in particular to an image data processing method of multi-scale channel attention.

Background

The channel attention mechanism can remarkably improve the expressive force and generalization capability of the model, has lower calculation cost and is easy to integrate into the existing convolutional neural network structure. Because of these advantages, the channel attention mechanism has also been widely used in the field of deep learning applications such as image classification, object detection, semantic segmentation, etc.

The essence of the channel attention mechanism is to weight average the characteristics of different channels, so that richer, stable and reliable characteristic expression is obtained.

Existing channel attentions include SE, ECA, CA, etc., which focus on only detail information in a certain local feature or semantic information in a global feature, but not both detail information and semantic information, resulting in feature expression of insufficient channel dimensions.

Disclosure of Invention

The invention aims to provide an image data processing method of multi-scale channel attention.

The invention aims to solve the problems that:

the image data processing method for the attention of the multi-scale channel is provided, global features and local features in input data are extracted, so that the convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in a complex scene are relieved.

The image data processing method of the multi-scale channel attention adopts the following technical scheme:

image data processing method for attention of multi-scale channel

S21: the method comprises the steps of performing digital processing on input data (an original image or a feature map), converting extracted features into digital, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;

s22: the method of combining the global channel attention mechanism and the local channel attention mechanism is used for carrying out feature extraction and feature fusion on input data;

s23: the global channel attention mechanism uses global average pooling, self-adaptively selects a one-dimensional convolution layer with convolution kernel size and a Sigmoid activation function, and the global channel attention can be used for self-adaptively adjusting the weights of different channels through global average pooling and element-by-element conversion of a feature map, so that a model can pay attention to more important features, and the classification performance and the robustness of the model are improved, wherein the calculation formula of the global average pooling is as follows:

wherein->

Representing global average pooling result,/->

For an input image, the dimensions thereof are w×h×c, W, H and C represent the width, height and channel of the input image, respectively, and i and j represent pixel positions on the width and height, respectively;

the calculation formula of the self-adaptive selection is as follows:

wherein->

Convolution kernel size, representing one-dimensional convolution, +.>

Indicates the number of channels>

Meaning that k can only be odd,/or->

And->

For changing->

And->

The ratio of (in the present invention>

And->

Taking 2 and 1 respectively;

the Sigmoid activation function is also called an S-shaped growth curve, and the calculation formula is:

wherein

Is input;

s24: the multi-layer perceptron MLP realized by two-dimensional convolution is adopted in a local channel attention mechanism and is used for extracting local features, the MLP architecture is activated by two-dimensional convolutions with convolution kernel size of 1 and a middle ReLU function, the number of channels of input data is only changed after the input data is subjected to the two-dimensional convolutions, the number of output channels of the first convolution operation is one sixteenth of the number of input channels, the number of output channels of the second convolution operation is consistent with the number of channels of an embedded position, and the local channel attention can help a model to better capture local information in the input features;

s25: the ReLU function only retains positive elements and discards all negative elements by setting the corresponding activity value to 0;

s26: the output of the global attention and the output of the global attention are fused, the Sigmoid function is used for activating data to obtain final attention weight, and then the activated data and the input data are multiplied pixel by pixel;

s27: compressing existing data according to the range thereof by a Sigmoid function, and compressing any input to a certain value in a section (0, 1) so as to ensure normalization;

s28: the pixel-by-pixel multiplication of the input data with the activated data is performed to perform different location weighting operations on the input data, thereby focusing more on global features and local features.

Further, the input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and the attention among the channels is estimated in a manner of shrinking before expanding the channels of the input data in the whole MLP architecture, wherein the shrinkage coefficient is r, the feature scale after shrinkage is h×w×c/r, and the feature scale after expansion is h×w×c by using a ReLU activation function.

Further, in the steps S23 and S24, the global features and the local features in the input data are respectively extracted by the global average pooling manner in the global channel attention mechanism and the multi-layer perceptron MLP manner in the local channel attention mechanism, and the fusion operation is performed on the output of the global channel attention mechanism and the output of the local channel attention mechanism in the steps S23 and S24, namely, the feature fusion is performed on different features, through the step S26, so that the convolutional neural network focuses on the whole information and the local detail features of the input data, and the problems of target aggregation and target shielding in complex scenes are relieved.

The invention has the beneficial effects that: the problems of low detection precision, high omission rate and the like caused by the characteristics of large aggregation, serious shielding and the like of small target detection in a complex scene can be further alleviated by a multi-scale channel attention image data processing method, and the multi-scale channel attention image data processing method is used for extracting global features and local features in data and carrying out feature fusion on different features, so that a convolutional neural network is more concerned about the whole information and local detail features of input data, and the problems of target aggregation and target shielding in the complex scene are alleviated.

Drawings

FIG. 1 is a schematic diagram of a method for processing image data of attention of a multi-scale channel according to the present invention;

FIG. 2 is a linear diagram of ReLU function correction according to the invention;

FIG. 3 is a schematic diagram of the normalization of sigmoid function data in the present invention.

Detailed Description

The invention will be further clarified and fully described in connection with the accompanying drawings, to which the scope of protection of the invention is not limited.

Examples

As shown in fig. 1 to 3, a method for processing image data of attention of a multi-scale channel includes the steps of:

s22: using a method of combining a global channel attention mechanism and a local channel attention mechanism, as shown in fig. 1, performing feature extraction and feature fusion on input data;

s23: the global channel attention mechanism uses global average pooling, self-adaptively selects a one-dimensional convolution layer with convolution kernel size and a Sigmoid activation function, as shown in the left column of fig. 1, the global channel attention can be obtained by carrying out global average pooling and element-by-element transformation on a feature map, and the weights of different channels are self-adaptively adjusted, so that a model can pay attention to more important features, and the classification performance and the robustness of the model are improved, wherein the calculation formula of the global average pooling is as follows:

wherein->

Representing global average pooling result,/->

the calculation formula of the self-adaptive selection is as follows:

wherein->

Convolution kernel size, representing one-dimensional convolution, +.>

Indicates the number of channels>

Meaning that k can only be odd,/or->

And->

For changing->

And->

The ratio of (in the present invention>

And->

Taking 2 and 1 respectively;

the Sigmoid activation function is also called an S-shaped growth curve, and as shown in fig. 3, the calculation formula is:

wherein->

Is input;

s24: the multi-layer perceptron MLP realized by two-dimensional convolution is adopted in a local channel attention mechanism and is used for extracting local features, the MLP architecture is formed by two-dimensional convolutions with convolution kernel size of 1 and intermediate ReLU function activation, the ReLU function activation enables the output of a part of neurons to be 0, the interdependence relation of parameters is reduced, the occurrence of over-fitting problem is relieved, the channel number of input data is only changed after two-dimensional convolution, the output channel number of a first convolution operation is one sixteenth of the input channel number, the output channel number of a second convolution operation is consistent with the embedding position channel number, and the local channel attention can help a model to better capture local information in the input features, as shown in the right column of fig. 1;

s25: the ReLU function retains only positive elements and discards all negative elements by setting the corresponding activity value to 0, as shown in fig. 2;

s27: compression is performed through a Sigmoid function, which compresses the existing data to a certain value in the interval (0, 1) according to the range of the existing data so as to ensure normalization, as shown in fig. 1;

s28: the pixel-wise multiplication of the input data with the activated data is performed to perform different position weighting operations on the input data, thereby focusing more on global and local features as shown in fig. 1.

The input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and the attention among the channels of the input data is estimated in a manner of shrinking before expanding in the whole MLP architecture, wherein the shrinkage coefficient is r, the feature scale after shrinkage is h×w×c/r, and the feature scale after expansion is h×w×c by using a ReLU activation function.

The steps S23 and S24 respectively extract global features and local features in input data in a global average pooling mode in a global channel attention mechanism and a multi-layer perceptron MLP mode in a local channel attention mechanism, and the step S26 is used for carrying out fusion operation on the output of the global channel attention mechanism and the output of the local channel attention mechanism of the steps S23 and S24, namely carrying out feature fusion on different features, so that a convolutional neural network focuses on the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in complex scenes are relieved.

The embodiments of the present invention are disclosed as preferred embodiments, but not limited thereto, and those skilled in the art will readily appreciate from the foregoing description that various extensions and modifications can be made without departing from the spirit of the present invention.

Claims

1. A method of processing image data of a multi-scale channel attention, comprising the steps of:

s21: the method comprises the steps of performing digital processing on input data, namely an original image or a feature map, converting the extracted features into digital data, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;

s23: using global average pooling, a one-dimensional convolution layer with self-adaptive selection convolution kernel size and a Sigmoid activation function in a global channel attention mechanism, wherein the calculation formula of the global average pooling process is as follows:

wherein->

Representing global average pooling result,/->

the calculation formula of the self-adaptive selection is as follows:

wherein->

Convolution kernel size, representing one-dimensional convolution, +.>

Indicates the number of channels>

Meaning that k can only be odd,/or->

And->

For changing->

And->

The ratio between;

wherein->

Is input;

s24: the multi-layer perceptron MLP is realized by two-dimensional convolution in a local channel attention mechanism and is used for extracting local features, the MLP architecture is activated by two-dimensional convolutions with convolution kernel size of 1 and a middle ReLU function, the number of channels of input data is only changed after the input data is subjected to the two-dimensional convolutions, the number of output channels of a first convolution operation is one sixteenth of the number of input channels, and the number of output channels of a second convolution operation is consistent with the number of channels of an embedded position;

2. A method of processing image data for multi-scale channel attention as recited in claim 1, wherein,

3. The method according to claim 1, wherein the input data is subjected to two-dimensional convolution in step S24 to change only the number of channels, and the attention between channels is estimated for the channels of the input data in a first-shrink-then-expansion manner in the whole MLP architecture, wherein the shrinkage factor is r, the feature scale after shrinkage is hxw x C/r, and the feature scale after expansion is hxw x C using a ReLU activation function.