CN116129207B - Image data processing method for attention of multi-scale channel - Google Patents

Image data processing method for attention of multi-scale channel Download PDF

Info

Publication number
CN116129207B
CN116129207B CN202310414590.1A CN202310414590A CN116129207B CN 116129207 B CN116129207 B CN 116129207B CN 202310414590 A CN202310414590 A CN 202310414590A CN 116129207 B CN116129207 B CN 116129207B
Authority
CN
China
Prior art keywords
data
channels
global
input data
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310414590.1A
Other languages
Chinese (zh)
Other versions
CN116129207A (en
Inventor
刘刚
王冰冰
周杰
王磊
史魁杰
曾辉
张金烁
胡莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202310414590.1A priority Critical patent/CN116129207B/en
Publication of CN116129207A publication Critical patent/CN116129207A/en
Application granted granted Critical
Publication of CN116129207B publication Critical patent/CN116129207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Facsimile Image Signal Circuits (AREA)

Abstract

The invention discloses an image data processing method of multi-scale channel attention, which is characterized in that global features and local features in input data are extracted, so that a convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in complex scenes are relieved.

Description

Image data processing method for attention of multi-scale channel
Technical Field
The invention relates to the field of computer vision, in particular to an image data processing method of multi-scale channel attention.
Background
The channel attention mechanism can remarkably improve the expressive force and generalization capability of the model, has lower calculation cost and is easy to integrate into the existing convolutional neural network structure. Because of these advantages, the channel attention mechanism has also been widely used in the field of deep learning applications such as image classification, object detection, semantic segmentation, etc.
The essence of the channel attention mechanism is to weight average the characteristics of different channels, so that richer, stable and reliable characteristic expression is obtained.
Existing channel attentions include SE, ECA, CA, etc., which focus on only detail information in a certain local feature or semantic information in a global feature, but not both detail information and semantic information, resulting in feature expression of insufficient channel dimensions.
Disclosure of Invention
The invention aims to provide an image data processing method of multi-scale channel attention.
The invention aims to solve the problems that:
the image data processing method for the attention of the multi-scale channel is provided, global features and local features in input data are extracted, so that the convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in a complex scene are relieved.
The image data processing method of the multi-scale channel attention adopts the following technical scheme:
image data processing method for attention of multi-scale channel
S21: the method comprises the steps of performing digital processing on input data (an original image or a feature map), converting extracted features into digital, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;
s22: the method of combining the global channel attention mechanism and the local channel attention mechanism is used for carrying out feature extraction and feature fusion on input data;
s22.1: the global channel attention mechanism uses global average pooling, self-adaptively selects a one-dimensional convolution layer with convolution kernel size and a Sigmoid activation function, the global channel attention can be used for self-adaptively adjusting the weights of different channels through global average pooling and element-by-element conversion of a feature map, so that a model can pay attention to more important features, and the classification performance and the robustness of the model are improved, wherein the calculation formula of the global average is as follows:
wherein the method comprises the steps ofRepresenting global average pooling result,/->For an input image, the dimensions thereof are w×h×c, W, H and C represent the width, height and channel of the input image, respectively, and i and j represent pixel positions on the width and height, respectively;
the calculation formula of the self-adaptive selection is as follows:
where k represents the convolution kernel size of the one-dimensional convolution, C represents the number of channels,meaning that k can only be odd,/or->And b for changing the ratio between C and k, in the present invention +.>And b is 2 and 1, respectively;
the Sigmoid activation function is also called an S-shaped growth curve, and the calculation formula is:
wherein x is input;
s22.2: the multi-layer perceptron MLP realized by two-dimensional convolution is adopted in a local channel attention mechanism and is used for extracting local features, the MLP architecture is activated by two-dimensional convolutions with convolution kernel size of 1 and a middle ReLU function, the number of channels of input data is only changed after two-dimensional convolutions, the number of output channels of a first convolution operation is one sixteenth of the number of input channels, the number of output channels of a second convolution operation is consistent with the number of channels of an embedded position, the local channel attention can help a model to better capture local information in the input features, and the ReLU function only retains positive elements and discards all negative elements by setting corresponding activity values to 0;
s22.3: and carrying out fusion operation on the output of the global attention and the attention, activating the data by using a Sigmoid function to obtain final attention weight, and then carrying out pixel-by-pixel multiplication on the activated data and the input data.
S22.4: compressing existing data according to the range thereof by a Sigmoid function, and compressing any input to a certain value in a section (0, 1) so as to ensure normalization;
s22.5: the pixel-by-pixel multiplication of the input data with the activated data is performed to perform different location weighting operations on the input data, thereby focusing more on global features and local features.
Further, the input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and the attention among the channels is estimated in a manner of shrinking before expanding the channels of the input data in the whole MLP architecture, wherein the shrinkage coefficient is r, the feature scale after shrinkage is h×w×c/r, and the feature scale after expansion is h×w×c by using a ReLU activation function.
Further, in the steps S23 and S24, the global feature and the local feature in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism, respectively, and in the step S26, the fusion operation is performed on the output of the global attention and the local attention, that is, the feature fusion is performed on different features, so that the convolutional neural network focuses on the whole information and the local detail features of the input data, and the problems of target aggregation and target occlusion in the complex scene are relieved.
The invention has the beneficial effects that: the problems of low detection precision, high omission rate and the like caused by the characteristics of large aggregation, serious shielding and the like of small target detection in a complex scene can be further alleviated by a multi-scale channel attention image data processing method, and the multi-scale channel attention image data processing method is used for extracting global features and local features in data and carrying out feature fusion on different features, so that a convolutional neural network is more concerned about the whole information and local detail features of input data, and the problems of target aggregation and target shielding in the complex scene are alleviated.
Drawings
FIG. 1 is a schematic diagram of a method for processing image data of attention of a multi-scale channel according to the present invention;
FIG. 2 is a linear diagram of ReLU function correction according to the invention;
FIG. 3 is a schematic diagram of the normalization of sigmoid function data in the present invention.
Description of the embodiments
The invention will be further clarified and fully described in connection with the accompanying drawings, to which the scope of protection of the invention is not limited.
Examples
As shown in fig. 1 to 3, a method for processing image data of attention of a multi-scale channel includes the steps of:
s21: the method comprises the steps of performing digital processing on input data (an original image or a feature map), converting extracted features into digital, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;
s22: a method of combining a global channel attention mechanism and a local channel attention mechanism is used, as shown in fig. 1, for carrying out feature extraction and feature fusion on input data;
s22.1: the global channel attention mechanism uses global average pooling, self-adaptively selects a one-dimensional convolution layer with convolution kernel size and a Sigmoid activation function, as shown in the left column of fig. 1, the global channel attention can be obtained by carrying out global average pooling and element-by-element transformation on a feature map, and the weights of different channels are self-adaptively adjusted, so that a model can pay attention to more important features, and the classification performance and the robustness of the model are improved, wherein the calculation formula of the global average pooling is as follows:
wherein y represents a global average pooling result, X is an input image, the sizes of which are W X H X C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent pixel point positions on the width and height respectively;
the calculation formula of the self-adaptive selection is as follows:
where k represents the convolution kernel size of the one-dimensional convolution, C represents the number of channels,meaning that k can only be odd, whereAnd b for changing the ratio between C and k, in this embodiment +.>And b is 2 and 1, respectively;
the Sigmoid activation function is also called an S-shaped growth curve, and as shown in fig. 3, the calculation formula is:
wherein x is input;
s22.2: the multi-layer perceptron MLP realized by two-dimensional convolution is adopted in a local channel attention mechanism and is used for extracting local features, the MLP architecture is formed by two-dimensional convolutions with convolution kernel size of 1 and middle ReLU function activation, the ReLU function activation enables the output of a part of neurons to be 0, the interdependence relation of parameters is reduced, the occurrence of over-fitting problem is relieved, the number of channels of input data is only changed after two-dimensional convolution, the number of output channels of a first convolution operation is one sixteenth of the number of input channels, the number of output channels of a second convolution operation is consistent with the number of channels of an embedding position, the local channel attention can help a model to better capture local information in the input features, as shown in the right column of fig. 1, wherein the ReLU function only retains positive elements and discards all negative elements by setting corresponding activity values to 0, as shown in fig. 2;
s22.3: and carrying out fusion operation on the output of the global attention and the attention, activating the data by using a Sigmoid function to obtain final attention weight, and then carrying out pixel-by-pixel multiplication on the activated data and the input data.
S22.4: compression is performed through a Sigmoid function, which compresses the existing data to a certain value in the interval (0, 1) according to the range of the existing data so as to ensure normalization, as shown in fig. 1;
s22.5: the pixel-wise multiplication of the input data with the activated data is performed to perform different position weighting operations on the input data, thereby focusing more on global and local features as shown in fig. 1.
The input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and the attention among the channels of the input data is estimated in a manner of shrinking before expanding in the whole MLP architecture, wherein the shrinkage coefficient is r, the feature scale after shrinkage is h×w×c/r, and the feature scale after expansion is h×w×c by using a ReLU activation function.
In the steps S23 and S24, the global feature and the local feature in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism, respectively, and in the step S26, the fusion operation is performed on the output of the global attention and the local attention, that is, the feature fusion is performed on different features, so that the convolutional neural network focuses on the whole information and the local detail features of the input data, and the problems of target aggregation and target occlusion in the complex scene are relieved.
The embodiments of the present invention are disclosed as preferred embodiments, but not limited thereto, and those skilled in the art will readily appreciate from the foregoing description that various extensions and modifications can be made without departing from the spirit of the present invention.

Claims (2)

1. A method of processing image data of a multi-scale channel attention, comprising the steps of:
s21: the method comprises the steps of performing digital processing on input data, namely an original image or a feature map, converting the extracted features into digital data, storing the digital data through tensor matrixes, and accelerating the convergence of a convolutional neural network through normalization processing;
s22: the method of combining the global channel attention mechanism and the local channel attention mechanism is used for carrying out feature extraction and feature fusion on input data;
s22.1: using global average pooling, a one-dimensional convolution layer with self-adaptive selection convolution kernel size and a Sigmoid activation function in a global channel attention mechanism, wherein the calculation formula of the global average pooling process is as follows:
wherein->Representing global average pooling result,/->For an input image, the dimensions thereof are w×h×c, W, H and C represent the width, height and channel of the input image, respectively, and i and j represent pixel positions on the width and height, respectively;
the calculation formula of the self-adaptive selection is as follows:where k represents the convolution kernel size of the one-dimensional convolution, C represents the number of channels, +.>Meaning that k can only be odd,/or->And b is used to vary the ratio between C and k;
Sithe gmoid activation function is also called an S-type growth curve, and the calculation formula is:wherein x is the input;
s22.2: the multi-layer perceptron MLP is realized by two-dimensional convolution in a local channel attention mechanism and is used for extracting local features, the MLP architecture is activated by two-dimensional convolutions with convolution kernel size of 1 and a middle ReLU function, the number of channels of input data is only changed after the two-dimensional convolutions, the number of output channels of the first convolution operation is one sixteenth of the number of input channels, the number of output channels of the second convolution operation is consistent with the number of channels of an embedded position, and the ReLU function only keeps positive elements and discards all negative elements by setting corresponding activity values to 0;
s22.3: the output of the global attention and the output of the global attention are fused, the Sigmoid function is used for activating data to obtain final attention weight, and then the activated data and the input data are multiplied pixel by pixel;
s22.4: compressing existing data according to the range thereof by a Sigmoid function, and compressing any input to a certain value in a section (0, 1) so as to ensure normalization;
s22.5: the pixel-by-pixel multiplication of the input data with the activated data is performed to perform different location weighting operations on the input data, thereby focusing more on global features and local features.
2. A method of processing image data for multi-scale channel attention as recited in claim 1, wherein,
the input data is subjected to two-dimensional convolution in the step S24, and then only the number of channels is changed, and in the whole MLP architecture, the attention among channels of the input data is estimated in a first-contraction-then-expansion mode, wherein the contraction coefficient is r, the characteristic scale after contraction is h×w×c/r, a ReLU activation function is used, and the characteristic scale after expansion is h×w×c.
CN202310414590.1A 2023-04-18 2023-04-18 Image data processing method for attention of multi-scale channel Active CN116129207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310414590.1A CN116129207B (en) 2023-04-18 2023-04-18 Image data processing method for attention of multi-scale channel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310414590.1A CN116129207B (en) 2023-04-18 2023-04-18 Image data processing method for attention of multi-scale channel

Publications (2)

Publication Number Publication Date
CN116129207A CN116129207A (en) 2023-05-16
CN116129207B true CN116129207B (en) 2023-08-04

Family

ID=86301329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310414590.1A Active CN116129207B (en) 2023-04-18 2023-04-18 Image data processing method for attention of multi-scale channel

Country Status (1)

Country Link
CN (1) CN116129207B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489358A (en) * 2020-03-18 2020-08-04 华中科技大学 Three-dimensional point cloud semantic segmentation method based on deep learning
CN112017198A (en) * 2020-10-16 2020-12-01 湖南师范大学 Right ventricle segmentation method and device based on self-attention mechanism multi-scale features
CN112784764A (en) * 2021-01-27 2021-05-11 南京邮电大学 Expression recognition method and system based on local and global attention mechanism
CN115240201A (en) * 2022-09-21 2022-10-25 江西师范大学 Chinese character generation method for alleviating network mode collapse problem by utilizing Chinese character skeleton information
CN115761258A (en) * 2022-11-10 2023-03-07 山西大学 Image direction prediction method based on multi-scale fusion and attention mechanism
CN115880225A (en) * 2022-11-10 2023-03-31 北京工业大学 Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157307B (en) * 2016-06-27 2018-09-11 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN110853051B (en) * 2019-10-24 2022-06-03 北京航空航天大学 Cerebrovascular image segmentation method based on multi-attention dense connection generation countermeasure network
CN113627295A (en) * 2021-07-28 2021-11-09 中汽创智科技有限公司 Image processing method, device, equipment and storage medium
CN114842553A (en) * 2022-04-18 2022-08-02 安庆师范大学 Behavior detection method based on residual shrinkage structure and non-local attention

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489358A (en) * 2020-03-18 2020-08-04 华中科技大学 Three-dimensional point cloud semantic segmentation method based on deep learning
CN112017198A (en) * 2020-10-16 2020-12-01 湖南师范大学 Right ventricle segmentation method and device based on self-attention mechanism multi-scale features
CN112784764A (en) * 2021-01-27 2021-05-11 南京邮电大学 Expression recognition method and system based on local and global attention mechanism
CN115240201A (en) * 2022-09-21 2022-10-25 江西师范大学 Chinese character generation method for alleviating network mode collapse problem by utilizing Chinese character skeleton information
CN115761258A (en) * 2022-11-10 2023-03-07 山西大学 Image direction prediction method based on multi-scale fusion and attention mechanism
CN115880225A (en) * 2022-11-10 2023-03-31 北京工业大学 Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多尺度特征联合注意力的声纹识别模型研究;章予希;《中国优秀硕士学位论文全文数据库 信息科技辑》;第I136-362页 *

Also Published As

Publication number Publication date
CN116129207A (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN111462126B (en) Semantic image segmentation method and system based on edge enhancement
CN113011329B (en) Multi-scale feature pyramid network-based and dense crowd counting method
CN112329658A (en) Method for improving detection algorithm of YOLOV3 network
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN111340844A (en) Multi-scale feature optical flow learning calculation method based on self-attention mechanism
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN112801027A (en) Vehicle target detection method based on event camera
CN114387521B (en) Remote sensing image building extraction method based on attention mechanism and boundary loss
CN114882530A (en) Pedestrian detection-oriented lightweight convolutional neural network model
CN115984747A (en) Video saliency target detection method based on dynamic filter
CN113361493B (en) Facial expression recognition method robust to different image resolutions
CN116129207B (en) Image data processing method for attention of multi-scale channel
CN116434039B (en) Target detection method based on multiscale split attention mechanism
CN111488839B (en) Target detection method and target detection system
CN111402140A (en) Single image super-resolution reconstruction system and method
CN115409991B (en) Target identification method and device, electronic equipment and storage medium
CN116246109A (en) Multi-scale hole neighborhood attention computing backbone network model and application thereof
CN113810597B (en) Rapid image and scene rendering method based on semi-predictive filtering
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN111489361B (en) Real-time visual target tracking method based on deep feature aggregation of twin network
CN113673271B (en) Double-layer labeling calculation method for secondary loss based on pet detection
CN108629737B (en) Method for improving JPEG format image space resolution
CN113327254A (en) Image segmentation method and system based on U-type network
CN116152580B (en) Data training method for small target in complex scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant