CN113408577A - Image classification method based on attention mechanism - Google Patents

Image classification method based on attention mechanism Download PDF

Info

Publication number
CN113408577A
CN113408577A CN202110517855.1A CN202110517855A CN113408577A CN 113408577 A CN113408577 A CN 113408577A CN 202110517855 A CN202110517855 A CN 202110517855A CN 113408577 A CN113408577 A CN 113408577A
Authority
CN
China
Prior art keywords
channel
attention
attention mechanism
feature map
image classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110517855.1A
Other languages
Chinese (zh)
Inventor
徐智
宁文昌
李智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202110517855.1A priority Critical patent/CN113408577A/en
Publication of CN113408577A publication Critical patent/CN113408577A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of image data processing, and discloses an image classification method based on an attention mechanism, which comprises the steps of carrying out frequency decomposition on each channel of a characteristic diagram based on discrete cosine transform, jointly representing channel global information by a plurality of frequency components, and then calculating channel attention weight information; weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism, then calculating the spatial attention weight of each pixel of the feature map, and then weighting and summing each spatial pixel of the feature map to obtain the spatial attention mechanism; embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolution neural network, and training. According to the invention, the global information of the channel can be better represented by combining a plurality of frequency components in the attention of the channel; a self-attention mechanism is adopted in the space attention to acquire global information on the space dimension of the feature map, and space weight distribution which is better than the space attention of the traditional convolution implementation can be obtained.

Description

Image classification method based on attention mechanism
Technical Field
The invention relates to the field of image data processing, in particular to an attention mechanism-based image classification method.
Background
The invention designs an image classification method based on a convolutional neural network, and embeds a novel attention mechanism in the convolutional neural network. The backbone part of the convolutional neural network is realized by a residual error network, and the attention mechanism comprises channel attention and space attention, so that the related background technology mainly comprises 3 items: a residual network; a channel attention mechanism; the spatial attention mechanism.
The residual error network is a neural network characterized by short circuit connection, wherein the short circuit connection refers to that the outputs of layers with different depths are added in the neural network to be used as the input of a subsequent layer, and the connection mode can enable the network to be more easily fitted with complex functions on one hand, and can realize identity mapping on the other hand, so that the performance of the network is not degraded when the depth is deepened, and a deeper network structure can be trained. Because the residual error network has good feature extraction capability, many tasks related to deep learning, such as target detection, image classification, video understanding and the like, are used as a backbone network for feature extraction. The residual network used in the present invention is ResNet.
The feature map output by each layer of the convolutional neural network comprises a plurality of channels, and each channel captures a visual feature in the input image. For many deep learning tasks, including image classification, different visual features in the input image contribute differently to the classification task. If convolutional neural networks can give more attention to important features, more complex learning tasks can be handled with limited network capacity. The channel attention mechanism is to give different weights to different channels of the feature map, so as to realize different attention degrees to different visual features. The main stream channel attention mechanism generally achieves the purposes of highlighting important features and suppressing irrelevant features by calculating the global information of each channel, modeling the importance of each channel based on the global information, calculating the weights of different channels according to the importance of different channels, and finally weighting different channels.
The feature map output by each layer of the convolutional neural network contains certain spatial information, and each 'pixel' on the feature map corresponds to an area in the input image. When a certain visual feature appears in a certain area of the input image, a larger activation value appears in the corresponding "pixel" of the corresponding channel in the feature map. Different spatial positions of the feature map may reflect features at different spatial positions of the input image. Similar to the channel attention mechanism, features at different spatial locations in the input image have different degrees of importance to the learning task, and if the convolutional neural network can give more attention to important regions in the image, more complex learning tasks can be processed with limited network capacity. The spatial attention mechanism is realized by giving different weights to different spatial positions of the feature map, so that different attention degrees are given to different areas in the input image. The main stream spatial attention mechanism generally calculates global information of each spatial position of a feature map in a channel dimension, then uses an additionally added convolutional layer to generate a spatial attention distribution map, each pixel of the distribution map represents a weight of one spatial position, and finally weights different spatial positions of the feature map by using the spatial attention distribution map, so that the purposes of highlighting features of important regions in an image and weakening features of irrelevant regions in the image can be achieved.
There are many image classification methods to improve the classification effect of the model by embedding the channel attention mechanism and the spatial attention mechanism into the neural network. In the existing channel attention mechanism, a common method for extracting global information is global average pooling or global maximum pooling, but both methods have information loss and cannot sufficiently extract the global information of one channel, so that a weight distribution scheme of the channel attention mechanism is not optimal, and the expression capability of the features extracted by the convolutional neural network is limited.
In the existing spatial attention mechanism, a common convolutional layer is commonly used for calculating spatial attention distribution, but the spatial attention distribution is limited by the size of a convolutional kernel, and global information on a spatial dimension cannot be extracted, so that a weight distribution scheme of the spatial attention mechanism is not globally optimal, and the expression capability of features extracted by a convolutional neural network is also limited.
Disclosure of Invention
The invention aims to provide an attention mechanism-based image classification method, which designs better global information representation methods for a channel attention mechanism and a space attention mechanism respectively, embeds the two attention mechanisms into ResNet simultaneously, improves the image classification effect of ResNet, and balances the improvement of the network classification effect and the increase of the calculated amount by optimizing the embedding mode of the attention mechanism.
In order to achieve the above object, the present invention provides an attention-based image classification method, including: carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components;
calculating channel attention weight information based on the channel global information, and weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism;
calculating a spatial attention weight of each pixel of the feature map based on a self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism;
embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
The specific steps of performing frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing the channel global by using the plurality of frequency components are as follows:
calculating two-dimensional discrete cosine transform for each channel of the characteristic diagram to obtain a plurality of frequency components;
the 3 frequency components are selected to be spliced into a vector.
The specific steps of calculating the channel attention weight information based on the channel global information and weighting each channel of the feature map based on the weight information to obtain the channel attention mechanism are as follows:
reducing the dimension of the vector by using one-dimensional convolution;
performing dimension reduction on the vector obtained by the one-dimensional convolution again by using the full-connection layer;
processing the vectors subjected to dimension reduction of the full-connection layer by a nonlinear activation function;
the vector output by the nonlinear activation function is subjected to dimensionality raising through a layer of full-connection layer to form a dimensionality which is the same as the number of the characteristic graph channels, and the sigmoid function is used for normalization, so that channel attention distribution is obtained;
and weighting the characteristic diagram according to the channel attention distribution to obtain the output of the channel attention module.
The method comprises the following specific steps of calculating a spatial attention weight of each pixel of the feature map based on the self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain the spatial attention mechanism:
calculating three vectors of query, key and value for each pixel of the feature map;
traversing each pixel of the input feature map, and calculating the correlation between each query vector and the key vectors of all pixels of the input feature map to obtain a correlation distribution map;
and carrying out weighted summation on the value vectors of all the pixels of the input feature map based on the correlation distribution map to obtain the pixel value at the corresponding position in the output feature map.
The query is a query vector and represents information related to a learning task, the key is a key vector and represents the attribute of the pixel, and the value is a value vector and represents the feature representation of the pixel.
Embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network specifically comprises the following steps:
embedding channel attention into the shallow building block set of the network: conv2_ x, conv3_ x, conv4_ x, set of deep building blocks that embed spatial attention into the network: conv5_ x;
the channel attention is connected behind the convolution module of the residual block, and the space attention replaces the 3 multiplied by 3 convolution layer in the convolution module of the residual block to obtain an image classification convolution neural network;
and training the image classification convolutional neural network.
The invention discloses an attention mechanism-based image classification method, which comprises the following steps: carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components; calculating channel attention weight information based on the channel global information, and weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism; calculating a spatial attention weight of each pixel of the feature map based on a self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism; embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
Thereby having the following advantages:
1. in the channel attention, a plurality of frequency components are obtained through discrete cosine transform, and due to complementarity among the frequency components, the global information of the channel can be better represented by combining the frequency components;
2. in spatial attention, a self-attention mechanism is employed to obtain global information in the feature map spatial dimension. Since each output neuron of the self-attention mechanism has a global receptive field, a spatial weight distribution that is superior to the spatial attention of conventional convolution implementations can be obtained.
3. The channel attention and the space attention are respectively embedded into the shallow layer and the deep layer of the convolutional neural network, and due to the fact that the number of channels of the shallow layer of the network is small, the space dimensionality of the deep layer of the network is small, too much calculation amount cannot be additionally increased through the two embedded attention mechanisms, the network can utilize the advantages of the two attention mechanisms, and the image classification effect of the network is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a channel attention calculation method of the present invention;
FIG. 2 is a block diagram of the spatial attention of the present invention;
FIG. 3 is a graph of the comparison between the residual block of the present invention after embedding the attention module and the ResNet original residual block;
FIG. 4 is a schematic illustration of the embedded position of the channel attention and spatial attention in ResNet of the present invention;
FIG. 5 is a flow chart of an attention-based image classification method of the present invention;
FIG. 6 is a flowchart of the present invention, in which each channel of the feature map is frequency-decomposed based on discrete cosine transform to obtain a plurality of frequency components, and the global information of the channel is jointly represented by the plurality of frequency components;
FIG. 7 is a flowchart of the present invention for computing channel attention weight information based on channel global information, and weighting each channel of a feature map based on the weight information to obtain a channel attention mechanism;
FIG. 8 is a flow chart of the present invention for computing a spatial attention weight for each pixel of a feature map based on a self-attention mechanism, and then summing the spatial attention weights for each pixel of the feature map to obtain a spatial attention mechanism;
FIG. 9 is a flow chart of embedding a channel attention mechanism and a spatial attention mechanism into ResNet to obtain an image classification convolutional neural network and training the image classification convolutional neural network according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1 to 9, the present invention provides an image classification method based on attention mechanism, including:
s101, carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components;
the method comprises the following specific steps:
s201, calculating two-dimensional discrete cosine transform for each channel of the characteristic diagram to obtain a plurality of frequency components;
in order to calculate the attention of the channels, global information in one channel needs to be acquired firstly, and the invention performs discrete cosine transform on each channel of the characteristic diagram and then jointly represents the global information of one channel by using a plurality of frequency components. The two-dimensional discrete cosine transform can be written as:
Figure BDA0003062452780000051
wherein F represents a feature map, C, W, H represents the number of channels, width and height of the feature map, respectively, Fk(i, j) is the ith, j position in the kth channel of the feature map,
Figure BDA0003062452780000052
then represents channel FkThe h, w components in the spectrum of the discrete cosine transform of (1).
In the existing channel attention mechanism, global average pooling is commonly used to obtain global information in one channel, and the definition of global average pooling is as follows:
Figure BDA0003062452780000061
the lowest frequency component of the discrete cosine transform can be found by combining the formulas (1) and (2)
Figure BDA0003062452780000067
Comprises the following steps:
Figure BDA0003062452780000062
as can be seen from equation (3), the lowest frequency component of the discrete cosine transform is proportional to the global average pooling result, which means that the global information extracted from each channel of the feature map by the existing channel attention mechanism is only the lowest frequency component of the channel.
S202, selecting 3 frequency components to splice into a vector;
in calculating the channel attention, the optimal result will be theoretically obtained if all frequency components are taken into account. But the number of frequency components resulting from the discrete cosine transform is the same as the number of dimensions of the original signal, i.e. for one F e RC×H×WThe obtained frequency components will have the number of C × H × W, the calculation of all frequency components will make the calculation complexity too high, and many frequency components are small in the signal and can be ignored when calculating the attention of the channel.
And because the low-frequency component of the image contributes more to the classification than the high-frequency component, when the method is used for calculating the attention of channels, each channel only uses the component with lower frequency in discrete cosine transform
Figure BDA0003062452780000063
And
Figure BDA0003062452780000064
the channel attention calculation method is shown in fig. 1. Feature F is the input to the attention module and feature a is the output of the attention module. The channel attention is divided into two steps of global information extraction and attention distribution calculation. When global information is extracted, the k channel of the feature map is calculated according to the formula (1)
Figure BDA0003062452780000065
And
Figure BDA0003062452780000066
three frequency components, and the three frequency components of all channels are combined into three vectors, which are denoted as T0,0、T0,1And T1,0Then, againWill T0,0、T0,1And T1,0And (5) splicing to obtain the output with the dimension of 3 × C.
S102, calculating channel attention weight information based on the channel global information; weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism;
the method comprises the following specific steps:
s301, reducing the dimension of the vector by using one-dimensional convolution;
wherein the convolution kernel size of the one-dimensional convolution layer is C, the convolution step length is also C, and the total C/r is C1A filter. The dimension of the obtained one-dimensional convolution layer is
Figure BDA0003062452780000071
A one-dimensional vector of (2), wherein r1Is a hyperparametric and r1Is greater than 1. This step may reduce redundancy of channel information.
S302, performing dimension reduction on the vector obtained by the one-dimensional convolution again by using the full-connection layer;
dimensionality reduction of the vector into with fully-connected layers
Figure BDA0003062452780000072
Wherein r is2Is also a hyperparametric and r2Is a multiple of 3.
S303, processing the vectors subjected to the dimension reduction of the full connection layer by a nonlinear activation function;
the above dimension is defined as
Figure BDA0003062452780000073
Is processed by a non-linear activation function ReLU.
S304, the vectors output by the nonlinear activation function are subjected to dimensionality raising through a layer of full-connection layer to form a dimensionality which is the same as the number of the channels of the characteristic diagram, and the sigmoid function is used for normalization, so that channel attention distribution is obtained;
and (3) performing dimensionality raising on the vector output by the nonlinear activation function to form 1 × C through a layer of full-connection layer, normalizing all elements of the vector to be between [0 and 1] by using a sigmoid function, wherein the normalized vector is the channel attention distribution, and each element represents the weight of one channel in the feature map.
S305, weighting the characteristic diagram according to the channel attention distribution to obtain the output of the channel attention module.
S103, calculating a spatial attention weight of each pixel of the feature map based on the self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism;
the invention adopts a self-attention mechanism to realize a space attention mechanism, and the method comprises the following specific steps:
s401, calculating three vectors of query, key and value for each pixel of the feature map based on a self-attention mechanism;
the query is a query vector representing information related to a learning task, the key is a key vector representing an attribute of the pixel itself, and the value is a value vector representing a feature representation of the pixel.
Let F be the signature of the input from the attention mechanism and A be the signature of the output. Wherein FjIs the feature vector at the jth spatial position of the feature map F. To calculate the query, key and value vectors for each eigenvector, each eigenvector is multiplied by three matrices W, respectivelyθ、WφAnd WgAs shown in formula (5):
Figure BDA0003062452780000074
wherein theta (F)j)、φ(Fj) And g (F)j) Respectively represent vectors FjQuery, key and value vectors. Wθ、WφAnd WgAre all learnable matrices, implemented with a convolution of 1 x 1 in a convolutional neural network.
S402, traversing each pixel of the input feature map, and calculating the correlation between the query vector of each pixel and the key vectors of all pixels of the input feature map to obtain a correlation distribution map;
as shown in equation (6), the correlation distribution map is denoted as M for the pixel at the ith position of the feature mapi∈RH×WThen of MThe j-th position is:
Figure BDA0003062452780000081
in the formula (6), exp (θ (F))i)Tφ(Fj) Compute the correlation between the query vector for the ith location and the key vector for the jth location in the feature map.
Figure BDA0003062452780000083
It is the normalized coefficient of the correlation.
S403, carrying out weighted summation on the value vectors of all the pixels of the input feature map based on the correlation distribution map to obtain pixel values at corresponding positions in the output feature map.
As shown in equation (7), the pixel value at the ith position of the output feature map is:
Figure BDA0003062452780000082
s105, embedding a channel attention mechanism and a space attention mechanism into the ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
The method comprises the following specific steps:
s601 embeds channel attention into the shallow building block set of the network: conv2_ x, conv3_ x, conv4_ x, set of deep building blocks that embed spatial attention into the network: conv5_ x;
the location of the attention mechanism embedding of the present invention into the ResNet network is shown in FIG. 4. conv1 is the first convolutional layer of ResNet, the shallow blocks in the figure refer to conv2_ x, conv3_ x and conv4_ x in ResNet, and the deep blocks refer to conv5_ x, XNsRepresenting the structural block repetition N within the dotted linesSub, NsThe value of (c) is determined by the ordinal number of the structure block group and the depth of ResNet, such as N in ResNet50, conv2_ x, conv3_ x, conv4_ x and conv5_ xsTake 3, 4, 6 and 3, respectively.
S602 channel attention follows the convolution block of the residual block, while spatial attention replaces the 3 x 3 convolution layer in the convolution block of the residual block.
The manner in which the attention mechanism of the present invention is embedded in the ResNet network is shown in FIG. 3. Fig. 3 shows the comparison between the residual block structure after ResNet embedded channel attention, spatial attention, respectively, and the original residual block structure. Where (a) is the original residual block, (b) is the residual block after embedding the channel attention, and (c) is the residual block after embedding the spatial attention.
S603, training the image classification convolution neural network obtained in the step.
And training the ResNet embedded with the attention mechanism by using a training method of a common convolutional neural network in image classification, so that the image classification method based on the attention mechanism can be realized.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. An attention mechanism-based image classification method is characterized in that,
the method comprises the following steps: carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components;
calculating channel attention weight information based on the channel global information, and weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism;
calculating a spatial attention weight of each pixel of the feature map based on a self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism;
embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
2. The method of image classification based on attention mechanism as claimed in claim 1,
the specific steps of performing frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing the channel global by using the plurality of frequency components are as follows:
calculating two-dimensional discrete cosine transform for each channel of the characteristic diagram to obtain a plurality of frequency components;
the 3 frequency components are selected to be spliced into a vector.
3. The method of image classification based on attention mechanism as claimed in claim 2,
the specific steps of calculating the channel attention weight information based on the channel global information and weighting each channel of the feature map based on the weight information to obtain the channel attention mechanism are as follows:
reducing the dimension of the vector by using one-dimensional convolution;
performing dimension reduction on the vector obtained by the one-dimensional convolution again by using the full-connection layer;
processing the vectors subjected to dimension reduction of the full-connection layer by a nonlinear activation function;
the vector output by the nonlinear activation function is subjected to dimensionality raising through a layer of full-connection layer to form a dimensionality which is the same as the number of the characteristic graph channels, and the sigmoid function is used for normalization, so that channel attention distribution is obtained;
and weighting the characteristic diagram according to the channel attention distribution to obtain the output of the channel attention module.
4. The method of image classification based on attention mechanism as claimed in claim 1,
the specific steps of calculating the spatial attention weight of each pixel of the feature map based on the self-attention mechanism and then weighting and summing the spatial pixels of the feature map to obtain the spatial attention mechanism are as follows:
calculating three vectors of query, key and value for each pixel of the feature map;
traversing each pixel of the input feature map, and calculating the correlation between each query vector and the key vectors of all pixels of the input feature map to obtain a correlation distribution map;
and carrying out weighted summation on the value vectors of all the pixels of the input feature map based on the correlation distribution map to obtain the pixel value at the corresponding position in the output feature map.
5. The method of image classification based on attention mechanism as claimed in claim 4,
the query is a query vector representing information related to a learning task, the key is a key vector representing an attribute of the pixel itself, and the value is a value vector representing a feature representation of the pixel.
6. The method of image classification based on attention mechanism as claimed in claim 1,
the method comprises the following specific steps of embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network:
embedding channel attention into the shallow building block set of the network: conv2_ x, conv3_ x, conv4_ x, set of deep building blocks that embed spatial attention into the network: conv5_ x;
the channel attention is connected behind the convolution module of the residual block, and the space attention replaces the 3 multiplied by 3 convolution layer in the convolution module of the residual block to obtain an image classification convolution neural network;
and training the image classification convolutional neural network.
CN202110517855.1A 2021-05-12 2021-05-12 Image classification method based on attention mechanism Pending CN113408577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110517855.1A CN113408577A (en) 2021-05-12 2021-05-12 Image classification method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110517855.1A CN113408577A (en) 2021-05-12 2021-05-12 Image classification method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN113408577A true CN113408577A (en) 2021-09-17

Family

ID=77678423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110517855.1A Pending CN113408577A (en) 2021-05-12 2021-05-12 Image classification method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN113408577A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822246A (en) * 2021-11-22 2021-12-21 山东交通学院 Vehicle weight identification method based on global reference attention mechanism
CN114064954A (en) * 2022-01-18 2022-02-18 北京中科开迪软件有限公司 Method and system for cleaning images in optical disk library
CN114118140A (en) * 2021-10-29 2022-03-01 新黎明科技股份有限公司 Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing
CN115067945A (en) * 2022-08-22 2022-09-20 深圳市海清视讯科技有限公司 Fatigue detection method, device, equipment and storage medium
CN117422939A (en) * 2023-12-15 2024-01-19 武汉纺织大学 Breast tumor classification method and system based on ultrasonic feature extraction
CN117635962A (en) * 2024-01-25 2024-03-01 云南大学 Multi-frequency fusion-based channel attention image processing method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084794A (en) * 2019-04-22 2019-08-02 华南理工大学 A kind of cutaneum carcinoma image identification method based on attention convolutional neural networks
WO2019153908A1 (en) * 2018-02-11 2019-08-15 北京达佳互联信息技术有限公司 Image recognition method and system based on attention model
CN110309800A (en) * 2019-07-05 2019-10-08 中国科学技术大学 A kind of forest fires smoke detection method and device
CN111353539A (en) * 2020-02-29 2020-06-30 武汉大学 Cervical OCT image classification method and system based on double-path attention convolutional neural network
CN111651504A (en) * 2020-06-03 2020-09-11 湖南大学 Multi-element time sequence multilayer space-time dependence modeling method based on deep learning
CN111898709A (en) * 2020-09-30 2020-11-06 中国人民解放军国防科技大学 Image classification method and device
CN112767451A (en) * 2021-02-01 2021-05-07 福州大学 Crowd distribution prediction method and system based on double-current convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153908A1 (en) * 2018-02-11 2019-08-15 北京达佳互联信息技术有限公司 Image recognition method and system based on attention model
CN110084794A (en) * 2019-04-22 2019-08-02 华南理工大学 A kind of cutaneum carcinoma image identification method based on attention convolutional neural networks
CN110309800A (en) * 2019-07-05 2019-10-08 中国科学技术大学 A kind of forest fires smoke detection method and device
CN111353539A (en) * 2020-02-29 2020-06-30 武汉大学 Cervical OCT image classification method and system based on double-path attention convolutional neural network
CN111651504A (en) * 2020-06-03 2020-09-11 湖南大学 Multi-element time sequence multilayer space-time dependence modeling method based on deep learning
CN111898709A (en) * 2020-09-30 2020-11-06 中国人民解放军国防科技大学 Image classification method and device
CN112767451A (en) * 2021-02-01 2021-05-07 福州大学 Crowd distribution prediction method and system based on double-current convolutional neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ARAVIND SRINIVAS 等: "Bottleneck Transformers for Visual Recognition", 《HTTPS://ARXIV.ORG/ABS/2101.11605V1》 *
朱迎新: "基于空间和通道注意力机制的行人再识别技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李娜 等: "基于多尺度注意力网络的行人属性识别算法", 《激光与光电子学进展》 *
湃森: "从频域角度重新思考注意力机制——FcaNet", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/339215696》 *
陶威: "基于注意力机制的脑电情绪识别方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118140A (en) * 2021-10-29 2022-03-01 新黎明科技股份有限公司 Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing
CN113822246A (en) * 2021-11-22 2021-12-21 山东交通学院 Vehicle weight identification method based on global reference attention mechanism
CN113822246B (en) * 2021-11-22 2022-02-18 山东交通学院 Vehicle weight identification method based on global reference attention mechanism
CN114064954A (en) * 2022-01-18 2022-02-18 北京中科开迪软件有限公司 Method and system for cleaning images in optical disk library
CN114064954B (en) * 2022-01-18 2022-05-10 北京中科开迪软件有限公司 Method and system for cleaning images in optical disk library
CN115067945A (en) * 2022-08-22 2022-09-20 深圳市海清视讯科技有限公司 Fatigue detection method, device, equipment and storage medium
CN117422939A (en) * 2023-12-15 2024-01-19 武汉纺织大学 Breast tumor classification method and system based on ultrasonic feature extraction
CN117422939B (en) * 2023-12-15 2024-03-08 武汉纺织大学 Breast tumor classification method and system based on ultrasonic feature extraction
CN117635962A (en) * 2024-01-25 2024-03-01 云南大学 Multi-frequency fusion-based channel attention image processing method
CN117635962B (en) * 2024-01-25 2024-04-12 云南大学 Multi-frequency fusion-based channel attention image processing method

Similar Documents

Publication Publication Date Title
CN113408577A (en) Image classification method based on attention mechanism
CN106991646B (en) Image super-resolution method based on dense connection network
CN109949255B (en) Image reconstruction method and device
CN112132023A (en) Crowd counting method based on multi-scale context enhanced network
US11132392B2 (en) Image retrieval method, image retrieval apparatus, image retrieval device and medium
CN110796166B (en) Attention mechanism-based multitask image processing method
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN113011329A (en) Pyramid network based on multi-scale features and dense crowd counting method
CN108197669B (en) Feature training method and device of convolutional neural network
WO2021164725A1 (en) Method and device for removing moiré patterns
CN109685772B (en) No-reference stereo image quality evaluation method based on registration distortion representation
CN113610146A (en) Method for realizing image classification based on knowledge distillation enhanced by interlayer feature extraction
CN111340077A (en) Disparity map acquisition method and device based on attention mechanism
CN111339862A (en) Remote sensing scene classification method and device based on channel attention mechanism
CN111353988A (en) KNN dynamic self-adaptive double-image convolution image segmentation method and system
JP2019197445A (en) Image recognition device, image recognition method, and program
CN105160679A (en) Local three-dimensional matching algorithm based on combination of adaptive weighting and image segmentation
CN117058235A (en) Visual positioning method crossing various indoor scenes
CN111598781B (en) Image super-resolution method based on hybrid high-order attention network
CN114830168A (en) Image reconstruction method, electronic device, and computer-readable storage medium
CN111695470A (en) Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition
CN112634161B (en) Reflected light removing method based on two-stage reflected light eliminating network and pixel loss
CN114119698B (en) Unsupervised monocular depth estimation method based on attention mechanism
CN116486203B (en) Single-target tracking method based on twin network and online template updating
CN111223120B (en) Point cloud semantic segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210917

RJ01 Rejection of invention patent application after publication