CN113408577A - Image classification method based on attention mechanism - Google Patents
Image classification method based on attention mechanism Download PDFInfo
- Publication number
- CN113408577A CN113408577A CN202110517855.1A CN202110517855A CN113408577A CN 113408577 A CN113408577 A CN 113408577A CN 202110517855 A CN202110517855 A CN 202110517855A CN 113408577 A CN113408577 A CN 113408577A
- Authority
- CN
- China
- Prior art keywords
- channel
- attention
- attention mechanism
- feature map
- image classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of image data processing, and discloses an image classification method based on an attention mechanism, which comprises the steps of carrying out frequency decomposition on each channel of a characteristic diagram based on discrete cosine transform, jointly representing channel global information by a plurality of frequency components, and then calculating channel attention weight information; weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism, then calculating the spatial attention weight of each pixel of the feature map, and then weighting and summing each spatial pixel of the feature map to obtain the spatial attention mechanism; embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolution neural network, and training. According to the invention, the global information of the channel can be better represented by combining a plurality of frequency components in the attention of the channel; a self-attention mechanism is adopted in the space attention to acquire global information on the space dimension of the feature map, and space weight distribution which is better than the space attention of the traditional convolution implementation can be obtained.
Description
Technical Field
The invention relates to the field of image data processing, in particular to an attention mechanism-based image classification method.
Background
The invention designs an image classification method based on a convolutional neural network, and embeds a novel attention mechanism in the convolutional neural network. The backbone part of the convolutional neural network is realized by a residual error network, and the attention mechanism comprises channel attention and space attention, so that the related background technology mainly comprises 3 items: a residual network; a channel attention mechanism; the spatial attention mechanism.
The residual error network is a neural network characterized by short circuit connection, wherein the short circuit connection refers to that the outputs of layers with different depths are added in the neural network to be used as the input of a subsequent layer, and the connection mode can enable the network to be more easily fitted with complex functions on one hand, and can realize identity mapping on the other hand, so that the performance of the network is not degraded when the depth is deepened, and a deeper network structure can be trained. Because the residual error network has good feature extraction capability, many tasks related to deep learning, such as target detection, image classification, video understanding and the like, are used as a backbone network for feature extraction. The residual network used in the present invention is ResNet.
The feature map output by each layer of the convolutional neural network comprises a plurality of channels, and each channel captures a visual feature in the input image. For many deep learning tasks, including image classification, different visual features in the input image contribute differently to the classification task. If convolutional neural networks can give more attention to important features, more complex learning tasks can be handled with limited network capacity. The channel attention mechanism is to give different weights to different channels of the feature map, so as to realize different attention degrees to different visual features. The main stream channel attention mechanism generally achieves the purposes of highlighting important features and suppressing irrelevant features by calculating the global information of each channel, modeling the importance of each channel based on the global information, calculating the weights of different channels according to the importance of different channels, and finally weighting different channels.
The feature map output by each layer of the convolutional neural network contains certain spatial information, and each 'pixel' on the feature map corresponds to an area in the input image. When a certain visual feature appears in a certain area of the input image, a larger activation value appears in the corresponding "pixel" of the corresponding channel in the feature map. Different spatial positions of the feature map may reflect features at different spatial positions of the input image. Similar to the channel attention mechanism, features at different spatial locations in the input image have different degrees of importance to the learning task, and if the convolutional neural network can give more attention to important regions in the image, more complex learning tasks can be processed with limited network capacity. The spatial attention mechanism is realized by giving different weights to different spatial positions of the feature map, so that different attention degrees are given to different areas in the input image. The main stream spatial attention mechanism generally calculates global information of each spatial position of a feature map in a channel dimension, then uses an additionally added convolutional layer to generate a spatial attention distribution map, each pixel of the distribution map represents a weight of one spatial position, and finally weights different spatial positions of the feature map by using the spatial attention distribution map, so that the purposes of highlighting features of important regions in an image and weakening features of irrelevant regions in the image can be achieved.
There are many image classification methods to improve the classification effect of the model by embedding the channel attention mechanism and the spatial attention mechanism into the neural network. In the existing channel attention mechanism, a common method for extracting global information is global average pooling or global maximum pooling, but both methods have information loss and cannot sufficiently extract the global information of one channel, so that a weight distribution scheme of the channel attention mechanism is not optimal, and the expression capability of the features extracted by the convolutional neural network is limited.
In the existing spatial attention mechanism, a common convolutional layer is commonly used for calculating spatial attention distribution, but the spatial attention distribution is limited by the size of a convolutional kernel, and global information on a spatial dimension cannot be extracted, so that a weight distribution scheme of the spatial attention mechanism is not globally optimal, and the expression capability of features extracted by a convolutional neural network is also limited.
Disclosure of Invention
The invention aims to provide an attention mechanism-based image classification method, which designs better global information representation methods for a channel attention mechanism and a space attention mechanism respectively, embeds the two attention mechanisms into ResNet simultaneously, improves the image classification effect of ResNet, and balances the improvement of the network classification effect and the increase of the calculated amount by optimizing the embedding mode of the attention mechanism.
In order to achieve the above object, the present invention provides an attention-based image classification method, including: carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components;
calculating channel attention weight information based on the channel global information, and weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism;
calculating a spatial attention weight of each pixel of the feature map based on a self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism;
embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
The specific steps of performing frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing the channel global by using the plurality of frequency components are as follows:
calculating two-dimensional discrete cosine transform for each channel of the characteristic diagram to obtain a plurality of frequency components;
the 3 frequency components are selected to be spliced into a vector.
The specific steps of calculating the channel attention weight information based on the channel global information and weighting each channel of the feature map based on the weight information to obtain the channel attention mechanism are as follows:
reducing the dimension of the vector by using one-dimensional convolution;
performing dimension reduction on the vector obtained by the one-dimensional convolution again by using the full-connection layer;
processing the vectors subjected to dimension reduction of the full-connection layer by a nonlinear activation function;
the vector output by the nonlinear activation function is subjected to dimensionality raising through a layer of full-connection layer to form a dimensionality which is the same as the number of the characteristic graph channels, and the sigmoid function is used for normalization, so that channel attention distribution is obtained;
and weighting the characteristic diagram according to the channel attention distribution to obtain the output of the channel attention module.
The method comprises the following specific steps of calculating a spatial attention weight of each pixel of the feature map based on the self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain the spatial attention mechanism:
calculating three vectors of query, key and value for each pixel of the feature map;
traversing each pixel of the input feature map, and calculating the correlation between each query vector and the key vectors of all pixels of the input feature map to obtain a correlation distribution map;
and carrying out weighted summation on the value vectors of all the pixels of the input feature map based on the correlation distribution map to obtain the pixel value at the corresponding position in the output feature map.
The query is a query vector and represents information related to a learning task, the key is a key vector and represents the attribute of the pixel, and the value is a value vector and represents the feature representation of the pixel.
Embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network specifically comprises the following steps:
embedding channel attention into the shallow building block set of the network: conv2_ x, conv3_ x, conv4_ x, set of deep building blocks that embed spatial attention into the network: conv5_ x;
the channel attention is connected behind the convolution module of the residual block, and the space attention replaces the 3 multiplied by 3 convolution layer in the convolution module of the residual block to obtain an image classification convolution neural network;
and training the image classification convolutional neural network.
The invention discloses an attention mechanism-based image classification method, which comprises the following steps: carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components; calculating channel attention weight information based on the channel global information, and weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism; calculating a spatial attention weight of each pixel of the feature map based on a self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism; embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
Thereby having the following advantages:
1. in the channel attention, a plurality of frequency components are obtained through discrete cosine transform, and due to complementarity among the frequency components, the global information of the channel can be better represented by combining the frequency components;
2. in spatial attention, a self-attention mechanism is employed to obtain global information in the feature map spatial dimension. Since each output neuron of the self-attention mechanism has a global receptive field, a spatial weight distribution that is superior to the spatial attention of conventional convolution implementations can be obtained.
3. The channel attention and the space attention are respectively embedded into the shallow layer and the deep layer of the convolutional neural network, and due to the fact that the number of channels of the shallow layer of the network is small, the space dimensionality of the deep layer of the network is small, too much calculation amount cannot be additionally increased through the two embedded attention mechanisms, the network can utilize the advantages of the two attention mechanisms, and the image classification effect of the network is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a channel attention calculation method of the present invention;
FIG. 2 is a block diagram of the spatial attention of the present invention;
FIG. 3 is a graph of the comparison between the residual block of the present invention after embedding the attention module and the ResNet original residual block;
FIG. 4 is a schematic illustration of the embedded position of the channel attention and spatial attention in ResNet of the present invention;
FIG. 5 is a flow chart of an attention-based image classification method of the present invention;
FIG. 6 is a flowchart of the present invention, in which each channel of the feature map is frequency-decomposed based on discrete cosine transform to obtain a plurality of frequency components, and the global information of the channel is jointly represented by the plurality of frequency components;
FIG. 7 is a flowchart of the present invention for computing channel attention weight information based on channel global information, and weighting each channel of a feature map based on the weight information to obtain a channel attention mechanism;
FIG. 8 is a flow chart of the present invention for computing a spatial attention weight for each pixel of a feature map based on a self-attention mechanism, and then summing the spatial attention weights for each pixel of the feature map to obtain a spatial attention mechanism;
FIG. 9 is a flow chart of embedding a channel attention mechanism and a spatial attention mechanism into ResNet to obtain an image classification convolutional neural network and training the image classification convolutional neural network according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1 to 9, the present invention provides an image classification method based on attention mechanism, including:
s101, carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components;
the method comprises the following specific steps:
s201, calculating two-dimensional discrete cosine transform for each channel of the characteristic diagram to obtain a plurality of frequency components;
in order to calculate the attention of the channels, global information in one channel needs to be acquired firstly, and the invention performs discrete cosine transform on each channel of the characteristic diagram and then jointly represents the global information of one channel by using a plurality of frequency components. The two-dimensional discrete cosine transform can be written as:
wherein F represents a feature map, C, W, H represents the number of channels, width and height of the feature map, respectively, Fk(i, j) is the ith, j position in the kth channel of the feature map,then represents channel FkThe h, w components in the spectrum of the discrete cosine transform of (1).
In the existing channel attention mechanism, global average pooling is commonly used to obtain global information in one channel, and the definition of global average pooling is as follows:
the lowest frequency component of the discrete cosine transform can be found by combining the formulas (1) and (2)Comprises the following steps:
as can be seen from equation (3), the lowest frequency component of the discrete cosine transform is proportional to the global average pooling result, which means that the global information extracted from each channel of the feature map by the existing channel attention mechanism is only the lowest frequency component of the channel.
S202, selecting 3 frequency components to splice into a vector;
in calculating the channel attention, the optimal result will be theoretically obtained if all frequency components are taken into account. But the number of frequency components resulting from the discrete cosine transform is the same as the number of dimensions of the original signal, i.e. for one F e RC×H×WThe obtained frequency components will have the number of C × H × W, the calculation of all frequency components will make the calculation complexity too high, and many frequency components are small in the signal and can be ignored when calculating the attention of the channel.
And because the low-frequency component of the image contributes more to the classification than the high-frequency component, when the method is used for calculating the attention of channels, each channel only uses the component with lower frequency in discrete cosine transformAndthe channel attention calculation method is shown in fig. 1. Feature F is the input to the attention module and feature a is the output of the attention module. The channel attention is divided into two steps of global information extraction and attention distribution calculation. When global information is extracted, the k channel of the feature map is calculated according to the formula (1)Andthree frequency components, and the three frequency components of all channels are combined into three vectors, which are denoted as T0,0、T0,1And T1,0Then, againWill T0,0、T0,1And T1,0And (5) splicing to obtain the output with the dimension of 3 × C.
S102, calculating channel attention weight information based on the channel global information; weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism;
the method comprises the following specific steps:
s301, reducing the dimension of the vector by using one-dimensional convolution;
wherein the convolution kernel size of the one-dimensional convolution layer is C, the convolution step length is also C, and the total C/r is C1A filter. The dimension of the obtained one-dimensional convolution layer isA one-dimensional vector of (2), wherein r1Is a hyperparametric and r1Is greater than 1. This step may reduce redundancy of channel information.
S302, performing dimension reduction on the vector obtained by the one-dimensional convolution again by using the full-connection layer;
dimensionality reduction of the vector into with fully-connected layersWherein r is2Is also a hyperparametric and r2Is a multiple of 3.
S303, processing the vectors subjected to the dimension reduction of the full connection layer by a nonlinear activation function;
S304, the vectors output by the nonlinear activation function are subjected to dimensionality raising through a layer of full-connection layer to form a dimensionality which is the same as the number of the channels of the characteristic diagram, and the sigmoid function is used for normalization, so that channel attention distribution is obtained;
and (3) performing dimensionality raising on the vector output by the nonlinear activation function to form 1 × C through a layer of full-connection layer, normalizing all elements of the vector to be between [0 and 1] by using a sigmoid function, wherein the normalized vector is the channel attention distribution, and each element represents the weight of one channel in the feature map.
S305, weighting the characteristic diagram according to the channel attention distribution to obtain the output of the channel attention module.
S103, calculating a spatial attention weight of each pixel of the feature map based on the self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism;
the invention adopts a self-attention mechanism to realize a space attention mechanism, and the method comprises the following specific steps:
s401, calculating three vectors of query, key and value for each pixel of the feature map based on a self-attention mechanism;
the query is a query vector representing information related to a learning task, the key is a key vector representing an attribute of the pixel itself, and the value is a value vector representing a feature representation of the pixel.
Let F be the signature of the input from the attention mechanism and A be the signature of the output. Wherein FjIs the feature vector at the jth spatial position of the feature map F. To calculate the query, key and value vectors for each eigenvector, each eigenvector is multiplied by three matrices W, respectivelyθ、WφAnd WgAs shown in formula (5):
wherein theta (F)j)、φ(Fj) And g (F)j) Respectively represent vectors FjQuery, key and value vectors. Wθ、WφAnd WgAre all learnable matrices, implemented with a convolution of 1 x 1 in a convolutional neural network.
S402, traversing each pixel of the input feature map, and calculating the correlation between the query vector of each pixel and the key vectors of all pixels of the input feature map to obtain a correlation distribution map;
as shown in equation (6), the correlation distribution map is denoted as M for the pixel at the ith position of the feature mapi∈RH×WThen of MThe j-th position is:
in the formula (6), exp (θ (F))i)Tφ(Fj) Compute the correlation between the query vector for the ith location and the key vector for the jth location in the feature map.It is the normalized coefficient of the correlation.
S403, carrying out weighted summation on the value vectors of all the pixels of the input feature map based on the correlation distribution map to obtain pixel values at corresponding positions in the output feature map.
As shown in equation (7), the pixel value at the ith position of the output feature map is:
s105, embedding a channel attention mechanism and a space attention mechanism into the ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
The method comprises the following specific steps:
s601 embeds channel attention into the shallow building block set of the network: conv2_ x, conv3_ x, conv4_ x, set of deep building blocks that embed spatial attention into the network: conv5_ x;
the location of the attention mechanism embedding of the present invention into the ResNet network is shown in FIG. 4. conv1 is the first convolutional layer of ResNet, the shallow blocks in the figure refer to conv2_ x, conv3_ x and conv4_ x in ResNet, and the deep blocks refer to conv5_ x, XNsRepresenting the structural block repetition N within the dotted linesSub, NsThe value of (c) is determined by the ordinal number of the structure block group and the depth of ResNet, such as N in ResNet50, conv2_ x, conv3_ x, conv4_ x and conv5_ xsTake 3, 4, 6 and 3, respectively.
S602 channel attention follows the convolution block of the residual block, while spatial attention replaces the 3 x 3 convolution layer in the convolution block of the residual block.
The manner in which the attention mechanism of the present invention is embedded in the ResNet network is shown in FIG. 3. Fig. 3 shows the comparison between the residual block structure after ResNet embedded channel attention, spatial attention, respectively, and the original residual block structure. Where (a) is the original residual block, (b) is the residual block after embedding the channel attention, and (c) is the residual block after embedding the spatial attention.
S603, training the image classification convolution neural network obtained in the step.
And training the ResNet embedded with the attention mechanism by using a training method of a common convolutional neural network in image classification, so that the image classification method based on the attention mechanism can be realized.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. An attention mechanism-based image classification method is characterized in that,
the method comprises the following steps: carrying out frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing channel global information by using the frequency components;
calculating channel attention weight information based on the channel global information, and weighting each channel of the feature map based on the weight information to obtain a channel attention mechanism;
calculating a spatial attention weight of each pixel of the feature map based on a self-attention mechanism, and weighting and summing the spatial pixels of the feature map to obtain a spatial attention mechanism;
embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network.
2. The method of image classification based on attention mechanism as claimed in claim 1,
the specific steps of performing frequency decomposition on each channel of the characteristic diagram based on discrete cosine transform to obtain a plurality of frequency components, and jointly representing the channel global by using the plurality of frequency components are as follows:
calculating two-dimensional discrete cosine transform for each channel of the characteristic diagram to obtain a plurality of frequency components;
the 3 frequency components are selected to be spliced into a vector.
3. The method of image classification based on attention mechanism as claimed in claim 2,
the specific steps of calculating the channel attention weight information based on the channel global information and weighting each channel of the feature map based on the weight information to obtain the channel attention mechanism are as follows:
reducing the dimension of the vector by using one-dimensional convolution;
performing dimension reduction on the vector obtained by the one-dimensional convolution again by using the full-connection layer;
processing the vectors subjected to dimension reduction of the full-connection layer by a nonlinear activation function;
the vector output by the nonlinear activation function is subjected to dimensionality raising through a layer of full-connection layer to form a dimensionality which is the same as the number of the characteristic graph channels, and the sigmoid function is used for normalization, so that channel attention distribution is obtained;
and weighting the characteristic diagram according to the channel attention distribution to obtain the output of the channel attention module.
4. The method of image classification based on attention mechanism as claimed in claim 1,
the specific steps of calculating the spatial attention weight of each pixel of the feature map based on the self-attention mechanism and then weighting and summing the spatial pixels of the feature map to obtain the spatial attention mechanism are as follows:
calculating three vectors of query, key and value for each pixel of the feature map;
traversing each pixel of the input feature map, and calculating the correlation between each query vector and the key vectors of all pixels of the input feature map to obtain a correlation distribution map;
and carrying out weighted summation on the value vectors of all the pixels of the input feature map based on the correlation distribution map to obtain the pixel value at the corresponding position in the output feature map.
5. The method of image classification based on attention mechanism as claimed in claim 4,
the query is a query vector representing information related to a learning task, the key is a key vector representing an attribute of the pixel itself, and the value is a value vector representing a feature representation of the pixel.
6. The method of image classification based on attention mechanism as claimed in claim 1,
the method comprises the following specific steps of embedding a channel attention mechanism and a space attention mechanism into ResNet to obtain an image classification convolutional neural network, and training the image classification convolutional neural network:
embedding channel attention into the shallow building block set of the network: conv2_ x, conv3_ x, conv4_ x, set of deep building blocks that embed spatial attention into the network: conv5_ x;
the channel attention is connected behind the convolution module of the residual block, and the space attention replaces the 3 multiplied by 3 convolution layer in the convolution module of the residual block to obtain an image classification convolution neural network;
and training the image classification convolutional neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110517855.1A CN113408577A (en) | 2021-05-12 | 2021-05-12 | Image classification method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110517855.1A CN113408577A (en) | 2021-05-12 | 2021-05-12 | Image classification method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113408577A true CN113408577A (en) | 2021-09-17 |
Family
ID=77678423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110517855.1A Pending CN113408577A (en) | 2021-05-12 | 2021-05-12 | Image classification method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113408577A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113822246A (en) * | 2021-11-22 | 2021-12-21 | 山东交通学院 | Vehicle weight identification method based on global reference attention mechanism |
CN114064954A (en) * | 2022-01-18 | 2022-02-18 | 北京中科开迪软件有限公司 | Method and system for cleaning images in optical disk library |
CN114118140A (en) * | 2021-10-29 | 2022-03-01 | 新黎明科技股份有限公司 | Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing |
CN115067945A (en) * | 2022-08-22 | 2022-09-20 | 深圳市海清视讯科技有限公司 | Fatigue detection method, device, equipment and storage medium |
CN117422939A (en) * | 2023-12-15 | 2024-01-19 | 武汉纺织大学 | Breast tumor classification method and system based on ultrasonic feature extraction |
CN117635962A (en) * | 2024-01-25 | 2024-03-01 | 云南大学 | Multi-frequency fusion-based channel attention image processing method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084794A (en) * | 2019-04-22 | 2019-08-02 | 华南理工大学 | A kind of cutaneum carcinoma image identification method based on attention convolutional neural networks |
WO2019153908A1 (en) * | 2018-02-11 | 2019-08-15 | 北京达佳互联信息技术有限公司 | Image recognition method and system based on attention model |
CN110309800A (en) * | 2019-07-05 | 2019-10-08 | 中国科学技术大学 | A kind of forest fires smoke detection method and device |
CN111353539A (en) * | 2020-02-29 | 2020-06-30 | 武汉大学 | Cervical OCT image classification method and system based on double-path attention convolutional neural network |
CN111651504A (en) * | 2020-06-03 | 2020-09-11 | 湖南大学 | Multi-element time sequence multilayer space-time dependence modeling method based on deep learning |
CN111898709A (en) * | 2020-09-30 | 2020-11-06 | 中国人民解放军国防科技大学 | Image classification method and device |
CN112767451A (en) * | 2021-02-01 | 2021-05-07 | 福州大学 | Crowd distribution prediction method and system based on double-current convolutional neural network |
-
2021
- 2021-05-12 CN CN202110517855.1A patent/CN113408577A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019153908A1 (en) * | 2018-02-11 | 2019-08-15 | 北京达佳互联信息技术有限公司 | Image recognition method and system based on attention model |
CN110084794A (en) * | 2019-04-22 | 2019-08-02 | 华南理工大学 | A kind of cutaneum carcinoma image identification method based on attention convolutional neural networks |
CN110309800A (en) * | 2019-07-05 | 2019-10-08 | 中国科学技术大学 | A kind of forest fires smoke detection method and device |
CN111353539A (en) * | 2020-02-29 | 2020-06-30 | 武汉大学 | Cervical OCT image classification method and system based on double-path attention convolutional neural network |
CN111651504A (en) * | 2020-06-03 | 2020-09-11 | 湖南大学 | Multi-element time sequence multilayer space-time dependence modeling method based on deep learning |
CN111898709A (en) * | 2020-09-30 | 2020-11-06 | 中国人民解放军国防科技大学 | Image classification method and device |
CN112767451A (en) * | 2021-02-01 | 2021-05-07 | 福州大学 | Crowd distribution prediction method and system based on double-current convolutional neural network |
Non-Patent Citations (5)
Title |
---|
ARAVIND SRINIVAS 等: "Bottleneck Transformers for Visual Recognition", 《HTTPS://ARXIV.ORG/ABS/2101.11605V1》 * |
朱迎新: "基于空间和通道注意力机制的行人再识别技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李娜 等: "基于多尺度注意力网络的行人属性识别算法", 《激光与光电子学进展》 * |
湃森: "从频域角度重新思考注意力机制——FcaNet", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/339215696》 * |
陶威: "基于注意力机制的脑电情绪识别方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118140A (en) * | 2021-10-29 | 2022-03-01 | 新黎明科技股份有限公司 | Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing |
CN113822246A (en) * | 2021-11-22 | 2021-12-21 | 山东交通学院 | Vehicle weight identification method based on global reference attention mechanism |
CN113822246B (en) * | 2021-11-22 | 2022-02-18 | 山东交通学院 | Vehicle weight identification method based on global reference attention mechanism |
CN114064954A (en) * | 2022-01-18 | 2022-02-18 | 北京中科开迪软件有限公司 | Method and system for cleaning images in optical disk library |
CN114064954B (en) * | 2022-01-18 | 2022-05-10 | 北京中科开迪软件有限公司 | Method and system for cleaning images in optical disk library |
CN115067945A (en) * | 2022-08-22 | 2022-09-20 | 深圳市海清视讯科技有限公司 | Fatigue detection method, device, equipment and storage medium |
CN117422939A (en) * | 2023-12-15 | 2024-01-19 | 武汉纺织大学 | Breast tumor classification method and system based on ultrasonic feature extraction |
CN117422939B (en) * | 2023-12-15 | 2024-03-08 | 武汉纺织大学 | Breast tumor classification method and system based on ultrasonic feature extraction |
CN117635962A (en) * | 2024-01-25 | 2024-03-01 | 云南大学 | Multi-frequency fusion-based channel attention image processing method |
CN117635962B (en) * | 2024-01-25 | 2024-04-12 | 云南大学 | Multi-frequency fusion-based channel attention image processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113408577A (en) | Image classification method based on attention mechanism | |
CN106991646B (en) | Image super-resolution method based on dense connection network | |
CN109949255B (en) | Image reconstruction method and device | |
CN112132023A (en) | Crowd counting method based on multi-scale context enhanced network | |
US11132392B2 (en) | Image retrieval method, image retrieval apparatus, image retrieval device and medium | |
CN110796166B (en) | Attention mechanism-based multitask image processing method | |
CN112489164B (en) | Image coloring method based on improved depth separable convolutional neural network | |
CN113011329A (en) | Pyramid network based on multi-scale features and dense crowd counting method | |
CN108197669B (en) | Feature training method and device of convolutional neural network | |
WO2021164725A1 (en) | Method and device for removing moiré patterns | |
CN109685772B (en) | No-reference stereo image quality evaluation method based on registration distortion representation | |
CN113610146A (en) | Method for realizing image classification based on knowledge distillation enhanced by interlayer feature extraction | |
CN111340077A (en) | Disparity map acquisition method and device based on attention mechanism | |
CN111339862A (en) | Remote sensing scene classification method and device based on channel attention mechanism | |
CN111353988A (en) | KNN dynamic self-adaptive double-image convolution image segmentation method and system | |
JP2019197445A (en) | Image recognition device, image recognition method, and program | |
CN105160679A (en) | Local three-dimensional matching algorithm based on combination of adaptive weighting and image segmentation | |
CN117058235A (en) | Visual positioning method crossing various indoor scenes | |
CN111598781B (en) | Image super-resolution method based on hybrid high-order attention network | |
CN114830168A (en) | Image reconstruction method, electronic device, and computer-readable storage medium | |
CN111695470A (en) | Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition | |
CN112634161B (en) | Reflected light removing method based on two-stage reflected light eliminating network and pixel loss | |
CN114119698B (en) | Unsupervised monocular depth estimation method based on attention mechanism | |
CN116486203B (en) | Single-target tracking method based on twin network and online template updating | |
CN111223120B (en) | Point cloud semantic segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210917 |
|
RJ01 | Rejection of invention patent application after publication |