CN113537243A - Image classification method based on SE module and self-attention mechanism network - Google Patents
Image classification method based on SE module and self-attention mechanism network Download PDFInfo
- Publication number
- CN113537243A CN113537243A CN202110839024.6A CN202110839024A CN113537243A CN 113537243 A CN113537243 A CN 113537243A CN 202110839024 A CN202110839024 A CN 202110839024A CN 113537243 A CN113537243 A CN 113537243A
- Authority
- CN
- China
- Prior art keywords
- patch
- module
- attention mechanism
- output
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 6
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000006870 function Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an image classification method based on an SE module and an attention mechanism network, which comprises the steps of firstly slicing a picture into a plurality of Patchs, adding the Patch0 as classification features, wherein the Patchs are the same but have different positions and correspond to pictures of different categories, adding position information, extracting internal features of each Patch by utilizing a convolutional neural network and the SE module, then extracting global features between the Patchs by utilizing the attention mechanism, inputting the global features into a multi-layer sensing machine, serially stacking the SE module, the attention mechanism and the multi-layer sensing machine into a minimum unit layer, stacking an L minimum unit layer to extract higher global features, enabling the features of the pictures to be richer in representation, finally taking output vectors of each minimum unit layer Patch0, respectively giving different weights to the output vectors, weighting and fusing each Patch0, thus extracting higher-level local and global features of the whole picture, classifying the pictures by utilizing the features, and obtaining a category label corresponding to the picture.
Description
Technical Field
The invention relates to the technical field of image identification and deep learning, in particular to an image classification method based on an SE module and a self-attention mechanism network.
Background
Image classification is a very active research direction in the fields of computer vision, machine learning and deep learning, and is widely applied to face recognition, pedestrian detection, traffic scene object recognition, license plate recognition, automatic album classification and the like.
Image classification is an important basic task in the field of artificial intelligence computer vision and is also the basis of target detection, and the accuracy of image classification influences the performance evaluation of subsequent tasks. At present, a support vector machine image classification method based on machine learning exists, mainstream deep learning image classification methods are mainly divided into two main categories, one category is based on a convolutional neural network, and the following methods are provided: classical neural networks such as AlexNet, VGG, GoogleNet, and ResNet; another category is based on self-attention mechanisms: vision Transformer and Transformer in Transformer.
The most similar to the method is a Vision transform based on a self-attention mechanism, a whole picture is firstly sliced into a plurality of Patch, then global features among the Patch are extracted by the self-attention mechanism, and the overall features are continuously transmitted through a multilayer perceptron. The self-attention mechanism and the multi-layer perceptron stacking form an encoder layer, a Vision Transformer framework is formed by stacking a plurality of encoder layers, and the output of the last encoder layer Patch0 is input to the softmax layer to obtain the image category prediction result.
The invention is relatively similar to the Transformer in Transformer based on the self-attention mechanism, the characteristics between and inside each Patch are respectively extracted by using the encoder layers in the two transformers, the two encoder layers form a module, and the Transformer in Transformer framework is formed by stacking a plurality of modules.
Disclosure of Invention
The invention aims to utilize local features of each Patch, and utilizes global features of the Patch0 of each minimum unit layer, so that more features are utilized to improve the classification accuracy, and provides an image classification method based on an SE module and a self-attention mechanism network.
In order to realize the above purpose of the invention, the following technical scheme is adopted:
an image classification method based on an SE module and a self-attention mechanism network comprises the following steps:
s1: the input picture is converted into a matrix with a specified size, then the data is converted into a tensor data type, and then the data is input into the model.
S2: a picture is sliced into a plurality of Patch, the Patch0 is added to be used as a classification feature, and a position information rich feature representation is added.
S3: features are extracted internally for each Patch using the SE module.
S4: features between each Patch are extracted using a self-attention mechanism.
S5: the output data from the attention mechanism is put into two layers of MLPs.
S6: the modules formed by S3, S4 and S5 are stacked in series to form the minimum unit layer of the method, and an L minimum unit layer is stacked, so that higher-level local and global features are extracted.
S7: and using the global features obtained in the step for classification.
Preferably, step S1 is to pre-process pictures, and if there are few input pictures, convert them into tensor data types by a data enhancement method.
Preferably, in step S2, a picture with pixels of C × H × W is sliced into individual slicesA plurality of Patch, each Patch is C × H1×W1Then each Patch is generated to 1 XCH1W1Vector of (2) to realize slicingUsing convolution and Flatten operations with input dimensions (B, C, H, W) and output dimensions (B, N, CH)1W1) Where C is the number of channels in the picture and B is the Batch Size. In addition, Patch0 is added as a classification feature, that is, M is N +1 patches, so the output dimension of this step is (B, M, CH)1W1). Position information is added to each Patch (including the Patch0), so that the self-attention mechanism can better learn that even if the same picture is used, the Patch positions are different, and the obtained classification result is different.
Preferably, the internal characteristic extraction of Patch in step S3 uses convolutional neural network, so the output dimensions (B, M, CH) of the previous step are used1W1) Become (B, M, C, H)1,W1) (ii) a To make the input and output width and height the same, 0 padding is used and CH is used1W1A convolution kernel, from CH1W1Extracting internal features of Patch in each dimension to obtain CH1W1A H1×W1Feature maps, i.e. dimensions are (CH)1W1,H1,W1) (ii) a Then using global average pooling for each feature map, get (CH)1 W 11, 1); through the first linear layer, the output dimension is set toWherein beta is a scaling factor, the function Relu is activated, the formula is expressed as formula (1),
whereinb1Is a trainable parameter, X1Is an input, X2Is the output; a second linear layer, the input dimension beingOutput dimension is dim ═ CH1W1Laser, laserObtaining the weight of each channel by using a function softmax, wherein the formula is expressed as a formula (2),
whereinb2Is a trainable parameter, X2Is an input, X3Outputting; multiplying the feature maps of each channel respectively, and then adding all the feature maps to obtain f being 1 XH1×W1The characteristic diagram contains CH1W1The fusion of the internal characteristic information of the Patch extracted by dimensionality has a formula shown as a formula (3),
f=x31c1+x32c2+…+x3ici i=1,2,…,CH1W1formula (3)
Wherein x3iIs X3Element (c) ofiExtracting each feature map inside the Patch by using a convolutional neural network; after Flatten, through the linear layer, the dimension is raised to CH1W1. It can be seen that the input and output dimensions of the SE module are (B, M, CH)1W1) Where each picture in Batch shares the SE module, the number of parameters can be reduced.
Preferably, the step S4 is to extract global features from different dimensions by using a multi-head self-attention mechanism, and the self-attention mechanism process can be represented by the following process:
first, through the operations of linear layer and dimension conversion, three tensors Q, K, V are initialized, which are aimed at training the three tensors, and the dimensions are all the sameWhere B denotes Batch Size, H denotes the number of heads of the multi-head attention mechanism, M denotes the number of input attention mechanisms Patch (including Patch0), D ═ C, H1,W1Representing the dimensions of each Patch.
Therefore, dimension W is (B, H, M, M), where the ith row and jth column element in dimension 2 represents the weight of the ith Patch to the jth Patch.
A ═ WV formula (5)
So that the dimension of A isAnd A aggregates the characteristic information of the whole picture, and then converts the dimension of A into (B, M, D) and outputs the (B, M, D) to the next layer.
Preferably, the two-layer perceptron of step S5 can be expressed by the following formula:
a first layer:
wherein, X4Is an input, X5Is the output of the computer system,b4is an offset, where α is a reduction factor, W4And b4Are training parameters.
A second layer:
wherein, X5Is an input, X6Is the output of the computer system,b5is an offset, X5And b5Are training parameters. The dimension of the output of this step is also (B, M, D).
Preferably, there is a normalization operation before each module is input as described in step S6, and then each module adds a Shortcut connection.
Preferably, in step S7, the range of attention paid by each minimum unit layer Patch0 is different, so that the extracted global information is different, and therefore, the Patch0 output by each minimum unit layer multi-layer perceptron is extracted and recorded as ui∈R1×DI-1, 2, …, L, then,
P=k1u1+k2u2+…+kiui1,2, …, L formula (9)
out is softmax (P) formula (10)
Wherein k isiRepresents uiE ∈ RD×1The method is characterized in that the training parameters are P, the P represents global features fused according to different weights of Patch0 output by each minimum unit layer, finally the P obtains classification confidence coefficient by inputting a softmax layer, and the class with the highest confidence coefficient is used as a prediction result.
Compared with the prior art, the invention has the following beneficial effects:
firstly, before the input of the self-attention mechanism, the SE module is used for extracting internal features of the Patch, so that the Patch vector representation of the input self-attention mechanism is richer, more features are utilized, the classification accuracy is improved, and the calculation amount is less than that of a transform in transform architecture.
Secondly, the Ptach0 output of each minimum unit layer is taken out, then corresponding weights are distributed, the weights are obtained through automatic learning, the output of each minimum unit layer Patch0 is multiplied by the corresponding weights, and then the weights are added, so that the global features extracted by each minimum unit layer can be utilized, the features of the input softmax layer are richer, and the classification accuracy is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and the drawings are only for illustrative purposes and should not be construed as limiting the invention.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow diagram of the SE module of the present invention;
FIG. 3 is a flow chart of the minimum cell level of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example (b):
an image classification based on an SE module and a self-attention mechanism network is shown in FIG. 1, and the model comprises 3 parts. The first part is to cut a picture into several lots and add lots 0 for classification features, as well as position information. The second part is the minimum unit layer, which includes the SE module to extract local features inside each Patch, extract global features between each Patch from the attention mechanism, and the multi-layer perceptron, which determines the stack L minimum unit layer as needed. And the third part is to take the Patch0 output of each minimum unit layer, respectively give different weights, then fuse the weights, and input the weights into the softmax layer to obtain a prediction result.
A first part: assume that a color picture has 3 × 224 × 224 pixels, and each Patch has 3 × 16 × 16 pixels, therePatch, sliced with a convolution operation with parameters set to: convolution kernel 3 × 16 × 16, step size (1,1), no offset, number of convolution kernels set to CH1W1The input dimension is (B,3,224,224), the feature map dimension obtained through convolution operation is (B,768,14,14), the dimension after the scatter operation is (B,768,196), the 1 st and 2 nd dimensions are exchanged, and the dimension is (B,196,768). Wherein C is the number of picture channels, and the color picture channels are 3, H1、W1For each Patch height and width, 16 in this embodiment, and B is Batch Size. Adding Patch0, i.e. M + N + 1-197 Patch, beginning with Patch0 as all 0 vectors, and training to obtain the final productTo a vector representing a global feature, then the partial output dimension is (B,197,768). Position information is added to each Patch (including the Patch0), so that the self-attention mechanism can better learn that even if the same picture is used, the Patch positions are different, and the obtained classification result is different.
A second part: first, the SE module, whose flowchart is shown in fig. 2, changes the output dimension (B,197,768) of the previous step into (B,197,3,16,16) because the convolutional neural network is used; the convolution parameters are set as: convolution kernels 3 × 3 × 3, step size (1,1), bias, 0 padding, one row up and down, one column left and right, 768 convolution kernels, and internal characteristics of Patch extracted from 768 dimensions to obtain 768 16 × 16 characteristic graphs, namely dimensions (768,16,16) in order to make the input and output width and height the same; then using global average pooling for each feature map, resulting in (768,1, 1); through the first linear layer, the output dimension is set toWhere the scaling factor is 16, the activation function Relu, the formula is expressed as equation (1),
wherein W1∈R768×48、b1Is a trainable parameter, X1Is an input, X2Is the output; the second linear layer, with input dimension dim 48 and output dimension dim 768, activates the softmax function, resulting in a weight per channel, as expressed by equation (2),
wherein W2∈R48×768、b2Is a trainable parameter, X2Is an input, X3Outputting; multiplying the feature maps of each channel respectively, and then adding all the feature maps to obtain a feature map with f being 1 × 16 × 16, wherein the feature map packetThe method contains the fusion of characteristic information in the Patch extracted by 768 dimensions, the formula is expressed as formula (3),
f=x31c1+x32c2+…+x3i c i1,2, …,768 formula (3)
Wherein x3iIs X3Element (c) ofiExtracting each feature map inside the Patch by using a convolutional neural network; after Flatten, the dimensions are raised to 768 through the linear layers. It can be seen that the input and output dimensions of the SE module are (B,197,768), wherein each picture in Batch shares the SE module, so that the parameter number can be reduced.
Then, a multi-head self-attention mechanism extracts global features from different dimensions, the number of heads of the multi-head self-attention mechanism is set to be 8, three tensors Q, K, V are initialized through linear mapping and dimension conversion, the dimensions are (B,8,176 and 96), and the purpose is to train the three tensors, wherein B represents Batch Size. Passing type (4)
A weight tensor W may be derived, dimension (B,8,176,176), where the ith row and jth column element of dimension 2 represents the weight of the ith Patch to the jth Patch.
A ═ WV formula (5)
The dimension of A is (B,8,197 and 96), the A aggregates the local and global features of the whole picture, and then the dimension of A is converted into (B,197,768) to output the next layer.
Then two layers of multilayer perceptrons are put into the first layer,
wherein, X4Is an input, X5Is an output, W4∈R768×48,b4Is an offset, where the reduction factor is set to 16, W4And b4Are training parameters. A second layer:
wherein, X5Is an input, X6Is an output, W5∈R48×768,b5Is an offset, X5And b5Are training parameters. The dimension of the output of this step is also (B,197,768).
As shown in fig. 3, before inputting to the SE module, the attention mechanism, and the multi-layer sensor, normalization operation is performed, and then Shortcut connection is added, so that serial stacking constitutes the minimum unit layer provided by the present invention, and different layers can be stacked according to requirements.
And a third part: because the attention range of each minimum unit layer Patch0 is different, the extracted global features are different, and when the features are used, the features need to be fused, the stack 6 minimum unit layers are set, and the Patch0 of the multi-layer perceptron output of each minimum unit layer is extracted and recorded as ui∈R1×768And i is 1,2, …,6, and weights are obtained by using a softmax function, wherein the higher the weight is, the more important the global features of the minimum unit layer are represented. The formula is as follows:
P=k1u1+k2u2+k3u3+k4u4+k5u5+k6u6formula (9)
out is softmax (P) formula (10)
Wherein k isiRepresents uiE ∈ R768×1The method is characterized in that the training parameters are P, the P represents global features fused according to different weights of Patch0 output by each minimum unit layer, finally the P obtains classification confidence coefficient by inputting a softmax layer, and the class with the highest confidence coefficient is used as a prediction result.
The above examples are merely illustrative for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (8)
1. An image classification method based on an SE module and a self-attention mechanism network is characterized by comprising the following steps:
s1: the input picture is converted into a matrix with a specified size, then the data is converted into a tensor data type, and then the data is input into the model.
S2: a picture is sliced into a plurality of Patch, the Patch0 is added to be used as a classification feature, and a position information rich feature representation is added.
S3: features are extracted internally for each Patch using the SE module.
S4: features between each Patch are extracted using a self-attention mechanism.
S5: the output data from the attention mechanism is fed into two layers of MLPs (multilayer perceptron).
S6: the modules formed by S3, S4 and S5 are stacked in series to form the minimum unit layer of the method, and an L minimum unit layer is stacked, so that higher-level local and global features are extracted.
S7: and using the global features obtained in the step for classification.
2. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein step S1 is to pre-process the pictures and if there are few input pictures, convert them into tensor data types by data enhancement method.
3. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein the step S2 is to slice a picture with C x H x W pixelsBecome oneA plurality of Patch, each Patch is C × H1×W1Then each Patch is generated to 1 XCH1W1The vector of (C), the implementation of slicing is performed using convolution and Flatten operations with input dimensions (B, C, H, W) and output dimensions (B, N, CH)1W1) Where C is the number of channels in the picture and B is the Batch Size. In addition, Patch0 is added as a classification feature, that is, M is N +1 patches, so the output dimension of this step is (B, M, CH)1W1). Position information is added to each Patch (including the Patch0), so that the self-attention mechanism can better learn that even if the same picture is used, the Patch positions are different, and the obtained classification result is different.
4. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein the Patch internal feature extraction in step S3 adopts a convolutional neural network, so that the output dimensions (B, M, CH) of the previous step are selected1W1) Become (B, M, C, H)1,W1) (ii) a To make the input and output width and height the same, 0 padding is used and CH is used1W1A convolution kernel, from CH1W1Extracting internal features of Patch in each dimension to obtain CH1W1A H1×W1Feature maps, i.e. dimensions are (CH)1W1,H1,W1) (ii) a Then using global average pooling for each feature map, get (CH)1W11, 1); through the first linear layer, the output dimension is set toWherein beta is a scaling factor, the function Relu is activated, the formula is expressed as formula (1),
whereinb1Is a trainable parameter, X1Is an input, X2Is the output; a second linear layer, the input dimension beingOutput dimension is dim ═ CH1W1Activating a function softmax to obtain the weight of each channel, wherein the formula is expressed as a formula (2),
whereinb2Is a trainable parameter, X2Is an input, X3Outputting; multiplying the feature maps of each channel respectively, and then adding all the feature maps to obtain f being 1 XH1×W1The characteristic diagram contains CH1W1The fusion of the internal characteristic information of the Patch extracted by dimensionality has a formula shown as a formula (3),
f=x31c1+x32c2+...+x3ici i=1,2,...,CH1W1formula (3)
Wherein x3iIs X3Element (c) ofiExtracting each feature map inside the Patch by using a convolutional neural network; after Flatten, through the linear layer, the dimension is raised to CH1W1. It can be seen that the input and output dimensions of the SE module are (B, M, CH)1W1) Where each picture in Batch shares the SE module, the number of parameters can be reduced.
5. The image classification method based on the SE module and the self-attention mechanism network as claimed in claim 1, wherein the step S4 is to extract global features from different dimensions by using a multi-head self-attention mechanism, and the self-attention mechanism process can be represented by the following process:
first, through the operations of linear layer and dimension conversion, three tensors Q, K, V are initialized, which are aimed at training the three tensors, and the dimensions are all the sameWhere B denotes Batch Size, H denotes the number of heads of the multi-head attention mechanism, M denotes the number of input attention mechanisms Patch (including Patch0), and D ═ CH1W1Representing the dimensions of each Patch.
Therefore, dimension W is (B, H, M, M), where the ith row and jth column element in dimension 2 represents the weight of the ith Patch to the jth Patch.
A ═ WV formula (5)
6. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein the two-layer perceptron of step S5 can be represented by the following formula:
a first layer:
wherein, X4Is an input, X5Is the output of the computer system,b4is an offset, where α is a reduction factor, W4And b4Are training parameters.
A second layer:
7. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein step S6 is performed by normalization before inputting each module, and then each module adds Shortcut connection.
8. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein in step S7, the range of attention of each minimum unit layer Patch0 is different, so that the extracted global information is different, and therefore the Patch0 of the multi-layer perceptron output of each minimum unit layer is extracted and recorded as ui∈R1×D1,2, L, then,
P=k1u1+k2u2+...+kiui1,2, L formula (9)
out is softmax (P) formula (10)
Wherein k isiRepresents uiE ∈ RD×1The method is characterized in that the training parameters are P, the P represents global features fused according to different weights of Patch0 output by each minimum unit layer, finally the P obtains classification confidence coefficient by inputting a softmax layer, and the class with the highest confidence coefficient is used as a prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110839024.6A CN113537243A (en) | 2021-07-23 | 2021-07-23 | Image classification method based on SE module and self-attention mechanism network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110839024.6A CN113537243A (en) | 2021-07-23 | 2021-07-23 | Image classification method based on SE module and self-attention mechanism network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113537243A true CN113537243A (en) | 2021-10-22 |
Family
ID=78089425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110839024.6A Withdrawn CN113537243A (en) | 2021-07-23 | 2021-07-23 | Image classification method based on SE module and self-attention mechanism network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537243A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113951866A (en) * | 2021-10-28 | 2022-01-21 | 北京深睿博联科技有限责任公司 | Deep learning-based uterine fibroid diagnosis method and device |
CN114998653A (en) * | 2022-05-24 | 2022-09-02 | 电子科技大学 | ViT network-based small sample remote sensing image classification method, medium and equipment |
CN117173562A (en) * | 2023-08-23 | 2023-12-05 | 哈尔滨工程大学 | SAR image ship identification method based on latent layer diffusion model technology |
CN117496225A (en) * | 2023-10-17 | 2024-02-02 | 南昌大学 | Image data evidence obtaining method and system |
CN117746209A (en) * | 2023-12-13 | 2024-03-22 | 山东浪潮超高清智能科技有限公司 | Image recognition method and device based on efficient multi-type convolution aggregation convolution |
-
2021
- 2021-07-23 CN CN202110839024.6A patent/CN113537243A/en not_active Withdrawn
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113951866A (en) * | 2021-10-28 | 2022-01-21 | 北京深睿博联科技有限责任公司 | Deep learning-based uterine fibroid diagnosis method and device |
CN114998653A (en) * | 2022-05-24 | 2022-09-02 | 电子科技大学 | ViT network-based small sample remote sensing image classification method, medium and equipment |
CN114998653B (en) * | 2022-05-24 | 2024-04-26 | 电子科技大学 | ViT network-based small sample remote sensing image classification method, medium and equipment |
CN117173562A (en) * | 2023-08-23 | 2023-12-05 | 哈尔滨工程大学 | SAR image ship identification method based on latent layer diffusion model technology |
CN117173562B (en) * | 2023-08-23 | 2024-06-04 | 哈尔滨工程大学 | SAR image ship identification method based on latent layer diffusion model technology |
CN117496225A (en) * | 2023-10-17 | 2024-02-02 | 南昌大学 | Image data evidence obtaining method and system |
CN117496225B (en) * | 2023-10-17 | 2024-09-06 | 南昌大学 | Image data evidence obtaining method and system |
CN117746209A (en) * | 2023-12-13 | 2024-03-22 | 山东浪潮超高清智能科技有限公司 | Image recognition method and device based on efficient multi-type convolution aggregation convolution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113537243A (en) | Image classification method based on SE module and self-attention mechanism network | |
CN107633513B (en) | 3D image quality measuring method based on deep learning | |
US20220215227A1 (en) | Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium | |
CN108446689B (en) | Face recognition method | |
CN112926396A (en) | Action identification method based on double-current convolution attention | |
CN111209921A (en) | License plate detection model based on improved YOLOv3 network and construction method | |
US20230162522A1 (en) | Person re-identification method of integrating global features and ladder-shaped local features and device thereof | |
CN109886066A (en) | Fast target detection method based on the fusion of multiple dimensioned and multilayer feature | |
CN112069868A (en) | Unmanned aerial vehicle real-time vehicle detection method based on convolutional neural network | |
CN106803069A (en) | Crowd's level of happiness recognition methods based on deep learning | |
CN109063719B (en) | Image classification method combining structure similarity and class information | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
WO2022227292A1 (en) | Action recognition method | |
CN112800906A (en) | Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile | |
CN112861970B (en) | Fine-grained image classification method based on feature fusion | |
CN117557922B (en) | Unmanned aerial vehicle aerial photographing target detection method with improved YOLOv8 | |
CN110084284A (en) | Target detection and secondary classification algorithm and device based on region convolutional neural networks | |
CN115965819A (en) | Lightweight pest identification method based on Transformer structure | |
CN116580322A (en) | Unmanned aerial vehicle infrared small target detection method under ground background | |
CN112084897A (en) | Rapid traffic large-scene vehicle target detection method of GS-SSD | |
CN114445816A (en) | Pollen classification method based on two-dimensional image and three-dimensional point cloud | |
AU2019100967A4 (en) | An environment perception system for unmanned driving vehicles based on deep learning | |
CN113706404A (en) | Depression angle human face image correction method and system based on self-attention mechanism | |
CN117152644A (en) | Target detection method for aerial photo of unmanned aerial vehicle | |
Bigdeli et al. | Deep feature learning versus shallow feature learning systems for joint use of airborne thermal hyperspectral and visible remote sensing data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20211022 |