CN113537243A - Image classification method based on SE module and self-attention mechanism network - Google Patents

Image classification method based on SE module and self-attention mechanism network Download PDF

Info

Publication number
CN113537243A
CN113537243A CN202110839024.6A CN202110839024A CN113537243A CN 113537243 A CN113537243 A CN 113537243A CN 202110839024 A CN202110839024 A CN 202110839024A CN 113537243 A CN113537243 A CN 113537243A
Authority
CN
China
Prior art keywords
patch
module
attention mechanism
output
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110839024.6A
Other languages
Chinese (zh)
Inventor
梁俊雄
肖明
郑坚燚
曾旺旺
廖泽宇
陈俊文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110839024.6A priority Critical patent/CN113537243A/en
Publication of CN113537243A publication Critical patent/CN113537243A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an image classification method based on an SE module and an attention mechanism network, which comprises the steps of firstly slicing a picture into a plurality of Patchs, adding the Patch0 as classification features, wherein the Patchs are the same but have different positions and correspond to pictures of different categories, adding position information, extracting internal features of each Patch by utilizing a convolutional neural network and the SE module, then extracting global features between the Patchs by utilizing the attention mechanism, inputting the global features into a multi-layer sensing machine, serially stacking the SE module, the attention mechanism and the multi-layer sensing machine into a minimum unit layer, stacking an L minimum unit layer to extract higher global features, enabling the features of the pictures to be richer in representation, finally taking output vectors of each minimum unit layer Patch0, respectively giving different weights to the output vectors, weighting and fusing each Patch0, thus extracting higher-level local and global features of the whole picture, classifying the pictures by utilizing the features, and obtaining a category label corresponding to the picture.

Description

Image classification method based on SE module and self-attention mechanism network
Technical Field
The invention relates to the technical field of image identification and deep learning, in particular to an image classification method based on an SE module and a self-attention mechanism network.
Background
Image classification is a very active research direction in the fields of computer vision, machine learning and deep learning, and is widely applied to face recognition, pedestrian detection, traffic scene object recognition, license plate recognition, automatic album classification and the like.
Image classification is an important basic task in the field of artificial intelligence computer vision and is also the basis of target detection, and the accuracy of image classification influences the performance evaluation of subsequent tasks. At present, a support vector machine image classification method based on machine learning exists, mainstream deep learning image classification methods are mainly divided into two main categories, one category is based on a convolutional neural network, and the following methods are provided: classical neural networks such as AlexNet, VGG, GoogleNet, and ResNet; another category is based on self-attention mechanisms: vision Transformer and Transformer in Transformer.
The most similar to the method is a Vision transform based on a self-attention mechanism, a whole picture is firstly sliced into a plurality of Patch, then global features among the Patch are extracted by the self-attention mechanism, and the overall features are continuously transmitted through a multilayer perceptron. The self-attention mechanism and the multi-layer perceptron stacking form an encoder layer, a Vision Transformer framework is formed by stacking a plurality of encoder layers, and the output of the last encoder layer Patch0 is input to the softmax layer to obtain the image category prediction result.
The invention is relatively similar to the Transformer in Transformer based on the self-attention mechanism, the characteristics between and inside each Patch are respectively extracted by using the encoder layers in the two transformers, the two encoder layers form a module, and the Transformer in Transformer framework is formed by stacking a plurality of modules.
Disclosure of Invention
The invention aims to utilize local features of each Patch, and utilizes global features of the Patch0 of each minimum unit layer, so that more features are utilized to improve the classification accuracy, and provides an image classification method based on an SE module and a self-attention mechanism network.
In order to realize the above purpose of the invention, the following technical scheme is adopted:
an image classification method based on an SE module and a self-attention mechanism network comprises the following steps:
s1: the input picture is converted into a matrix with a specified size, then the data is converted into a tensor data type, and then the data is input into the model.
S2: a picture is sliced into a plurality of Patch, the Patch0 is added to be used as a classification feature, and a position information rich feature representation is added.
S3: features are extracted internally for each Patch using the SE module.
S4: features between each Patch are extracted using a self-attention mechanism.
S5: the output data from the attention mechanism is put into two layers of MLPs.
S6: the modules formed by S3, S4 and S5 are stacked in series to form the minimum unit layer of the method, and an L minimum unit layer is stacked, so that higher-level local and global features are extracted.
S7: and using the global features obtained in the step for classification.
Preferably, step S1 is to pre-process pictures, and if there are few input pictures, convert them into tensor data types by a data enhancement method.
Preferably, in step S2, a picture with pixels of C × H × W is sliced into individual slices
Figure BDA0003178260700000021
A plurality of Patch, each Patch is C × H1×W1Then each Patch is generated to 1 XCH1W1Vector of (2) to realize slicingUsing convolution and Flatten operations with input dimensions (B, C, H, W) and output dimensions (B, N, CH)1W1) Where C is the number of channels in the picture and B is the Batch Size. In addition, Patch0 is added as a classification feature, that is, M is N +1 patches, so the output dimension of this step is (B, M, CH)1W1). Position information is added to each Patch (including the Patch0), so that the self-attention mechanism can better learn that even if the same picture is used, the Patch positions are different, and the obtained classification result is different.
Preferably, the internal characteristic extraction of Patch in step S3 uses convolutional neural network, so the output dimensions (B, M, CH) of the previous step are used1W1) Become (B, M, C, H)1,W1) (ii) a To make the input and output width and height the same, 0 padding is used and CH is used1W1A convolution kernel, from CH1W1Extracting internal features of Patch in each dimension to obtain CH1W1A H1×W1Feature maps, i.e. dimensions are (CH)1W1,H1,W1) (ii) a Then using global average pooling for each feature map, get (CH)1 W 11, 1); through the first linear layer, the output dimension is set to
Figure BDA0003178260700000022
Wherein beta is a scaling factor, the function Relu is activated, the formula is expressed as formula (1),
Figure BDA0003178260700000031
wherein
Figure BDA0003178260700000032
b1Is a trainable parameter, X1Is an input, X2Is the output; a second linear layer, the input dimension being
Figure BDA0003178260700000033
Output dimension is dim ═ CH1W1Laser, laserObtaining the weight of each channel by using a function softmax, wherein the formula is expressed as a formula (2),
Figure BDA0003178260700000034
wherein
Figure BDA0003178260700000035
b2Is a trainable parameter, X2Is an input, X3Outputting; multiplying the feature maps of each channel respectively, and then adding all the feature maps to obtain f being 1 XH1×W1The characteristic diagram contains CH1W1The fusion of the internal characteristic information of the Patch extracted by dimensionality has a formula shown as a formula (3),
f=x31c1+x32c2+…+x3ici i=1,2,…,CH1W1formula (3)
Wherein x3iIs X3Element (c) ofiExtracting each feature map inside the Patch by using a convolutional neural network; after Flatten, through the linear layer, the dimension is raised to CH1W1. It can be seen that the input and output dimensions of the SE module are (B, M, CH)1W1) Where each picture in Batch shares the SE module, the number of parameters can be reduced.
Preferably, the step S4 is to extract global features from different dimensions by using a multi-head self-attention mechanism, and the self-attention mechanism process can be represented by the following process:
first, through the operations of linear layer and dimension conversion, three tensors Q, K, V are initialized, which are aimed at training the three tensors, and the dimensions are all the same
Figure BDA0003178260700000036
Where B denotes Batch Size, H denotes the number of heads of the multi-head attention mechanism, M denotes the number of input attention mechanisms Patch (including Patch0), D ═ C, H1,W1Representing the dimensions of each Patch.
Figure BDA0003178260700000037
Therefore, dimension W is (B, H, M, M), where the ith row and jth column element in dimension 2 represents the weight of the ith Patch to the jth Patch.
A ═ WV formula (5)
So that the dimension of A is
Figure BDA0003178260700000038
And A aggregates the characteristic information of the whole picture, and then converts the dimension of A into (B, M, D) and outputs the (B, M, D) to the next layer.
Preferably, the two-layer perceptron of step S5 can be expressed by the following formula:
a first layer:
Figure BDA0003178260700000041
wherein, X4Is an input, X5Is the output of the computer system,
Figure BDA0003178260700000042
b4is an offset, where α is a reduction factor, W4And b4Are training parameters.
A second layer:
Figure BDA0003178260700000043
wherein, X5Is an input, X6Is the output of the computer system,
Figure BDA0003178260700000044
b5is an offset, X5And b5Are training parameters. The dimension of the output of this step is also (B, M, D).
Preferably, there is a normalization operation before each module is input as described in step S6, and then each module adds a Shortcut connection.
Preferably, in step S7, the range of attention paid by each minimum unit layer Patch0 is different, so that the extracted global information is different, and therefore, the Patch0 output by each minimum unit layer multi-layer perceptron is extracted and recorded as ui∈R1×DI-1, 2, …, L, then,
Figure BDA0003178260700000045
P=k1u1+k2u2+…+kiui1,2, …, L formula (9)
out is softmax (P) formula (10)
Wherein k isiRepresents uiE ∈ RD×1The method is characterized in that the training parameters are P, the P represents global features fused according to different weights of Patch0 output by each minimum unit layer, finally the P obtains classification confidence coefficient by inputting a softmax layer, and the class with the highest confidence coefficient is used as a prediction result.
Compared with the prior art, the invention has the following beneficial effects:
firstly, before the input of the self-attention mechanism, the SE module is used for extracting internal features of the Patch, so that the Patch vector representation of the input self-attention mechanism is richer, more features are utilized, the classification accuracy is improved, and the calculation amount is less than that of a transform in transform architecture.
Secondly, the Ptach0 output of each minimum unit layer is taken out, then corresponding weights are distributed, the weights are obtained through automatic learning, the output of each minimum unit layer Patch0 is multiplied by the corresponding weights, and then the weights are added, so that the global features extracted by each minimum unit layer can be utilized, the features of the input softmax layer are richer, and the classification accuracy is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and the drawings are only for illustrative purposes and should not be construed as limiting the invention.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow diagram of the SE module of the present invention;
FIG. 3 is a flow chart of the minimum cell level of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example (b):
an image classification based on an SE module and a self-attention mechanism network is shown in FIG. 1, and the model comprises 3 parts. The first part is to cut a picture into several lots and add lots 0 for classification features, as well as position information. The second part is the minimum unit layer, which includes the SE module to extract local features inside each Patch, extract global features between each Patch from the attention mechanism, and the multi-layer perceptron, which determines the stack L minimum unit layer as needed. And the third part is to take the Patch0 output of each minimum unit layer, respectively give different weights, then fuse the weights, and input the weights into the softmax layer to obtain a prediction result.
A first part: assume that a color picture has 3 × 224 × 224 pixels, and each Patch has 3 × 16 × 16 pixels, there
Figure BDA0003178260700000051
Patch, sliced with a convolution operation with parameters set to: convolution kernel 3 × 16 × 16, step size (1,1), no offset, number of convolution kernels set to CH1W1The input dimension is (B,3,224,224), the feature map dimension obtained through convolution operation is (B,768,14,14), the dimension after the scatter operation is (B,768,196), the 1 st and 2 nd dimensions are exchanged, and the dimension is (B,196,768). Wherein C is the number of picture channels, and the color picture channels are 3, H1、W1For each Patch height and width, 16 in this embodiment, and B is Batch Size. Adding Patch0, i.e. M + N + 1-197 Patch, beginning with Patch0 as all 0 vectors, and training to obtain the final productTo a vector representing a global feature, then the partial output dimension is (B,197,768). Position information is added to each Patch (including the Patch0), so that the self-attention mechanism can better learn that even if the same picture is used, the Patch positions are different, and the obtained classification result is different.
A second part: first, the SE module, whose flowchart is shown in fig. 2, changes the output dimension (B,197,768) of the previous step into (B,197,3,16,16) because the convolutional neural network is used; the convolution parameters are set as: convolution kernels 3 × 3 × 3, step size (1,1), bias, 0 padding, one row up and down, one column left and right, 768 convolution kernels, and internal characteristics of Patch extracted from 768 dimensions to obtain 768 16 × 16 characteristic graphs, namely dimensions (768,16,16) in order to make the input and output width and height the same; then using global average pooling for each feature map, resulting in (768,1, 1); through the first linear layer, the output dimension is set to
Figure BDA0003178260700000061
Where the scaling factor is 16, the activation function Relu, the formula is expressed as equation (1),
Figure BDA0003178260700000062
wherein W1∈R768×48、b1Is a trainable parameter, X1Is an input, X2Is the output; the second linear layer, with input dimension dim 48 and output dimension dim 768, activates the softmax function, resulting in a weight per channel, as expressed by equation (2),
Figure BDA0003178260700000063
wherein W2∈R48×768、b2Is a trainable parameter, X2Is an input, X3Outputting; multiplying the feature maps of each channel respectively, and then adding all the feature maps to obtain a feature map with f being 1 × 16 × 16, wherein the feature map packetThe method contains the fusion of characteristic information in the Patch extracted by 768 dimensions, the formula is expressed as formula (3),
f=x31c1+x32c2+…+x3i c i1,2, …,768 formula (3)
Wherein x3iIs X3Element (c) ofiExtracting each feature map inside the Patch by using a convolutional neural network; after Flatten, the dimensions are raised to 768 through the linear layers. It can be seen that the input and output dimensions of the SE module are (B,197,768), wherein each picture in Batch shares the SE module, so that the parameter number can be reduced.
Then, a multi-head self-attention mechanism extracts global features from different dimensions, the number of heads of the multi-head self-attention mechanism is set to be 8, three tensors Q, K, V are initialized through linear mapping and dimension conversion, the dimensions are (B,8,176 and 96), and the purpose is to train the three tensors, wherein B represents Batch Size. Passing type (4)
Figure BDA0003178260700000064
A weight tensor W may be derived, dimension (B,8,176,176), where the ith row and jth column element of dimension 2 represents the weight of the ith Patch to the jth Patch.
A ═ WV formula (5)
The dimension of A is (B,8,197 and 96), the A aggregates the local and global features of the whole picture, and then the dimension of A is converted into (B,197,768) to output the next layer.
Then two layers of multilayer perceptrons are put into the first layer,
Figure BDA0003178260700000071
wherein, X4Is an input, X5Is an output, W4∈R768×48,b4Is an offset, where the reduction factor is set to 16, W4And b4Are training parameters. A second layer:
Figure BDA0003178260700000072
wherein, X5Is an input, X6Is an output, W5∈R48×768,b5Is an offset, X5And b5Are training parameters. The dimension of the output of this step is also (B,197,768).
As shown in fig. 3, before inputting to the SE module, the attention mechanism, and the multi-layer sensor, normalization operation is performed, and then Shortcut connection is added, so that serial stacking constitutes the minimum unit layer provided by the present invention, and different layers can be stacked according to requirements.
And a third part: because the attention range of each minimum unit layer Patch0 is different, the extracted global features are different, and when the features are used, the features need to be fused, the stack 6 minimum unit layers are set, and the Patch0 of the multi-layer perceptron output of each minimum unit layer is extracted and recorded as ui∈R1×768And i is 1,2, …,6, and weights are obtained by using a softmax function, wherein the higher the weight is, the more important the global features of the minimum unit layer are represented. The formula is as follows:
Figure BDA0003178260700000073
P=k1u1+k2u2+k3u3+k4u4+k5u5+k6u6formula (9)
out is softmax (P) formula (10)
Wherein k isiRepresents uiE ∈ R768×1The method is characterized in that the training parameters are P, the P represents global features fused according to different weights of Patch0 output by each minimum unit layer, finally the P obtains classification confidence coefficient by inputting a softmax layer, and the class with the highest confidence coefficient is used as a prediction result.
The above examples are merely illustrative for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. An image classification method based on an SE module and a self-attention mechanism network is characterized by comprising the following steps:
s1: the input picture is converted into a matrix with a specified size, then the data is converted into a tensor data type, and then the data is input into the model.
S2: a picture is sliced into a plurality of Patch, the Patch0 is added to be used as a classification feature, and a position information rich feature representation is added.
S3: features are extracted internally for each Patch using the SE module.
S4: features between each Patch are extracted using a self-attention mechanism.
S5: the output data from the attention mechanism is fed into two layers of MLPs (multilayer perceptron).
S6: the modules formed by S3, S4 and S5 are stacked in series to form the minimum unit layer of the method, and an L minimum unit layer is stacked, so that higher-level local and global features are extracted.
S7: and using the global features obtained in the step for classification.
2. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein step S1 is to pre-process the pictures and if there are few input pictures, convert them into tensor data types by data enhancement method.
3. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein the step S2 is to slice a picture with C x H x W pixelsBecome one
Figure FDA0003178260690000011
A plurality of Patch, each Patch is C × H1×W1Then each Patch is generated to 1 XCH1W1The vector of (C), the implementation of slicing is performed using convolution and Flatten operations with input dimensions (B, C, H, W) and output dimensions (B, N, CH)1W1) Where C is the number of channels in the picture and B is the Batch Size. In addition, Patch0 is added as a classification feature, that is, M is N +1 patches, so the output dimension of this step is (B, M, CH)1W1). Position information is added to each Patch (including the Patch0), so that the self-attention mechanism can better learn that even if the same picture is used, the Patch positions are different, and the obtained classification result is different.
4. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein the Patch internal feature extraction in step S3 adopts a convolutional neural network, so that the output dimensions (B, M, CH) of the previous step are selected1W1) Become (B, M, C, H)1,W1) (ii) a To make the input and output width and height the same, 0 padding is used and CH is used1W1A convolution kernel, from CH1W1Extracting internal features of Patch in each dimension to obtain CH1W1A H1×W1Feature maps, i.e. dimensions are (CH)1W1,H1,W1) (ii) a Then using global average pooling for each feature map, get (CH)1W11, 1); through the first linear layer, the output dimension is set to
Figure FDA0003178260690000021
Wherein beta is a scaling factor, the function Relu is activated, the formula is expressed as formula (1),
Figure FDA0003178260690000022
wherein
Figure FDA0003178260690000023
b1Is a trainable parameter, X1Is an input, X2Is the output; a second linear layer, the input dimension being
Figure FDA0003178260690000024
Output dimension is dim ═ CH1W1Activating a function softmax to obtain the weight of each channel, wherein the formula is expressed as a formula (2),
Figure FDA0003178260690000025
wherein
Figure FDA0003178260690000026
b2Is a trainable parameter, X2Is an input, X3Outputting; multiplying the feature maps of each channel respectively, and then adding all the feature maps to obtain f being 1 XH1×W1The characteristic diagram contains CH1W1The fusion of the internal characteristic information of the Patch extracted by dimensionality has a formula shown as a formula (3),
f=x31c1+x32c2+...+x3ici i=1,2,...,CH1W1formula (3)
Wherein x3iIs X3Element (c) ofiExtracting each feature map inside the Patch by using a convolutional neural network; after Flatten, through the linear layer, the dimension is raised to CH1W1. It can be seen that the input and output dimensions of the SE module are (B, M, CH)1W1) Where each picture in Batch shares the SE module, the number of parameters can be reduced.
5. The image classification method based on the SE module and the self-attention mechanism network as claimed in claim 1, wherein the step S4 is to extract global features from different dimensions by using a multi-head self-attention mechanism, and the self-attention mechanism process can be represented by the following process:
first, through the operations of linear layer and dimension conversion, three tensors Q, K, V are initialized, which are aimed at training the three tensors, and the dimensions are all the same
Figure FDA0003178260690000027
Where B denotes Batch Size, H denotes the number of heads of the multi-head attention mechanism, M denotes the number of input attention mechanisms Patch (including Patch0), and D ═ CH1W1Representing the dimensions of each Patch.
Figure FDA0003178260690000028
Therefore, dimension W is (B, H, M, M), where the ith row and jth column element in dimension 2 represents the weight of the ith Patch to the jth Patch.
A ═ WV formula (5)
So that the dimension of A is
Figure FDA0003178260690000031
And A aggregates the characteristic information of the whole picture, and then converts the dimension of A into (B, M, D) and outputs the (B, M, D) to the next layer.
6. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein the two-layer perceptron of step S5 can be represented by the following formula:
a first layer:
Figure FDA0003178260690000032
wherein, X4Is an input, X5Is the output of the computer system,
Figure FDA0003178260690000033
b4is an offset, where α is a reduction factor, W4And b4Are training parameters.
A second layer:
Figure FDA0003178260690000034
wherein, X5Is an input, X6Is the output of the computer system,
Figure FDA0003178260690000035
b5is an offset, X5And b5Are training parameters. The dimension of the output of this step is also (B, M, D).
7. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein step S6 is performed by normalization before inputting each module, and then each module adds Shortcut connection.
8. The image classification method based on the SE module and the attention mechanism network as claimed in claim 1, wherein in step S7, the range of attention of each minimum unit layer Patch0 is different, so that the extracted global information is different, and therefore the Patch0 of the multi-layer perceptron output of each minimum unit layer is extracted and recorded as ui∈R1×D1,2, L, then,
Figure FDA0003178260690000036
P=k1u1+k2u2+...+kiui1,2, L formula (9)
out is softmax (P) formula (10)
Wherein k isiRepresents uiE ∈ RD×1The method is characterized in that the training parameters are P, the P represents global features fused according to different weights of Patch0 output by each minimum unit layer, finally the P obtains classification confidence coefficient by inputting a softmax layer, and the class with the highest confidence coefficient is used as a prediction result.
CN202110839024.6A 2021-07-23 2021-07-23 Image classification method based on SE module and self-attention mechanism network Withdrawn CN113537243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110839024.6A CN113537243A (en) 2021-07-23 2021-07-23 Image classification method based on SE module and self-attention mechanism network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110839024.6A CN113537243A (en) 2021-07-23 2021-07-23 Image classification method based on SE module and self-attention mechanism network

Publications (1)

Publication Number Publication Date
CN113537243A true CN113537243A (en) 2021-10-22

Family

ID=78089425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110839024.6A Withdrawn CN113537243A (en) 2021-07-23 2021-07-23 Image classification method based on SE module and self-attention mechanism network

Country Status (1)

Country Link
CN (1) CN113537243A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113951866A (en) * 2021-10-28 2022-01-21 北京深睿博联科技有限责任公司 Deep learning-based uterine fibroid diagnosis method and device
CN114998653A (en) * 2022-05-24 2022-09-02 电子科技大学 ViT network-based small sample remote sensing image classification method, medium and equipment
CN117173562A (en) * 2023-08-23 2023-12-05 哈尔滨工程大学 SAR image ship identification method based on latent layer diffusion model technology
CN117496225A (en) * 2023-10-17 2024-02-02 南昌大学 Image data evidence obtaining method and system
CN117746209A (en) * 2023-12-13 2024-03-22 山东浪潮超高清智能科技有限公司 Image recognition method and device based on efficient multi-type convolution aggregation convolution

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113951866A (en) * 2021-10-28 2022-01-21 北京深睿博联科技有限责任公司 Deep learning-based uterine fibroid diagnosis method and device
CN114998653A (en) * 2022-05-24 2022-09-02 电子科技大学 ViT network-based small sample remote sensing image classification method, medium and equipment
CN114998653B (en) * 2022-05-24 2024-04-26 电子科技大学 ViT network-based small sample remote sensing image classification method, medium and equipment
CN117173562A (en) * 2023-08-23 2023-12-05 哈尔滨工程大学 SAR image ship identification method based on latent layer diffusion model technology
CN117173562B (en) * 2023-08-23 2024-06-04 哈尔滨工程大学 SAR image ship identification method based on latent layer diffusion model technology
CN117496225A (en) * 2023-10-17 2024-02-02 南昌大学 Image data evidence obtaining method and system
CN117496225B (en) * 2023-10-17 2024-09-06 南昌大学 Image data evidence obtaining method and system
CN117746209A (en) * 2023-12-13 2024-03-22 山东浪潮超高清智能科技有限公司 Image recognition method and device based on efficient multi-type convolution aggregation convolution

Similar Documents

Publication Publication Date Title
CN113537243A (en) Image classification method based on SE module and self-attention mechanism network
CN107633513B (en) 3D image quality measuring method based on deep learning
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
CN108446689B (en) Face recognition method
CN112926396A (en) Action identification method based on double-current convolution attention
CN111209921A (en) License plate detection model based on improved YOLOv3 network and construction method
US20230162522A1 (en) Person re-identification method of integrating global features and ladder-shaped local features and device thereof
CN109886066A (en) Fast target detection method based on the fusion of multiple dimensioned and multilayer feature
CN112069868A (en) Unmanned aerial vehicle real-time vehicle detection method based on convolutional neural network
CN106803069A (en) Crowd's level of happiness recognition methods based on deep learning
CN109063719B (en) Image classification method combining structure similarity and class information
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
WO2022227292A1 (en) Action recognition method
CN112800906A (en) Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN112861970B (en) Fine-grained image classification method based on feature fusion
CN117557922B (en) Unmanned aerial vehicle aerial photographing target detection method with improved YOLOv8
CN110084284A (en) Target detection and secondary classification algorithm and device based on region convolutional neural networks
CN115965819A (en) Lightweight pest identification method based on Transformer structure
CN116580322A (en) Unmanned aerial vehicle infrared small target detection method under ground background
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN114445816A (en) Pollen classification method based on two-dimensional image and three-dimensional point cloud
AU2019100967A4 (en) An environment perception system for unmanned driving vehicles based on deep learning
CN113706404A (en) Depression angle human face image correction method and system based on self-attention mechanism
CN117152644A (en) Target detection method for aerial photo of unmanned aerial vehicle
Bigdeli et al. Deep feature learning versus shallow feature learning systems for joint use of airborne thermal hyperspectral and visible remote sensing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211022