CN113111970B - Method for classifying images by constructing global embedded attention residual network - Google Patents
Method for classifying images by constructing global embedded attention residual network Download PDFInfo
- Publication number
- CN113111970B CN113111970B CN202110487497.4A CN202110487497A CN113111970B CN 113111970 B CN113111970 B CN 113111970B CN 202110487497 A CN202110487497 A CN 202110487497A CN 113111970 B CN113111970 B CN 113111970B
- Authority
- CN
- China
- Prior art keywords
- feature matrix
- global
- attention
- transformation
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000011176 pooling Methods 0.000 claims abstract description 31
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 124
- 230000009466 transformation Effects 0.000 claims description 59
- 238000012549 training Methods 0.000 claims description 40
- 230000006870 function Effects 0.000 claims description 36
- 238000012360 testing method Methods 0.000 claims description 31
- 238000012795 verification Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000007774 longterm Effects 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims 1
- 238000011056 performance test Methods 0.000 claims 1
- 238000010200 validation analysis Methods 0.000 claims 1
- 230000007246 mechanism Effects 0.000 description 13
- 230000008901 benefit Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000005284 excitation Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000009795 derivation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001125 extrusion Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 208000010877 cognitive disease Diseases 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 208000027061 mild cognitive impairment Diseases 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure discloses a method of classifying images by building a global embedded attention residual network, comprising: preprocessing the image data to be classified; constructing a global embedded attention residual network containing a global embedded attention module, wherein the global embedded attention residual network comprises 1 input layer, 1 convolution layer with a convolution kernel size of 7×7, 1 max pooling layer, a global embedded attention module, 2 fully connected layers and 1 output layer, and the global embedded attention module comprises a spatial attention sub-module based on global context and a channel attention sub-module based on coordinates; and inputting the preprocessed image data to be classified into a global embedded attention residual error network for classification.
Description
Technical Field
The present disclosure relates to an image classification method, and more particularly, to a method of classifying images by constructing a global embedded attention residual network.
Background
Image classification is an important task in the field of computer vision. At present, many scholars utilize a method of adding an attention mechanism to improve a network structure, so that image classification can be better performed. The most classical extrusion and excitation networks have been considered as milestones of the attention mechanism by two-step operation of extrusion and excitation, which first uses global averaging pooling to extrude global features into channel features, then uses a simple gating mechanism and excitation using sigmoid functions, and finally the corresponding channel products. The method can adaptively recalibrate the characteristic response of the channel by modeling the inter-dependencies between channels with the aid of a 2D global pool, providing significant performance improvements at a relatively low computational cost. However, it only considers the encoding of inter-channel information, ignoring the importance of location information, which is critical for capturing object structures in computer vision tasks. In addition, the later scholars try to combine the spatial attention information with the channel attention information, but the benefit brought by using only the position information of the local space is not high, so that the local position information of the channel can be effectively utilized while the global position information is added into the neural network.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for classifying images by constructing a global embedded attention residual network, and the classification capability of the images is improved by utilizing global position information and embedding the global position information into channel information to effectively extract image features.
In order to achieve the above object, the present disclosure provides the following technical solutions:
a method of classifying images by constructing a global embedded attention residual network, comprising the steps of:
s100: preprocessing the image data to be classified;
s200: constructing a global embedded attention residual network containing a global embedded attention module, wherein the global embedded attention residual network comprises 1 input layer, 1 convolution layer with a convolution kernel size of 7×7, 1 max pooling layer, a global embedded attention module, 2 fully connected layers and 1 output layer, and the global embedded attention module comprises a spatial attention sub-module based on global context and a channel attention sub-module based on coordinates;
s300: and inputting the preprocessed image data to be classified into a global embedded attention residual error network for classification.
Preferably, after the global embedded attention residual network is built, a training sample is required to be selected and preprocessed to train the network, a verification sample is required to be selected and preprocessed to adjust parameters of the trained network, and a test sample is required to be selected to test performance of the trained network.
Preferably, in step S200, the spatial attention submodule based on the global context includes:
the first subunit is used for inputting the preprocessed training samples, verification samples and test samples into the convolution layer and the pooling layer for processing and then performing global average pooling operation so as to obtain a feature matrix containing global information;
the second subunit is used for performing linear transformation on the feature matrix containing global information by adopting convolution and reshape functions with convolution kernel size of 1×1 so as to obtain a feature matrix subjected to dimension transformation processing;
the third subunit is configured to perform adaptive selection on the feature matrix subjected to the dimensional transformation processing by using a softmax function, obtain a corresponding weight of each different element on the feature matrix, and multiply the corresponding weight of each different element with the feature matrix containing global information to obtain a feature matrix containing global context feature information;
and a fourth subunit, configured to perform nonlinear transformation on the feature matrix containing the global context feature information by using batch normalization and a ReLU activation function and perform dimensional transformation by using 1×1 convolution.
Preferably, the global context-based spatial attention submodule is expressed as:
where x represents the output of global average pooling and y represents globalThe output of the contextual features, H and W representing the height and width of the input image, respectively, X representing the input image, K representing a 1X 1 convolution, reLU representing the ReLU activation function, BN representing the batch normalization function, N representing the number of elements in the feature matrix, e representing the base of the natural logarithmic function, i, j, m representing the possible positions of all elements in the feature matrix, respectively, X j 、x m Respectively representing the values of element information in the feature matrix, t represents the weight of the x matrix, tx j And tx m And representing the output value obtained by calculating the feature matrix after global average pooling operation.
Preferably, in step S200, the coordinate-based channel attention submodule includes:
a fifth subunit, configured to decompose the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation by adopting average pooling along the W 'direction and the H' direction, to obtain a one-dimensional feature matrix along the W 'direction and a one-dimensional feature matrix along the H' direction, where the one-dimensional feature matrix along the W 'direction includes local position information of the channel, and the one-dimensional feature matrix along the H' direction includes long-term dependency information;
a sixth subunit, configured to concatenate the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction, and perform feature transformation with a convolution of 1×1 to obtain a feature matrix subjected to dimension transformation;
and a seventh subunit, configured to perform weight distribution on the feature matrix subjected to the dimension transformation by using a softmax function to obtain feature matrices with different weights, and perform feature transformation on the feature matrices with different weights by using 1×1 convolution, so as to obtain an output of the global embedded attention module.
Preferably, the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation is decomposed by adopting average pooling along the W 'direction and the H' direction, and the obtained one-dimensional feature matrix along the W 'direction and the obtained one-dimensional feature matrix along the H' direction are respectively expressed as:
where H 'and W' represent the height and width of the spatial attention sub-module output image of the global context, Z H And Z W Representing the one-dimensional feature matrix along the H 'direction and along the W' direction, respectively, i and j representing the H 'and W' of the ith row, respectively.
Preferably, the cascading of the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction is performed by:
g=K(z w +z h )
where g represents the output of the cascade operation and K represents a 1 x 1 convolution.
Preferably, the output of the global embedded attention module is expressed as:
z=X(i,j)×a c +X(i,j)×b c
and is also provided with
Wherein A and B represent random numbers, ac and Bc represent initial values, a c And b c Representing the weights corresponding to feature matrices with different weights, e represents the natural logarithm, g H And g W The feature matrix after dimension transformation in step S306 is obtained by processing a ReLU activation function and then dividing the feature matrix into two matrices along the space dimension, wherein the dimension of the feature matrix after dimension transformation in step S306 is R C×(W+H) G after segmentation H And g W The dimension of the feature matrix is R respectively C×H And R is C×W 。
Preferably, the image data to be classified and the training sample, the verification sample and the test sample are preprocessed according to the following steps:
s201: performing horizontal and vertical overturning on the image data to be classified and the image data in the training sample, the verification sample and the test sample;
s202: rotating the flipped image data clockwise or anticlockwise;
s203: scaling the rotated image data;
s204: and carrying out average reduction processing on the zoomed image data.
Preferably, in step S204, the mean value reduction processing is performed on the scaled image data by the following formula:
wherein Z is the image after the mean value is subtracted, v i The pixel matrix of the ith image in the n images, n is 100000 integer images.
Compared with the prior art, the beneficial effects that this disclosure brought are:
the method for embedding the global position information into the channel information is provided for constructing the global embedded attention residual error network, and the memory burden is greatly reduced through effective data preprocessing.
Drawings
FIG. 1 is a flow chart of a method of classifying images by building a global embedded attention residual network, provided by one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a global embedded attention residual network provided by another embodiment of the present disclosure;
fig. 3 (a), fig. 3 (b), fig. 3 (c), fig. 3 (d) are schematic diagrams illustrating comparison of a global embedded attention residual network and an existing classification method according to another embodiment of the present disclosure.
Detailed Description
Specific embodiments of the present disclosure will be described in detail below with reference to fig. 1 to 3 (d). While specific embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It should be noted that certain terms are used throughout the description and claims to refer to particular components. Those of skill in the art will understand that a person may refer to the same component by different names. The specification and claims do not identify differences in terms of components, but rather differences in terms of the functionality of the components. As used throughout the specification and claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. The description hereinafter sets forth a preferred embodiment for practicing the invention, but is not intended to limit the scope of the invention, as the description proceeds with reference to the general principles of the description. The scope of the present disclosure is defined by the appended claims.
For the purposes of promoting an understanding of the embodiments of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific examples, without the intention of being limiting the embodiments of the disclosure.
In one embodiment, as shown in fig. 1, an image classification method based on a global embedded attention residual network includes the following steps:
s100: preprocessing the image data to be classified;
s200: constructing a global embedded attention residual network containing a global embedded attention module, wherein the global embedded attention residual network comprises 1 input layer, 1 convolution layer with a convolution kernel size of 7×7, 1 max pooling layer, a global embedded attention module, 2 fully connected layers and 1 output layer, and the global embedded attention module comprises a spatial attention sub-module based on global context and a channel attention sub-module based on coordinates;
s300: and inputting the preprocessed image data to be classified into a global embedded attention residual error network for classification.
Compared with the existing method, the method can reduce model parameters of the deep neural network, fully utilize the context information and embed the context information into the channel information, realize global feature modeling, improve classification effect and solve the problems that the existing network is low in classification accuracy and difficult to combine the position information and the channel information.
In another embodiment, after the global embedded attention residual network is built, a training sample is required to be selected and preprocessed to train the network, a verification sample is required to be selected and preprocessed to adjust parameters of the trained network, and a test sample is required to test performance of the trained network.
In this embodiment, firstly, a plurality of image data are selected from any image data set including COCO, imageNet and ADNI, the selected image data are sorted into different image data subsets and are respectively used as a training sample, a verification sample and a test sample, then the selected training sample, the verification sample and the test sample are preprocessed, finally, the preprocessed training sample is input into the global embedded attention residual network for network training, after the network training is completed, the preprocessed verification sample is input into the trained network for adjusting network parameters, and the preprocessed test sample is input into the trained network for testing network performance, so that image classification is realized.
Compared with the existing method, the method can reduce model parameters of the deep neural network, fully utilize the context information and embed the context information into the channel information, realize global feature modeling, improve classification effect and solve the problems that the existing network is low in classification accuracy and difficult to combine the position information and the channel information.
In another embodiment, in step S200, the global context-based spatial attention submodule includes:
the first subunit is used for inputting the preprocessed training samples, verification samples and test samples into the convolution layer and the pooling layer for processing and then performing global average pooling operation so as to obtain a feature matrix containing global information;
the second subunit performs linear transformation on the feature matrix containing global information by adopting convolution and reshape functions with convolution kernel size of 1 multiplied by 1 so as to obtain a feature matrix subjected to dimension transformation processing;
the third subunit performs self-adaptive selection on the feature matrix subjected to dimension transformation processing by using a softmax function to obtain the corresponding weight of each different element on the feature matrix, and multiplies the corresponding weight of each different element by the feature matrix containing global information to obtain the feature matrix containing global context feature information;
and a fourth subunit, configured to perform nonlinear transformation on the feature matrix containing the global context feature information by using batch normalization and a ReLU activation function and perform dimensional transformation by using 1×1 convolution.
In another embodiment, the global context based spatial attention submodule is expressed as:
where x represents the output of global average pooling, y represents the output of global context features, H and W are eachRepresenting the height and width of the input image, X representing the input image, K representing a 1X 1 convolution, reLU representing a ReLU activation function, BN representing a batch normalization function, N representing the number of elements in the feature matrix, e representing the base of a natural logarithmic function, i, j, m representing the possible positions of all elements in the feature matrix, X, respectively j 、x m Respectively representing the values of element information in the feature matrix, t represents the weight of the x matrix, tx j And tx m And representing the output value obtained by calculating the feature matrix after global average pooling operation.
In another embodiment, in step S200, the coordinate-based channel attention submodule includes:
a fifth subunit, configured to decompose the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation by adopting average pooling along the W 'direction and the H' direction, to obtain a one-dimensional feature matrix along the W 'direction and a one-dimensional feature matrix along the H' direction, where the one-dimensional feature matrix along the W 'direction includes local position information of the channel, and the one-dimensional feature matrix along the H' direction includes long-term dependency information;
a sixth subunit, configured to concatenate the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction, and perform feature transformation with a convolution of 1×1 to obtain a feature matrix subjected to dimension transformation;
and a seventh subunit, wherein the feature matrixes with different weights are obtained by carrying out weight distribution on the feature matrixes subjected to dimension transformation by utilizing a softmax function, and the feature matrixes with different weights are respectively subjected to feature transformation by utilizing 1×1 convolution, so that the output of the global embedded attention module is obtained.
In another embodiment, the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation is decomposed by adopting average pooling along the W 'direction and the H' direction, and the obtained one-dimensional feature matrix along the W 'direction and the obtained one-dimensional feature matrix along the H' direction are respectively expressed as:
where H 'and W' represent the height and width of the spatial attention sub-module output image of the global context, Z H And Z W Representing the one-dimensional feature matrix along the H 'direction and along the W' direction, respectively, i and j representing the H 'and W' of the ith row, respectively.
In another embodiment, the cascading of the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction is performed by:
g=K(z w +z h )
where g represents the output of the cascade operation and K represents a 1 x 1 convolution.
In another embodiment, the output of the global embedded attention module is expressed as:
z=X(i,j)×a c +X(i,j)×b c
and is also provided with
Wherein A and B represent random numbers, ac and Bc represent initial values, a c And b c Representing the weights corresponding to feature matrices with different weights, e represents the natural logarithm, g H And g W The feature matrix after dimension transformation in step S306 is obtained by processing a ReLU activation function and then dividing the feature matrix into two matrices along the space dimension, wherein the dimension of the feature matrix after dimension transformation in step S306 is R C×(W+H) G after segmentation H And g W The dimension of the feature matrix is R respectively C×H And R is C×W 。
In another embodiment, the image data to be classified and the training sample, the verification sample and the test sample are preprocessed according to the following steps:
s201: performing horizontal and vertical overturning on the image data to be classified and the image data in the training sample, the verification sample and the test sample;
s202: rotating the flipped image data clockwise or anticlockwise;
in this step of the process, the process is carried out,
s203: scaling the rotated image data;
s204: and carrying out average reduction processing on the zoomed image data.
In this step, the mean value reduction processing is performed on the scaled image data by the following formula:
wherein Z is the image after the mean value is subtracted, v i The pixel matrix of the ith image in the n images, n is 100000 integer images.
In another embodiment, the training of the preprocessed training samples by inputting them into the global embedded attention residual network is performed by:
s501: performing linear and nonlinear operation on the preprocessed training samples in a forward propagation mode;
s502: and carrying out chain derivation on the training samples subjected to linear and nonlinear operation in a counter-propagation mode, and updating the weight information of the network according to a preset learning rate until the maximum iteration number is reached.
In this embodiment, training samples are sequentially processed through an input layer, a convolution layer, a maximum pooling layer, a global embedded attention module, a full connection layer and an output layer in a forward propagation manner, and then chain derivation is performed in a reverse propagation manner, and weight information of a network is updated from the output layer sequentially through the full connection layer, the global embedded attention module, the maximum pooling layer, the convolution layer and the input layer according to a preset learning rate a (a=0.01 and a gradually decreases with an increase of the network layer number, and a=a/5 every 20 iterations are added), so that the training samples continuously reciprocate until the maximum iteration number is reached.
The method of the present disclosure is further described below in connection with specific examples.
Specific example 1:
1. 100000 Zhang Yangben images are selected from the ImageNet image classification data set as training samples, 10000 sample images are selected as verification samples and 30000 sample images are selected as test samples, and the training samples and the images in the test samples are not overlapped.
2. 100000 images in the training sample are preprocessed, which comprises the following steps:
a. performing horizontal overturn and vertical overturn on the image;
b. rotating the flipped image by 20 degrees in a clockwise or counterclockwise direction;
c. scaling the rotated image to obtain a training sample image with 224×224;
d. the training sample image is subjected to mean reduction processing through a formula (1), wherein the formula (1) is expressed as:
wherein Z is the image after the mean value is subtracted, v i The pixel matrix of the ith image in the n images, n is 100000 integer images.
The preprocessing steps of the images in the verification sample and the test sample are the same as the above steps, and will not be repeated here.
3. Constructing a global embedded attention residual network containing a global embedded attention module, as shown in fig. 2, wherein the global embedded attention residual network comprises 1 input layer, 1 convolution layer with a convolution kernel size of 7×7, 1 max pooling layer, a global embedded attention module, 2 fully connected layers and 1 output layer, and the global embedded attention module comprises a spatial attention sub-module based on global context and a channel attention sub-module based on coordinates;
wherein the global context based spatial attention submodule comprises:
the first subunit is used for inputting the preprocessed training samples, verification samples and test samples into the convolution layer and the pooling layer for processing and then performing global average pooling operation so as to obtain a feature matrix containing global information;
the second subunit performs linear transformation on the feature matrix containing global information by adopting convolution and reshape functions with convolution kernel size of 1 multiplied by 1 so as to obtain a feature matrix subjected to dimension transformation processing;
the third subunit performs self-adaptive selection on the feature matrix subjected to dimension transformation processing by using a softmax function to obtain the corresponding weight of each different element on the feature matrix, and multiplies the corresponding weight of each different element by the feature matrix containing global information to obtain the feature matrix containing global context feature information;
and a fourth subunit, configured to perform nonlinear transformation on the feature matrix containing the global context feature information by using batch normalization and a ReLU activation function and perform dimensional transformation by using 1×1 convolution.
Wherein the first subunit is implemented by formula (2), where formula (2) is expressed as:
the second to fourth sub-units are realized by the formula (3), and the formula (3) is expressed as:
where X represents the output of global average pooling, y represents the output of global context features, H and W represent the height and width of the input image, respectively, and X represents the inputThe image, K represents a 1 x 1 convolution, reLU represents a ReLU activation function, BN represents a batch normalization function, N represents the number of elements in the feature matrix, e represents the base of a natural logarithmic function, i, j, m represent the possible positions of all elements in the feature matrix, x, respectively j 、x m Respectively representing the values of element information in the feature matrix, t represents the weight of the x matrix, tx j And tx m And representing the output value obtained by calculating the feature matrix after global average pooling operation.
The coordinate-based channel attention submodule includes:
a fifth subunit, configured to decompose the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation by adopting average pooling along the W 'direction and the H' direction, to obtain a one-dimensional feature matrix along the W 'direction and a one-dimensional feature matrix along the H' direction, where the one-dimensional feature matrix along the W 'direction includes local position information of the channel, and the one-dimensional feature matrix along the H' direction includes long-term dependency information;
a sixth subunit, configured to concatenate the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction, and perform feature transformation with a convolution of 1×1 to obtain a feature matrix subjected to dimension transformation;
and a seventh subunit, wherein the feature matrixes with different weights are obtained by carrying out weight distribution on the feature matrixes subjected to dimension transformation by utilizing a softmax function, and the feature matrixes with different weights are respectively subjected to feature transformation by utilizing 1×1 convolution, so that the output of the global embedded attention module is obtained.
Wherein the fifth subunit is implemented by formulas (4) and (5), where formulas (4) and (5) are expressed as:
the sixth subunit is implemented by equation (6), equation (6) being expressed as:
g=K(z w +z h ) (6)
the seventh subunit is realized by formulas (7) - (9), formulas (7) - (9) being expressed as:
z=X(i,j)×a c +X(i,j)×b c (9)
where H 'and W' represent the height and width of the spatial attention sub-module output image of the global context, Z H And Z W Representing one-dimensional feature matrices in the H 'direction and in the W' direction, respectively, i and j representing H 'of the ith row and W' of the jth row, respectively, g representing the output of the cascade operation, K representing a 1 x 1 convolution, a and B representing random numbers, ac and Bc representing initial values, a c And b c Representing the weights corresponding to feature matrices with different weights, e represents the natural logarithm, g H And g W The feature matrix after dimension transformation in step S306 is obtained by processing a ReLU activation function and then dividing the feature matrix into two matrices along the space dimension, wherein the dimension of the feature matrix after dimension transformation in step S306 is R C×(W+H) G after segmentation H And g W The dimension of the feature matrix is R respectively C ×H And R is C×W 。
It should be noted that, the number of the global embedded attention modules is 16, each global embedded attention module is correspondingly embedded in the residual error structure, the convolution layer outside the global embedded attention mechanism can selectively perform batch normalization function and excitation function to perform nonlinear transformation, the output layer uses the full connection layer and softmax function to output the probability of the category to which each input image belongs, and the category with the maximum probability is used as the predicted category.
It should be further noted that 16 global embedded attention modules are selected because the attention modules may be embedded in multiple networks, such as, for example, resnet50, and the disclosure describes based on a structure of resnet50, and the structure of resnet50 is (3,4,6,3) a total of 16 residual blocks, so the disclosure selects 16 global embedded attention modules.
4. And inputting the preprocessed training sample containing 100000 images into the global embedded attention residual error network to train the training sample, and repeatedly and circularly updating the weight of the network through two steps of forward propagation and reverse propagation until the maximum iteration times reach 70-120 times, and ending the training process to obtain a trained residual error network model. And inputting the preprocessed verification sample containing 100000 images and the preprocessed test sample containing 30000 images into a trained network for verification and testing, so as to realize image classification.
Test samples were used to test and compared to the Top-1 accuracy and Top-5 accuracy on the COCO and ImageNet datasets, respectively, with existing classification methods including CA attention mechanisms, SE attention, BAM attention, and CBAM attention mechanisms, with the results shown in tables 1 and 2:
TABLE 1
TABLE 2
As can be seen from Table 1, the global embedded attention residual network added with the global embedded attention module has good performance in COCO or ImageNet data sets, with the reference of resnet-50, the highest Top-1acc can reach 75.9, the highest Top-5acc can reach 86.6, and the highest Top-1acc and Top-5acc respectively reach 75.8 and 83.1 in ImageNet data sets, so that the global embedded attention residual network has a certain improvement compared with other models.
As can be seen from Table 2, if Resnet-101 is used, there is a certain improvement in Top-1acc and Top-5acc, which indicates that the method disclosed by the disclosure has better generalization performance and can better classify images.
Specific example 2:
327 structural magnetic resonance image data were selected from the ADNI dataset, including 119 brain MRI of mild cognitive impairment patients, 101 brain MRI of alzheimer patients, and 107 normal human brain MRI. According to 7:3 into training sample data and test sample data, training and verifying the training sample data by using a 10-fold cross verification method, specifically: the training sample data are divided into 10 parts, the number is 0-9, wherein 0-8 is assumed, 9 parts are taken as training sets, 9 parts are taken as verification sets, after training and verification are finished, 8 parts are taken as verification sets, the rest are taken as test sets, training and verification are performed, and the like, and the training and verification are performed for 10 times. Finally, testing by using a test sample, and comparing the accuracy, recall and precision of the classification method comprising CA attention mechanism, SE attention, BAM attention and CBAM attention mechanism with the existing classification method on an ADNI data set, wherein the comparison results are shown in tables 3 and 4:
TABLE 3 Table 3
TABLE 4 Table 4
As can be seen from tables 3 and 4, the accuracy of the global embedded attention residual network added with the global embedded attention module is up to 88.5 when the Resnet-50 is taken as a basic model, and is up to 90.5 when the Resnet-101 is taken as a basic model.
To further verify the technical effect of the method of the present disclosure, the present disclosure applies the method to three data sets, and selects the results of a portion of the tests for visual display. As shown in fig. 3 (a) to 3 (d), where fig. 3 (a) is a non-attention mechanism, fig. 3 (b) is a SE attention mechanism, fig. 3 (c) is a CA attention mechanism, and fig. 3 (d) is a GEA attention mechanism (i.e., a global embedded attention residual network). From the results, the GEA mechanism proposed by the user can restrict the network more, so that the network can concentrate the more prominent features of the image rather than focusing on the whole image, the network can more prominently focus on the region of interest, the most prominent features of the image can be found out, and the classification accuracy of the network on the image can be greatly improved.
The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.
Claims (7)
1. A method of classifying images by constructing a global embedded attention residual network, comprising the steps of:
s100: preprocessing the image data to be classified;
s200: constructing a global embedded attention residual network containing a global embedded attention module, wherein the global embedded attention residual network comprises 1 input layer, 1 convolution layer with a convolution kernel size of 7×7, 1 max pooling layer, a global embedded attention module, 2 fully connected layers and 1 output layer, and the global embedded attention module comprises a spatial attention sub-module based on global context and a channel attention sub-module based on coordinates;
the global context-based spatial attention submodule includes:
the first subunit is used for inputting the preprocessed training samples, verification samples and test samples into the convolution layer and the pooling layer for processing and then performing global average pooling operation so as to obtain a feature matrix containing global information;
the second subunit performs linear transformation on the feature matrix containing global information by adopting convolution and reshape functions with convolution kernel size of 1 multiplied by 1 so as to obtain a feature matrix subjected to dimension transformation processing;
the third subunit performs self-adaptive selection on the feature matrix subjected to dimension transformation processing by using a so and max function to obtain the corresponding weight of each different element on the feature matrix, and multiplies the corresponding weight of each different element by the feature matrix containing global information to obtain the feature matrix containing global context feature information;
a fourth subunit, configured to perform nonlinear transformation on the feature matrix including the global context feature information by using batch normalization and a ReLU activation function, and perform dimensional transformation by using 1×1 convolution; the global context based spatial attention submodule is expressed as:
wherein X is represented as the output of global average pooling, y is represented as the output of global context features, H and W are represented as the height and width of the input image, respectively, X is represented as the input image, K is represented as a 1X 1 convolution, reLU is represented as a ReLU activation function, BN is represented as a batch normalization function, N is represented as the number of elements in the feature matrix, e is represented as the base of a natural logarithmic function, i, j, m are represented as the possible positions of all elements in the feature matrix, respectively, X j 、x m Respectively representing the values of element information in the feature matrix, t represents the weight of the x matrix, tx j And tx m Representing the output value obtained by calculating the feature matrix after global average pooling operation;
The coordinate-based channel attention submodule includes:
a fifth subunit, configured to decompose the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation by adopting average pooling along the W 'direction and the H' direction, to obtain a one-dimensional feature matrix along the W 'direction and a one-dimensional feature matrix along the H' direction, where the one-dimensional feature matrix along the W 'direction includes local position information of the channel, and the one-dimensional feature matrix along the H' direction includes long-term dependency information;
a sixth subunit, configured to concatenate the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction, and perform feature transformation with a convolution of 1×1 to obtain a feature matrix subjected to dimension transformation;
a seventh subunit, performing weight distribution on the feature matrix subjected to the dimension transformation by using a softmax function to obtain feature matrices with different weights, and performing feature transformation on the feature matrices with different weights by using 1×1 convolution respectively to obtain the output of the global embedded attention module;
s300: and inputting the preprocessed image data to be classified into a global embedded attention residual error network for classification.
2. The method of claim 1, wherein after the global embedded attention residual network construction is completed, a training sample is selected and preprocessed to train the network, a verification sample is selected and preprocessed to adjust parameters of the trained network, and a test sample is selected to perform performance test on the trained network.
3. The method of claim 1, wherein the feature matrix containing the global context feature information after the nonlinear transformation and the dimensional transformation is decomposed by adopting average pooling along the W 'and the H' directions, and the obtained one-dimensional feature matrix along the W 'direction and the obtained one-dimensional feature matrix along the H' direction are respectively expressed as:
where H 'and W' represent the height and width of the spatial attention sub-module output image of the global context, Z H And Z W Representing the one-dimensional feature matrix along the H 'direction and along the W' direction, respectively, i and j representing the H 'and W' of the ith row, respectively.
4. The method of claim 1, wherein the cascading of the one-dimensional feature matrix along the W 'direction and the one-dimensional feature matrix along the H' direction is performed by:
g=K(z w +z h )
where g represents the output of the cascade operation and K represents a 1 x 1 convolution.
5. The method of claim 1, wherein the output of the global embedded attention module is represented as:
z=X(i,j)×a c +X(i,j)×b c
and is also provided with
Wherein A and B represent random numbers, ac and Bc represent initial values, a c And b c Representing the weights corresponding to feature matrices with different weights, e represents the natural logarithm, g H And g W Is performed by the process in step S306After the feature matrix subjected to the dimension transformation is processed by the ReLU activation function, the feature matrix is segmented into two matrices along the space dimension, and the dimension of the feature matrix subjected to the dimension transformation in step S306 is R C×(W+H) G after segmentation H And g W The dimension of the feature matrix is R respectively C×H And R is C×W 。
6. The method according to claim 2, wherein the image data to be classified and the training, validation and test samples are preprocessed according to the following steps:
s201: performing horizontal and vertical overturning on the image data to be classified and the image data in the training sample, the verification sample and the test sample;
s202: rotating the flipped image data clockwise or anticlockwise;
s203: scaling the rotated image data;
s204: and carrying out average reduction processing on the zoomed image data.
7. The method according to claim 6, wherein in step S204, the process of reducing the mean value of the scaled image data is performed by:
wherein Z is the image after the mean value is subtracted, v i The pixel matrix of the ith image in the n images, n is 100000 integer images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110487497.4A CN113111970B (en) | 2021-04-30 | 2021-04-30 | Method for classifying images by constructing global embedded attention residual network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110487497.4A CN113111970B (en) | 2021-04-30 | 2021-04-30 | Method for classifying images by constructing global embedded attention residual network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113111970A CN113111970A (en) | 2021-07-13 |
CN113111970B true CN113111970B (en) | 2023-12-26 |
Family
ID=76720844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110487497.4A Active CN113111970B (en) | 2021-04-30 | 2021-04-30 | Method for classifying images by constructing global embedded attention residual network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113111970B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034375B (en) * | 2022-08-09 | 2023-06-27 | 北京灵汐科技有限公司 | Data processing method and device, neural network model, equipment and medium |
CN115203380B (en) * | 2022-09-19 | 2022-12-20 | 山东鼹鼠人才知果数据科技有限公司 | Text processing system and method based on multi-mode data fusion |
CN116958711B (en) * | 2023-09-19 | 2023-12-15 | 华东交通大学 | Lead-zinc ore image classification model construction method, system, storage medium and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199214A (en) * | 2020-01-04 | 2020-05-26 | 西安电子科技大学 | Residual error network multispectral image ground feature classification method |
CN111259982A (en) * | 2020-02-13 | 2020-06-09 | 苏州大学 | Premature infant retina image classification method and device based on attention mechanism |
WO2020140633A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Text topic extraction method, apparatus, electronic device, and storage medium |
CN112163601A (en) * | 2020-09-14 | 2021-01-01 | 华南理工大学 | Image classification method, system, computer device and storage medium |
-
2021
- 2021-04-30 CN CN202110487497.4A patent/CN113111970B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020140633A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Text topic extraction method, apparatus, electronic device, and storage medium |
CN111199214A (en) * | 2020-01-04 | 2020-05-26 | 西安电子科技大学 | Residual error network multispectral image ground feature classification method |
CN111259982A (en) * | 2020-02-13 | 2020-06-09 | 苏州大学 | Premature infant retina image classification method and device based on attention mechanism |
CN112163601A (en) * | 2020-09-14 | 2021-01-01 | 华南理工大学 | Image classification method, system, computer device and storage medium |
Non-Patent Citations (1)
Title |
---|
分层特征融合注意力网络图像超分辨率重建;雷鹏程;刘丛;唐坚刚;彭敦陆;;中国图象图形学报(第09期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113111970A (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113111970B (en) | Method for classifying images by constructing global embedded attention residual network | |
US20230021497A1 (en) | Generating images using neural networks | |
CN111626300B (en) | Image segmentation method and modeling method of image semantic segmentation model based on context perception | |
CN109190695B (en) | Fish image classification method based on deep convolutional neural network | |
CN108734661B (en) | High-resolution image prediction method for constructing loss function based on image texture information | |
CN110706214B (en) | Three-dimensional U-Net brain tumor segmentation method fusing condition randomness and residual error | |
CN111127472B (en) | Multi-scale image segmentation method based on weight learning | |
CN112418261B (en) | Human body image multi-attribute classification method based on prior prototype attention mechanism | |
CN113706544B (en) | Medical image segmentation method based on complete attention convolutional neural network | |
CN115311502A (en) | Remote sensing image small sample scene classification method based on multi-scale double-flow architecture | |
CN115965789A (en) | Scene perception attention-based remote sensing image semantic segmentation method | |
CN116168197A (en) | Image segmentation method based on Transformer segmentation network and regularization training | |
CN116503399A (en) | Insulator pollution flashover detection method based on YOLO-AFPS | |
CN116091823A (en) | Single-feature anchor-frame-free target detection method based on fast grouping residual error module | |
CN115171052A (en) | Crowded crowd attitude estimation method based on high-resolution context network | |
CN113298931B (en) | Reconstruction method and device of object model, terminal equipment and storage medium | |
CN114882278A (en) | Tire pattern classification method and device based on attention mechanism and transfer learning | |
CN111724306B (en) | Image reduction method and system based on convolutional neural network | |
CN117373064A (en) | Human body posture estimation method based on self-adaptive cross-dimension weighting, computer equipment and storage medium | |
CN113436224A (en) | Intelligent image clipping method and device based on explicit composition rule modeling | |
CN109583584B (en) | Method and system for enabling CNN with full connection layer to accept indefinite shape input | |
CN116385454A (en) | Medical image segmentation method based on multi-stage aggregation | |
CN110930314A (en) | Image banding noise suppression method and device, electronic device and storage medium | |
CN114897884A (en) | No-reference screen content image quality evaluation method based on multi-scale edge feature fusion | |
CN114782779B (en) | Small sample image feature learning method and device based on feature distribution migration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |