CN113743484A

CN113743484A - Image classification method and system based on space and channel attention mechanism

Info

Publication number: CN113743484A
Application number: CN202110961232.3A
Authority: CN
Inventors: 杨军; 刘孟鑫; 马利亚
Original assignee: Ningxia University
Current assignee: Ningxia University
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-12-03

Abstract

The invention discloses an image classification method and system based on a space and channel attention mechanism, which comprises the following steps: s1, acquiring a sample data set; s2, extracting a certain number of images from the sample data set, and generating a confrontation network DCGAN by utilizing depth convolution to generate a false sample to obtain an extended data set; s3, processing the expansion data set, including dimension reduction, denoising and data enhancement; s4, dividing the expansion data set according to the proportion to obtain a training set and a test set; s5, inputting the images in the training set into the constructed image classification network model for parameter adjustment training, extracting the characteristics of the images from the images, and finally storing the trained image classification network model; and S6, loading the images in the test set into the trained image classification network model for judgment, wherein the result output by the model is the classification result. The invention can realize accurate classification of the target type image.

Description

Image classification method and system based on space and channel attention mechanism

Technical Field

The invention relates to the technical field of image classification, in particular to an image classification method and system based on a space and channel attention mechanism.

Background

Image classification is an image processing method in which features of an input original image are extracted and classified into categories. The main processes include image preprocessing, feature and classifier design. Feature extraction is the most critical part of the image classification task. Traditional image classification is implemented based on feature coding, while modern image classification is implemented based on deep learning.

In recent years, image classification techniques have received widespread attention and have played an important role in many different fields. The technology is widely applied to agriculture, environment, medicine and other fields at present. The development of image classification technology has made great progress, and especially the deep Convolutional Neural Networks (CNNs) have made success, and through a series of methods, the image identification problem with huge data volume is successfully reduced in dimension, and the method is the best mode for image feature extraction. CNN was first proposed by YanLeCun as lennet-5 and applied to handwriting recognition, followed by the appearance of AlexNet, VGG, GoogleNet, to the now widely used ResNet and densneet. The convolutional neural network plays an important role in computer vision.

Feature extraction is a difficult point in the whole classification regardless of the modern method or the traditional method, and once good features are found, the classification becomes easy. Modern image classification extracts higher-dimensional, abstract features than traditional image classification methods, and these features are closely related to the classifier. However, when processing huge image data, image interference and other data, the feature extraction cannot meet the actual requirements, and the classification accuracy cannot be achieved. Therefore, a method of extracting higher dimensional image features is proposed.

Disclosure of Invention

The first purpose of the present invention is to overcome the disadvantages and shortcomings of the prior art, and to provide an image classification method based on a spatial and channel attention mechanism, which can realize accurate classification of target type images.

It is a second object of the present invention to provide an image classification system based on spatial and channel attention mechanisms.

The first purpose of the invention is realized by the following technical scheme: the image classification method based on the spatial and channel attention mechanism comprises the following steps:

s1, obtaining an image sample of the target type to be distinguished and classified, and constructing a corresponding sample data set;

s2, extracting a certain number of images from the acquired sample data set, and generating a countermeasure network DCGAN by utilizing depth convolution to generate false samples, so as to expand the acquired sample data set to obtain an expanded data set;

s3, processing the expansion data set, including dimension reduction, denoising and data enhancement;

s4, dividing the expansion data set processed in the step S3 according to the proportion, dividing most of data into training sets and dividing a small part of data into test sets;

s5, inputting the images in the training set into the constructed image classification network model for parameter adjustment training, extracting the characteristics of the images from the images, and finally storing the trained image classification network model; the constructed image classification network model consists of a DenseNet, a space attention mechanism, a channel attention mechanism SE-Net and classification submodules, wherein the DenseNet is a main network of the model and is used for extracting global features of images and multiplexing the features; embedding a space attention mechanism into the DenseNet, performing corresponding space transformation on the space domain information of the picture, only concerning interested positions, and extracting key information; the channel attention mechanism SE-Net inhibits or enhances different channels aiming at the importance of the characteristics by modeling the importance degree of each characteristic channel; the classification submodule uses Softmax as a core to accurately classify various images; the image classification network model acquires an interested part through a space attention mechanism, acquires the weight of the characteristics by utilizing a channel attention mechanism SE-Net, emphasizes useful information to inhibit useless information and inputs the useful information to a classification submodule;

and S6, loading the images in the test set into the trained image classification network model for judgment, wherein the result output by the model is the classification result.

Further, in step S2, a certain number of images are extracted from the acquired sample data set to generate false samples, an unsupervised learning deep convolution is used to generate a confrontation network DCGAN, and a mutual game is performed between a generator G and a discriminator D of the DCGAN to finally achieve nash balance, so that the sample generated by the generator G can deceive the discriminator D and is finally judged to be true; the specific process of generating the false sample is as follows:

s21, a generator G generates synthetic data from given noise, and finally converts the high-level representation into a pixel image with low resolution through up-sampling and deconvolution operations, wherein a full-connection layer and a pooling layer are not used in the whole process, and the given noise follows uniform distribution or normal distribution;

s22, judging whether the output of the generator G is real data or not by the discriminator D; the latter attempts to produce data that is closer to true, and correspondingly, the former attempts to better distinguish true data from generated data; therefore, the generator G and the discriminator D progress in the confrontation, and continue to confront after the progress, the data obtained by the generator G is more and more perfect and approaches to the real data, so that the image to be obtained can be generated; the optimization objective function is as follows:

where V (D, G) is a vector representation of generator G and discriminator D; x represents a real picture;

representing that the real picture obeys uniform distribution or normal distribution;

representing that the noise follows a uniform or normal distribution; d (x) represents the probability that the discriminator D discriminates whether the real picture is real or not; g (z) represents the picture generated by generator G; d (G (z)) represents the probability that the discriminator D judges whether or not the picture generated by the generator G is true.

Further, in step S3, a Principal Component Analysis (PCA) is used to perform dimensionality reduction and denoising on the extended data set, so as to reduce data redundancy, which includes the following contents:

a. changing the image into a matrix;

b. performing mean value removing operation on all the characteristics;

c. solving a covariance matrix;

d. solving the eigenvalue of covariance and corresponding eigenvector;

e. sorting the eigenvalues;

f. keeping the eigenvectors corresponding to the first N largest eigenvalues;

g. and projecting the original features into a new space constructed by the obtained N feature vectors to finally achieve the purpose of reducing the dimension.

Further, in step S3, the image in the extended data set is rotated, flipped, cropped, translated, and the brightness and contrast are adjusted to enhance the image data.

Further, in step S5, the convolution layer in DenseNet is composed of BN, Relu, and 1 × 1Conv, and with 1 × 1Conv, not only the dimensionality of the image can be reduced, but also the number of feature map outputs can be reduced; a Dense Block module in the DenseNet is an important component of the DenseNet, consists of BN, Relu, 1 × 1Conv and 3 × 3Conv, and is used for improving interlayer information flow, wherein Bottleneck is adopted in the Dense Block to reduce the calculation amount, and the original structure is increased by 1 × 1 Conv; the Transition layer in DenseNet is located between two dense blocks, used to change the size of the feature map, consisting of BN, Relu, 1 × 1Conv, and 2 × 2 average pooling.

Further, in step S5, the spatial attention mechanism first performs dimensionality reduction on the channel itself, obtains the maximum pooling result and the mean pooling result, respectively, then splices into a feature map, and then performs learning using a convolutional layer.

Further, in step S5, the channel attention mechanism SE-Net is composed of five parts, namely, global average pooling, full connection layer, Relu, full connection layer, and Sigmoid; the compression is realized by global average pooling, then channel statistics is generated, each two-dimensional characteristic channel is changed into a real number, the real number has a global receptive field to a certain extent, and the input characteristic channel is consistent with the output dimension; the excitation is to multiply the result obtained by compression with a first full connection layer, pass through a Relu layer, output dimension is unchanged, then multiply with a second full connection layer, and then pass through a Sigmoid function, and the excitation can help to capture the dependency relationship in the aspect of channels, thereby greatly reducing parameters and calculation amount.

Further, in step S5, the classification submodule is composed of a global average pooling, a full connection layer, BN and a Softmax function for reducing the type of the parameter and the image.

The second purpose of the invention is realized by the following technical scheme: an image classification system based on spatial and channel attention mechanisms, comprising:

the image acquisition module is used for acquiring image samples of the target types to be distinguished and classified and constructing corresponding sample data sets;

the data set processing module is used for extracting a certain number of images from the acquired sample data set to generate false samples, so that the sample data set is expanded to obtain an expanded data set; then, performing dimension reduction, denoising and data enhancement processing on the expansion data set; finally, dividing the processed expansion data set according to a proportion, dividing most of data into a training set, and dividing a small part of data into a test set;

the model training module is used for constructing an image classification network model first and then performing parameter adjustment training on the constructed image classification network model by using a training set; the constructed image classification network model consists of a DenseNet, a space attention mechanism, a channel attention mechanism SE-Net and classification submodules, wherein the DenseNet is a main network of the model and is used for extracting global features of images and multiplexing the features; embedding a space attention mechanism into the DenseNet, performing corresponding space transformation on the space domain information of the picture, only concerning interested positions, and extracting key information; the channel attention mechanism SE-Net inhibits or enhances different channels aiming at the importance of the characteristics by modeling the importance degree of each characteristic channel; the classification submodule uses Softmax as a core to accurately classify various images; the image classification network model acquires an interested part through a space attention mechanism, acquires the weight of the characteristics by utilizing a channel attention mechanism SE-Net, emphasizes useful information to inhibit useless information and inputs the useful information to a classification submodule;

and the image classification module is used for inputting the test set into the trained image classification network model to obtain the final classification result of the image.

Compared with the prior art, the invention has the following advantages and beneficial effects:

in order to improve the accuracy of image classification, the invention particularly designs an image classification network model combining a space and channel attention mechanism, adds the space attention mechanism, performs corresponding space transformation on the space domain information of the picture, only concerns the interested position, extracts the key information, simultaneously adds the channel attention mechanism Se-Net, can inhibit the uninteresting part aiming at the importance of the characteristic by modeling the importance degree of each characteristic channel, and enhances the interested part; in addition, a data set is expanded by utilizing a DCGAN generation technology aiming at image samples which are difficult to obtain, so that an image classification network model is trained better, the stability and robustness of the model performance are improved finally, the generalization capability of the model is improved, the overfitting phenomenon is relieved, the high accuracy of image classification is realized, and the method is applicable to various image classifications, has practical application value and is worthy of popularization.

Drawings

Fig. 1 is a sample diagram of the classified images in this embodiment.

Fig. 2 is a structural diagram of an image classification network model.

Fig. 3 is an architecture diagram of the system of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

The embodiment discloses a brain tumor image classification method based on a space and channel attention mechanism, which specifically comprises the following steps:

1) a total of 3540 samples of the target type were obtained and made up a corresponding sample dataset, as shown in fig. 1, where the samples included gliomas, meningiomas, pituitary tumors, and angioreticular tumors.

2) Extracting 120 images from the acquired sample data set, and generating a false sample by using a unsupervised learning deep convolution generation countermeasure network (DCGAN), so as to expand the acquired sample data set to obtain an expanded data set; the mutual game of the generator G and the discriminator D of the DCGAN finally achieves Nash equilibrium, so that the sample generated by the generator G can deceive the discriminator D, and the sample is finally judged to be true. The specific process of generating a dummy sample is as follows:

21) a generator G generates synthetic data from given noise (generally, uniform distribution or normal distribution), and finally converts high-level representation into a pixel image with low resolution through a series of up-sampling and deconvolution operations, wherein a full connection layer and a pooling layer are not used in the whole process;

22) the discriminator D discriminates whether the output of the generator G is real data; the latter attempts to produce data that is closer to true, and correspondingly, the former attempts to better distinguish true data from generated data; therefore, the generator G and the discriminator D progress in the confrontation, and continue to confront after the progress, the data obtained by the generator G is more and more perfect and approaches to the real data, so that the image to be obtained can be generated; the optimization objective function is as follows:

Random noise is input into the generator G, passes through a mapping and reshaping (equivalent to a full connection layer), then passes through a series of upsampling and deconvolution to generate false samples, and the size of each upsampled feature map is reduced by half by a channel and is doubled.

And the input of the discriminator D is an image, the image is processed by down sampling and full connection layers, then the image is input into a Sigmoid function, and finally the true and false probability is output, wherein if the true probability is 1, otherwise, the true probability is 0.

3) And processing the extended data set, including dimension reduction, denoising and data enhancement.

The Principal Component Analysis (PCA) method is adopted for dimensionality reduction and denoising, so that the redundancy of data is reduced, and the method comprises the following steps:

a. changing the image into a matrix;

b. performing mean value removing operation on all the characteristics;

c. solving a covariance matrix;

d. solving the eigenvalue of covariance and corresponding eigenvector;

e. sorting the eigenvalues;

f. keeping the eigenvectors corresponding to the first N largest eigenvalues;

Data enhancement is realized by rotating, turning, cutting, translating and adjusting brightness and contrast of the image.

4) And dividing the processed expansion data set according to the number of 7:3 to obtain a training set and a test set.

5) Inputting the images in the training set into the constructed image classification network model for parameter adjustment training, extracting the characteristics of the images, and finally storing the trained image classification network model; referring to fig. 2, the constructed image classification network model is composed of a DenseNet, a space attention mechanism, a channel attention mechanism SE-Net and classification sub-modules, wherein the DenseNet is a main network of the model and is used for extracting global features of images and multiplexing the features; embedding a space attention mechanism into the DenseNet, performing corresponding space transformation on the space domain information of the picture, only concerning interested positions, and extracting key information; the channel attention mechanism SE-Net inhibits or enhances different channels aiming at the importance of the characteristics by modeling the importance degree of each characteristic channel; the classification submodule uses Softmax as a core to accurately classify various images; the image classification network model acquires an interested part through a space attention mechanism, acquires the weight of the characteristics by utilizing a channel attention mechanism SE-Net, emphasizes useful information to inhibit useless information and inputs the useful information to a classification submodule;

the image classification network model is specific as follows:

first, the image is convolved by the convolution layer in DenseNet, and the number of output feature maps can be reduced as well as the dimension of the image by using a convolution kernel of 1 × 1.

Putting the convolved result into a Dense Block, reducing the calculation amount by adopting Bottleneck inside the Dense Block because the input of the back layer is very large, and increasing the original structure by 1 × 1 Conv; the operations of BN, Relu, 1 × 1Conv, and 3 × 3Conv are performed in the sense Block to improve the inter-layer information flow.

The output of the Dense Block Block is input to the Transition layer located between two Dense blocks, pooled by BN, Relu, 1 × 1Conv, and 2 × 2 averaging, used to change the size of the feature map.

And reducing the dimension of the channel by using a space attention mechanism, respectively obtaining maximum pooling results and mean pooling results, splicing into a characteristic diagram, and learning by using a convolution layer.

The channel attention mechanism consists of five parts, namely global average pooling, a full connection layer, Relu, a full connection layer and Sigmoid, and comprises the following two steps:

the compression (sequeneze) is used for compressing the features, the compression operation is realized through global average pooling, then channel statistics is generated, each two-dimensional feature channel is changed into a real number, the real number has a global receptive field to a certain extent, and the input feature channel is consistent with the output dimension. The expression is as follows:

in the formula, Z_cRepresents that U is_cResult after compression, F_sqRepresenting compression operation, c represents the number of channels, U_cThe C-th feature map, u, representing the spatial dimension H × W × C (length × width × height)_c(i, j) represents the c-th two-dimensional matrix with the size W multiplied by H in the three-dimensional matrix U, i represents W (width), and j represents H (height).

Excitation (Excitation) is to multiply the result obtained by compression with the first fully-connected layer, pass through a Relu layer, output unchanged dimension, then multiply with the second fully-connected layer, and then pass through a Sigmoid function. Therefore, the excitation can capture the dependence relation in the channel, and the parameters and the calculation amount are greatly reduced. The expression is as follows:

S＝F_ex(Z,W)＝σ(W₂δ(W₁z))

in the formula, S is used for describing the weight of c characteristic graphs in U, a parameter W generates the weight for each characteristic channel, the correlation among the characteristic channels is explicitly modeled, F_ex(Z, W) represents the actuation of the compressed results Z and W, W₁Represents the weight of the first fully-connected layer, W₂Represents the weight of the second fully-connected layer, W₁Is of the dimension of

W₂Is of the dimension of

r is a scaling parameter, c represents the number of channels, W₁z represents the first fully-connected layer, δ (·) represents Relu; then with W₂Multiplication is the second fully-connected layer, σ (·) stands for sigmoid.

Finally, the input channels are multiplied by their respective weights.

The classification submodule mainly comprises a global average pooling function, a full connection layer, BN and a Softmax function, and the class of each type of brain tumor is output by using the Softmax function, so that the classification submodule can reduce parameters and accurately distinguish the classes of the brain tumors.

6) And loading the images in the test set into a trained image classification network model for discrimination, wherein the result output by the model is the classification result.

Referring to fig. 3, the present embodiment also provides a brain tumor image classification system based on spatial and channel attention mechanism, including:

an image acquisition module: the method is used for acquiring MRI brain tumor images of target types, including 4 types of images of glioma, meningioma, pituitary tumor and angioreticular cell tumor, and corresponding sample data sets are formed.

A data set processing module: extracting a certain number of images from the acquired sample data set to generate false samples, so as to expand the sample data set to obtain an expanded data set; then, performing dimension reduction, denoising and data enhancement processing on the expansion data set; and finally, dividing the processed expansion data set according to a ratio of 7:3 to obtain a training set and a test set.

A model training module: firstly, an image classification network model is constructed, and then a training set is used for parameter adjustment training of the constructed image classification network model; the constructed image classification network model consists of a DenseNet, a space attention mechanism, a channel attention mechanism SE-Net and classification submodules, wherein the DenseNet is a main network of the model and is used for extracting global features of images and multiplexing the features; embedding a space attention mechanism into the DenseNet, performing corresponding space transformation on the space domain information of the picture, only concerning interested positions, and extracting key information; the channel attention mechanism SE-Net inhibits or enhances different channels aiming at the importance of the characteristics by modeling the importance degree of each characteristic channel; the classification submodule uses Softmax as a core to accurately classify various images; the image classification network model acquires an interested part through a space attention mechanism, acquires the weight of the characteristics by utilizing a channel attention mechanism SE-Net, emphasizes useful information to inhibit useless information and inputs the useful information to a classification submodule;

an image classification module: and inputting the test set into the trained image classification network model to obtain the final classification result of the MRI brain tumor image.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The image classification method based on the spatial and channel attention mechanism is characterized by comprising the following steps of:

2. The method for image classification based on the spatial and channel attention mechanism as claimed in claim 1, wherein in step S2, a certain number of image generation false samples are extracted from the acquired sample data set, an unsupervised deep convolution is used to generate a countermeasure network DCGAN, and nash equalization is finally achieved by mutual game between a generator G and a discriminator D of the DCGAN, so that the sample generated by the generator G can deceive the discriminator D and the sample is finally discriminated as true; the specific process of generating the false sample is as follows:

3. The method for image classification based on the spatial and channel attention mechanism as claimed in claim 1, wherein in step S3, the Principal Component Analysis (PCA) is used to perform dimensionality reduction and denoising on the extended data set, so as to reduce data redundancy, which includes the following steps:

a. changing the image into a matrix;

b. performing mean value removing operation on all the characteristics;

c. solving a covariance matrix;

d. solving the eigenvalue of covariance and corresponding eigenvector;

e. sorting the eigenvalues;

f. keeping the eigenvectors corresponding to the first N largest eigenvalues;

4. The method for image classification based on spatial and channel attention mechanism as claimed in claim 1, wherein in step S3, the image is rotated, flipped, cropped, translated, and adjusted in brightness and contrast to enhance the image data.

5. The image classification method based on the spatial and channel attention mechanism according to claim 1, wherein in step S5, the convolution layer in the DenseNet is composed of BN, Relu, and 1 × 1Conv, and the 1 × 1Conv can be used to reduce not only the dimension of the image but also the number of feature map outputs; a Dense Block module in the DenseNet is an important component of the DenseNet, consists of BN, Relu, 1 × 1Conv and 3 × 3Conv, and is used for improving interlayer information flow, wherein Bottleneck is adopted in the Dense Block to reduce the calculation amount, and the original structure is increased by 1 × 1 Conv; the Transition layer in DenseNet is located between two dense blocks, used to change the size of the feature map, consisting of BN, Relu, 1 × 1Conv, and 2 × 2 average pooling.

6. The image classification method based on the spatial and channel attention mechanism as claimed in claim 1, wherein in step S5, the spatial attention mechanism first performs dimensionality reduction on the channel itself, obtains the maximum pooling result and the mean pooling result respectively, then splices them into a feature map, and then performs learning using a convolutional layer.

7. The image classification method based on the spatial and channel attention mechanisms according to claim 1, characterized in that in step S5, the channel attention mechanism SE-Net is composed of five parts of global average pooling, full connection layer, Relu, full connection layer and Sigmoid; the compression is realized by global average pooling, then channel statistics is generated, each two-dimensional characteristic channel is changed into a real number, the real number has a global receptive field to a certain extent, and the input characteristic channel is consistent with the output dimension; the excitation is to multiply the result obtained by compression with a first full connection layer, pass through a Relu layer, output dimension is unchanged, then multiply with a second full connection layer, and then pass through a Sigmoid function, and the excitation can help to capture the dependency relationship in the aspect of channels, thereby greatly reducing parameters and calculation amount.

8. The image classification method based on the spatial and channel attention mechanism as claimed in claim 1, wherein in step S5, the classification submodule is composed of global average pooling, full connection layer, BN and Softmax functions for reducing the types of parameters and images.

9. An image classification system based on a spatial and channel attention mechanism, comprising: