CN115953622A

CN115953622A - Image classification method combining attention mutual exclusion regularization

Info

Publication number: CN115953622A
Application number: CN202211576853.0A
Authority: CN
Inventors: 陆靖桥; 宾炜; 麦广柱; 陶彦百; 罗志鹏; 陈银
Original assignee: Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Current assignee: Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-04-11
Anticipated expiration: 2042-12-07
Also published as: CN115953622B

Abstract

The invention discloses an image classification method combining attention mutual exclusion regularization, which relates to the technical field of image processing and comprises the following steps: importing an image; extracting image features; extracting image features comprises picking a plurality of attention channels with a specified number in an attention map; classifying the image features; classifying the image features includes updating model parameters with a final loss function of the image classification model. The method has the advantages that the method guides the model to pay attention to different regions of the image target and integrates information of different key regions by limiting the model to pay attention to a plurality of key candidate channels of which the attention channels are not overlapped with each other, so that the performance of the model is improved, and the accuracy of image classification is improved.

Description

Image classification method combining attention mutual exclusion regularization

Technical Field

The invention relates to the technical field of image processing, in particular to an image classification method combining attention mutual exclusion regularization.

Background

Image classification is an image processing method that distinguishes different types of targets according to different features reflected in different types of image information. In the current image classification method, a neural network model is usually adopted as a means, but in the training process of the neural network model, the attention of the neural network model to a plurality of target regions of a picture is still not ideal enough, for example, the second column of models in fig. 3 only pay attention to a single region, and the performance of the models needs to be optimized.

Disclosure of Invention

To address one or more of the above issues, an image classification method incorporating attention mutual exclusion regularization is provided.

According to one aspect of the invention, an image classification method combining attention mutual exclusion regularization is provided, which comprises the following steps:

importing an image;

extracting image features;

classifying the image features;

the classifying the image features includes updating model parameters using a final loss function of the image classification model. The beneficial effects are as follows: the training phase of the image classification method combined with attention mutual exclusion regularization is a process of constructing an image classification model at the same time, and in the testing phase, after the image is input into the model, the input image category can be known.

In some embodiments, importing the image includes inputting the image dataset for training into an image classification model. The imported training image dataset needs to comprise different classes of training images. The beneficial effects are as follows: and enabling the image classification model to learn different types of training image patterns in a training stage.

In some embodiments, the extracting the image features includes extracting the image features from a training image through a CNN network, thereby obtaining a feature map. The different classes of training images need to be distinguished according to the extracted image features. The beneficial effects are as follows: helping to distinguish between different classes of training images.

In some embodiments, the extracting the image features comprises turning the feature map into an attention map, and selecting a specified number of attention channels in the attention map. The attention map corresponds to which regions on the training image are of interest. The beneficial effects are as follows: the method is beneficial to embodying the attention to different areas of the training image.

In some embodiments, the extracting image features further comprises:

and judging whether the attention channel is a candidate key channel or not and limiting the candidate key attention channel to focus on the non-overlapping region. The beneficial effects are as follows: images can be better classified according to their features.

In some embodiments, the determining whether the attention channel is a candidate key channel includes:

and selecting a value as a threshold, and if the weight of the attention channel is greater than the threshold, judging the attention channel as a candidate key channel. The larger the weight corresponding to the attention channel is, the more important the training image area corresponding to the attention channel is. The beneficial effects are as follows: and the candidate key area is selected.

In some embodiments, the limiting attention channels focusing on non-overlapping regions comprises:

calculating an attention mutual exclusion regularized loss function, which is calculated according to the following formula:

wherein L is _AME For attention mutual exclusion of regular loss functions, M _c1 For attention to hot areaFirst attention channel of the drawing, M _c2 For the second attention channel of the attention heat area map, W is the width of the attention heat area map and H is the height of the attention heat area map. Different attention channels correspond to different regions of the image. The beneficial effects are as follows: different areas can be focused on, and information of each candidate key area can be integrated.

In some embodiments, the classifying the image features further comprises:

performing feature fusion operation on the attention diagram and the feature diagram to obtain final features of the image;

performing multi-classification operation on the final features of the image to obtain the category of the input image;

a final loss function of the image classification model is calculated. The beneficial effects are as follows: the images are classified according to the obtained image features.

In some embodiments, the final loss function of the image classification model includes an attention-exclusive regular loss function and a cross-entropy loss function. The beneficial effects are as follows: the calculation of the total loss function may be used to update parameters of the image classification model.

According to another aspect of the application, a storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the steps of a method of image classification in combination with an attention mutual exclusion regularization. The beneficial effects are as follows: the training images are classified using the computer program.

According to the image classification method combining attention mutual exclusion regularization, the model pays attention to different regions of an image target, a plurality of key candidate regions of which attention channels are not overlapped with each other are paid attention to by limiting the model, information of the candidate key regions is integrated, the performance of the model is improved, and the accuracy of image classification is further improved.

Drawings

FIG. 1 is a schematic diagram of a training process of an image classification method with attention mutual exclusion rule according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a testing process of an image classification method with attention mutual exclusion rule according to an embodiment of the present invention;

FIG. 3 is a comparison of the attention heat region map obtained for the same original image using the method of the present invention and the prior art.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

FIGS. 1-2 schematically illustrate an image classification method incorporating attention mutual exclusion regularization according to one embodiment of the present invention. As shown, the method includes:

importing an image;

extracting image features;

extracting image features comprises selecting a number of attention channels of a specified number in an attention map;

classifying the image features;

classifying the image features includes updating model parameters with a final loss function of the image classification model.

Importing images is to import the screened image data set into a training model. Categories of image datasets include, but are not limited to: images of automobiles, birds and airplanes.

The image feature extraction is to extract features of one image in the image data set through a backbone network to obtain a feature map. Alternatively, the backbone network may be a VGG network or a Resnet network or other CNN network.

In this embodiment, the training image size is 200 × 200, and since the image is in RGB format, the image is actually a 200 × 200 × 3 matrix in the model training process.

Because the matrix corresponding to the image has large calculation amount, the image is compressed into a feature map with small size through layer-upon-layer convolution operation. The matrix size corresponding to the compressed characteristic graph is the matrix size corresponding to the image

In this embodiment, the image is compressed into a 7 × 7 × 256 matrix feature map after the above processing.

Picking a number of attention channels that specify the number of attention channels in the attention map includes:

and (4) sequentially passing the feature map through a convolution layer and a RELU activation function layer to obtain an attention map. The convolution kernel of the convolutional layer has a step of 1, a number of 64, and a size of 3 × 3.

The drawing corresponds to a three-dimensional matrix having a length, a width, and a height. The number of channels is the height of the three-dimensional matrix corresponding to the attention map.

Several attention channels of a specified number are picked, with the individual attention channel weights of the attention map as probabilities. The number of attention channels selected is desirably less than the number of channels of the feature map. The number of attention channels ranges from 3 to 10, and the number must be chosen to be an integer. Each attention channel is embodied as a different region of the training image. The higher the attention channel weight, the more important the attention channel is.

Extracting the image features further includes determining whether the attention channel is a candidate key channel and limiting the candidate key attention channel to focus on regions that do not overlap with each other.

Determining whether the attention channel is a candidate key channel further comprises: and selecting a value from the set range as a threshold, and if the weight of the attention channel is greater than the threshold, judging the attention channel as a candidate key channel.

Specifically, the judgment is performed according to the following formula:

wherein A is _c For the c-th attention channel of attention map A, (i, j) is the position coordinate of the attention map, M _c Is a chart of attention heat area.

In this embodiment, the threshold value is selected from the set range of [0.5,0.8 ]. The weights for the attention channel have a maximum value of 1 and a minimum value of 0. And a larger value indicates that the attention channel corresponding to the matrix is more important. The attention channel actually corresponds to the candidate key region of the training image. Therefore, the specific value of the threshold value needs to be greater than 0.5. Alternatively, the threshold may be selected from [0.5,0.9 ].

Firstly, a random value is selected from the range [0.5,0.8] as a threshold value, then whether the weight of the attention channel is greater than the threshold value or not is judged, and if yes, the channel greater than the threshold value is a candidate key channel.

Limiting the attention channels to regions that do not overlap further comprises:

an attention mutual exclusion regularized loss function is calculated.

The attention mutual exclusion regularized loss function is calculated according to the following formula:

wherein L is _AME For attention mutual exclusion of regular loss functions, M _c1 Is the first attention channel of the attention thermal region map, M _c2 Is the second attention channel of the attention heat area map, W is the width of the attention heat area map, and H is the height of the attention heat area map. Attention mutual exclusion regularization loss requires consideration of both that the area difference of candidate key regions cannot be too large and that different candidate key regions should focus on different regions as much as possible.

Wherein the content of the first and second substances,

the areas of partial corresponding candidate key regions need to be consistent; m _c1 M _c2 The parts of the key area do not overlap with the candidate key area. The attention hot areas are mutually exclusive, namely the candidate key areas are not overlapped, so that a plurality of key parts of the target can be identified in a targeted manner, the category of the target image can be classified more efficiently and more accurately, and the generalization capability of the model can be improved.

An attention mutual exclusion regularized loss function may be used to update parameters of the image classification model. The attention mutual exclusion regular loss function is a degree value of non-overlapping areas on two attention channels, wherein the smaller the value is, the more the non-overlapping areas are represented, and the larger the value is, the more the overlapping areas are represented.

Classifying the image features further comprises:

and fusing the attention diagram and the feature diagram to obtain the final image feature.

In this embodiment, the method for obtaining the final image feature by fusing the attention map and the feature map is bilinear attention pooling in the prior art.

And performing multi-classification operation on the final features of the image to obtain the category of the input image. Wherein the multi-classification operation includes computing a cross-entropy loss function.

In this embodiment, a multi-classification operation is performed on the final features of the image, and the final features of the image are actually classified by a softmax classifier to obtain different classes.

A final loss function of the image classification model is calculated.

The cross entropy loss function is calculated according to the following formula:

wherein L is _ce For the cross entropy loss function, K is the number of classes of the target class image, K is the kth class, l _k The actual label representing the current target class image is a 0-1 code, p _k Representing the prediction probability of the current input image, is a decimal between 0-1.

And obtaining a final loss function of the image classification model by combining the attention mutual exclusion regular loss function and the cross entropy loss function.

The final loss function of the image classification model is calculated according to the following formula:

L＝αL _CE +βL _AME where L is the final loss function, α is the tuning parameter of the cross entropy loss function and β is the tuning parameter of the attention mutual exclusion regularized loss function. The adjusting parameter value corresponds to the weight of the loss function, and the larger the adjusting parameter value is, the more important the corresponding loss is. The final loss function of the image classification model can be used to update the imageParameters of the classification model.

The above steps are the training phase of the image classification model shown in fig. 1, and fig. 2 is the testing phase of the image classification model.

In the testing stage of the image classification model, only the images are input, the probability of belonging to each class can be obtained, and further the class of the input image is obtained.

The following table shows the test accuracy obtained from the Stanford cars test set, which is a public data set, as input to the model of the present invention and the prior art.

TABLE 1

Method	Accuracy of the Stanford Cars test set
		B-CNN	91.3
OSME	93.0
		WS-DAN	94.5
CSE	93.90
		Resnet50	90.9
The method of the invention	95.5

The data sources for the Stanford Cars test set were: krause J, stark M, deng J, et al.3d object representations for fine-grained harvesting [ C ]// Proceedings of the IEEE international conference on computer vision works.2013: 554-561.

The data sources for the B-CNN method in Table 1 are: lin T Y, royChowdhury A, maji S.Biliner CNN modules for fine-grained visual recognition [ C ]// Proceedings of the IEEE international conference on computer vision.2015:1449-1457.

The data sources for the OSME process in table 1 are: zhang Wenxuan, wuqi, fine-grained image classification based on multi-branch attention enhancement [ J ] computer science, 49 (5): 105-112.

The data sources for the WS-DAN method in Table 1 are: hu J, shen L, sun G.Squeeze-and-excitation networks [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2018:7132-7141.

The data sources for the CSE method in table 1 are: sun M, yuan Y, zhou F, et al. Multi-attribute multi-class constraint for fine-grained image recognition [ C ]// Proceedings of the European Conference on Computer Vision (ECCV). 2018.

The data sources for the Resnet50 method in table 1 are: he K, zhang X, ren S, et al. Deep residual learning for image recognition [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.

In this embodiment, the accuracy as shown in Table 1 is obtained by using the test set in the public data set Stanford cars as input for testing by the method of the present invention. As shown in table 1, the accuracy using the method of the present invention is higher than the accuracy using other methods in the prior art. Wherein the number of attention channels of the method of the invention is designated three.

Fig. 3 schematically shows a comparison of attention heat area maps derived for the same original image using the method of the present invention and other methods of the prior art. Meanwhile, as can be seen from fig. 3, the first column is the original target class image, the second column is the attention heat region map in the prior art, and the third column is the attention heat region map using the method of the present invention. It can be seen that the method of the present invention allows the model to focus on different regions of the target class image.

What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the inventive concept thereof, and these changes and modifications can be made without departing from the spirit and scope of the invention.

Claims

1. An image classification method combining attention mutual exclusion regularization is characterized by comprising the following steps:

importing an image;

extracting image features;

classifying the image features;

the classifying the image features includes updating model parameters using a final loss function of the image classification model.

2. The method of claim 1, wherein importing the image comprises inputting an image dataset for training into an image classification model.

3. The method as claimed in claim 1, wherein the extracting image features includes extracting image features from a training image through a CNN network, so as to obtain a feature map.

4. The regular image classification method combined with attention mutual exclusion as recited in claim 1, wherein the extracting image features comprises converting the feature map into an attention map, and selecting a specified number of attention channels in the attention map.

5. The method of image classification in conjunction with attention mutual exclusion regularization as claimed in claim 1 wherein said extracting image features further comprises:

and judging whether the attention channel is a candidate key channel or not and limiting the candidate key attention channel to focus on the non-overlapping region.

6. The method according to claim 5, wherein the determining whether the attention channel is a candidate key channel comprises:

and selecting a value as a threshold, and if the weight of the attention channel is greater than the threshold, judging the attention channel as a candidate key channel.

7. The method of claim 5, wherein the restricting attention channels to regions that do not overlap with each other comprises:

wherein L is _AME For attention mutual exclusion of regular loss functions, M _c1 Is the first attention channel of the attention thermal region map, M _c2 For the second attention channel of the attention heat area map, W is the width of the attention heat area map and H is the height of the attention heat area map.

8. The method of claim 4, wherein the classifying the image features comprises:

performing a feature fusion operation on the attention diagram and the feature diagram to obtain a final feature of the image;

a final loss function of the image classification model is calculated.

9. The method as claimed in claim 8, wherein the final loss function of the image classification model includes an attention mutual exclusion regularization loss function and a cross entropy loss function.

10. A storage medium having stored thereon a computer program, wherein the computer program is adapted to, when executed by a processor, perform the steps of a method for image classification in conjunction with attention mutual exclusion regularization as claimed in any one of claims 1 to 9.