CN115953622B

CN115953622B - Image classification method combining attention mutual exclusion rules

Info

Publication number: CN115953622B
Application number: CN202211576853.0A
Authority: CN
Inventors: 陆靖桥; 宾炜; 麦广柱; 陶彦百; 罗志鹏; 陈银
Original assignee: Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Current assignee: Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2024-01-30
Anticipated expiration: 2042-12-07
Also published as: CN115953622A

Abstract

The invention discloses an image classification method combining attention mutual exclusion regularization, which relates to the technical field of image processing and comprises the following steps: importing an image; extracting image features; extracting image features includes selecting a number of attention channels of a specified number of attention attempts; classifying the image features; classifying the image features includes updating model parameters with a final loss function of the image classification model. The method has the advantages that the model is guided to focus on different areas of the image target by limiting a plurality of key candidate channels of which the attention channels are not overlapped with each other, and information of the different key areas is integrated, so that the performance of the model is improved, and the accuracy of image classification is improved.

Description

Image classification method combining attention mutual exclusion rules

Technical Field

The invention relates to the technical field of image processing, in particular to an image classification method combining attention mutual exclusion regularization.

Background

Image classification is an image processing method that distinguishes objects of different categories according to different features reflected in image information of different categories. In the current image classification method, a neural network model is generally adopted as a means, but in the training process of the neural network model, the attention of the neural network model to a plurality of target areas of a picture is still not ideal enough, for example, the second column model in fig. 3 only focuses on a single area and the like, and the performance of the model is required to be optimized.

Disclosure of Invention

To address one or more of the above problems, an image classification method that incorporates attention-mutex regularization is provided.

According to one aspect of the present invention, there is provided an image classification method in combination with attention mutual exclusion, including:

importing an image;

extracting image features;

classifying the image features;

the classifying the image features includes updating model parameters with a final loss function of the image classification model. The beneficial effects are as follows: the training stage of the image classification method combined with the attention mutual exclusion is also a process of constructing an image classification model, and the input image type can be known after the image is input into the model in the test stage.

In some embodiments, importing the image includes inputting an image dataset for training into an image classification model. The imported training image dataset needs to include different classes of training images. The beneficial effects are as follows: the image classification model learns training image modes of different categories in a training stage.

In some embodiments, extracting the image features includes extracting the image features from a training image over a CNN network, thereby obtaining a feature map. Since training images of different categories need to be distinguished according to the extracted image features. The beneficial effects are as follows: helping to distinguish between different classes of training images.

In some embodiments, extracting the image features includes first converting the feature map to an attention map, and selecting a specified number of attention channels of the attention map. Note that the force diagram corresponds to which regions on the training image are of interest. The beneficial effects are as follows: and the method is beneficial to reflecting the attention to different areas of the training image.

In some implementations, the extracting image features further includes:

judging whether the attention channel is a candidate key channel or not and limiting the candidate key attention channel to pay attention to the non-overlapping area. The beneficial effects are as follows: the images may be better classified according to image characteristics.

In some embodiments, the determining whether the attention channel is a candidate key channel comprises:

and selecting a value as a threshold value, and judging the attention channel as a candidate key channel if the weight of the attention channel is greater than the threshold value. The larger the weight corresponding to the attention channel, the more important the training image area corresponding to the attention channel. The beneficial effects are as follows: helping to select candidate key regions.

In some embodiments, the limiting the attention channel to areas that do not overlap each other comprises:

calculating an attention mutual exclusion canonical loss function, wherein the attention mutual exclusion canonical loss function is calculated according to the following formula:

wherein L is _AME To the attention mutual exclusion canonical loss function, M _c1 For the first attention channel of the attention heat area map, M _c2 For the second attention channel of the attention deficit map, W is the width of the attention deficit map and H is the height of the attention deficit map. Different attention channels correspond to different areas of the image. The beneficial effects are as follows: different areas can be focused on, and information of each candidate key area can be integrated.

In some embodiments, the classifying the image features further comprises:

performing feature fusion operation on the attention map and the feature map to obtain final features of the image;

performing multi-classification operation on final features of the image to obtain categories of the input image;

a final loss function of the image classification model is calculated. The beneficial effects are as follows: the images are classified according to the obtained image features.

In some implementations, the final loss functions of the image classification model include an attention-mutex canonical loss function and a cross entropy loss function. The beneficial effects are as follows: the calculation of the total loss function may be used to update parameters of the image classification model.

According to another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of image classification in combination with attention mutual exclusion. The beneficial effects are as follows: the training images are classified using the computer program.

According to the image classification method combining the attention mutual exclusion rules, the models pay attention to different areas of the image target, the models pay attention to a plurality of key candidate areas with the attention channels not overlapped with each other, information of the candidate key areas is integrated, performance of the models is improved, and accuracy of image classification is improved.

Drawings

FIG. 1 is a schematic diagram of a training flow of an image classification method combining attention mutual exclusion according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a test flow of an image classification method combining attention mutual exclusion according to an embodiment of the invention;

FIG. 3 is a comparison of a plot of the thermal area of attention taken from the same original image using the method of the present invention with the prior art.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

1-2 schematically illustrate an image classification method incorporating attention-mutex in accordance with an embodiment of the invention. As shown, the method includes:

importing an image;

extracting image features;

extracting image features includes selecting a number of attention channels of a specified number of attention attempts;

classifying the image features;

classifying the image features includes updating model parameters with a final loss function of the image classification model.

Importing the image is importing the screened image data set into a training model. Categories of image datasets include, but are not limited to: images of automobiles, birds, and airplanes.

The image feature extraction is to extract the feature of one image in the image data set through a backbone network to obtain a feature map. Alternatively, the backbone network may be a VGG network or a Resnet network or other CNN network.

In this embodiment, the training image size is 200×200, and since the image is in RGB format, the image is actually a 200×200×3 matrix during the model training process.

Because the matrix calculation amount corresponding to the image is large, the image is compressed into a feature map with a small size through layer-by-layer convolution operation. The matrix size corresponding to the compressed characteristic diagram is the matrix size corresponding to the imageIn this embodiment, the image is compressed into a 7×7×256 matrix feature map after the above processing.

Choosing a specified number of attention channels in an attention attempt includes:

the feature map is sequentially subjected to a convolution layer and a RELU activation function layer to obtain attention and try to get an image. The convolution kernel of the convolution layer has a stride of 1, a number of 64, and a size of 3×3.

Note that the real thing that the force diagram corresponds to is a three-dimensional matrix, which has a length, a width, and a height. The number of attention map channels is the height of the three-dimensional matrix corresponding to the attention map.

Extracting the image features further includes determining whether the attention channel is a candidate key channel and limiting the candidate key attention channel to regions that do not overlap with each other.

Determining whether the attention channel is a candidate key channel further comprises: taking the weight of each attention channel of the attention map as probability, the maximum value of the weight is 1, the minimum value of the weight is 0, selecting a value from the set range as a threshold value, judging that the weight of the attention channel is larger than the threshold value, and selecting a specified number of candidate key attention channels. The number of selected attention channels needs to be smaller than the number of channels of the feature map. The number of the attention channels ranges from 3 to 10, and the selected value is an integer. Each of the attention channels is embodied as a different region of the training image. The greater the attention channel weight, the more important the attention channel.

Limiting candidate key attention channels to pay attention to the non-overlapping areas, and firstly obtaining an attention heat area diagram, specifically, judging according to the following formula:

wherein A is _c To note the c-th candidate key attention channel of force diagram A, (i, j) is the position coordinates of force diagram, M _c Is a plot of the attention heat region.

In the present embodiment, M _c Calculating the threshold value theta _c From the set range of [0.5,0.8]]Is selected. Attention channel actually corresponds to candidate key regions of the training image. Therefore, the specific value of the threshold value needs to be greater than 0.5. Alternatively, the threshold value may also be selected from [0.5,0.9 ]]Is selected.

Firstly, selecting a random value from the range [0.5,0.8] as a threshold value, and then judging whether the weight of the attention channel is larger than the threshold value, if so, indicating that the region larger than the threshold value is a candidate key region.

Limiting the attention channel from focusing on mutually non-overlapping areas further comprises:

and calculating the attention mutual exclusion canonical loss function.

The attention mutual exclusion canonical loss function is calculated according to the following formula:

wherein L is _AME To the attention mutual exclusion canonical loss function, M _c1 To pay attention to heatFirst attention channel of region map, M _c2 For the second attention channel of the attention deficit map, W is the width of the attention deficit map and H is the height of the attention deficit map. The attention-exclusion regularization penalty requires that the candidate key region area difference be considered simultaneously not too large and that different candidate key regions be focused on different regions as much as possible.

Wherein,the areas of the partial corresponding candidate key areas need to be consistent; m is M _c1 M _c2 The parts corresponding to the different candidate key regions are not overlapped. Mutual exclusion among a plurality of attention areas, namely non-overlapping among candidate key areas, can specifically identify a plurality of key parts of a target, more effectively and accurately classify the category of the target class image, and is beneficial to improving the generalization capability of the model.

The attention-mutex canonical loss function may be used to update parameters of the image classification model. The attention mutual exclusion canonical loss function is a degree value of a non-overlapping region on any two attention channels, wherein a smaller value represents a non-overlapping region between different regions, and a larger value represents a overlapping region between different regions.

Classifying the image features further includes:

attention will be paid to the fusion of the force diagram and the feature diagram to obtain the final image features.

In this embodiment, the method used to fuse the attention map and the feature map to obtain the final image feature is bilinear attention pooling operation in the prior art.

And performing multi-classification operation on the final characteristics of the image to obtain the category of the input image. Wherein the multi-classification operation includes calculating a cross entropy loss function.

In this embodiment, a multi-classification operation is performed on the final features of the image, and the final features of the image are actually classified by a softmax classifier to obtain different classes.

A final loss function of the image classification model is calculated.

The cross entropy loss function is calculated according to the following formula:

wherein L is _ce K is the number of categories of the target category image, K is the kth category, l _k The actual label representing the current target class image is a 0-1 code, p _k The prediction probability representing the current input image is a fraction between 0 and 1.

And combining the attention mutual exclusion regular loss function and the cross entropy loss function to obtain a final loss function of the image classification model.

The final loss function of the image classification model is calculated according to the following formula:

L＝αL _CE +βL _AME where L is the final loss function, α is the adjustment parameter of the cross entropy loss function and β is the adjustment parameter of the attention-mutex canonical loss function. The larger the adjustment parameter value, the more important the corresponding loss is. The final loss function of the image classification model may be used to update parameters of the image classification model.

The above is a training stage of the image classification model shown in fig. 1, and fig. 2 is a testing stage of the image classification model.

In the test stage of the image classification model, only the image is input, the probability of each category can be obtained, and then the category to which the input image belongs is obtained.

The following table shows the test accuracy obtained from the published dataset Stanford cars test set as input to the model of the invention and prior art.

TABLE 1

Method	Accuracy of Stanford Cars test set
		B-CNN	91.3
OSME	93.0
		WS-DAN	94.5
CSE	93.90
		Resnet50	90.9
The method of the invention	95.5

The data sources for the Stanford cards test set were: krause J, stark M, deng J, et al 3d object representations for fine-grained categorization [ C ]// Proceedings of the IEEE international conference on computer vision workshops.2013:554-561.

The data sources for the B-CNN method in Table 1 are: lin T Y, royChordhury A, maji S. Bilinear CNN models for fine-grained visual recognition [ C ]// Proceedings of the IEEE international conference on computer vision 2015:1449-1457.

The data sources for the OSME method in table 1 are: zhang Wenxuan, wu Qin fine-grained image classification based on multi-branch attention enhancement [ J ]. Computer science, 49 (5): 105-112.

The data sources for the WS-DAN method in Table 1 are: hu J, shen L, sun G.squeeze-and-excitation networks [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognment.2018:7132-7141.

The data sources for the CSE method in table 1 are: sun M, yuan Y, zhou F, et al Multi-saturation Multi-class constraint for fine-grained image recognition [ C ]// Proceedings of the European Conference on Computer Vision (ECCV). 2018:805-821.

The data sources for the Resnet50 method in Table 1 are: he K, zhang X, ren S, et al deep residual learning for image recognition [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognment.2016:770-778.

In this embodiment, the method of the present invention is used to test the test set in the public data set Stanford cars as input, and the accuracy shown in table 1 is obtained. As shown in Table 1, the accuracy of the method of the present invention is higher than that of other methods of the prior art. Wherein the number of candidate critical attention channels of the method of the present invention is designated 3.

Fig. 3 schematically shows a comparison of a plot of the thermal area of attention taken from the same original image using the method of the present invention with other methods of the prior art. Meanwhile, as can be seen in fig. 3, the first column is an original object class image, the second column is an attention heat area map in the prior art, and the third column is an attention heat area map using the method of the present invention. It can be seen that the method of the present invention allows the model to focus on different regions of the object-class image.

What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims

1. An image classification method combining attention mutual exclusion regularization is characterized by comprising the following steps:

importing an image;

extracting image features; extracting image features includes:

judging whether the attention channel is a candidate key channel or not and limiting the attention of the candidate key attention channel to the non-overlapping area;

judging whether the attention channel is a candidate key channel, comprising:

selecting a specified number of attention channels in an attention map according to the limit of candidate key areas which are not overlapped with each other;

limiting candidate critical attention channels to regions that do not overlap with each other includes:

according to a first attention channel and a second attention channel of the attention heat area diagram, limiting that the first attention channel and the second attention channel are not overlapped with each other, and calculating an attention mutual exclusion regularization loss function; the attention mutual exclusion canonical loss function is calculated according to the following formula:

wherein L is _AME To the attention mutual exclusion canonical loss function, M _c1 For the first attention channel of the attention heat area map, M _c2 For the second attention channel of the attention heat area map, W is the width of the attention heat area map and H is the height of the attention heat area map;

classifying the image features;

the classifying of the image features includes updating model parameters using a final loss function of the image classification model; the final loss function includes an attention mutex canonical loss function.

2. An image classification method in combination with attention-mutex as claimed in claim 1 wherein said importing the image includes inputting an image dataset for training into an image classification model.

3. The method of claim 1, wherein extracting image features comprises extracting image features from a training image via a CNN network, thereby obtaining a feature map.

4. A method of classifying images in combination with attention mutual exclusion according to claim 3, wherein extracting image features includes first converting a feature map into an attention map, and selecting a specified number of attention channels in the attention map.

5. The method of claim 1, wherein determining whether the attention channel is a candidate key channel comprises:

and selecting a value as a threshold value, and judging the attention channel as a candidate key channel if the weight of the attention channel is greater than the threshold value.

6. The method of image classification in combination with attention-mutex as claimed in claim 4, wherein said classifying image features comprises:

a final loss function of the image classification model is calculated.

7. The method of claim 6, wherein the final loss function of the image classification model comprises an attention-mutex-canonical loss function and a cross-entropy loss function.

8. A storage medium having stored thereon a computer program, which when executed by a processor implements the steps of a method of image classification in combination with attention-mutex as claimed in any of claims 1 to 7.