CN115953622B - Image classification method combining attention mutual exclusion rules - Google Patents

Image classification method combining attention mutual exclusion rules Download PDF

Info

Publication number
CN115953622B
CN115953622B CN202211576853.0A CN202211576853A CN115953622B CN 115953622 B CN115953622 B CN 115953622B CN 202211576853 A CN202211576853 A CN 202211576853A CN 115953622 B CN115953622 B CN 115953622B
Authority
CN
China
Prior art keywords
attention
image
channel
loss function
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211576853.0A
Other languages
Chinese (zh)
Other versions
CN115953622A (en
Inventor
陆靖桥
宾炜
麦广柱
陶彦百
罗志鹏
陈银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Original Assignee
Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine filed Critical Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority to CN202211576853.0A priority Critical patent/CN115953622B/en
Publication of CN115953622A publication Critical patent/CN115953622A/en
Application granted granted Critical
Publication of CN115953622B publication Critical patent/CN115953622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image classification method combining attention mutual exclusion regularization, which relates to the technical field of image processing and comprises the following steps: importing an image; extracting image features; extracting image features includes selecting a number of attention channels of a specified number of attention attempts; classifying the image features; classifying the image features includes updating model parameters with a final loss function of the image classification model. The method has the advantages that the model is guided to focus on different areas of the image target by limiting a plurality of key candidate channels of which the attention channels are not overlapped with each other, and information of the different key areas is integrated, so that the performance of the model is improved, and the accuracy of image classification is improved.

Description

Image classification method combining attention mutual exclusion rules
Technical Field
The invention relates to the technical field of image processing, in particular to an image classification method combining attention mutual exclusion regularization.
Background
Image classification is an image processing method that distinguishes objects of different categories according to different features reflected in image information of different categories. In the current image classification method, a neural network model is generally adopted as a means, but in the training process of the neural network model, the attention of the neural network model to a plurality of target areas of a picture is still not ideal enough, for example, the second column model in fig. 3 only focuses on a single area and the like, and the performance of the model is required to be optimized.
Disclosure of Invention
To address one or more of the above problems, an image classification method that incorporates attention-mutex regularization is provided.
According to one aspect of the present invention, there is provided an image classification method in combination with attention mutual exclusion, including:
importing an image;
extracting image features;
classifying the image features;
the classifying the image features includes updating model parameters with a final loss function of the image classification model. The beneficial effects are as follows: the training stage of the image classification method combined with the attention mutual exclusion is also a process of constructing an image classification model, and the input image type can be known after the image is input into the model in the test stage.
In some embodiments, importing the image includes inputting an image dataset for training into an image classification model. The imported training image dataset needs to include different classes of training images. The beneficial effects are as follows: the image classification model learns training image modes of different categories in a training stage.
In some embodiments, extracting the image features includes extracting the image features from a training image over a CNN network, thereby obtaining a feature map. Since training images of different categories need to be distinguished according to the extracted image features. The beneficial effects are as follows: helping to distinguish between different classes of training images.
In some embodiments, extracting the image features includes first converting the feature map to an attention map, and selecting a specified number of attention channels of the attention map. Note that the force diagram corresponds to which regions on the training image are of interest. The beneficial effects are as follows: and the method is beneficial to reflecting the attention to different areas of the training image.
In some implementations, the extracting image features further includes:
judging whether the attention channel is a candidate key channel or not and limiting the candidate key attention channel to pay attention to the non-overlapping area. The beneficial effects are as follows: the images may be better classified according to image characteristics.
In some embodiments, the determining whether the attention channel is a candidate key channel comprises:
and selecting a value as a threshold value, and judging the attention channel as a candidate key channel if the weight of the attention channel is greater than the threshold value. The larger the weight corresponding to the attention channel, the more important the training image area corresponding to the attention channel. The beneficial effects are as follows: helping to select candidate key regions.
In some embodiments, the limiting the attention channel to areas that do not overlap each other comprises:
calculating an attention mutual exclusion canonical loss function, wherein the attention mutual exclusion canonical loss function is calculated according to the following formula:
wherein L is AME To the attention mutual exclusion canonical loss function, M c1 For the first attention channel of the attention heat area map, M c2 For the second attention channel of the attention deficit map, W is the width of the attention deficit map and H is the height of the attention deficit map. Different attention channels correspond to different areas of the image. The beneficial effects are as follows: different areas can be focused on, and information of each candidate key area can be integrated.
In some embodiments, the classifying the image features further comprises:
performing feature fusion operation on the attention map and the feature map to obtain final features of the image;
performing multi-classification operation on final features of the image to obtain categories of the input image;
a final loss function of the image classification model is calculated. The beneficial effects are as follows: the images are classified according to the obtained image features.
In some implementations, the final loss functions of the image classification model include an attention-mutex canonical loss function and a cross entropy loss function. The beneficial effects are as follows: the calculation of the total loss function may be used to update parameters of the image classification model.
According to another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of image classification in combination with attention mutual exclusion. The beneficial effects are as follows: the training images are classified using the computer program.
According to the image classification method combining the attention mutual exclusion rules, the models pay attention to different areas of the image target, the models pay attention to a plurality of key candidate areas with the attention channels not overlapped with each other, information of the candidate key areas is integrated, performance of the models is improved, and accuracy of image classification is improved.
Drawings
FIG. 1 is a schematic diagram of a training flow of an image classification method combining attention mutual exclusion according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a test flow of an image classification method combining attention mutual exclusion according to an embodiment of the invention;
FIG. 3 is a comparison of a plot of the thermal area of attention taken from the same original image using the method of the present invention with the prior art.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
1-2 schematically illustrate an image classification method incorporating attention-mutex in accordance with an embodiment of the invention. As shown, the method includes:
importing an image;
extracting image features;
extracting image features includes selecting a number of attention channels of a specified number of attention attempts;
classifying the image features;
classifying the image features includes updating model parameters with a final loss function of the image classification model.
Importing the image is importing the screened image data set into a training model. Categories of image datasets include, but are not limited to: images of automobiles, birds, and airplanes.
The image feature extraction is to extract the feature of one image in the image data set through a backbone network to obtain a feature map. Alternatively, the backbone network may be a VGG network or a Resnet network or other CNN network.
In this embodiment, the training image size is 200×200, and since the image is in RGB format, the image is actually a 200×200×3 matrix during the model training process.
Because the matrix calculation amount corresponding to the image is large, the image is compressed into a feature map with a small size through layer-by-layer convolution operation. The matrix size corresponding to the compressed characteristic diagram is the matrix size corresponding to the imageIn this embodiment, the image is compressed into a 7×7×256 matrix feature map after the above processing.
Choosing a specified number of attention channels in an attention attempt includes:
the feature map is sequentially subjected to a convolution layer and a RELU activation function layer to obtain attention and try to get an image. The convolution kernel of the convolution layer has a stride of 1, a number of 64, and a size of 3×3.
Note that the real thing that the force diagram corresponds to is a three-dimensional matrix, which has a length, a width, and a height. The number of attention map channels is the height of the three-dimensional matrix corresponding to the attention map.
Extracting the image features further includes determining whether the attention channel is a candidate key channel and limiting the candidate key attention channel to regions that do not overlap with each other.
Determining whether the attention channel is a candidate key channel further comprises: taking the weight of each attention channel of the attention map as probability, the maximum value of the weight is 1, the minimum value of the weight is 0, selecting a value from the set range as a threshold value, judging that the weight of the attention channel is larger than the threshold value, and selecting a specified number of candidate key attention channels. The number of selected attention channels needs to be smaller than the number of channels of the feature map. The number of the attention channels ranges from 3 to 10, and the selected value is an integer. Each of the attention channels is embodied as a different region of the training image. The greater the attention channel weight, the more important the attention channel.
Limiting candidate key attention channels to pay attention to the non-overlapping areas, and firstly obtaining an attention heat area diagram, specifically, judging according to the following formula:
wherein A is c To note the c-th candidate key attention channel of force diagram A, (i, j) is the position coordinates of force diagram, M c Is a plot of the attention heat region.
In the present embodiment, M c Calculating the threshold value theta c From the set range of [0.5,0.8]]Is selected. Attention channel actually corresponds to candidate key regions of the training image. Therefore, the specific value of the threshold value needs to be greater than 0.5. Alternatively, the threshold value may also be selected from [0.5,0.9 ]]Is selected.
Firstly, selecting a random value from the range [0.5,0.8] as a threshold value, and then judging whether the weight of the attention channel is larger than the threshold value, if so, indicating that the region larger than the threshold value is a candidate key region.
Limiting the attention channel from focusing on mutually non-overlapping areas further comprises:
and calculating the attention mutual exclusion canonical loss function.
The attention mutual exclusion canonical loss function is calculated according to the following formula:
wherein L is AME To the attention mutual exclusion canonical loss function, M c1 To pay attention to heatFirst attention channel of region map, M c2 For the second attention channel of the attention deficit map, W is the width of the attention deficit map and H is the height of the attention deficit map. The attention-exclusion regularization penalty requires that the candidate key region area difference be considered simultaneously not too large and that different candidate key regions be focused on different regions as much as possible.
Wherein,the areas of the partial corresponding candidate key areas need to be consistent; m is M c1 M c2 The parts corresponding to the different candidate key regions are not overlapped. Mutual exclusion among a plurality of attention areas, namely non-overlapping among candidate key areas, can specifically identify a plurality of key parts of a target, more effectively and accurately classify the category of the target class image, and is beneficial to improving the generalization capability of the model.
The attention-mutex canonical loss function may be used to update parameters of the image classification model. The attention mutual exclusion canonical loss function is a degree value of a non-overlapping region on any two attention channels, wherein a smaller value represents a non-overlapping region between different regions, and a larger value represents a overlapping region between different regions.
Classifying the image features further includes:
attention will be paid to the fusion of the force diagram and the feature diagram to obtain the final image features.
In this embodiment, the method used to fuse the attention map and the feature map to obtain the final image feature is bilinear attention pooling operation in the prior art.
And performing multi-classification operation on the final characteristics of the image to obtain the category of the input image. Wherein the multi-classification operation includes calculating a cross entropy loss function.
In this embodiment, a multi-classification operation is performed on the final features of the image, and the final features of the image are actually classified by a softmax classifier to obtain different classes.
A final loss function of the image classification model is calculated.
The cross entropy loss function is calculated according to the following formula:
wherein L is ce K is the number of categories of the target category image, K is the kth category, l k The actual label representing the current target class image is a 0-1 code, p k The prediction probability representing the current input image is a fraction between 0 and 1.
And combining the attention mutual exclusion regular loss function and the cross entropy loss function to obtain a final loss function of the image classification model.
The final loss function of the image classification model is calculated according to the following formula:
L=αL CE +βL AME where L is the final loss function, α is the adjustment parameter of the cross entropy loss function and β is the adjustment parameter of the attention-mutex canonical loss function. The larger the adjustment parameter value, the more important the corresponding loss is. The final loss function of the image classification model may be used to update parameters of the image classification model.
The above is a training stage of the image classification model shown in fig. 1, and fig. 2 is a testing stage of the image classification model.
In the test stage of the image classification model, only the image is input, the probability of each category can be obtained, and then the category to which the input image belongs is obtained.
The following table shows the test accuracy obtained from the published dataset Stanford cars test set as input to the model of the invention and prior art.
TABLE 1
Method Accuracy of Stanford Cars test set
B-CNN 91.3
OSME 93.0
WS-DAN 94.5
CSE 93.90
Resnet50 90.9
The method of the invention 95.5
The data sources for the Stanford cards test set were: krause J, stark M, deng J, et al 3d object representations for fine-grained categorization [ C ]// Proceedings of the IEEE international conference on computer vision workshops.2013:554-561.
The data sources for the B-CNN method in Table 1 are: lin T Y, royChordhury A, maji S. Bilinear CNN models for fine-grained visual recognition [ C ]// Proceedings of the IEEE international conference on computer vision 2015:1449-1457.
The data sources for the OSME method in table 1 are: zhang Wenxuan, wu Qin fine-grained image classification based on multi-branch attention enhancement [ J ]. Computer science, 49 (5): 105-112.
The data sources for the WS-DAN method in Table 1 are: hu J, shen L, sun G.squeeze-and-excitation networks [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognment.2018:7132-7141.
The data sources for the CSE method in table 1 are: sun M, yuan Y, zhou F, et al Multi-saturation Multi-class constraint for fine-grained image recognition [ C ]// Proceedings of the European Conference on Computer Vision (ECCV). 2018:805-821.
The data sources for the Resnet50 method in Table 1 are: he K, zhang X, ren S, et al deep residual learning for image recognition [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognment.2016:770-778.
In this embodiment, the method of the present invention is used to test the test set in the public data set Stanford cars as input, and the accuracy shown in table 1 is obtained. As shown in Table 1, the accuracy of the method of the present invention is higher than that of other methods of the prior art. Wherein the number of candidate critical attention channels of the method of the present invention is designated 3.
Fig. 3 schematically shows a comparison of a plot of the thermal area of attention taken from the same original image using the method of the present invention with other methods of the prior art. Meanwhile, as can be seen in fig. 3, the first column is an original object class image, the second column is an attention heat area map in the prior art, and the third column is an attention heat area map using the method of the present invention. It can be seen that the method of the present invention allows the model to focus on different regions of the object-class image.
What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims (8)

1. An image classification method combining attention mutual exclusion regularization is characterized by comprising the following steps:
importing an image;
extracting image features; extracting image features includes:
judging whether the attention channel is a candidate key channel or not and limiting the attention of the candidate key attention channel to the non-overlapping area;
judging whether the attention channel is a candidate key channel, comprising:
selecting a specified number of attention channels in an attention map according to the limit of candidate key areas which are not overlapped with each other;
limiting candidate critical attention channels to regions that do not overlap with each other includes:
according to a first attention channel and a second attention channel of the attention heat area diagram, limiting that the first attention channel and the second attention channel are not overlapped with each other, and calculating an attention mutual exclusion regularization loss function; the attention mutual exclusion canonical loss function is calculated according to the following formula:
wherein L is AME To the attention mutual exclusion canonical loss function, M c1 For the first attention channel of the attention heat area map, M c2 For the second attention channel of the attention heat area map, W is the width of the attention heat area map and H is the height of the attention heat area map;
classifying the image features;
the classifying of the image features includes updating model parameters using a final loss function of the image classification model; the final loss function includes an attention mutex canonical loss function.
2. An image classification method in combination with attention-mutex as claimed in claim 1 wherein said importing the image includes inputting an image dataset for training into an image classification model.
3. The method of claim 1, wherein extracting image features comprises extracting image features from a training image via a CNN network, thereby obtaining a feature map.
4. A method of classifying images in combination with attention mutual exclusion according to claim 3, wherein extracting image features includes first converting a feature map into an attention map, and selecting a specified number of attention channels in the attention map.
5. The method of claim 1, wherein determining whether the attention channel is a candidate key channel comprises:
and selecting a value as a threshold value, and judging the attention channel as a candidate key channel if the weight of the attention channel is greater than the threshold value.
6. The method of image classification in combination with attention-mutex as claimed in claim 4, wherein said classifying image features comprises:
performing feature fusion operation on the attention map and the feature map to obtain final features of the image;
performing multi-classification operation on final features of the image to obtain categories of the input image;
a final loss function of the image classification model is calculated.
7. The method of claim 6, wherein the final loss function of the image classification model comprises an attention-mutex-canonical loss function and a cross-entropy loss function.
8. A storage medium having stored thereon a computer program, which when executed by a processor implements the steps of a method of image classification in combination with attention-mutex as claimed in any of claims 1 to 7.
CN202211576853.0A 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules Active CN115953622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211576853.0A CN115953622B (en) 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211576853.0A CN115953622B (en) 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules

Publications (2)

Publication Number Publication Date
CN115953622A CN115953622A (en) 2023-04-11
CN115953622B true CN115953622B (en) 2024-01-30

Family

ID=87289932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211576853.0A Active CN115953622B (en) 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules

Country Status (1)

Country Link
CN (1) CN115953622B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110287836A (en) * 2019-06-14 2019-09-27 北京迈格威科技有限公司 Image classification method, device, computer equipment and storage medium
CN110458829A (en) * 2019-08-13 2019-11-15 腾讯医疗健康(深圳)有限公司 Image quality control method, device, equipment and storage medium based on artificial intelligence
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
WO2021143267A1 (en) * 2020-09-07 2021-07-22 平安科技(深圳)有限公司 Image detection-based fine-grained classification model processing method, and related devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110287836A (en) * 2019-06-14 2019-09-27 北京迈格威科技有限公司 Image classification method, device, computer equipment and storage medium
CN110458829A (en) * 2019-08-13 2019-11-15 腾讯医疗健康(深圳)有限公司 Image quality control method, device, equipment and storage medium based on artificial intelligence
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
WO2021143267A1 (en) * 2020-09-07 2021-07-22 平安科技(深圳)有限公司 Image detection-based fine-grained classification model processing method, and related devices

Also Published As

Publication number Publication date
CN115953622A (en) 2023-04-11

Similar Documents

Publication Publication Date Title
CN111462126B (en) Semantic image segmentation method and system based on edge enhancement
CN112396002B (en) SE-YOLOv 3-based lightweight remote sensing target detection method
CN109840531B (en) Method and device for training multi-label classification model
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN110837836A (en) Semi-supervised semantic segmentation method based on maximized confidence
CN110322445B (en) Semantic segmentation method based on maximum prediction and inter-label correlation loss function
CN105469080B (en) A kind of facial expression recognizing method
CN111783819B (en) Improved target detection method based on region of interest training on small-scale data set
CN111986126B (en) Multi-target detection method based on improved VGG16 network
CN110826609B (en) Double-current feature fusion image identification method based on reinforcement learning
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
CN112801097B (en) Training method and device of text detection model and readable storage medium
CN108596240B (en) Image semantic segmentation method based on discriminant feature network
CN112200186B (en) Vehicle logo identification method based on improved YOLO_V3 model
CN111461039A (en) Landmark identification method based on multi-scale feature fusion
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN116645592B (en) Crack detection method based on image processing and storage medium
CN113420669B (en) Document layout analysis method and system based on multi-scale training and cascade detection
CN111126401A (en) License plate character recognition method based on context information
CN107958219A (en) Image scene classification method based on multi-model and Analysis On Multi-scale Features
CN112329771A (en) Building material sample identification method based on deep learning
CN110533068B (en) Image object identification method based on classification convolutional neural network
CN111738237B (en) Heterogeneous convolution-based target detection method for multi-core iteration RPN
CN111582057B (en) Face verification method based on local receptive field
CN115953622B (en) Image classification method combining attention mutual exclusion rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant