CN115953622A - Image classification method combining attention mutual exclusion regularization - Google Patents

Image classification method combining attention mutual exclusion regularization Download PDF

Info

Publication number
CN115953622A
CN115953622A CN202211576853.0A CN202211576853A CN115953622A CN 115953622 A CN115953622 A CN 115953622A CN 202211576853 A CN202211576853 A CN 202211576853A CN 115953622 A CN115953622 A CN 115953622A
Authority
CN
China
Prior art keywords
attention
image
channel
mutual exclusion
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211576853.0A
Other languages
Chinese (zh)
Other versions
CN115953622B (en
Inventor
陆靖桥
宾炜
麦广柱
陶彦百
罗志鹏
陈银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Original Assignee
Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine filed Critical Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority to CN202211576853.0A priority Critical patent/CN115953622B/en
Publication of CN115953622A publication Critical patent/CN115953622A/en
Application granted granted Critical
Publication of CN115953622B publication Critical patent/CN115953622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image classification method combining attention mutual exclusion regularization, which relates to the technical field of image processing and comprises the following steps: importing an image; extracting image features; extracting image features comprises picking a plurality of attention channels with a specified number in an attention map; classifying the image features; classifying the image features includes updating model parameters with a final loss function of the image classification model. The method has the advantages that the method guides the model to pay attention to different regions of the image target and integrates information of different key regions by limiting the model to pay attention to a plurality of key candidate channels of which the attention channels are not overlapped with each other, so that the performance of the model is improved, and the accuracy of image classification is improved.

Description

Image classification method combining attention mutual exclusion regularization
Technical Field
The invention relates to the technical field of image processing, in particular to an image classification method combining attention mutual exclusion regularization.
Background
Image classification is an image processing method that distinguishes different types of targets according to different features reflected in different types of image information. In the current image classification method, a neural network model is usually adopted as a means, but in the training process of the neural network model, the attention of the neural network model to a plurality of target regions of a picture is still not ideal enough, for example, the second column of models in fig. 3 only pay attention to a single region, and the performance of the models needs to be optimized.
Disclosure of Invention
To address one or more of the above issues, an image classification method incorporating attention mutual exclusion regularization is provided.
According to one aspect of the invention, an image classification method combining attention mutual exclusion regularization is provided, which comprises the following steps:
importing an image;
extracting image features;
classifying the image features;
the classifying the image features includes updating model parameters using a final loss function of the image classification model. The beneficial effects are as follows: the training phase of the image classification method combined with attention mutual exclusion regularization is a process of constructing an image classification model at the same time, and in the testing phase, after the image is input into the model, the input image category can be known.
In some embodiments, importing the image includes inputting the image dataset for training into an image classification model. The imported training image dataset needs to comprise different classes of training images. The beneficial effects are as follows: and enabling the image classification model to learn different types of training image patterns in a training stage.
In some embodiments, the extracting the image features includes extracting the image features from a training image through a CNN network, thereby obtaining a feature map. The different classes of training images need to be distinguished according to the extracted image features. The beneficial effects are as follows: helping to distinguish between different classes of training images.
In some embodiments, the extracting the image features comprises turning the feature map into an attention map, and selecting a specified number of attention channels in the attention map. The attention map corresponds to which regions on the training image are of interest. The beneficial effects are as follows: the method is beneficial to embodying the attention to different areas of the training image.
In some embodiments, the extracting image features further comprises:
and judging whether the attention channel is a candidate key channel or not and limiting the candidate key attention channel to focus on the non-overlapping region. The beneficial effects are as follows: images can be better classified according to their features.
In some embodiments, the determining whether the attention channel is a candidate key channel includes:
and selecting a value as a threshold, and if the weight of the attention channel is greater than the threshold, judging the attention channel as a candidate key channel. The larger the weight corresponding to the attention channel is, the more important the training image area corresponding to the attention channel is. The beneficial effects are as follows: and the candidate key area is selected.
In some embodiments, the limiting attention channels focusing on non-overlapping regions comprises:
calculating an attention mutual exclusion regularized loss function, which is calculated according to the following formula:
Figure BDA0003985654950000021
wherein L is AME For attention mutual exclusion of regular loss functions, M c1 For attention to hot areaFirst attention channel of the drawing, M c2 For the second attention channel of the attention heat area map, W is the width of the attention heat area map and H is the height of the attention heat area map. Different attention channels correspond to different regions of the image. The beneficial effects are as follows: different areas can be focused on, and information of each candidate key area can be integrated.
In some embodiments, the classifying the image features further comprises:
performing feature fusion operation on the attention diagram and the feature diagram to obtain final features of the image;
performing multi-classification operation on the final features of the image to obtain the category of the input image;
a final loss function of the image classification model is calculated. The beneficial effects are as follows: the images are classified according to the obtained image features.
In some embodiments, the final loss function of the image classification model includes an attention-exclusive regular loss function and a cross-entropy loss function. The beneficial effects are as follows: the calculation of the total loss function may be used to update parameters of the image classification model.
According to another aspect of the application, a storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the steps of a method of image classification in combination with an attention mutual exclusion regularization. The beneficial effects are as follows: the training images are classified using the computer program.
According to the image classification method combining attention mutual exclusion regularization, the model pays attention to different regions of an image target, a plurality of key candidate regions of which attention channels are not overlapped with each other are paid attention to by limiting the model, information of the candidate key regions is integrated, the performance of the model is improved, and the accuracy of image classification is further improved.
Drawings
FIG. 1 is a schematic diagram of a training process of an image classification method with attention mutual exclusion rule according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a testing process of an image classification method with attention mutual exclusion rule according to an embodiment of the present invention;
FIG. 3 is a comparison of the attention heat region map obtained for the same original image using the method of the present invention and the prior art.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
FIGS. 1-2 schematically illustrate an image classification method incorporating attention mutual exclusion regularization according to one embodiment of the present invention. As shown, the method includes:
importing an image;
extracting image features;
extracting image features comprises selecting a number of attention channels of a specified number in an attention map;
classifying the image features;
classifying the image features includes updating model parameters with a final loss function of the image classification model.
Importing images is to import the screened image data set into a training model. Categories of image datasets include, but are not limited to: images of automobiles, birds and airplanes.
The image feature extraction is to extract features of one image in the image data set through a backbone network to obtain a feature map. Alternatively, the backbone network may be a VGG network or a Resnet network or other CNN network.
In this embodiment, the training image size is 200 × 200, and since the image is in RGB format, the image is actually a 200 × 200 × 3 matrix in the model training process.
Because the matrix corresponding to the image has large calculation amount, the image is compressed into a feature map with small size through layer-upon-layer convolution operation. The matrix size corresponding to the compressed characteristic graph is the matrix size corresponding to the image
Figure BDA0003985654950000031
In this embodiment, the image is compressed into a 7 × 7 × 256 matrix feature map after the above processing.
Picking a number of attention channels that specify the number of attention channels in the attention map includes:
and (4) sequentially passing the feature map through a convolution layer and a RELU activation function layer to obtain an attention map. The convolution kernel of the convolutional layer has a step of 1, a number of 64, and a size of 3 × 3.
The drawing corresponds to a three-dimensional matrix having a length, a width, and a height. The number of channels is the height of the three-dimensional matrix corresponding to the attention map.
Several attention channels of a specified number are picked, with the individual attention channel weights of the attention map as probabilities. The number of attention channels selected is desirably less than the number of channels of the feature map. The number of attention channels ranges from 3 to 10, and the number must be chosen to be an integer. Each attention channel is embodied as a different region of the training image. The higher the attention channel weight, the more important the attention channel is.
Extracting the image features further includes determining whether the attention channel is a candidate key channel and limiting the candidate key attention channel to focus on regions that do not overlap with each other.
Determining whether the attention channel is a candidate key channel further comprises: and selecting a value from the set range as a threshold, and if the weight of the attention channel is greater than the threshold, judging the attention channel as a candidate key channel.
Specifically, the judgment is performed according to the following formula:
Figure BDA0003985654950000041
wherein A is c For the c-th attention channel of attention map A, (i, j) is the position coordinate of the attention map, M c Is a chart of attention heat area.
In this embodiment, the threshold value is selected from the set range of [0.5,0.8 ]. The weights for the attention channel have a maximum value of 1 and a minimum value of 0. And a larger value indicates that the attention channel corresponding to the matrix is more important. The attention channel actually corresponds to the candidate key region of the training image. Therefore, the specific value of the threshold value needs to be greater than 0.5. Alternatively, the threshold may be selected from [0.5,0.9 ].
Firstly, a random value is selected from the range [0.5,0.8] as a threshold value, then whether the weight of the attention channel is greater than the threshold value or not is judged, and if yes, the channel greater than the threshold value is a candidate key channel.
Limiting the attention channels to regions that do not overlap further comprises:
an attention mutual exclusion regularized loss function is calculated.
The attention mutual exclusion regularized loss function is calculated according to the following formula:
Figure BDA0003985654950000042
wherein L is AME For attention mutual exclusion of regular loss functions, M c1 Is the first attention channel of the attention thermal region map, M c2 Is the second attention channel of the attention heat area map, W is the width of the attention heat area map, and H is the height of the attention heat area map. Attention mutual exclusion regularization loss requires consideration of both that the area difference of candidate key regions cannot be too large and that different candidate key regions should focus on different regions as much as possible.
Wherein the content of the first and second substances,
Figure BDA0003985654950000051
the areas of partial corresponding candidate key regions need to be consistent; m c1 M c2 The parts of the key area do not overlap with the candidate key area. The attention hot areas are mutually exclusive, namely the candidate key areas are not overlapped, so that a plurality of key parts of the target can be identified in a targeted manner, the category of the target image can be classified more efficiently and more accurately, and the generalization capability of the model can be improved.
An attention mutual exclusion regularized loss function may be used to update parameters of the image classification model. The attention mutual exclusion regular loss function is a degree value of non-overlapping areas on two attention channels, wherein the smaller the value is, the more the non-overlapping areas are represented, and the larger the value is, the more the overlapping areas are represented.
Classifying the image features further comprises:
and fusing the attention diagram and the feature diagram to obtain the final image feature.
In this embodiment, the method for obtaining the final image feature by fusing the attention map and the feature map is bilinear attention pooling in the prior art.
And performing multi-classification operation on the final features of the image to obtain the category of the input image. Wherein the multi-classification operation includes computing a cross-entropy loss function.
In this embodiment, a multi-classification operation is performed on the final features of the image, and the final features of the image are actually classified by a softmax classifier to obtain different classes.
A final loss function of the image classification model is calculated.
The cross entropy loss function is calculated according to the following formula:
Figure BDA0003985654950000052
wherein L is ce For the cross entropy loss function, K is the number of classes of the target class image, K is the kth class, l k The actual label representing the current target class image is a 0-1 code, p k Representing the prediction probability of the current input image, is a decimal between 0-1.
And obtaining a final loss function of the image classification model by combining the attention mutual exclusion regular loss function and the cross entropy loss function.
The final loss function of the image classification model is calculated according to the following formula:
L=αL CE +βL AME where L is the final loss function, α is the tuning parameter of the cross entropy loss function and β is the tuning parameter of the attention mutual exclusion regularized loss function. The adjusting parameter value corresponds to the weight of the loss function, and the larger the adjusting parameter value is, the more important the corresponding loss is. The final loss function of the image classification model can be used to update the imageParameters of the classification model.
The above steps are the training phase of the image classification model shown in fig. 1, and fig. 2 is the testing phase of the image classification model.
In the testing stage of the image classification model, only the images are input, the probability of belonging to each class can be obtained, and further the class of the input image is obtained.
The following table shows the test accuracy obtained from the Stanford cars test set, which is a public data set, as input to the model of the present invention and the prior art.
TABLE 1
Method Accuracy of the Stanford Cars test set
B-CNN 91.3
OSME 93.0
WS-DAN 94.5
CSE 93.90
Resnet50 90.9
The method of the invention 95.5
The data sources for the Stanford Cars test set were: krause J, stark M, deng J, et al.3d object representations for fine-grained harvesting [ C ]// Proceedings of the IEEE international conference on computer vision works.2013: 554-561.
The data sources for the B-CNN method in Table 1 are: lin T Y, royChowdhury A, maji S.Biliner CNN modules for fine-grained visual recognition [ C ]// Proceedings of the IEEE international conference on computer vision.2015:1449-1457.
The data sources for the OSME process in table 1 are: zhang Wenxuan, wuqi, fine-grained image classification based on multi-branch attention enhancement [ J ] computer science, 49 (5): 105-112.
The data sources for the WS-DAN method in Table 1 are: hu J, shen L, sun G.Squeeze-and-excitation networks [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2018:7132-7141.
The data sources for the CSE method in table 1 are: sun M, yuan Y, zhou F, et al. Multi-attribute multi-class constraint for fine-grained image recognition [ C ]// Proceedings of the European Conference on Computer Vision (ECCV). 2018.
The data sources for the Resnet50 method in table 1 are: he K, zhang X, ren S, et al. Deep residual learning for image recognition [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.
In this embodiment, the accuracy as shown in Table 1 is obtained by using the test set in the public data set Stanford cars as input for testing by the method of the present invention. As shown in table 1, the accuracy using the method of the present invention is higher than the accuracy using other methods in the prior art. Wherein the number of attention channels of the method of the invention is designated three.
Fig. 3 schematically shows a comparison of attention heat area maps derived for the same original image using the method of the present invention and other methods of the prior art. Meanwhile, as can be seen from fig. 3, the first column is the original target class image, the second column is the attention heat region map in the prior art, and the third column is the attention heat region map using the method of the present invention. It can be seen that the method of the present invention allows the model to focus on different regions of the target class image.
What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the inventive concept thereof, and these changes and modifications can be made without departing from the spirit and scope of the invention.

Claims (10)

1. An image classification method combining attention mutual exclusion regularization is characterized by comprising the following steps:
importing an image;
extracting image features;
classifying the image features;
the classifying the image features includes updating model parameters using a final loss function of the image classification model.
2. The method of claim 1, wherein importing the image comprises inputting an image dataset for training into an image classification model.
3. The method as claimed in claim 1, wherein the extracting image features includes extracting image features from a training image through a CNN network, so as to obtain a feature map.
4. The regular image classification method combined with attention mutual exclusion as recited in claim 1, wherein the extracting image features comprises converting the feature map into an attention map, and selecting a specified number of attention channels in the attention map.
5. The method of image classification in conjunction with attention mutual exclusion regularization as claimed in claim 1 wherein said extracting image features further comprises:
and judging whether the attention channel is a candidate key channel or not and limiting the candidate key attention channel to focus on the non-overlapping region.
6. The method according to claim 5, wherein the determining whether the attention channel is a candidate key channel comprises:
and selecting a value as a threshold, and if the weight of the attention channel is greater than the threshold, judging the attention channel as a candidate key channel.
7. The method of claim 5, wherein the restricting attention channels to regions that do not overlap with each other comprises:
calculating an attention mutual exclusion regularized loss function, which is calculated according to the following formula:
Figure FDA0003985654940000021
wherein L is AME For attention mutual exclusion of regular loss functions, M c1 Is the first attention channel of the attention thermal region map, M c2 For the second attention channel of the attention heat area map, W is the width of the attention heat area map and H is the height of the attention heat area map.
8. The method of claim 4, wherein the classifying the image features comprises:
performing a feature fusion operation on the attention diagram and the feature diagram to obtain a final feature of the image;
performing multi-classification operation on the final features of the image to obtain the category of the input image;
a final loss function of the image classification model is calculated.
9. The method as claimed in claim 8, wherein the final loss function of the image classification model includes an attention mutual exclusion regularization loss function and a cross entropy loss function.
10. A storage medium having stored thereon a computer program, wherein the computer program is adapted to, when executed by a processor, perform the steps of a method for image classification in conjunction with attention mutual exclusion regularization as claimed in any one of claims 1 to 9.
CN202211576853.0A 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules Active CN115953622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211576853.0A CN115953622B (en) 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211576853.0A CN115953622B (en) 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules

Publications (2)

Publication Number Publication Date
CN115953622A true CN115953622A (en) 2023-04-11
CN115953622B CN115953622B (en) 2024-01-30

Family

ID=87289932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211576853.0A Active CN115953622B (en) 2022-12-07 2022-12-07 Image classification method combining attention mutual exclusion rules

Country Status (1)

Country Link
CN (1) CN115953622B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110287836A (en) * 2019-06-14 2019-09-27 北京迈格威科技有限公司 Image classification method, device, computer equipment and storage medium
CN110458829A (en) * 2019-08-13 2019-11-15 腾讯医疗健康(深圳)有限公司 Image quality control method, device, equipment and storage medium based on artificial intelligence
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
WO2021143267A1 (en) * 2020-09-07 2021-07-22 平安科技(深圳)有限公司 Image detection-based fine-grained classification model processing method, and related devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110287836A (en) * 2019-06-14 2019-09-27 北京迈格威科技有限公司 Image classification method, device, computer equipment and storage medium
CN110458829A (en) * 2019-08-13 2019-11-15 腾讯医疗健康(深圳)有限公司 Image quality control method, device, equipment and storage medium based on artificial intelligence
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
WO2021143267A1 (en) * 2020-09-07 2021-07-22 平安科技(深圳)有限公司 Image detection-based fine-grained classification model processing method, and related devices

Also Published As

Publication number Publication date
CN115953622B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN112396002B (en) SE-YOLOv 3-based lightweight remote sensing target detection method
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110443818B (en) Graffiti-based weak supervision semantic segmentation method and system
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN111191583B (en) Space target recognition system and method based on convolutional neural network
CN110826596A (en) Semantic segmentation method based on multi-scale deformable convolution
CN111368636B (en) Object classification method, device, computer equipment and storage medium
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN111986126B (en) Multi-target detection method based on improved VGG16 network
CN110414616B (en) Remote sensing image dictionary learning and classifying method utilizing spatial relationship
CN111783841A (en) Garbage classification method, system and medium based on transfer learning and model fusion
CN113420669B (en) Document layout analysis method and system based on multi-scale training and cascade detection
CN112801097B (en) Training method and device of text detection model and readable storage medium
CN110533068B (en) Image object identification method based on classification convolutional neural network
CN116645592B (en) Crack detection method based on image processing and storage medium
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN117237733A (en) Breast cancer full-slice image classification method combining self-supervision and weak supervision learning
CN115661777A (en) Semantic-combined foggy road target detection algorithm
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN112329771A (en) Building material sample identification method based on deep learning
CN111027472A (en) Video identification method based on fusion of video optical flow and image space feature weight
CN111582057B (en) Face verification method based on local receptive field
CN112418358A (en) Vehicle multi-attribute classification method for strengthening deep fusion network
CN115953622B (en) Image classification method combining attention mutual exclusion rules
CN114202694A (en) Small sample remote sensing scene image classification method based on manifold mixed interpolation and contrast learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant