CN116051948A - Fine granularity image recognition method based on attention interaction and anti-facts attention - Google Patents

Fine granularity image recognition method based on attention interaction and anti-facts attention Download PDF

Info

Publication number
CN116051948A
CN116051948A CN202310212744.9A CN202310212744A CN116051948A CN 116051948 A CN116051948 A CN 116051948A CN 202310212744 A CN202310212744 A CN 202310212744A CN 116051948 A CN116051948 A CN 116051948A
Authority
CN
China
Prior art keywords
attention
feature
map
representing
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310212744.9A
Other languages
Chinese (zh)
Other versions
CN116051948B (en
Inventor
魏志强
安辰
黄磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202310212744.9A priority Critical patent/CN116051948B/en
Publication of CN116051948A publication Critical patent/CN116051948A/en
Application granted granted Critical
Publication of CN116051948B publication Critical patent/CN116051948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention belongs to the technical field of image processing, and discloses a fine-granularity image recognition method based on attention interaction and anti-facts, which is characterized in that after image features are extracted, the spatial distribution of each part of an object is learned through a spatial attention mechanism, complementary features are captured through a self-channel feature interaction fusion module and fused with key features so as to improve recognition performance, an anti-facts region is positioned through an enhanced anti-facts attention mechanism module, prediction results of a critical distinguishing region and the anti-facts region are subjected to difference, and the difference is used as a strong attention supervision signal, so that the ability of effective attention of network learning is improved. The method provided by the invention can be used for effectively improving the identification precision of the fine-grained image.

Description

Fine granularity image recognition method based on attention interaction and anti-facts attention
Technical Field
The invention belongs to the technical field of image processing, relates to a deep learning and fine-granularity image recognition technology, and particularly relates to a fine-granularity image recognition method based on attention interaction and inverse fact attention.
Background
Fine-grained image recognition, also referred to as sub-category image recognition, differs from traditional image recognition in that it is intended to distinguish between different sub-categories belonging to one category. The different subclasses are often too similar, and meanwhile, because of interference factors such as gestures, illumination, shielding, background and the like, the images with fine granularity have similar appearance and shape, and have the characteristics of small inter-class difference and large intra-class difference. In view of the high requirements for image recognition accuracy in reality, fine-grained image recognition becomes an important research direction of computer vision.
Early fine-grained image recognition methods addressed this problem by human annotated bounding boxes/region annotations (e.g., bird head, body) for region-based feature representation. However, specialized knowledge and a lot of annotation time are required in the tagging process. Thus, the strongly supervised approach, which takes a lot of time and resources to annotate, is not optimal for the actual fine-grained image recognition task. To address this problem, research has focused on weak supervision methods that provide only class labels, learning distinguishing features by locating different sites. At present, research methods for fine-grained image recognition focus on enlarging and cropping locally distinguishable regions. Specifically, in the method, an attention mechanism branch network is added in a feature extraction network for learning attention weight, after the feature extraction network extracts features from an input image, a feature map is used as the input of the attention mechanism branch network to obtain an attention feature map, the attention feature map and an original feature map are fused to strengthen key features, and then the key features are amplified and cut, so that fine-grained features which are more beneficial to recognition tasks are enhanced.
This common approach of magnifying and cropping critical areas using the attention mechanism, while achieving some results, still has several key issues. Specifically, the existing fine-granularity image recognition method mainly attaches weights to the characteristics of different channels through an attention mechanism, strengthens the channels with strong distinguishing property to locate key areas, and ignores complementarity among the channels; and the attention mechanism module is only supervised by the loss function, lacks a powerful supervision signal to guide the learning process, and ignores the causal relationship between the prediction result and the attention.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a fine granularity image recognition method based on attention interaction and anti-fact attention, optimizes an attention mechanism by maximizing the difference between the anti-fact attention and the fact attention, and effectively utilizes authentication features and complementary information to participate in recognition together so as to improve recognition accuracy. Specifically, (1) first, aiming at the problems that the prior method ignores finer complementary information and effectively utilizes identification features and complementary information to participate in identification together, a self-channel feature interaction fusion module is provided, the module models interaction between different channels of an image, the complementary features of the channels can be captured for each channel, and then the complementary features and key features are fused to obtain fusion features; secondly, key features and fusion features are effectively utilized to participate in recognition by introducing a sorting loss function, so that recognition accuracy is improved; (2) Aiming at the problem that an attention mechanism lacks a powerful supervision signal to guide a learning process and ignores the causal relation between a prediction result and attention, the invention designs a module for enhancing a negative fact attention mechanism, and quantifies the quality of attention by comparing the influence of facts (learned attention) and negative facts (irrelevant attention) on a final prediction result; and then, the difference is maximized, the network learning is promoted to be more effective, the unilateral influence of the training set is reduced, and the recognition precision is improved.
In order to solve the technical problems, the invention adopts the following technical scheme:
the fine granularity image recognition method based on attention interaction and counter-facts attention comprises the following steps:
step 1: feature extraction:
inputting the image I into a feature extraction network to obtain a feature map
Figure SMS_1
Wherein C, H, W are the height, width and number of channels of the feature map, respectively.
Step 2: the spatial distribution of each part of the object is learned through a spatial attention mechanism:
the feature map F obtained in the step 1 is used for learning the spatial distribution of each part of the object through a spatial attention mechanism and is expressed as
Figure SMS_2
Where M represents the number of attentiveness, the attentiveness force map A may be calculated as: />
Figure SMS_3
wherein
Figure SMS_4
Attention seeking to indicate coverage of a local area, +.>
Figure SMS_5
Representing the spatial attention mechanism, consisting of a convolution layer and a ReLU activation function.
Step 3: capturing complementary features through a self-channel feature interaction fusion module and fusing the complementary features with key features:
inputting the attention map A obtained in the step 2 into a self-channel feature interaction fusion module, extracting complementary features by exploring channel correlation in the image, and fusing the complementary features with key features; the specific method comprises the following steps:
first, attention diagram A is compressed into feature matrix
Figure SMS_6
Figure SMS_7
wherein
Figure SMS_8
,/>
Figure SMS_9
Then will
Figure SMS_10
And->
Figure SMS_11
Bilinear interpolation operation is carried out to obtain bilinear matrix +.>
Figure SMS_12
By using a bilinear matrix
Figure SMS_13
Adding negative sign before, and obtaining weight matrix by softmax function>
Figure SMS_14
Figure SMS_15
wherein
Figure SMS_16
Representation->
Figure SMS_17
Transpose of->
Figure SMS_18
Representing the spatial relationship between channel i and channel j.
The weight matrix W and the feature matrix
Figure SMS_19
Multiplying to obtain a feature matrix comprising complementary features>
Figure SMS_20
Figure SMS_21
Matrix the features
Figure SMS_22
Conversion to an attention seeking force comprising complementary features>
Figure SMS_23
And fusing with attention seeking A to obtain +.>
Figure SMS_24
Figure SMS_25
wherein
Figure SMS_26
Representing a fusion attention map includes both key features and complementary features.
Step 4: constructing a counterfactual attention profile from the attention profile a obtained in step 2
Figure SMS_27
Masking key areas in the attention map A to obtain a mask map
Figure SMS_28
In->
Figure SMS_29
The position of the key area has been blocked by +.>
Figure SMS_30
To construct a counterfactual attention seeking->
Figure SMS_31
Step 5: converting the feature map into feature vectors:
converting the attention force diagram, the fusion attention force diagram and the inverse fact attention diagram obtained in the step 2, the step 3 and the step 4 into feature matrixes respectively; after the corresponding feature matrix is obtained, the corresponding feature matrix is converted into a feature vector through a full connection layer.
Step 6: calculating loss:
and (5) calculating loss according to the feature vector obtained in the step (5), and optimizing the model.
And (5) repeating the training steps 2-6.
Further, in step 2, the feature diagram F obtained through step 1 is input into an attention mechanism module to obtain an attention force diagram, where the attention mechanism module includes a channel attention mechanism module and a spatial attention mechanism module, and the specific steps are as follows:
first, the feature diagram F is input into a channel attention mechanism module to obtain a channel attention diagram
Figure SMS_32
Figure SMS_33
wherein
Figure SMS_34
Characteristic map of c-th channel, +.>
Figure SMS_35
Representing the eigenvectors of the c-th channel, and z represents the eigenvectors of all channels.
Weighting the feature vector z to obtain a weight vector s:
Figure SMS_36
wherein
Figure SMS_37
Representing ReLU activation function, +.>
Figure SMS_38
、/>
Figure SMS_39
Are all parameters, wherein->
Figure SMS_40
、/>
Figure SMS_41
R represents the dimension-reducing super of the channelParameters.
After the weight vector s is obtained, the feature map F and the weight vector s are fused to obtain a channel attention map
Figure SMS_42
Figure SMS_43
wherein
Figure SMS_44
The representative weight vector s and the feature map F are channel-level multiplied to obtain a channel attention map.
Channel attention is sought
Figure SMS_45
Input to the spatial attention module, capture attention in the spatial dimension, get attention strive a:
Figure SMS_46
wherein
Figure SMS_47
Comprising a 1 x 1 convolution kernel, a normalization layer and a ReLU activation function by
Figure SMS_48
An attention map a containing both channel and space dimensions is then obtained.
Further, in step 4, the specific steps of constructing the counterfacts attention map are as follows:
masking key areas in the attention map A to obtain a mask map
Figure SMS_49
Figure SMS_50
wherein
Figure SMS_51
Representing the index of attention force diagram A in spatial position +.>
Figure SMS_54
Position-corresponding value,/->
Figure SMS_57
For a set threshold value, if->
Figure SMS_52
The value of (2) is greater than a threshold +.>
Figure SMS_55
The value of the corresponding position is multiplied by the suppression factor +.>
Figure SMS_58
Shielding, inhibiting factor->
Figure SMS_59
Is a super parameter; if->
Figure SMS_53
The value of (2) is less than or equal to the threshold +.>
Figure SMS_56
The value of the corresponding position is unchanged.
By the above method, a mask pattern is obtained
Figure SMS_60
In->
Figure SMS_61
The position of the key region has been blocked by
Figure SMS_62
To construct a counterfactual attention seeking->
Figure SMS_63
Figure SMS_64
;/>
Where random (a) represents the generation of a corresponding random feature map from the attention map a, random_map represents the random feature map, and it is represented that in the feature map random_map, the critical and non-critical regions are random.
After obtaining random feature map random_map, combining random_map with
Figure SMS_65
Multiplication is performed to obtain a counterfactual attention seeking diagram>
Figure SMS_66
Figure SMS_67
In which attention is sought after against facts
Figure SMS_68
In (1) due to->
Figure SMS_69
So that the critical area is blocked, so that the random_map can only be applied to the non-critical area, then +.>
Figure SMS_70
The critical area in (a) is the irrelevant area.
Further, in step 5, the specific step of converting the feature map into the feature vector is as follows:
converting the attention force diagram, the complementary attention force diagram and the inverse fact force diagram obtained in the step 2, the step 3 and the step 4 into feature matrixes respectively:
Figure SMS_71
wherein feature_matrix represents the feature matrix of attention diagram A, feature_complex_matrix represents the fused attention diagram
Figure SMS_72
Feature_counter_matrix representing the inverse fact strive for +.>
Figure SMS_73
Normal () represents the normalization operation and einsurm () represents the attention-seeking diagram a, the complementary attention-seeking diagram, the inverse fact-seeking diagram +.>
Figure SMS_74
Multiplied by the feature map F and converted into a feature matrix.
After the corresponding feature matrix is obtained, the corresponding feature matrix is converted into feature vectors through a full connection layer:
Figure SMS_75
wherein ,
Figure SMS_76
feature vector representing attention seeking, ∈>
Figure SMS_77
Feature vector representing complementary attention profile, +_>
Figure SMS_78
A feature vector representing the difference between attention and countermeasures.
In step 6, the loss function is divided into two parts to optimize the model, firstly, the sorting loss function is introduced to effectively utilize the key features and the fusion features to participate in recognition together, and the calculation formula is as follows:
Figure SMS_79
wherein
Figure SMS_80
Is super-parameter (herba Cinchi Oleracei)>
Figure SMS_81
Representing the ranking lossLoss by ordering loss->
Figure SMS_82
The model can promote->
Figure SMS_83
Priority of (i.e.)>
Figure SMS_84
Secondly, a cross entropy loss function is introduced to optimize the model, and a calculation formula is as follows:
Figure SMS_85
wherein ,
Figure SMS_86
representing a cross entropy loss function, ">
Figure SMS_87
Representing loss of attention seeking, ->
Figure SMS_88
Representing loss of fusion attention map, ++>
Figure SMS_89
Loss representing the difference between an attention attempt and a counterfacts attention attempt, ++>
Figure SMS_90
Representing the transpose of the real label vector.
Combining the above-mentioned loss functions into an overall loss function L:
Figure SMS_91
compared with the prior art, the invention has the advantages that:
(1) According to the invention, the key region is considered by using an attention mechanism, the complementary region is considered by designing the self-channel feature interaction fusion module, the self-channel feature interaction fusion module models the correlation among the channels of the feature map to extract complementary features, and the complementary features and the key features are fused to obtain a fusion attention map containing both the key features and the complementary features so as to improve the recognition performance;
(2) The invention designs a mechanism module for enhancing the anti-facts attention to locate the anti-facts area, makes a difference between the prediction results of the critical distinguishing area and the anti-facts area, takes the difference value as an attention powerful supervision signal, and the powerful supervision signal guides a network model to learn more effective attention, improves the ability of learning effective attention and improves the recognition accuracy, which is not considered in the prior art.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a system architecture diagram of the present invention;
FIG. 3 is a schematic diagram of a self-channel feature interaction fusion module according to the present invention;
FIG. 4 is a schematic diagram of an enhanced anti-facts attention mechanism of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples.
Referring to fig. 1 and 2, the present embodiment provides a fine granularity image recognition method based on attention interaction and anti-facts attention, which includes the following steps:
step 1: feature extraction:
inputting the image I into a feature extraction network to obtain a feature map
Figure SMS_92
C, H, W of themOther features are the height, width and number of channels of the feature map.
Step 2: the spatial distribution of each part of the object is learned through a spatial attention mechanism:
the feature map F obtained in the step 1 is used for learning the spatial distribution of each part of the object through a spatial attention mechanism and is expressed as
Figure SMS_93
Where M represents the number of attentiveness, the attentiveness force map A may be calculated as: />
Figure SMS_94
wherein
Figure SMS_95
Attention seeking to indicate coverage of a local area, +.>
Figure SMS_96
Representing the spatial attention mechanism, consisting of a convolution layer and a ReLU activation function.
In a preferred embodiment, in step 2, the feature map F obtained through step 1 is input into an attention mechanism module, and attention is sought, where the attention mechanism module includes a channel attention mechanism module and a spatial attention mechanism module, and the specific steps are as follows:
first, the feature diagram F is input into a channel attention mechanism module to obtain a channel attention diagram
Figure SMS_97
Figure SMS_98
wherein
Figure SMS_99
Characteristic map of c-th channel, +.>
Figure SMS_100
Feature vector representing the c-th channel, z representing allFeature vector of the channel.
Weighting the feature vector z to obtain a weight vector s:
Figure SMS_101
wherein
Figure SMS_102
Representing ReLU activation function, +.>
Figure SMS_103
、/>
Figure SMS_104
Are all parameters, wherein->
Figure SMS_105
、/>
Figure SMS_106
R represents the super parameter of channel dimension reduction.
After the weight vector s is obtained, the feature map F and the weight vector s are fused to obtain a channel attention map
Figure SMS_107
Figure SMS_108
wherein
Figure SMS_109
The representative weight vector s and the feature map F are channel-level multiplied to obtain a channel attention map.
Channel attention is sought
Figure SMS_110
Input to the spatial attention module, capture attention in the spatial dimension, get attention strive a:
Figure SMS_111
wherein
Figure SMS_112
Comprises a 1X 1 convolution kernel, a normalization layer and a ReLU activation function by +.>
Figure SMS_113
An attention map a containing both channel and space dimensions is then obtained.
Step 3: capturing complementary features through a self-channel feature interaction fusion module and fusing the complementary features with key features:
the attention map A obtained in the step 2 is input to a self-channel feature interaction fusion module, and the channel correlation in the image is explored to extract fine complementary features, and the complementary features are fused with key features.
In combination with the self-channel feature interaction fusion module shown in fig. 3, the specific method is as follows:
first, attention diagram A is compressed into feature matrix
Figure SMS_114
Figure SMS_115
wherein
Figure SMS_116
,/>
Figure SMS_117
Then will
Figure SMS_118
And->
Figure SMS_119
Bilinear interpolation operation is carried out to obtain bilinear matrix +.>
Figure SMS_120
By double-sidedLinear matrix
Figure SMS_121
Adding negative sign before, and obtaining weight matrix by softmax function>
Figure SMS_122
Figure SMS_123
wherein
Figure SMS_124
Representation->
Figure SMS_125
Transpose of->
Figure SMS_126
Representing the spatial relationship between channel i and channel j. According to the definition of the weight matrix W, the channels with larger weights tend to be associated with +.>
Figure SMS_127
Semantically complementary. For example, a->
Figure SMS_128
Focusing on the bird's head, the channel that highlights the complement is weighted more, such as the wings of the bird, while the channel that highlights the bird's head is weighted less.
The weight matrix W and the feature matrix
Figure SMS_129
Multiplying to obtain a feature matrix comprising complementary features>
Figure SMS_130
:/>
Figure SMS_131
Matrix the features
Figure SMS_132
Conversion to an attention seeking force comprising complementary features>
Figure SMS_133
And fusing with attention seeking A to obtain +.>
Figure SMS_134
Figure SMS_135
wherein
Figure SMS_136
Representing a fusion attention map includes both key features and complementary features.
Step 4: constructing a counterfactual attention profile from the attention profile a obtained in step 2
Figure SMS_137
Masking key areas in the attention map A to obtain a mask map
Figure SMS_138
In->
Figure SMS_139
The position of the key area has been blocked by +.>
Figure SMS_140
To construct a counterfactual attention seeking->
Figure SMS_141
The enhanced countering attention mechanism module shown in connection with fig. 4 is specifically as follows:
masking key areas in the attention map A to obtain a mask map
Figure SMS_142
Figure SMS_143
wherein
Figure SMS_146
Representing the index of attention force diagram A in spatial position +.>
Figure SMS_148
Position-corresponding value,/->
Figure SMS_150
For a set threshold value, if->
Figure SMS_145
The value of (2) is greater than a threshold +.>
Figure SMS_147
The value of the corresponding position is multiplied by the suppression factor +.>
Figure SMS_151
Shielding, inhibiting factor->
Figure SMS_152
Is a super parameter; if->
Figure SMS_144
The value of (2) is less than or equal to the threshold +.>
Figure SMS_149
The value of the corresponding position is unchanged.
By the above method, a mask pattern is obtained
Figure SMS_153
In->
Figure SMS_154
The position of the key region has been blocked by
Figure SMS_155
To construct the counterfactual attentionFigure->
Figure SMS_156
Figure SMS_157
Where random (a) represents the generation of a corresponding random feature map from the attention map a, random_map represents the random feature map, and it is represented that in the feature map random_map, the critical and non-critical regions are random.
After obtaining random feature map random_map, combining random_map with
Figure SMS_158
Multiplication is performed to obtain a counterfactual attention seeking diagram>
Figure SMS_159
Figure SMS_160
In which attention is sought after against facts
Figure SMS_161
In (1) due to->
Figure SMS_162
So that the critical area is blocked, so that the random_map can only be applied to the non-critical area, then +.>
Figure SMS_163
The critical area in (a) is the irrelevant area.
Step 5: converting the feature map into feature vectors:
converting the attention force diagram, the fusion attention force diagram and the inverse fact attention diagram obtained in the step 2, the step 3 and the step 4 into feature matrixes respectively:
Figure SMS_164
wherein feature_matrix represents the feature matrix of attention diagram A, feature_complex_matrix represents the fused attention diagram
Figure SMS_165
Feature_counter_matrix representing the inverse fact strive for +.>
Figure SMS_166
Normal () represents the normalization operation and einsurm () represents the attention-seeking diagram a, the complementary attention-seeking diagram, the inverse fact-seeking diagram +.>
Figure SMS_167
Multiplied by the feature map F and converted into a feature matrix. />
After the corresponding feature matrix is obtained, the corresponding feature matrix is converted into feature vectors through a full connection layer:
Figure SMS_168
wherein ,
Figure SMS_169
feature vector representing attention seeking, ∈>
Figure SMS_170
Feature vector representing complementary attention profile, +_>
Figure SMS_171
A feature vector representing the difference between attention and countermeasures.
Step 6: calculating loss:
and (5) calculating loss according to the feature vector obtained in the step (5), and optimizing the model. The loss function is divided into two parts to optimize the model, firstly, the sorting loss function is introduced to effectively utilize key features and fusion features to participate in identification together, and the calculation formula is as follows:
Figure SMS_172
wherein
Figure SMS_173
Is super-parameter (herba Cinchi Oleracei)>
Figure SMS_174
Represents the ordering penalty by ordering penalty->
Figure SMS_175
The model can promote->
Figure SMS_176
Priority of (i.e.)>
Figure SMS_177
The purpose of this design is to force the fusion attention to produce more accurate predictions in order for the network to reference the predictions that the attention is attempting to produce. With this regularization method, the network can learn to identify fine-grained images by adaptively considering feature priorities.
Secondly, a cross entropy loss function is introduced to optimize the model, and a calculation formula is as follows:
Figure SMS_178
wherein ,
Figure SMS_179
representing a cross entropy loss function, ">
Figure SMS_180
Representing loss of attention seeking, ->
Figure SMS_181
Representing loss of fusion attention map, ++>
Figure SMS_182
Loss representing the difference between an attention attempt and a counterfacts attention attempt, ++>
Figure SMS_183
Representing the transpose of the real label vector.
Combining the above-mentioned loss functions into an overall loss function L:
Figure SMS_184
and (5) repeating the training steps 2-6.
After model training is completed, the image to be identified is input, so that high-accuracy identification can be realized.
It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (5)

1. The fine-granularity image recognition method based on attention interaction and counter-fact attention is characterized by comprising the following steps of:
step 1: feature extraction:
inputting the image I into a feature extraction network to obtain a feature map
Figure QLYQS_1
Wherein C, H, W is the height, width and number of channels of the feature map, respectively;
step 2: the spatial distribution of each part of the object is learned through a spatial attention mechanism:
the feature map F obtained in the step 1 is used for learning the spatial distribution of each part of the object through a spatial attention mechanism and is expressed as
Figure QLYQS_2
Where M represents the number of attentiveness, the attentiveness force map A may be calculated as:
Figure QLYQS_3
wherein
Figure QLYQS_4
Attention seeking to indicate coverage of a local area, +.>
Figure QLYQS_5
Representing a spatial attention mechanism consisting of a convolution layer and a ReLU activation function;
step 3: capturing complementary features through a self-channel feature interaction fusion module and fusing the complementary features with key features:
inputting the attention map A obtained in the step 2 into a self-channel feature interaction fusion module, extracting complementary features by exploring channel correlation in the image, and fusing the complementary features with key features; the specific method comprises the following steps:
first, attention diagram A is compressed into feature matrix
Figure QLYQS_6
Figure QLYQS_7
wherein
Figure QLYQS_8
,/>
Figure QLYQS_9
Then will
Figure QLYQS_10
And->
Figure QLYQS_11
Bilinear interpolation operation is carried out to obtain bilinear matrix +.>
Figure QLYQS_12
By adding a bilinear matrix>
Figure QLYQS_13
Adding negative sign before, and obtaining weight matrix by softmax function>
Figure QLYQS_14
Figure QLYQS_15
wherein
Figure QLYQS_16
Representation->
Figure QLYQS_17
Transpose of->
Figure QLYQS_18
Representing the spatial relationship between channel i and channel j;
the weight matrix W and the feature matrix
Figure QLYQS_19
Multiplying to obtain a feature matrix comprising complementary features>
Figure QLYQS_20
Figure QLYQS_21
Matrix the features
Figure QLYQS_22
Conversion to an attention seeking force comprising complementary features>
Figure QLYQS_23
And fusing with attention seeking A to obtain +.>
Figure QLYQS_24
Figure QLYQS_25
wherein
Figure QLYQS_26
Representing a fused attention map that includes both key features and complementary features;
step 4: constructing a counterfactual attention profile from the attention profile a obtained in step 2
Figure QLYQS_27
Masking key areas in the attention map A to obtain a mask map
Figure QLYQS_28
In->
Figure QLYQS_29
The position of the key area has been blocked by +.>
Figure QLYQS_30
To construct a counterfactual attention seeking->
Figure QLYQS_31
Step 5: converting the feature map into feature vectors:
converting the attention force diagram, the fusion attention force diagram and the inverse fact attention diagram obtained in the step 2, the step 3 and the step 4 into feature matrixes respectively; after the corresponding feature matrix is obtained, the corresponding feature matrix is converted into a feature vector through a full connection layer;
step 6: calculating loss:
calculating loss according to the feature vector obtained in the step 5, and optimizing the model;
and (5) repeating the training steps 2-6.
2. The fine-grained image recognition method based on attention interaction and anti-facts attention according to claim 1, wherein in step 2, the feature map F obtained through step 1 is input into an attention mechanism module to obtain attention force, and the attention mechanism module comprises a channel attention mechanism module and a spatial attention mechanism module, and the specific steps are as follows:
first, the feature diagram F is input into a channel attention mechanism module to obtain a channel attention diagram
Figure QLYQS_32
Figure QLYQS_33
Figure QLYQS_34
wherein
Figure QLYQS_35
Characteristic map of c-th channel, +.>
Figure QLYQS_36
Representing the feature vector of the c-th channel, z representing the feature vectors of all channels;
weighting the feature vector z to obtain a weight vector s:
Figure QLYQS_37
wherein
Figure QLYQS_38
Representing ReLU activation function, +.>
Figure QLYQS_39
、/>
Figure QLYQS_40
Are all parameters, wherein->
Figure QLYQS_41
,/>
Figure QLYQS_42
R represents the super parameter of channel dimension reduction;
after the weight vector s is obtained, the feature map F and the weight vector s are fused to obtain a channel attention map
Figure QLYQS_43
Figure QLYQS_44
wherein
Figure QLYQS_45
The representative weight vector s and the feature diagram F are multiplied by a channel level to obtain a channel attention diagram;
channel attention is sought
Figure QLYQS_46
Input to the spatial attention module, capture attention in the spatial dimension, get attention strive a:
Figure QLYQS_47
wherein
Figure QLYQS_48
Comprises a 1X 1 convolution kernel, a normalization layer and a ReLU activation function by +.>
Figure QLYQS_49
An attention map a containing both channel and space dimensions is then obtained.
3. The fine-grained image recognition method based on attention interaction and anti-facts attention according to claim 1, wherein in step 4, the specific steps of constructing an anti-facts attention map are as follows:
masking key areas in the attention map A to obtain a mask map
Figure QLYQS_50
Figure QLYQS_51
wherein
Figure QLYQS_53
Representing the index of attention force diagram A in spatial position +.>
Figure QLYQS_57
Position-corresponding value,/->
Figure QLYQS_58
For a set threshold value, if
Figure QLYQS_54
The value of (2) is greater than a threshold +.>
Figure QLYQS_56
The value of the corresponding position is multiplied by the suppression factor +.>
Figure QLYQS_59
Shielding, inhibiting factor->
Figure QLYQS_60
Is a super parameter; if->
Figure QLYQS_52
The value of (2) is less than or equal to the threshold +.>
Figure QLYQS_55
The value of the corresponding position is unchanged;
by the above method, a mask pattern is obtained
Figure QLYQS_61
In->
Figure QLYQS_62
The position of the key region has been blocked by
Figure QLYQS_63
To construct a counterfactual attention seeking->
Figure QLYQS_64
Figure QLYQS_65
Wherein random (a) represents generating a corresponding random feature map from the attention map a, and random_map represents the random feature map, representing that in the feature map random_map, the critical area and the non-critical area are random;
after obtaining random feature map random_map, combining random_map with
Figure QLYQS_66
Multiplication results in a counterfactual attention attempt
Figure QLYQS_67
Figure QLYQS_68
In which attention is sought after against facts
Figure QLYQS_69
In (1) due to->
Figure QLYQS_70
So that the critical area is blocked, so that the random_map can only be applied to the non-critical area, then +.>
Figure QLYQS_71
The critical area in (a) is the irrelevant area.
4. The fine-grained image recognition method based on attention interaction and anti-facts attention according to claim 3, wherein in step 5, the specific step of converting the feature map into feature vectors is as follows:
converting the attention force diagram, the complementary attention force diagram and the inverse fact force diagram obtained in the step 2, the step 3 and the step 4 into feature matrixes respectively:
Figure QLYQS_72
Figure QLYQS_73
Figure QLYQS_74
wherein feature_matrix represents the feature matrix of attention diagram A, feature_complex_matrix represents the fused attention diagram
Figure QLYQS_75
Feature_counter_matrix representing the inverse fact strive for +.>
Figure QLYQS_76
Normal () represents the normalization operation and einsurm () represents the attention-seeking diagram a, the complementary attention-seeking diagram, the inverse fact-seeking diagram +.>
Figure QLYQS_77
Multiplying the characteristic diagram F and converting the characteristic diagram F into a characteristic matrix;
after the corresponding feature matrix is obtained, the corresponding feature matrix is converted into feature vectors through a full connection layer:
Figure QLYQS_78
Figure QLYQS_79
Figure QLYQS_80
wherein ,
Figure QLYQS_81
feature vector representing attention seeking, ∈>
Figure QLYQS_82
Feature vector representing complementary attention profile, +_>
Figure QLYQS_83
A feature vector representing the difference between attention and countermeasures.
5. The fine-grained image recognition method based on attention interaction and anti-facts attention according to claim 4, wherein in step 6, the loss function is divided into two parts to optimize the model, firstly, a sorting loss function is introduced to effectively utilize key features and fusion features to participate in recognition together, and a calculation formula is as follows:
Figure QLYQS_84
wherein
Figure QLYQS_85
Is the parameter of the ultrasonic wave to be used as the ultrasonic wave,/>
Figure QLYQS_86
represents the ordering penalty by ordering penalty->
Figure QLYQS_87
The model can promote->
Figure QLYQS_88
Priority of (i.e.)>
Figure QLYQS_89
Secondly, a cross entropy loss function is introduced to optimize the model, and a calculation formula is as follows:
Figure QLYQS_90
Figure QLYQS_91
Figure QLYQS_92
wherein ,
Figure QLYQS_93
representing a cross entropy loss function, ">
Figure QLYQS_94
Representing loss of attention seeking, ->
Figure QLYQS_95
Representing loss of fusion attention map, ++>
Figure QLYQS_96
Loss representing the difference between an attention attempt and a counterfacts attention attempt, ++>
Figure QLYQS_97
Representing a transpose of the real label vector;
combining the above-mentioned loss functions into an overall loss function L:
Figure QLYQS_98
。/>
CN202310212744.9A 2023-03-08 2023-03-08 Fine granularity image recognition method based on attention interaction and anti-facts attention Active CN116051948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310212744.9A CN116051948B (en) 2023-03-08 2023-03-08 Fine granularity image recognition method based on attention interaction and anti-facts attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310212744.9A CN116051948B (en) 2023-03-08 2023-03-08 Fine granularity image recognition method based on attention interaction and anti-facts attention

Publications (2)

Publication Number Publication Date
CN116051948A true CN116051948A (en) 2023-05-02
CN116051948B CN116051948B (en) 2023-06-23

Family

ID=86123960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310212744.9A Active CN116051948B (en) 2023-03-08 2023-03-08 Fine granularity image recognition method based on attention interaction and anti-facts attention

Country Status (1)

Country Link
CN (1) CN116051948B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116228749A (en) * 2023-05-04 2023-06-06 昆山润石智能科技有限公司 Wafer defect detection method and system based on inverse fact interpretation
CN116665019A (en) * 2023-07-31 2023-08-29 山东交通学院 Multi-axis interaction multi-dimensional attention network for vehicle re-identification
CN117078920A (en) * 2023-10-16 2023-11-17 昆明理工大学 Infrared-visible light target detection method based on deformable attention mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325237A (en) * 2020-01-21 2020-06-23 中国科学院深圳先进技术研究院 Image identification method based on attention interaction mechanism
US20210133479A1 (en) * 2019-11-05 2021-05-06 Beijing University Of Posts And Telecommunications Fine-grained image recognition method, electronic device and storage medium
CN113592023A (en) * 2021-08-11 2021-11-02 杭州电子科技大学 High-efficiency fine-grained image classification model based on depth model framework
CN113642571A (en) * 2021-07-12 2021-11-12 中国海洋大学 Fine-grained image identification method based on saliency attention mechanism
CN114882534A (en) * 2022-05-31 2022-08-09 合肥工业大学 Pedestrian re-identification method, system and medium based on counterfactual attention learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210133479A1 (en) * 2019-11-05 2021-05-06 Beijing University Of Posts And Telecommunications Fine-grained image recognition method, electronic device and storage medium
CN111325237A (en) * 2020-01-21 2020-06-23 中国科学院深圳先进技术研究院 Image identification method based on attention interaction mechanism
CN113642571A (en) * 2021-07-12 2021-11-12 中国海洋大学 Fine-grained image identification method based on saliency attention mechanism
CN113592023A (en) * 2021-08-11 2021-11-02 杭州电子科技大学 High-efficiency fine-grained image classification model based on depth model framework
CN114882534A (en) * 2022-05-31 2022-08-09 合肥工业大学 Pedestrian re-identification method, system and medium based on counterfactual attention learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YONGMING RAO ET AL.: "Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification", 《2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
马瑶: "CNN和Transformer在细粒度图像识别中的应用综述", 《计算机工程与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116228749A (en) * 2023-05-04 2023-06-06 昆山润石智能科技有限公司 Wafer defect detection method and system based on inverse fact interpretation
CN116228749B (en) * 2023-05-04 2023-10-27 昆山润石智能科技有限公司 Wafer defect detection method and system based on inverse fact interpretation
CN116665019A (en) * 2023-07-31 2023-08-29 山东交通学院 Multi-axis interaction multi-dimensional attention network for vehicle re-identification
CN116665019B (en) * 2023-07-31 2023-09-29 山东交通学院 Multi-axis interaction multi-dimensional attention network for vehicle re-identification
CN117078920A (en) * 2023-10-16 2023-11-17 昆明理工大学 Infrared-visible light target detection method based on deformable attention mechanism
CN117078920B (en) * 2023-10-16 2024-01-23 昆明理工大学 Infrared-visible light target detection method based on deformable attention mechanism

Also Published As

Publication number Publication date
CN116051948B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN116051948B (en) Fine granularity image recognition method based on attention interaction and anti-facts attention
Zhang et al. PC-RGNN: Point cloud completion and graph neural network for 3D object detection
Ren et al. Salient object detection by fusing local and global contexts
CN113609896B (en) Object-level remote sensing change detection method and system based on dual-related attention
CN114758288A (en) Power distribution network engineering safety control detection method and device
Wang et al. Multiscale deep alternative neural network for large-scale video classification
Li et al. Detection-friendly dehazing: Object detection in real-world hazy scenes
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN116342601B (en) Image tampering detection method based on edge guidance and multi-level search
Zhang et al. Local–global attentive adaptation for object detection
Yu et al. The multi-level classification and regression network for visual tracking via residual channel attention
Alsanad et al. Real-time fuel truck detection algorithm based on deep convolutional neural network
Luo et al. Exploring point-bev fusion for 3d point cloud object tracking with transformer
CN111368637B (en) Transfer robot target identification method based on multi-mask convolutional neural network
CN111444913A (en) License plate real-time detection method based on edge-guided sparse attention mechanism
CN116311353A (en) Intensive pedestrian multi-target tracking method based on feature fusion, computer equipment and storage medium
Hou et al. A novel UAV aerial vehicle detection method based on attention mechanism and multi-scale feature cross fusion
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
Qi et al. TCNet: A novel triple-cooperative network for video object detection
Liu et al. Adversarial erasing attention for person re-identification in camera networks under complex environments
CN112668643A (en) Semi-supervised significance detection method based on lattice tower rule
Li et al. Multi-adversarial partial transfer learning with object-level attention mechanism for unsupervised remote sensing scene classification
Chen et al. Delving into the scale variance problem in object detection
Yang et al. An Effective and Lightweight Hybrid Network for Object Detection in Remote Sensing Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant