CN115063655A - Class activation mapping graph generation method fusing supercolumns - Google Patents

Class activation mapping graph generation method fusing supercolumns Download PDF

Info

Publication number
CN115063655A
CN115063655A CN202111655904.4A CN202111655904A CN115063655A CN 115063655 A CN115063655 A CN 115063655A CN 202111655904 A CN202111655904 A CN 202111655904A CN 115063655 A CN115063655 A CN 115063655A
Authority
CN
China
Prior art keywords
feature
region
map
feature maps
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111655904.4A
Other languages
Chinese (zh)
Inventor
刘晶晶
吕学强
游新冬
韩晶
刘国明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aerospace Automatic Control Research Institute
Original Assignee
Beijing Aerospace Automatic Control Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aerospace Automatic Control Research Institute filed Critical Beijing Aerospace Automatic Control Research Institute
Priority to CN202111655904.4A priority Critical patent/CN115063655A/en
Publication of CN115063655A publication Critical patent/CN115063655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for generating a class activation mapping map fused with supercolumns, which comprises the following steps: dividing the network convolution layer into a low region, a middle region and a high region according to a filter channel, and extracting the last convolution block of each region as information of three different levels of the low region, the middle region and the high region; the second step is that: the output features of two levels of d1 and d2 are up-sampled to be feature dimensions of a low level, then deep splicing is carried out to obtain a feature map, and standardization is carried out to enable the range of elements contained in the feature map to be between [0,1 ]; the third step: grouping and batching the feature maps obtained in the second step, and obtaining the confidence coefficient of each group of feature maps by adopting a confidence coefficient algorithm for each group of feature maps; the fourth step: splicing the confidence results of all groups to obtain a multi-dimensional vector, performing soft maximum (softmax ()) processing on the vector, and taking the result as the contribution degree of each feature map; the fifth step: and multiplying the contribution degrees by the corresponding feature maps, and adding the multiplied feature maps to obtain a final class activation map.

Description

Class activation mapping graph generation method fusing supercolumns
Technical Field
The invention belongs to the field of artificial intelligence.
Background
Deep learning is well-established, the interpretability of neural networks has been a hotspot in discussion, and the interpretability is often tied to model visualizations, which help us understand which features can guide the models in classifying images. Many different visualization techniques are known, such as visualizing the intermediate output of the convolutional neural network (intermediate activation), visualizing the filter of the convolutional neural network, and visualizing activation-like thermodynamic diagrams in images. The basic principle of Class Activation Mapping (CAM) is to find out the weight corresponding to each channel by using back propagation for the feature map of the last convolutional layer, and the greater the weight, the more important the corresponding feature map is. And then multiplying the corresponding weight and the feature map to obtain a final class activation map. Although the CAM method can locate and provide a basis for network judgment to a certain class, and theoretical derivation is sufficient, the CAM method also has great disadvantages: the network needs to be trained for the second time to obtain the weight corresponding to each feature map. The Grad-CAM algorithm combines the discriminant of CAM with the gradient-based pixel space visualization technology to obtain a high-resolution class prediction interpretation map, and the technology is not limited to a full convolution network and can be used for a common CNN structure. The Grad-CAM + + algorithm on the basis is used for optimizing the Grad-CAM result, so that the positioning is more accurate, and the method is more suitable for the situation that more than one object of the target class is contained in the image. However, since the algorithms such as the Grad-CAM and the Grad-CAM + + use gradients to obtain the feature weights, the gradients are prone to noise and saturation problems for the deep neural network, and thus the effect is affected. The Score-CAM algorithm is the first to get rid of the dependence on the gradient and measures the linear weight by using the global confidence Score of the model for the feature map. The Ablation-CAM algorithm and the SS-CAM algorithm are also free from the dependence on the gradient, the visualization result is more focused, and the noise in the background is greatly reduced. However, the CAM generated by the three algorithms of the Score-CAM, the approximation-CAM and the SS-CAM mainly depends on the characteristics obtained by convolution of the last layer of the neural network, but the characteristics of the middle and lower layers of the network are not paid much attention, so that the problems of incomplete important information, loss of edge information and the like contained in the generated characteristic diagram are easily caused.
Disclosure of Invention
The technical problem solved by the invention is as follows: the defects of the prior art are overcome, and a class activation mapping chart generation method fusing supercolumns is provided.
The technical scheme of the invention is as follows: a method for generating a class activation map fused with supercolumns comprises the following steps:
the first step is as follows: dividing the network convolution layer into three regions according to the filter channel, sequentially marking the three regions as a low region, a middle region and a high region, and extracting the last convolution block of each region as information of three different levels of the low region, the middle region and the high region, wherein the characteristic dimension of the marked low level is d0(a0 a0 b0), the characteristic dimension of the marked middle level is d1(a1 a1 b1), and the characteristic dimension of the marked high level is d2(a2 a2 b 2);
the second step is that: the output features of two levels of d1 and d2 are up-sampled to be feature dimensions of a0 × a0, then three levels of output features with unified dimensions are deeply spliced to obtain a feature map, and the feature map is standardized to enable the range of elements contained in the feature map to be between [0,1 ];
the third step: carrying out grouping batch processing on the feature maps obtained in the second step, wherein each group of feature maps adopts a confidence coefficient algorithm to obtain the confidence coefficient of the group of feature maps;
the fourth step: splicing the confidence results of all groups to obtain a multi-dimensional vector, performing soft maximum processing on the multi-dimensional vector, and taking the result as the contribution of each feature map;
the fifth step: and multiplying the contribution degree obtained in the fourth step by the corresponding feature maps, and adding the multiplied feature maps to obtain the final class activation map.
Preferably, a0 ≠ a1 ≠ a 2.
Preferably, the network convolution is divided into three levels in the first step. The features learned by the neural network are discriminative features, and the features learned by the network in the previous layers are generally low-level features such as colors, edges and the like; the middle part of the network learns the texture features; the last layers of the network learn distinctive features which are complete and have distinguishing key features. The invention divides the network into different levels according to different characteristics learned by the network, and the level learned by the network to low-level characteristics is regarded as a low level, the level learned to textural characteristics is regarded as a middle level, and the level learned to key semantic characteristics is regarded as a high level.
Preferably, the grouping preference range in batch processing is generally 2 n How much of the specific settings are related to the effect of the experiment and the memory utilization used at that time, and the most appropriate values need to be determined by trial and error.
Preferably, the feature activation map calculation process is as follows:
Figure RE-GDA0003789565450000031
Figure RE-GDA0003789565450000032
wherein
Figure RE-GDA0003789565450000033
Representing the contribution degree of the Kth feature map to the class C;
d k representing the Kth feature map;
relu () is a linear rectification function;
n () is a normalization process, mapping matrix values to [0,1] intervals;
Figure RE-GDA0003789565450000034
representing the resulting class activation map.
Compared with the prior art, the invention has the beneficial effects that:
since the feature activation map correlation algorithm is usually used to explain the image classification task, it tends to activate certain important areas in the image, ignoring other important areas that may exist. To solve this problem, many algorithms deliberately hide or erase object regions, forcing the model to look for more different parts, but these algorithms either hide fixed-size patches randomly or require repeated model training and response aggregation steps. There are also algorithms that extend the attention of the algorithm to non-target areas through a competing erasure strategy in an end-to-end training manner, but such strategies may gradually extend their attention to non-target areas, creating the problem of inaccurate attention. In addition, the current feature activation maps have the problem of covering edges with insufficient accuracy. Aiming at the problems existing in the current generation of the feature activation graph, the invention provides an HCscore-CAM algorithm for combining network multilayer information and generating the feature activation graph by using a batch processing mode in combination with the idea of supercolumn, wherein the algorithm is integrated into the idea of supercolumn, convolution layers at the front end, the middle end and the tail end of a trained network model form a more representative feature graph by a deep connection mode, and then the feature activation graph with wider coverage and more accurate edge information is generated by using the batch processing mode. When the number of the same target in the image is more than one, the generated class activation mapping has better effect than other algorithms.
Drawings
FIG. 1 is a schematic diagram of a supercolumn feature fusion method;
FIG. 2 is a flow chart of the HCScore-CAM algorithm of the present invention;
FIG. 3 is a graph showing the comparison result of the HCscore-CAM algorithm generating the feature activation map compared with other algorithms.
Detailed Description
The invention is further illustrated by the following examples.
1. Supercolumn feature fusion of multiple feature maps
The supercolumn idea is that on the feature diagram of each intermediate convolution layer between the CNN input layer and the output layer, the active values of all units at each pixel position in the input picture are taken out to form a vector, so as to achieve the effect of effectively utilizing the information of the intermediate layer of the neural network. According to the superordinate idea, the output characteristics of the low, medium and high-end convolutional layers in the CNN are extracted, and a composite characteristic comprising the output characteristics of the low, medium and high-end convolutional layers is formed in a deep connection mode, as shown in FIG. 1. Because in CNN, deep features have a large receptive field and rich semantic information, but edge information is lost due to the reduced resolution. In contrast, shallow features contain rich edge detail information. The deep layer characteristics and the shallow layer characteristics are extracted to generate composite characteristics, and the composite characteristics are used for generating a characteristic activation map, so that the problem of unclear edges can be effectively solved, and more important information can be highlighted.
Because the number of the convolution layers of the CNN is large and the correlation of adjacent layers is large, the invention divides the network convolution into a high region, a middle region and a low region according to the number of filter channels, and only extracts the last convolution block of the regions as information of three different levels, namely the high region, the middle region and the low region. Because the dimensions of the low, medium and high three-level output features are different, the invention firstly carries out up-sampling on the medium and high levels, adjusts the dimensions to be consistent with the low levels, and then standardizes the three-level output features with unified dimensions to ensure that the element range contained in the three-level output features is [0,1]]In the meantime. The specific calculation is shown in formula (1), wherein d k Representing the feature layer that needs to be converted, Up () representing the upsampling calculation, and N () representing the normalization calculation.
D k =N(Up(d k )) (1)
Figure RE-GDA0003789565450000041
2. Generation of feature activation graphs
To generate the feature activation graph, the contribution degree of each feature graph to the current classification result is firstly obtained, the feature activation graph is finally obtained by obtaining the contribution degree and then multiplying the feature graph by the corresponding contribution degree and then carrying out linear addition, and the specific steps are as follows:
the first step is as follows: and (3) sequentially covering the original image by taking the feature images contained in the supercolumn fusion feature image as masks, putting the covered image into the same network again to obtain the scores corresponding to the categories of the original image, and calculating the confidence coefficient of the current feature image by taking the difference value between the scores corresponding to the categories of the original image and the scores corresponding to the categories of the original image as the confidence coefficient of the current feature image, wherein the confidence coefficient is shown in formula (3).
C(A l )=f(X 0 *H l )-f(X 0 ) (3)
Wherein X 0 Original drawing H showing input l Shows the first feature map after convolutional layer fusion, C (A) l ) Representing the degree of contribution corresponding to the profile, and the f () function corresponds to the network model.
The second step is that: and splicing the confidence coefficient results of all the feature maps calculated in the first step into a vector as the confidence coefficient of the whole supercolumn fusion feature map, and then carrying out the treatment of a normalization and soft maximum (softmax) method to obtain the whole contribution degree. Each term in the vector corresponds to the contribution degree of one feature map in the supercolumn feature fusion feature maps to the whole picture. And then performing point multiplication operation on the overall confidence vector and the supercolumn feature fusion feature map to obtain a feature activation map of the HCscore-CAM. The feature activation graph calculation process is shown in formulas (4) and (5).
Figure RE-GDA0003789565450000051
Figure RE-GDA0003789565450000052
Wherein
Figure RE-GDA0003789565450000053
Representing the contribution degree of the Kth feature map to the class C;
d k represents the K thA feature map;
relu () is a linear rectification function;
n () is a normalization process, mapping matrix values to [0,1] intervals;
Figure RE-GDA0003789565450000054
representing the resulting class activation map.
The invention HCScore-CAM algorithm flow chart (figure 2)
Figure RE-GDA0003789565450000055
Figure RE-GDA0003789565450000061
Examples
A method for generating a class activation map fused with supercolumns comprises the following steps:
the first step is as follows: and dividing the network convolution into three areas of low, medium and high according to the filter channel, and extracting the last convolution block of each area as information of three different levels of low, medium and high. The feature dimension of the lower hierarchy is d0(224 × 64), the feature dimension of the middle hierarchy is d1(56 × 256), and the feature dimension of the upper hierarchy is d2(14 × 512).
The second step is that: the output features of the two levels d1 and d2 are up-sampled to be a feature dimension of 224 x 224, then the three levels of output features with the uniform dimension are deeply spliced to obtain a feature map with the dimension of 224 x 832, and the feature map is normalized to make the range of elements contained in the feature map be between [0,1 ].
The third step: and the feature map set obtained by the second step of deep stitching comprises 832 feature maps of 224 x 1, in order to increase the calculation speed, the 832 feature maps are processed in batch, each 32 feature maps are in a group, each group of feature maps adopts a confidence coefficient algorithm, the original image is covered by using the feature maps as masks, the covered images are put into the same network again to obtain scores corresponding to the classes of the original image, the difference value of the scores corresponding to the classes of the original image is recorded as the confidence coefficient of the group of feature maps, and the result obtained by each group is a vector with the dimension of (32 x 1).
The fourth step: and (3) splicing the confidence results of all the groups to obtain a vector with the dimension of (832 × 1), wherein each value in the vector represents the confidence of the corresponding feature map. This vector is then soft-maximized to improve the connection of each feature map to the whole, while the result is still a (832 × 1) vector representing the contribution of the feature map.
The fifth step: and multiplying the contribution degree obtained in the fourth step by the corresponding feature maps, and adding the multiplied feature maps to obtain the final class activation map.
Compared with other algorithms, the HCScore-CAM algorithm has the effect comparison experiment for generating the characteristic activation graph under the condition of multiple targets. As shown in FIG. 3, the HCScore-CAM algorithm has a better effect on locating multiple homogeneous objects than the Score-CAM and SS-CAM algorithms. When one image contains a plurality of similar objects and the objects are relatively separated, the Score-CAM, SS-CAM and HCscore-CAM algorithms can respectively locate the plurality of objects, and when the plurality of similar objects are distributed too densely, the HCscore-CAM has a more obvious locating effect compared with other two algorithms. This is because the HCScore-CAM incorporates the low-level features of the network, which tend to contain edge features, and because of these features, the HCScore-CAM algorithm works better for dense object localization.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make variations and modifications of the present invention without departing from the spirit and scope of the present invention by using the methods and technical contents disclosed above.

Claims (5)

1. A method for generating a class activation map fused with supercolumns is characterized by comprising the following steps:
the first step is as follows: dividing the network convolution layer into three regions according to the filter channel, sequentially marking the three regions as a low region, a middle region and a high region, and extracting the last convolution block of each region as information of three different levels of the low region, the middle region and the high region, wherein the characteristic dimension of the marked low level is d0(a0 a0 b0), the characteristic dimension of the marked middle level is d1(a1 a1 b1), and the characteristic dimension of the marked high level is d2(a2 a2 b 2);
the second step is that: the output features of two levels of d1 and d2 are up-sampled to be feature dimensions of a0 × a0, then three levels of output features with unified dimensions are deeply spliced to obtain a feature map, and the feature map is standardized to enable the range of elements contained in the feature map to be between [0,1 ];
the third step: carrying out grouping batch processing on the feature maps obtained in the second step, wherein each group of feature maps adopts a confidence coefficient algorithm to obtain the confidence coefficient of the group of feature maps;
the fourth step: splicing the confidence results of all groups to obtain a multi-dimensional vector, performing soft maximum processing on the multi-dimensional vector, and taking the result as the contribution degree of each feature map;
the fifth step: and multiplying the contribution degree obtained in the fourth step by the corresponding feature maps, and adding the multiplied feature maps to obtain the final class activation map.
2. The method of claim 1, wherein: a0 ≠ a1 ≠ a 2.
3. The method of claim 1, wherein: in the first step, the network convolution divides the network into different hierarchies according to different characteristics learned by the network, the hierarchy learned by the network into low-level characteristics is considered as a low layer, the hierarchy learned into textural characteristics is defined as a middle layer, and the hierarchy learned into key semantic characteristics is defined as a high layer.
4. The method of claim 1, wherein: the grouping preference range in batch processing is generally 2 n The specific setting of the form of the test is related to the effect of the experiment and the utilization rate of the used memory.
5. The method of claim 1, wherein: the feature activation graph calculation process is as follows:
Figure FDA0003448288020000011
Figure FDA0003448288020000012
wherein
Figure FDA0003448288020000021
Representing the contribution degree of the Kth feature map to the class C;
d k representing the Kth feature map;
relu () is a linear rectification function;
n () is a normalization process, mapping matrix values to [0,1] intervals;
Figure FDA0003448288020000022
representing the resulting class activation map.
CN202111655904.4A 2021-12-30 2021-12-30 Class activation mapping graph generation method fusing supercolumns Pending CN115063655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111655904.4A CN115063655A (en) 2021-12-30 2021-12-30 Class activation mapping graph generation method fusing supercolumns

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111655904.4A CN115063655A (en) 2021-12-30 2021-12-30 Class activation mapping graph generation method fusing supercolumns

Publications (1)

Publication Number Publication Date
CN115063655A true CN115063655A (en) 2022-09-16

Family

ID=83196650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111655904.4A Pending CN115063655A (en) 2021-12-30 2021-12-30 Class activation mapping graph generation method fusing supercolumns

Country Status (1)

Country Link
CN (1) CN115063655A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115908296A (en) * 2022-11-10 2023-04-04 深圳大学 Medical image class activation mapping evaluation method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115908296A (en) * 2022-11-10 2023-04-04 深圳大学 Medical image class activation mapping evaluation method and device, computer equipment and storage medium
CN115908296B (en) * 2022-11-10 2023-09-22 深圳大学 Medical image class activation mapping evaluation method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115601549B (en) River and lake remote sensing image segmentation method based on deformable convolution and self-attention model
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN106981080A (en) Night unmanned vehicle scene depth method of estimation based on infrared image and radar data
CN107016409A (en) A kind of image classification method and system based on salient region of image
CN106203430A (en) A kind of significance object detecting method based on foreground focused degree and background priori
CN110047139B (en) Three-dimensional reconstruction method and system for specified target
CN106228185A (en) A kind of general image classifying and identifying system based on neutral net and method
CN113240691A (en) Medical image segmentation method based on U-shaped network
CN109409240A (en) A kind of SegNet remote sensing images semantic segmentation method of combination random walk
CN114565860B (en) Multi-dimensional reinforcement learning synthetic aperture radar image target detection method
CN105787948A (en) Quick graph cutting method based on multiple deformation resolutions
CN112489054A (en) Remote sensing image semantic segmentation method based on deep learning
CN110222760A (en) A kind of fast image processing method based on winograd algorithm
CN110807485B (en) Method for fusing two-classification semantic segmentation maps into multi-classification semantic map based on high-resolution remote sensing image
CN105956570A (en) Lip characteristic and deep learning based smiling face recognition method
CN116402851A (en) Infrared dim target tracking method under complex background
CN105740917A (en) High-resolution remote sensing image semi-supervised multi-view feature selection method with tag learning
CN113361496B (en) City built-up area statistical method based on U-Net
CN115063655A (en) Class activation mapping graph generation method fusing supercolumns
CN114299101A (en) Method, apparatus, device, medium, and program product for acquiring target region of image
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN111353525A (en) Modeling and missing value filling method for unbalanced incomplete data set
CN115830537A (en) Crowd counting method
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
CN115482463A (en) Method and system for identifying land cover of mine area of generated confrontation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination