CN109858506B - Visualization algorithm for classification result of convolutional neural network - Google Patents

Visualization algorithm for classification result of convolutional neural network Download PDF

Info

Publication number
CN109858506B
CN109858506B CN201810519569.7A CN201810519569A CN109858506B CN 109858506 B CN109858506 B CN 109858506B CN 201810519569 A CN201810519569 A CN 201810519569A CN 109858506 B CN109858506 B CN 109858506B
Authority
CN
China
Prior art keywords
layer
output
neuron
correlation
neurons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810519569.7A
Other languages
Chinese (zh)
Other versions
CN109858506A (en
Inventor
周连科
谢晓东
褚慈
王红滨
李秀明
王念滨
赵昱杰
薛冬梅
王勇军
何茜茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Three Cup Tea Technology Co.,Ltd.
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201810519569.7A priority Critical patent/CN109858506B/en
Publication of CN109858506A publication Critical patent/CN109858506A/en
Application granted granted Critical
Publication of CN109858506B publication Critical patent/CN109858506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a visualization algorithm for a convolutional neural network classification result, and belongs to the technical field of computer vision and digital image processing. The invention uses a correlation propagation algorithm in the full connection layer to obtain the contribution of each neuron in the last convolutional layer to the final output result, and calculates the class activation mapping chart of the convolutional neural network according to the contribution. After the class activation mapping map is obtained, the positions of the neurons contributing to the classification result in the last convolutional layer are obtained, the positions of the neurons supporting classification in the convolutional layer are redirected forwards layer by layer until reaching an input layer according to a proposed propagation algorithm based on position information, so that a pixel position set contributing to an output result in an input image is obtained, and finally a visual image capable of explaining the classification basis of the convolutional neural network is obtained.

Description

Visualization algorithm for classification result of convolutional neural network
Technical Field
The invention belongs to the technical field of computer vision and digital image processing, and particularly relates to a visualization algorithm for a convolutional neural network classification result.
Background
At present, the convolutional neural network, which is the most commonly used deep learning model, has been widely applied to many fields, such as image classification, speech recognition, natural language processing, and the like, and all the fields achieve good effects, and achieve or even exceed human performance in some fields. With the wider application, the understanding of the network model becomes more and more important, especially when the network model relates to the fields of automobile automatic driving, medical diagnosis and the like, the understanding and verification of the neural network decision are very important, people do not want to make all decisions by an unknown 'black box' model, and therefore the study on the interpretation of the convolutional neural network classification result is of great significance.
The visual algorithm for explaining the classification result of the convolutional neural network mainly comprises a class activation mapping algorithm and a sensitivity analysis algorithm. The former can only be used in a full convolution neural network without a full connection layer, and has the advantages that the classification basis of the convolution neural network can be accurately explained; the latter has wider usability and can be used in most convolutional neural networks, but when the classification basis is explained, the method is not well applicable to the situation of input containing more than one type of targets, and the accuracy is still to be improved.
Aiming at the two problems that the class activation mapping image algorithm cannot be applied to a general structure convolutional neural network and the accuracy of the interpretation classification result of the sensitivity analysis algorithm is insufficient, different propagation algorithms are respectively used in a full connection layer and a convolutional layer, and a class activation mapping image visualization algorithm based on correlation is provided. The method comprises the steps of firstly using a correlation propagation algorithm at a full connection layer to obtain the contribution of each neuron in the last convolutional layer to a final output result, and then calculating a class activation mapping chart of the convolutional neural network according to the contribution. After the class activation mapping map is obtained, the positions of the neurons contributing to the classification result in the last convolutional layer are obtained, the positions of the neurons supporting classification in the convolutional layer are redirected forwards layer by layer until reaching an input layer according to a proposed propagation algorithm based on position information, so that a pixel position set contributing to an output result in an input image is obtained, and finally a visual image capable of explaining the classification basis of the convolutional neural network is obtained.
Disclosure of Invention
The invention aims to provide a visualization algorithm for solving the problem that the effect of explaining the classification of a convolutional neural network in the prior art is not ideal and orienting to the classification result of the convolutional neural network.
The purpose of the invention is realized by the following steps:
the invention discloses a visual algorithm for a convolutional neural network classification result, which is specifically realized by the following steps:
(1) Extracting a data set of an input image, and training the convolutional neural network by using the data set as a training set to obtain trained model parameters;
(2) According to a calculation method of a Rel-CAM algorithm in the fully-connected layer, calculating the contribution of each neural unit in the fully-connected layer to output layer by using an output result and model parameters until the convolutional layer;
(3) Calculating the corresponding weight of each channel and the output result of the layer according to the contribution of all the neural units in the last layer of convolutional layer obtained in the step (2) to the output, thereby obtaining a class activation mapping chart of the network model;
(4) Recording neural units with positive values in the class activation mapping, wherein the positions of the neural units are used as the positions of pixels contributing to output results in the layer, and the neural units are added into a neuron set contributing to output in the layer;
(5) Sequentially taking out each neuron in the set, calculating Hadamard products of all neurons in the receptive field in the previous layer and corresponding weights, summing the Hadamard products of each channel, taking the channel with the maximum sum as a contribution channel, adding the neuron with the positive value into the neuron set contributing to the output in the layer, and removing repeated neurons in the neuron set;
(6) And (5) repeating the propagation process of the step (5) until a neuron set in the input layer is obtained, wherein the neuron unit in the neuron set indicates that the pixel at the position contributes to the output result.
For a convolutional neural network classification result-oriented visualization algorithm, the step (2) is implemented by the following steps:
(2.1) assuming a trained CNN model and a given input picture, the model divides the CNN model into C types, assuming that a C node in an output layer is an output node of the type, and the score at the C node is S c The output before the Softmax layer is selected as the class score in the algorithm, the output maps to the position of the feature only related to class c,
Figure RE-GDA0001812829770000021
wherein the content of the first and second substances,
Figure RE-GDA0001812829770000022
expressing the correlation predicted as c-type neurons in an output layer, namely the distribution of the correlation of the prediction result on the output layer;
(2.2) assuming that the layer before the output layer is l, the contribution degree of each neural unit in the layer to the final output, namely the correlation between each neuron and the prediction result, is defined as:
Figure RE-GDA0001812829770000023
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA0001812829770000024
represents the activation value of the ith neuron in the l-th layer, and
Figure RE-GDA0001812829770000025
representing the weighted connections between the neural unit and the next layer, the output layer neurons;
(2.3) only considering the relevance of each neuron to the node C because only the class C output nodes have the relevance in the last layer; if the transmission between the middle layers is considered, the correlation between each neuron in the previous layer to all neurons in the next layer is considered, and then:
Figure RE-GDA0001812829770000026
wherein the content of the first and second substances,
Figure RE-GDA0001812829770000027
represents the correlation between the jth neuron in the ith layer and the class c prediction output,
Figure RE-GDA0001812829770000028
representing the correlation between the ith neuron in the l-1 layer and the jth neuron in the next layer l;
(2.4) according to conservation law, the sum of the correlations of all neurons of the l-th layer is equal to the correlation of the output layer, so that the correlation of the i neuron with the next layer, which is equal to its correlation with the predicted result, is:
Figure RE-GDA0001812829770000031
wherein the content of the first and second substances,
Figure RE-GDA0001812829770000032
representing the correlation between the ith neuron in the l-1 layer and the predicted result;
meanwhile, according to the conservation law of transmission, the following conditions exist:
Figure RE-GDA0001812829770000033
for a visualization algorithm for a convolutional neural network classification result, in step (3), in order to obtain a category CAM map, it is necessary to first reversely transfer the correlation of the prediction result to the last convolutional layer, because the spatial information in the input image is stored in the convolutional layer, the correlation is first transferred layer by layer until the last convolutional layer, so as to prepare for calculating the CAM map in the next step, in a general CNN structure, the output of the last convolutional layer is converted from a three-dimensional tensor to a one-dimensional vector, so as to connect the following fully-connected layers, and the specific implementation steps include:
(3.1) assuming that the output of the last convolutional layer is at the mth layer of the network, according to the conservation law of correlation:
Figure RE-GDA0001812829770000034
according to the conservation law of the correlation, the sum of the correlations of each neuron output by the last convolutional layer is equal to the final class score:
Figure RE-GDA0001812829770000035
(3.2) when classification prediction is carried out on forward propagation, a feature mapping image output by the m-th layer in the convolutional layer, namely a corresponding three-dimensional tensor is converted into a one-dimensional vector, and the conversion abandons spatial information in the extracted features, so that when a CAM (computer aided manufacturing) image is calculated by backward correlation and backward propagation, the one-dimensional vector representing the correlation of neurons in the m-th layer needs to be correspondingly converted into a three-dimensional tensor when the neurons are propagated in the forward direction, namely a spatial structure of the feature mapping image of the layer;
the algorithm firstly converts the one-dimensional correlation vector of the m-th layer into a three-dimensional tensor with the correlation of the feature mapping space structure, and the sum of the three-dimensional tensor with the correlation of the feature mapping space structure is kept unchanged as the values of the three-dimensional tensor are in one-to-one correspondence,
Figure RE-GDA0001812829770000036
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA0001812829770000037
representing the correlation between the neuron with the coordinate (i, j) in the kth channel in the m-th layer of correlation tensor in the network and the prediction classification result;
(3.3) if the output characteristics of each channel are globally averaged and pooled, the result is:
Figure RE-GDA0001812829770000038
wherein f is k (i, j) represents the activation value of the neuron with coordinate (i, j) in the kth channel in the feature map of the last convolutional layer, so that:
Figure RE-GDA0001812829770000041
compared with the calculation formula of the CAM diagram, the calculation formula of the CAM diagram can be obtained as follows:
Figure RE-GDA0001812829770000042
i.e. the weight of each feature map after global average pooling and final output
Figure RE-GDA0001812829770000047
As shown in the above equation, after weighted summation, a CAM map of the CNN model containing the fully-connected layer is obtained, which is:
Figure RE-GDA0001812829770000043
for a visualization algorithm for the classification result of the convolutional neural network, in the step (4), if the used CNN model has N convolutional layers, the index of each layer is 1,2, \ 8230; N, and in the l layer, the matrix A is used l Represents the activation value of all neurons in this layer, W l A weight matrix connecting this layer and the previous layer is represented,
Figure RE-GDA0001812829770000044
denotes the kth neuron in layer l, X l Representing neurons in layer I contributing to the last decision in the feature mapPosition, i.e. the bit of the neuron whose correlation with the final output result is positive, m represents the number of neurons therein; the position of the pixel in the input that supports this CNN decision will be obtained below based on the previously obtained CAM map in conjunction with the new propagation method proposed.
For a visualization algorithm for a convolutional neural network classification result, the step (5) specifically includes the following steps:
(5.1) for X l Extracting an activation value in a corresponding receptive field in layer l-1
Figure RE-GDA0001812829770000045
(5.2) calculating Hadamard products (Hadamard products) of the activation values and corresponding weights of the convolution kernels
Figure RE-GDA0001812829770000046
(5.3) obtaining a channel which has the maximum contribution to the next layer of neurons by summing the Hadamard products in each channel, wherein the neurons with positive Hadamard products in the channel are recorded into the neuron sets contributing to classification by an algorithm;
(5.4) removing the repeated neurons.
The invention has the beneficial effects that: the Rel-CAM algorithm has higher accuracy in explaining the classification of the convolutional neural network, and can distinguish the characteristics among the classifications when explaining the classification decision, so that people can be helped to better understand the classification basis of the convolutional neural network, and the problem that the effect of explaining the classification of the convolutional neural network in the prior art is not ideal is solved.
Drawings
FIG. 1 is a schematic flow diagram of a convolutional neural network classification result-oriented visualization algorithm in the present invention;
FIG. 2 is a qualitative comparison graph of Rel-CAM algorithm and Backprop and LRP algorithm in the present invention;
FIG. 3 is a diagram showing the results of Rel-CAM algorithm, backprop algorithm and LRP algorithm in the proportion of minimum descending confidence level and minimum ascending and descending confidence level of classification.
FIG. 4 is a diagram illustrating an exemplary image processing structure according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
With reference to fig. 1, the invention discloses a visualization algorithm for a convolutional neural network classification result, which is implemented by the following steps:
the method comprises the following steps: training the convolutional neural network by using a data set containing an input image to be interpreted as a training set to obtain trained model parameters;
step two: according to a calculation method of a Rel-CAM algorithm in the fully-connected layer, calculating the contribution of each neural unit in the fully-connected layer to output layer by using an output result and model parameters until the convolutional layer;
step three: calculating the corresponding weight of each channel and the output result of the layer according to the contribution of all the neural units in the last layer of convolutional layer to the output obtained in the step two, thereby obtaining a class activation mapping chart of the network model;
step four: recording neural units with positive values in the class activation mapping, wherein the positions of the neural units are used as the positions of pixels contributing to output results in the layer, and the neural units are added into a neuron set contributing to output in the layer;
step five: sequentially taking out each neuron in the set, calculating the Hadamard products of all neurons in the receptive field in the previous layer and corresponding weights, summing the Hadamard products of each channel, taking the channel with the maximum sum as a contribution channel, adding the neuron with the positive value into the neuron set which contributes to the output in the layer, and removing repeated neurons in the neuron set;
step six: and repeating the propagation process of the step five until a neuron set in the input layer is obtained, wherein the neuron units in the neuron set indicate that the pixel at the position contributes to the output result.
At present, a visual method for explaining the classification result of a convolutional neural network is a popular direction of the current machine learning research, and scholars at home and abroad propose various model methods and corresponding algorithms which have characteristics aiming at different network models and specific practical problems, on the basis of the research of predecessors, aiming at the defects of the accuracy of the explanation of the classification result and the efficiency of the algorithm of the existing sensitivity analysis visual algorithm, the invention provides a class activation mapping image visual algorithm based on correlation by combining the advantages and innovation of the class activation mapping image algorithm, and the main viewpoints and contents are as follows:
(1) The Rel-CAM algorithm computes the method in the fully-connected layer. The correlation propagation algorithm is one of the commonly used algorithms for explaining the classification of the convolutional neural network, and the general idea of the correlation propagation algorithm is to understand the contribution of each pixel to the final prediction result, and the correlation propagation algorithm utilizes the structure of the network to perform back propagation on the correlation. The algorithm starts from the output layer of the network and redistributes the scores of the predictive classification at each layer along the direction of backward propagation of the network, up to the input layer. And the redistribution process obeys the conservation law that the sum of the correlations of each layer remains unchanged. Here the correlation is denoted R (x), where x denotes a single pixel or a neuron in the middle layer. To obtain a certain type of CAM map, the prediction result of the last layer needs to be transferred to the last convolutional layer first.
First, assume that there is a trained CNN model and a given input picture, the model divides it into C classes, assume that the C node in the output layer is the output node of the class, and the score at this node is S c The output before the Softmax layer is selected as the class score in the algorithm, because then the output would map to the location of features that are only relevant to class c; if the Softmax layer output is chosen, the normalized output will map to locations that contain features from other classes, and the resulting visualization will be inaccurate because it contains features that are classified as other classes, although with little probability it will be classified as other classes. So taken together, the algorithm uses the output value before Softmax as the start of correlation propagation. Thus, there are:
Figure RE-GDA0001812829770000061
wherein the content of the first and second substances,
Figure RE-GDA0001812829770000062
the relevance of the predicted c-type neurons in the output layer is represented, namely the distribution of the relevance of the prediction result on the output layer has only one value because only one node is related to the c-type neurons in the output layer
Figure RE-GDA0001812829770000063
Thus, the sum of the correlations of each neuron of the previous layers is also
Figure RE-GDA0001812829770000064
Assuming that the layer before the output layer is l, the contribution degree of each neural unit in the layer to the final output, that is, the correlation of each neuron with the prediction result, is defined as:
Figure RE-GDA0001812829770000065
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0001812829770000066
represents the activation value of the ith neuron in the l-th layer, and
Figure RE-GDA0001812829770000067
the weighted connection between the neural unit and the next layer (output layer) of neurons is represented, and because only the class C output nodes in the last layer have correlation, the correlation of each neuron to the node C is considered. However, if the transmission between the middle layers is carried out, the correlation between each neuron in the previous layer and all neurons in the next layer must be considered, and the following steps are carried out:
Figure RE-GDA0001812829770000071
wherein
Figure RE-GDA0001812829770000072
Represents the correlation between the jth neuron in the ith layer and the class c prediction output,
Figure RE-GDA0001812829770000073
expressing the correlation between the ith neuron in the l-1 layer and the jth neuron in the next layer l, namely the contribution of the ith neuron to the j neuron, the sum of the contributions of the ith neuron to all the neurons in the next layer is the contribution of the ith neuron to the next layer, and according to the conservation law, the sum of the correlations of all the neurons in the l layer is equal to the correlation of the output layer, so that the correlation of the ith neuron to the next layer is equal to the correlation of the ith neuron to the prediction result, and the correlation is as follows:
Figure RE-GDA0001812829770000074
wherein the content of the first and second substances,
Figure RE-GDA0001812829770000075
and (4) representing the correlation between the ith neuron in the l-1 layer and the predicted result. Meanwhile, according to the conservation law of transmission, the method comprises the following steps:
Figure RE-GDA0001812829770000076
in order to obtain the CAM map of the category, it is necessary to first reversely transfer the correlation of the prediction result to the last convolutional layer, because the spatial information in the input image is stored in the convolutional layer, so that the correlation is first transferred layer by layer until the last convolutional layer, and preparation is made for calculating the CAM map in the next step. In a general CNN structure, the output of the last convolutional layer is converted from a three-dimensional tensor into a one-dimensional vector so as to connect the following fully connected layers. Assuming that the output of the last convolutional layer is located at the mth layer of the network, according to the conservation law of correlation:
Figure RE-GDA0001812829770000077
according to the conservation law of the correlation, the sum of the correlations of each neuron output by the last convolutional layer is equal to the final class score:
Figure RE-GDA0001812829770000078
because the feature map output from the m-th layer in the convolutional layer, i.e. the corresponding three-dimensional tensor, is converted into a one-dimensional vector in the forward propagation for the classification prediction, so as to facilitate the forward propagation in the fully-connected layer. This conversion discards spatial information in the extracted features, so when performing the reverse correlation back propagation to calculate the CAM map, it is necessary to first convert the one-dimensional vector representing the correlation of the neurons in the mth layer into the three-dimensional tensor in the forward propagation, that is, the spatial structure of the feature map in the layer.
The algorithm firstly converts the one-dimensional correlation vector of the mth layer into a three-dimensional tensor with the correlation of the feature mapping space structure, and the sum of the three-dimensional tensor is kept unchanged because the values of the three-dimensional tensor are in one-to-one correspondence. Thus for the transformed correlation tensor, there are also:
Figure RE-GDA0001812829770000079
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA0001812829770000081
and (3) representing the correlation between the neuron with the coordinate (i, j) in the kth channel in the correlation tensor of the mth layer in the network and the prediction classification result. If the global average pooling is performed on the output characteristics of each channel, the obtained result is:
Figure RE-GDA0001812829770000082
wherein, f k (i, j) represents the activation value of the neuron with coordinate (i, j) in the kth channel in the feature map of the last convolutional layer, so that:
Figure RE-GDA0001812829770000083
comparison with the calculation formula of the CAM map yields:
Figure RE-GDA0001812829770000084
i.e. the weight of each feature map after global average pooling and final output
Figure RE-GDA0001812829770000085
As shown in the above equation, after weighted summation, a CAM map of the CNN model containing the fully-connected layer is obtained, which is:
Figure RE-GDA0001812829770000086
(2) Rel-CAM algorithm is a calculation method in convolutional layers. In convolutional layers, the Rel-CAM algorithm uses an algorithm based on location information propagation. The core idea of the algorithm is as follows: assuming that at the current layer, if a neuron supports the final classification result, i.e. it is positively correlated with the final output result, the neurons positively correlated with the neuron in the previous layer can be regarded as evidence supporting neurons in the current layer, and also as evidence supporting the final classification result. The correlation is positive, i.e. the product of the neurons in the previous layer and the weight between them is positive. This is the core idea of the Rel-CAM algorithm to propagate layer by layer in convolutional layers.
First, if the CNN model used has N convolutional layers, the index of each layer is 1,2, \ 8230; N. In the l-th layer, using the matrix A l Represents all neuron activation values, W, of the layer l RepresentA weight matrix connecting this layer and the previous layer,
Figure RE-GDA0001812829770000091
denotes the kth neuron in layer l, X l The positions of the neurons contributing to the final decision in the feature map in the l-th layer, that is, the bits of the neurons having a positive correlation with the final output result are represented, and m represents the number of neurons therein. The position of the pixel in the input that supports this CNN decision will be obtained below based on the previously obtained CAM map in conjunction with the new propagation method proposed.
The CAM map obtained in the previous section is located in the mth layer of the network, and the neurons with positive values in the CAM map are those contributing to the final decision result in this layer, so X l Is that
Figure RE-GDA0001812829770000092
A set of element positions whose median value is greater than 0. The position information is then passed layer by layer up to the input layer.
After reaching the convolutional layer, X l Is a three-dimensional set of indices, each index identifying the location of a neuron at the layer that contributes to the final classification decision. How the method herein inversely localizes the neuron having discrimination in the previous layer will be explained below. It should be noted that typically the receptive field of the pooling layer for performing pooling operations is a two-dimensional plane, while the receptive field of the convolutional layer for performing convolution operations is a three-dimensional volume, and thus, for X l Need to extract the activation value in the corresponding receptive field in layer l-1
Figure RE-GDA0001812829770000093
The Hadamard product (Hadamard product) of these activation values and the corresponding weights of the convolution kernels is then computed
Figure RE-GDA0001812829770000094
By summing the Hadamard products in each channel, the channel with the largest contribution to the next layer of neurons can be obtained, and the Hadamard products in the channelNeurons with positive dammar product are recorded by the algorithm into the set of neurons contributing to the classification.
Algorithm 1 below explains the process of obtaining the position of the support-classified neurons in the convolutional layer. In the case of the algorithm 1, the algorithm,
Figure RE-GDA0001812829770000095
representing the activation value in the field of the previous layer and is therefore a three-dimensional tensor. When it and the weight of the corresponding neuron
Figure RE-GDA0001812829770000096
When the hadamard product is performed, the result is also a three-dimensional tensor of the same size. The algorithm first sums the outputs along the x-axis and y-axis directions to locate the most discriminative feature map. If the convolutional layer does not perform any down-sampling operation, the spatial position of the determinate neuron will not change during this conversion, that is, the position (x, y) in the subsequent layer will be shifted to the channel with the largest contribution in the current layer, thus completing the propagation of the position information between layers. The algorithm may further select the neuron with the largest activation value among the channels contributing the most, but the results of both are almost the same in the experiment, so the algorithm still selects the element of the channel contributing the most as the decision neuron.
The algorithm steps of the location update are as follows:
algorithm 1: neuron position propagation algorithm supporting classification decision in convolutional layer
Inputting: x l Neuron positions in the higher layer that contribute to the classification are derived from the CAM: x l [1]...X l [m]
W l Weight of the l-th layer
A l-1 Activation value of neurons of layer l-1
And (3) outputting: x l-1 Location of neurons with supporting classification in layer l-1
1 order of X l-1 =φ
2for i=1:m do
3 neurons
Figure RE-GDA0001812829770000101
Corresponding convolution kernel weights
Figure RE-GDA0001812829770000102
4 neurons
Figure RE-GDA0001812829770000103
Activation value in corresponding receptive field
Figure RE-GDA0001812829770000104
5 Hadamard product of activation value and weight
Figure RE-GDA0001812829770000105
6, calculating the contribution value of each channel according to the Hadamard product, namely summing each channel in the product tensor, and assigning C, namely
Figure RE-GDA0001812829770000106
Where S (x) is the summation over the plane elements
7 storing the position of the neuron with positive Hadamard product in the channel into the position set of the decision neuron in the current layer, namely in X l-1 Increase in (X) l-1 ,argmax(C))
8end for
9 for X l-1 The position with the same median value retains one of them.
With pooling layers, the algorithm extracts neurons in the two-dimensional receptive field range in the previous layer and finds the location with the largest activation value from it, because most CNN structures typically downsample the feature map using the method of maximal pooling. Thus, the activation in the next layer comes from the maximum activation that occurs in the corresponding receptive field in the previous layer. Thus, when the algorithm backtracks activation of a previous layer at a downsampling layer, the neuron with the largest activation value among the receptive fields of the corresponding neurons is selected.
Thus, when training CNNs for recognition, the Rel-CAM algorithm may start with the prediction result of the last layer, and first use a correlation propagation-based algorithm at the fully-connected layer to generate a class activation map capable of locating classification features. The map is then converted into a set containing a set of positions, and the positions of the neurons with decision-making are traced back at the convolutional layer to the input layer using another propagation algorithm based on position information. Finally, the localization of the features that determine the classification is obtained on the input image. Although the input picture usually contains three channels of RGB, the algorithm only considers the planar space of x and y, that is, only the positioning of the pixels in two-dimensional space is of interest.
When the method is used for explaining the classification result of the convolutional neural network, the concept of the class activation mapping chart is used for reference, and the class activation mapping chart shows that the region of the input picture which is divided into the classes is determined in the input picture, so that the method also has the advantage, the qualitative experiment result chart in the graph 2 can be used for seeing that when the image is classified into a cat or a dog, the Rel-CAM algorithm only identifies the pixel region of the corresponding class in the image, and does not identify the pixel regions of other classes or environments in the image; the Backprop method identifies features of all classes in both cases, indicating that the method cannot distinguish features between classes when interpreting classification decisions. The LRP method is similar to the Rel-CAM algorithm proposed herein, and has class distinctiveness compared with the Backprop method, but the algorithm labels more non-critical feature regions and pixels in the environment, and is more computationally intensive than the Rel-CAM algorithm, so the Rel-CAM algorithm in the present invention has better effect on interpreting the classification decision of CNN, especially when there are more than one type of objects in the image.
In addition, through quantitative experimental comparison of (a) classification confidence degree reduction drop and (b) classification confidence degree increase increment incrase of the three methods under the same data set, the Rel-CAM algorithm can explain classification of the convolutional neural network more accurately. The evaluation criterion uses the concept of an interpretation graph, and the interpretation E of the defined image is that the generated thermodynamic diagram H is multiplied by the input picture I in an element correspondence mode:
Figure RE-GDA0001812829770000111
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0001812829770000112
the hadamard product representing the multiplication of the elements is corresponding, I is the input picture, and H is a thermodynamic diagram capable of determining the classification. In the experiment of each picture, c represents the category into which the model is divided. In short, the interpretation graph is the importance of the model decision for each pixel obtained according to the algorithm, and covers a part of the input image.
Mean decrease in classification confidence: a good explanatory figure should mark the parts that are most important for classification. The deep CNN model makes a final decision according to all features of the input image in prediction, so that blocking a part of the image inevitably reduces the confidence of the model in decision. On the other hand, this drop should be small, since the most important parts of the whole input image that influence the classification decision are retained in the interpretation map. Thus, this metric compares the degradation in confidence of a picture after masking for a particular class of models. For example, if it is known that a model predicts an image as a tiger with a confidence of 0.8, when the prediction is performed using the interpretation graph, the confidence that the model classifies the image as a tiger drops to 0.4. Then the reduction in model confidence is 50%. Experimental comparisons were made with the top 50 images of each category of prediction in the experimental selection dataset, comparing the average drop of several algorithms.
The classification confidence is increased: however, sometimes it is possible that all features sought by CNN are the identified parts in the interpretation graph, while other parts of features are unnecessary features, and do not help classification decision. In this case, the confidence of the model for a particular class would instead increase. This metric calculates the number of times the confidence of the model increases, expressed as a percentage, when predicted using the interpretation graph throughout the data set.
Least reduction in classification confidence: the first two criteria may evaluate the ability of an interpretation graph generated by one visualization method to correctly identify regions in the image that affect classification, and this criterion is used to explicitly compare how good an interpretation graph generated by a different method is. The method is characterized in that in a given data set, the size of the confidence reduction of the interpretation graph generated by each visualization method on each image is compared, and the least reduction of which method is reduced is added with 1. The confidence degree is reduced at least, and the more important classification features of the class are identified in the interpretation graph generated by the method, namely the interpretation is better. The final output is a percentage, i.e., the minimum number of drops per method is a proportion of all algorithms.
Through experimental analysis, as shown in fig. 3, the Rel-CAM algorithm is lower than the other two existing algorithms in terms of average decrease of classification confidence, and is comparable to the other algorithms in terms of confidence increase, but has a little advantage. On the other hand, the experimental result shows that the positioning of the features can help to improve the performance of the classifier, which may provide a new angle for a deep learning researcher to improve the performance of the neural network, i.e., a feature recognition component is added to the model, and then training is guided according to the recognized features, so as to improve the network performance.
In the aspect of minimal confidence reduction, the proportion occupied by the Rel-CAM algorithm is the largest, that is, in the whole data set, in many cases, the Rel-CAM algorithm can identify the feature with the largest influence on classification, which indicates that the Rel-CAM is better than the other two methods.
In conclusion, through qualitative and quantitative analysis, the Rel-CAM algorithm has higher accuracy in explaining the classification of the convolutional neural network, and can distinguish the characteristics between the classifications when explaining the classification decision, thereby helping people to better understand the classification basis of the convolutional neural network.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A visualization method for a convolutional neural network classification result is characterized by specifically comprising the following steps of:
(1) Extracting a data set of an input image, and training the convolutional neural network by using the data set as a training set to obtain trained model parameters;
(2) According to a calculation method of a Rel-CAM algorithm in the fully-connected layer, calculating the contribution of each neural unit in the fully-connected layer to output layer by using an output result and model parameters until the convolutional layer;
(3) Calculating the corresponding weight of each channel and the output result of the layer according to the contribution of all the neural units in the last layer of convolutional layer obtained in the step (2) to the output, thereby obtaining a class activation mapping chart of the network model;
(4) Recording neural units with positive values in the class activation mapping, wherein the positions of the neural units are used as the positions of pixels contributing to output results in the layer, and the neural units are added into a neuron set contributing to output in the layer;
(5) Sequentially taking out each neuron in the set, calculating the Hadamard products of all neurons in the receptive field in the previous layer and corresponding weights, summing the Hadamard products of each channel, taking the channel with the maximum sum as a contribution channel, adding the neuron with the positive value into the neuron set which contributes to the output in the layer, and removing repeated neurons in the neuron set;
(6) Repeating the propagation process of the step (5) until a neuron set in the input layer is obtained, wherein the neuron units in the neuron set indicate that the pixel at the position contributes to the output result;
the step (2) is realized by the following steps:
(2.1) assuming a trained CNN model and a given input picture, the model divides the model into C classes, assuming that the C node in the output layer is the output node of the class,and a score at that node is S c The output before the Softmax layer is selected as the class score in the algorithm, the output maps to the position of the feature only related to class c,
Figure FDA0003775210630000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003775210630000012
representing the relevance predicted as c-type neurons in the output layer, namely the distribution of the relevance of the prediction result on the output layer;
(2.2) assuming that the layer before the output layer is l, the contribution degree of each neural unit in the layer to the final output, namely the correlation between each neuron and the prediction result, is defined as:
Figure FDA0003775210630000013
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003775210630000014
represents the activation value of the ith neuron in the l-th layer, and
Figure FDA0003775210630000015
representing the weighted connections between the neural unit and the next layer, the output layer neurons;
(2.3) only considering the relevance of each neuron to the node C because only the class C output nodes have the relevance in the last layer; if the transmission between the middle layers is considered, the correlation between each neuron in the previous layer to all neurons in the next layer is considered, and then:
Figure FDA0003775210630000021
wherein the content of the first and second substances,
Figure FDA0003775210630000022
represents the correlation between the jth neuron in the ith layer and the class c prediction output,
Figure FDA0003775210630000023
representing the correlation between the ith neuron in the l-1 layer and the jth neuron in the next layer l;
(2.4) according to conservation law, the sum of the correlations of all neurons of the l-th layer is equal to the correlation of the output layer, so that the correlation of the i neuron with the next layer, which is equal to its correlation with the predicted result, is:
Figure FDA0003775210630000024
wherein the content of the first and second substances,
Figure FDA0003775210630000025
representing the correlation between the ith neuron in the l-1 layer and the predicted result;
meanwhile, according to the conservation law of transmission, the method comprises the following steps:
Figure FDA0003775210630000026
in the step (3), in order to obtain the CAM map of the category, it is necessary to first reversely transfer the correlation of the prediction result to the last convolutional layer, because the spatial information in the input image is stored in the convolutional layer, the correlation is first transferred layer by layer until the last convolutional layer, so as to prepare for calculating the CAM map in the next step, in a general CNN structure, the output of the last convolutional layer is converted from a three-dimensional tensor into a one-dimensional vector so as to connect the following fully connected layers, and the specific implementation step includes:
(3.1) assuming that the output of the last convolutional layer is at the mth layer of the network, according to the conservation law of correlation:
Figure FDA0003775210630000027
according to the conservation law of the correlation, the sum of the correlations of each neuron output by the last convolutional layer is equal to the final class score:
Figure FDA0003775210630000028
(3.2) when classification prediction is carried out on forward propagation, a feature mapping image output by the m-th layer in the convolutional layer, namely a corresponding three-dimensional tensor is converted into a one-dimensional vector, and the conversion abandons spatial information in the extracted features, so that when a CAM (computer aided manufacturing) image is calculated by backward correlation and backward propagation, the one-dimensional vector representing the correlation of neurons in the m-th layer needs to be correspondingly converted into a three-dimensional tensor when the neurons are propagated in the forward direction, namely a spatial structure of the feature mapping image of the layer;
the algorithm firstly converts the one-dimensional correlation vector of the m-th layer into a three-dimensional tensor with the correlation of the feature mapping space structure, and the sum of the three-dimensional tensor with the correlation of the feature mapping space structure is kept unchanged as the values of the three-dimensional tensor are in one-to-one correspondence,
Figure FDA0003775210630000029
wherein the content of the first and second substances,
Figure FDA00037752106300000210
representing the correlation between the neuron with the coordinate (i, j) in the kth channel in the correlation tensor of the mth layer in the network and the prediction classification result;
(3.3) if the output characteristics of each channel are globally averaged and pooled, the result is:
Figure FDA0003775210630000031
wherein f is k (i, j) represents the activation value of the neuron with coordinate (i, j) in the kth channel in the feature map of the last convolutional layer, so that:
Figure FDA0003775210630000032
comparison with the calculation formula of the CAM map yields:
Figure FDA0003775210630000033
i.e. the weight of each feature map after global average pooling and final output
Figure FDA0003775210630000034
As shown in the above equation, after weighted summation, a CAM map of the CNN model containing the fully connected layers is obtained, which is:
Figure FDA0003775210630000035
in the step (4), if the CNN model used has N convolution layers, the index of each layer is 1,2, \ 8230: (N), in the l layer, matrix A is used l Represents the activation value of all neurons in this layer, W l A weight matrix connecting this layer and the previous layer is represented,
Figure FDA0003775210630000036
denotes the kth neuron in layer l, X l Representing the positions of the neurons contributing to the final decision in the characteristic map in the l-th layer, namely the bits of the neurons with positive correlation with the final output result, wherein m represents the number of the neurons; the CAM map obtained above is combined with a new propagation method to obtainInputting the location of the pixel in the input that supports the CNN decision;
the step (5) specifically comprises the following steps:
(5.1) for X l Extracting an activation value in a corresponding receptive field in layer l-1
Figure FDA0003775210630000041
(5.2) calculating Hadamard products (Hadamard products) of the activation values and corresponding weights of convolution kernels
Figure FDA0003775210630000042
(5.3) obtaining a channel which has the maximum contribution to the next layer of neurons by summing the Hadamard products in each channel, wherein the neurons with positive Hadamard products in the channel are recorded into the neuron sets contributing to classification by an algorithm;
(5.4) removing the repeated neurons.
CN201810519569.7A 2018-05-28 2018-05-28 Visualization algorithm for classification result of convolutional neural network Active CN109858506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810519569.7A CN109858506B (en) 2018-05-28 2018-05-28 Visualization algorithm for classification result of convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810519569.7A CN109858506B (en) 2018-05-28 2018-05-28 Visualization algorithm for classification result of convolutional neural network

Publications (2)

Publication Number Publication Date
CN109858506A CN109858506A (en) 2019-06-07
CN109858506B true CN109858506B (en) 2022-11-18

Family

ID=66889621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810519569.7A Active CN109858506B (en) 2018-05-28 2018-05-28 Visualization algorithm for classification result of convolutional neural network

Country Status (1)

Country Link
CN (1) CN109858506B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533093A (en) * 2019-08-24 2019-12-03 大连理工大学 A kind of automobile front face brand family analysis method
CN110769258A (en) * 2019-11-05 2020-02-07 山东浪潮人工智能研究院有限公司 Image compression method and system for multi-semantic region of specific scene
CN110852394B (en) * 2019-11-13 2022-03-25 联想(北京)有限公司 Data processing method and device, computer system and readable storage medium
CN111046939B (en) * 2019-12-06 2023-08-04 中国人民解放军战略支援部队信息工程大学 Attention-based CNN class activation graph generation method
CN111553462A (en) * 2020-04-08 2020-08-18 哈尔滨工程大学 Class activation mapping method
CN111401472B (en) * 2020-04-09 2023-11-24 中国人民解放军国防科技大学 Infrared target classification method and device based on deep convolutional neural network
CN111582376B (en) * 2020-05-09 2023-08-15 抖音视界有限公司 Visualization method and device for neural network, electronic equipment and medium
CN111666861A (en) * 2020-06-01 2020-09-15 浙江工业大学 Wireless signal modulation classifier visualization method based on convolutional neural network
CN112347252B (en) * 2020-11-04 2024-02-27 吉林大学 Interpretability analysis method based on CNN text classification model
CN112651407B (en) * 2020-12-31 2023-10-20 中国人民解放军战略支援部队信息工程大学 CNN visualization method based on discriminative deconvolution
CN112735514B (en) * 2021-01-18 2022-09-16 清华大学 Training and visualization method and system for neural network extraction regulation and control DNA combination mode
CN115829005B (en) * 2022-12-09 2023-06-27 之江实验室 Automatic defect diagnosis and repair method and device for convolutional neural classification network
CN116561752B (en) * 2023-07-07 2023-09-15 华测国软技术服务南京有限公司 Safety testing method for application software

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3025344B1 (en) * 2014-08-28 2017-11-24 Commissariat Energie Atomique NETWORK OF CONVOLUTIONAL NEURONS
CN106909945A (en) * 2017-03-01 2017-06-30 中国科学院电子学研究所 The feature visualization and model evaluation method of deep learning
CN107392085B (en) * 2017-05-26 2021-07-02 上海精密计量测试研究所 Method for visualizing a convolutional neural network
CN107766933B (en) * 2017-10-24 2021-04-23 天津大学 Visualization method for explaining convolutional neural network

Also Published As

Publication number Publication date
CN109858506A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN109858506B (en) Visualization algorithm for classification result of convolutional neural network
CN109670474B (en) Human body posture estimation method, device and equipment based on video
CN111291809B (en) Processing device, method and storage medium
WO2020061489A1 (en) Training neural networks for vehicle re-identification
CN112288011B (en) Image matching method based on self-attention deep neural network
CN111160375A (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN111476806B (en) Image processing method, image processing device, computer equipment and storage medium
KR20190056940A (en) Method and device for learning multimodal data
JP6892606B2 (en) Positioning device, position identification method and computer program
CN110879982A (en) Crowd counting system and method
US11263513B2 (en) Method and system for bit quantization of artificial neural network
CN110879961A (en) Lane detection method and apparatus using lane model
CN113554156B (en) Multitask image processing method based on attention mechanism and deformable convolution
CN111428619A (en) Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN114863407B (en) Multi-task cold start target detection method based on visual language deep fusion
KR20220045424A (en) Method and apparatus of compressing artificial neural network
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN115761393A (en) Anchor-free target tracking method based on template online learning
CN114612709A (en) Multi-scale target detection method guided by image pyramid characteristics
CN111914809A (en) Target object positioning method, image processing method, device and computer equipment
JP7225731B2 (en) Imaging multivariable data sequences
CN111652350A (en) Neural network visual interpretation method and weak supervision object positioning method
CN116109649A (en) 3D point cloud instance segmentation method based on semantic error correction
CN111582058B (en) Method for estimating hand posture by using impedance type 3D hierarchical network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231225

Address after: Room 2106-501, Building 4, Innovation and Entrepreneurship Square, Science and Technology Innovation City, Harbin High tech Industrial Development Zone, Heilongjiang Province, 150001 (No. 689 Shize Road, Songbei District)

Patentee after: Harbin Three Cup Tea Technology Co.,Ltd.

Address before: 150001 Intellectual Property Office, Harbin Engineering University science and technology office, 145 Nantong Avenue, Nangang District, Harbin, Heilongjiang

Patentee before: HARBIN ENGINEERING University