CN112287999A

CN112287999A - Weak supervision target positioning method utilizing convolutional neural network to correct gradient

Info

Publication number: CN112287999A
Application number: CN202011166826.7A
Authority: CN
Inventors: 王菡子; 程林; 张辽; 梁艳杰
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-01-29
Anticipated expiration: 2040-10-27
Also published as: CN112287999B

Abstract

A weak supervision target positioning method utilizing a convolutional neural network to correct gradient relates to the technical field of computer vision. Training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, then specifying the class of a target to be positioned, carrying out reverse transmission on the convolutional neural network to correct gradients, namely reversely transmitting the gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation. The reverse transfer of the convolutional neural network correction gradient comprises correction of the gradient transferred by a full connection layer, a convolutional layer and the like in the network. The generated heat map has clear target outline and high positioning accuracy, and simultaneously can distinguish different types of targets, and the positioned area contains less irrelevant backgrounds. And the method is robust to the model containing the negative value characteristic.

Description

Weak supervision target positioning method utilizing convolutional neural network to correct gradient

Technical Field

The invention relates to the technical field of computer vision, in particular to a weak supervision target positioning method for correcting gradient by using a convolutional neural network.

Background

In the field of computer vision, great success has been achieved in target localization using convolutional neural networks. However, a large category of existing target positioning methods is based on supervised target positioning methods, and such methods require a large amount of labeled data to train a convolutional neural network, where the training data needs to label target categories and position information of targets, and especially, a large amount of manpower and material resource costs are consumed for labeling target position information. Another method is a target positioning method based on weak supervision, for example, a convolutional neural network for classification tasks is obtained by training only using class label information of a target, then an approximate heat map focused on the target is obtained by transforming using internal characteristics of the trained classification network, and finally target positioning is realized. Chinese patent application CN202010405216.1 discloses a fine-grained image weak supervision target positioning method based on deep learning, which directly performs fine-grained semantic alignment between modalities on the pixel level of an image and a word described by a language. And inputting the image into a convolutional neural network to extract a characteristic vector, and simultaneously coding the language description to extract the characteristic vector of the language description. And performing feature matching on the convolution feature map and the language description feature vector, processing the feature matching map to obtain a saliency map of the target, and obtaining a final positioning result according to the feature matching map. Chinese patent application CN201810407386.6 discloses a data enhancement based weak supervision target positioning method, which mainly comprises the following contents: the construction of a reference network, the positioning of a target and the optimization of performance are carried out, for an input picture, a pre-activation residual error network is used for realizing the function of a classification network to serve as the reference network, then the classification network is trained by a network data set, meanwhile, the positioning performance is optimized through data enhancement, small batch processing scale and deep network depth, then a Class Activation Mapping (CAM) algorithm is applied to generate a heat map, and the classification (namely, object label) and positioning (namely, boundary box) results are output by the reference network through controlling the threshold value of the heat map. Currently, the positioning heatmap obtained by the target positioning method based on weak supervision either contains a lot of noises or cannot distinguish different targets, so that the positioning accuracy is far lower than that of the supervised target positioning method.

Disclosure of Invention

The invention aims to suppress noise, improve the discrimination capability of different targets, obtain higher target positioning accuracy, solve the gradient of a convolutional neural network, correct the gradient of each module, generate a high-quality target positioning heat map and realize a high-accuracy target, and provides a weakly supervised target positioning method for correcting the gradient by using the convolutional neural network.

The method comprises the following specific steps:

training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, outputting classification scores of all classes, then manually specifying the classes of targets to be positioned, or obtaining the top m classes as the classes of the targets to be positioned according to the classification scores output by the network, selecting one class of the targets to be positioned each time, carrying out convolutional neural network correction gradient reverse transmission, namely, reversely transmitting gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation.

The convolutional neural network correction gradient reverse transfer comprises the following steps:

1) initializing output layer gradients

According to the selected object class c to be positioned_k(where k is 1,2, …, m), its initial gradient value is set to 1, i.e. it is set to

The initial gradient of the other classes is set to 0, i.e.

Wherein

The gradient of the jth cell of the l +1 th layer is shown.

2) Gradient transfer of fully connected layers

2.1) correcting the gradient of the last full-connection layer in the convolutional neural network, enhancing the weight of the negative connection according to the contribution ratio of the positive connection to the negative connection, wherein the gradient transfer formula is

Wherein w_ijTo connect the weight of the ith cell of the l-th layer with the jth cell of the l + 1-th layer,

indicating that the weight value truncates negative values to 0,

indicating that the weight value truncates a positive value to 0, |, indicates an absolute value operation.

2.2) other fully connected layers are counter-propagated using the original gradient, with the transfer formula

3) Gradient delivery of convolutional layers

Correcting the gradient of the convolution layer, and obtaining a corrected gradient transfer formula by utilizing the proportion of the sum of the output characteristic value and the absolute value of the input characteristic in the convolution receptive field:

wherein the content of the first and second substances,

the ith characteristic of the ith layer is shown,

express get

Symbol of (a), u_ijIs a Boolean variable when

In that

When it is in the receptive field, u_ij1, otherwise u_ij＝0。

4) Correcting the gradients of the batch normalization layer, the local response normalization layer and the average pooling layer with the input features containing negative values, wherein the transfer formula is as follows:

5) the original gradient is used for the other layers to carry out reverse transfer, and the transfer formula is the same as the formula 2.

6) Transferring the corrected gradient to an intermediate feature layer or an input layer, multiplying the gradient by the input feature element by element, and summing in the channel direction to obtain the contribution value of each input to the output, wherein the formula is as follows:

a two-dimensional spatial heat map is obtained according to equation 5, which may be represented as a one-dimensional vector, S ═ S₁,s₂,...,s_n]And n is the number of spatial pixels.

7) Taking a threshold value from the heat map obtained in the step 6), and taking the area higher than the threshold value as the positioning area of the target.

And (4) positioning the next target each time the step 1 to the step 7 are completed, wherein k is k +1, repeating the steps 1) to 7), ending the circulation until k is m +1, and completing positioning of the m types of targets.

The invention provides a weak supervision target positioning method for correcting gradient by using a convolutional neural network. The invention trains a convolutional neural network for classification function on a given data set only containing class labels, firstly transmits the network in the forward direction, then specifies the class of a target to be positioned, performs reverse transmission of the convolutional neural network to correct the gradient, namely reversely transmits the gradient from an output layer to an input layer by layer, and performs corresponding correction operation. The reverse transfer of the convolutional neural network correction gradient in the invention comprises correction of the gradient transferred by a full connection layer, a convolutional layer and the like in the network. The heat map generated by the invention has clear target outline and high acquired positioning precision, and can distinguish different types of targets, and the positioned area contains less irrelevant backgrounds. The heat map generated by the invention has clear target outline and high acquired positioning precision, and can distinguish different types of targets, and the positioned area contains less irrelevant backgrounds. In addition, the method disclosed by the invention has robustness on a model containing a negative value characteristic.

Drawings

FIG. 1 is an overall flow chart of the present invention.

FIG. 2 is a flow chart of the reverse transfer portion of the convolutional neural network corrective gradient of the present invention.

Detailed Description

The following examples will further illustrate the present invention with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention includes the steps of:

training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, outputting classification scores of all classes, then manually specifying the classes of targets to be positioned, or obtaining the top m classes as the classes of the targets to be positioned according to the classification scores output by the network, selecting one class of the targets to be positioned each time, carrying out convolutional neural network correction gradient reverse transmission, namely, reversely transmitting gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation. The gradient size of the output target to the intermediate layer or each dimension variable of the input layer reflects the importance degree of the variable to the output target, and then the variables which are important bases for the classification and prediction of the output target can be found out, so that the positioning area of the target is determined in the spatial dimension. However, the gradient obtained by direct solution is used for positioning, a certain deviation exists, correction operation needs to be performed on a special module in the convolutional neural network, and then the precision based on gradient positioning is improved.

Referring to fig. 2, the convolutional neural network correction gradient reverse transfer part specifically includes the following steps:

1) initializing output layer gradients

The initial gradient of the other classes is set to 0, i.e.

Wherein

The gradient of the jth cell of the l +1 th layer is shown.

2) Gradient transfer of fully connected layers

2.1) correcting the gradient of the last full-connection layer in the convolutional neural network, enhancing the weight of the negative-direction connection according to the contribution ratio of the positive-direction connection to the negative-direction connection, wherein the formula of gradient transfer is as follows

indicating that the weight value truncates negative values to 0,

indicating that the weight value truncates a positive value to 0, |, indicates an absolute value operation. The last fully-connected layer is directly connected with the output layer, and the above correction operation is to improve the target selectivity and better suppress the background irrelevant to the target by enhancing the negative connection.

3) Gradient delivery of convolutional layers

wherein

The ith characteristic of the ith layer is shown,

express get

Symbol of (a), u_ijIs a Boolean variable when

In that

When it is in the receptive field, u_ij1, otherwise u_ij0. Herein, the

Essentially belongs to the process of convolving input features with convolution kernels with all 1 elements. The transmitted correction gradient fully utilizes the information of input and output characteristics, and can more finely position the target. Herein, the

The sign of the gradient delivered will be automatically adjusted according to the sign of the input features in order to make the process of gradient delivery robust to models containing negative-valued features.

4) Correcting the gradients of the batch normalization layer, the local response normalization layer and the average pooling layer with the input features containing negative values by using a transfer formula

6) Transferring the corrected gradient to an intermediate feature layer or an input layer, multiplying the gradient element by the input feature, and summing in the channel direction to obtain the contribution of each input to the output, the formula being

A two-dimensional spatial heat map is obtained according to equation 5, which may be represented as a one-dimensional vector, S ═ S₁,s₂,...,s_n]And n is the number of pixels in the spatial dimension.

7) And taking a threshold value from the heat map obtained in the step 6, and taking the area higher than the threshold value as the positioning area of the target.

And (4) positioning the next target each time step 1 to step 7 are completed, wherein k is equal to k +1, repeating steps 1 to 7 until k is equal to m +1, ending the cycle, and completing positioning of the m types of targets.

Furthermore, according to the requirements of specific tasks in practical application, the positioning area can be directly output in a segmentation form, that is, a segmented mask is output, namely, each pixel of the m category target positioning areas is respectively marked as a numerical sign corresponding to a category, and each pixel of the area outside the positioning area is marked as a numerical sign of the background; or outputting the positioning area in a form of a bounding box, namely outputting positioning coordinates, namely taking m most compact rectangular bounding boxes of the m category target positioning areas respectively and outputting coordinates of vertexes of the bounding boxes.

Claims

1. The weak supervision target positioning method for correcting gradient by using the convolutional neural network is characterized by comprising the following specific steps of:

training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, outputting classification scores of all classes, then manually specifying the classes of targets to be positioned, or obtaining the top m classes as the classes of the targets to be positioned according to the classification scores output by the network, selecting one class of the targets to be positioned each time, carrying out convolutional neural network correction gradient reverse transmission, namely reversely transmitting gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation;

the method for carrying out the reverse transmission of the convolutional neural network correction gradient specifically comprises the following steps:

1) initializing output layer gradients

The initial gradient of the other classes is set to 0, i.e.

Wherein

Represents the gradient of the jth cell of the l +1 th layer;

2) gradient transfer of fully connected layers

indicating that the weight value truncates negative values to 0,

the weighted value is expressed to cut off the positive value to 0, | represents absolute value operation;

3) Gradient delivery of convolutional layers

wherein the content of the first and second substances,

the ith characteristic of the ith layer is shown,

express get

Symbol of (a), u_ijIs a Boolean variable when

In that

When it is in the receptive field, u_ij1, otherwise u_ij＝0；

5) the original gradient is used for carrying out reverse transfer on other layers, and the transfer formula is the same as the formula 2;

a two-dimensional spatial heat map is obtained according to equation 5, and is represented by a one-dimensional vector as S ═ S₁,s₂,...,s_n]N is the number of spatial pixels;

7) taking a threshold value from the heat map obtained in the step 6), and taking the area higher than the threshold value as the positioning area of the target;