CN112287999A - Weak supervision target positioning method utilizing convolutional neural network to correct gradient - Google Patents

Weak supervision target positioning method utilizing convolutional neural network to correct gradient Download PDF

Info

Publication number
CN112287999A
CN112287999A CN202011166826.7A CN202011166826A CN112287999A CN 112287999 A CN112287999 A CN 112287999A CN 202011166826 A CN202011166826 A CN 202011166826A CN 112287999 A CN112287999 A CN 112287999A
Authority
CN
China
Prior art keywords
gradient
layer
neural network
convolutional neural
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011166826.7A
Other languages
Chinese (zh)
Other versions
CN112287999B (en
Inventor
王菡子
程林
张辽
梁艳杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202011166826.7A priority Critical patent/CN112287999B/en
Publication of CN112287999A publication Critical patent/CN112287999A/en
Application granted granted Critical
Publication of CN112287999B publication Critical patent/CN112287999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

A weak supervision target positioning method utilizing a convolutional neural network to correct gradient relates to the technical field of computer vision. Training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, then specifying the class of a target to be positioned, carrying out reverse transmission on the convolutional neural network to correct gradients, namely reversely transmitting the gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation. The reverse transfer of the convolutional neural network correction gradient comprises correction of the gradient transferred by a full connection layer, a convolutional layer and the like in the network. The generated heat map has clear target outline and high positioning accuracy, and simultaneously can distinguish different types of targets, and the positioned area contains less irrelevant backgrounds. And the method is robust to the model containing the negative value characteristic.

Description

Weak supervision target positioning method utilizing convolutional neural network to correct gradient
Technical Field
The invention relates to the technical field of computer vision, in particular to a weak supervision target positioning method for correcting gradient by using a convolutional neural network.
Background
In the field of computer vision, great success has been achieved in target localization using convolutional neural networks. However, a large category of existing target positioning methods is based on supervised target positioning methods, and such methods require a large amount of labeled data to train a convolutional neural network, where the training data needs to label target categories and position information of targets, and especially, a large amount of manpower and material resource costs are consumed for labeling target position information. Another method is a target positioning method based on weak supervision, for example, a convolutional neural network for classification tasks is obtained by training only using class label information of a target, then an approximate heat map focused on the target is obtained by transforming using internal characteristics of the trained classification network, and finally target positioning is realized. Chinese patent application CN202010405216.1 discloses a fine-grained image weak supervision target positioning method based on deep learning, which directly performs fine-grained semantic alignment between modalities on the pixel level of an image and a word described by a language. And inputting the image into a convolutional neural network to extract a characteristic vector, and simultaneously coding the language description to extract the characteristic vector of the language description. And performing feature matching on the convolution feature map and the language description feature vector, processing the feature matching map to obtain a saliency map of the target, and obtaining a final positioning result according to the feature matching map. Chinese patent application CN201810407386.6 discloses a data enhancement based weak supervision target positioning method, which mainly comprises the following contents: the construction of a reference network, the positioning of a target and the optimization of performance are carried out, for an input picture, a pre-activation residual error network is used for realizing the function of a classification network to serve as the reference network, then the classification network is trained by a network data set, meanwhile, the positioning performance is optimized through data enhancement, small batch processing scale and deep network depth, then a Class Activation Mapping (CAM) algorithm is applied to generate a heat map, and the classification (namely, object label) and positioning (namely, boundary box) results are output by the reference network through controlling the threshold value of the heat map. Currently, the positioning heatmap obtained by the target positioning method based on weak supervision either contains a lot of noises or cannot distinguish different targets, so that the positioning accuracy is far lower than that of the supervised target positioning method.
Disclosure of Invention
The invention aims to suppress noise, improve the discrimination capability of different targets, obtain higher target positioning accuracy, solve the gradient of a convolutional neural network, correct the gradient of each module, generate a high-quality target positioning heat map and realize a high-accuracy target, and provides a weakly supervised target positioning method for correcting the gradient by using the convolutional neural network.
The method comprises the following specific steps:
training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, outputting classification scores of all classes, then manually specifying the classes of targets to be positioned, or obtaining the top m classes as the classes of the targets to be positioned according to the classification scores output by the network, selecting one class of the targets to be positioned each time, carrying out convolutional neural network correction gradient reverse transmission, namely, reversely transmitting gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation.
The convolutional neural network correction gradient reverse transfer comprises the following steps:
1) initializing output layer gradients
According to the selected object class c to be positionedk(where k is 1,2, …, m), its initial gradient value is set to 1, i.e. it is set to
Figure BDA0002746078600000021
The initial gradient of the other classes is set to 0, i.e.
Figure BDA0002746078600000022
Wherein
Figure BDA0002746078600000023
The gradient of the jth cell of the l +1 th layer is shown.
2) Gradient transfer of fully connected layers
2.1) correcting the gradient of the last full-connection layer in the convolutional neural network, enhancing the weight of the negative connection according to the contribution ratio of the positive connection to the negative connection, wherein the gradient transfer formula is
Figure BDA0002746078600000024
Wherein wijTo connect the weight of the ith cell of the l-th layer with the jth cell of the l + 1-th layer,
Figure BDA0002746078600000025
indicating that the weight value truncates negative values to 0,
Figure BDA0002746078600000026
indicating that the weight value truncates a positive value to 0, |, indicates an absolute value operation.
2.2) other fully connected layers are counter-propagated using the original gradient, with the transfer formula
Figure BDA0002746078600000027
3) Gradient delivery of convolutional layers
Correcting the gradient of the convolution layer, and obtaining a corrected gradient transfer formula by utilizing the proportion of the sum of the output characteristic value and the absolute value of the input characteristic in the convolution receptive field:
Figure BDA0002746078600000028
wherein the content of the first and second substances,
Figure BDA00027460786000000213
the ith characteristic of the ith layer is shown,
Figure BDA0002746078600000029
express get
Figure BDA00027460786000000210
Symbol of (a), uijIs a Boolean variable when
Figure BDA00027460786000000211
In that
Figure BDA00027460786000000212
When it is in the receptive field, uij1, otherwise uij=0。
4) Correcting the gradients of the batch normalization layer, the local response normalization layer and the average pooling layer with the input features containing negative values, wherein the transfer formula is as follows:
Figure BDA0002746078600000031
5) the original gradient is used for the other layers to carry out reverse transfer, and the transfer formula is the same as the formula 2.
6) Transferring the corrected gradient to an intermediate feature layer or an input layer, multiplying the gradient by the input feature element by element, and summing in the channel direction to obtain the contribution value of each input to the output, wherein the formula is as follows:
Figure BDA0002746078600000032
a two-dimensional spatial heat map is obtained according to equation 5, which may be represented as a one-dimensional vector, S ═ S1,s2,...,sn]And n is the number of spatial pixels.
7) Taking a threshold value from the heat map obtained in the step 6), and taking the area higher than the threshold value as the positioning area of the target.
And (4) positioning the next target each time the step 1 to the step 7 are completed, wherein k is k +1, repeating the steps 1) to 7), ending the circulation until k is m +1, and completing positioning of the m types of targets.
The invention provides a weak supervision target positioning method for correcting gradient by using a convolutional neural network. The invention trains a convolutional neural network for classification function on a given data set only containing class labels, firstly transmits the network in the forward direction, then specifies the class of a target to be positioned, performs reverse transmission of the convolutional neural network to correct the gradient, namely reversely transmits the gradient from an output layer to an input layer by layer, and performs corresponding correction operation. The reverse transfer of the convolutional neural network correction gradient in the invention comprises correction of the gradient transferred by a full connection layer, a convolutional layer and the like in the network. The heat map generated by the invention has clear target outline and high acquired positioning precision, and can distinguish different types of targets, and the positioned area contains less irrelevant backgrounds. The heat map generated by the invention has clear target outline and high acquired positioning precision, and can distinguish different types of targets, and the positioned area contains less irrelevant backgrounds. In addition, the method disclosed by the invention has robustness on a model containing a negative value characteristic.
Drawings
FIG. 1 is an overall flow chart of the present invention.
FIG. 2 is a flow chart of the reverse transfer portion of the convolutional neural network corrective gradient of the present invention.
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention includes the steps of:
training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, outputting classification scores of all classes, then manually specifying the classes of targets to be positioned, or obtaining the top m classes as the classes of the targets to be positioned according to the classification scores output by the network, selecting one class of the targets to be positioned each time, carrying out convolutional neural network correction gradient reverse transmission, namely, reversely transmitting gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation. The gradient size of the output target to the intermediate layer or each dimension variable of the input layer reflects the importance degree of the variable to the output target, and then the variables which are important bases for the classification and prediction of the output target can be found out, so that the positioning area of the target is determined in the spatial dimension. However, the gradient obtained by direct solution is used for positioning, a certain deviation exists, correction operation needs to be performed on a special module in the convolutional neural network, and then the precision based on gradient positioning is improved.
Referring to fig. 2, the convolutional neural network correction gradient reverse transfer part specifically includes the following steps:
1) initializing output layer gradients
According to the selected object class c to be positionedk(where k is 1,2, …, m), its initial gradient value is set to 1, i.e. it is set to
Figure BDA0002746078600000041
The initial gradient of the other classes is set to 0, i.e.
Figure BDA0002746078600000042
Wherein
Figure BDA0002746078600000043
The gradient of the jth cell of the l +1 th layer is shown.
2) Gradient transfer of fully connected layers
2.1) correcting the gradient of the last full-connection layer in the convolutional neural network, enhancing the weight of the negative-direction connection according to the contribution ratio of the positive-direction connection to the negative-direction connection, wherein the formula of gradient transfer is as follows
Figure BDA0002746078600000044
Wherein wijTo connect the weight of the ith cell of the l-th layer with the jth cell of the l + 1-th layer,
Figure BDA0002746078600000045
indicating that the weight value truncates negative values to 0,
Figure BDA0002746078600000046
indicating that the weight value truncates a positive value to 0, |, indicates an absolute value operation. The last fully-connected layer is directly connected with the output layer, and the above correction operation is to improve the target selectivity and better suppress the background irrelevant to the target by enhancing the negative connection.
2.2) other fully connected layers are counter-propagated using the original gradient, with the transfer formula
Figure BDA0002746078600000047
3) Gradient delivery of convolutional layers
Correcting the gradient of the convolution layer, and obtaining a corrected gradient transfer formula by utilizing the proportion of the sum of the output characteristic value and the absolute value of the input characteristic in the convolution receptive field:
Figure BDA0002746078600000051
wherein
Figure BDA0002746078600000052
The ith characteristic of the ith layer is shown,
Figure BDA0002746078600000053
express get
Figure BDA0002746078600000054
Symbol of (a), uijIs a Boolean variable when
Figure BDA0002746078600000055
In that
Figure BDA0002746078600000056
When it is in the receptive field, uij1, otherwise uij0. Herein, the
Figure BDA0002746078600000057
Essentially belongs to the process of convolving input features with convolution kernels with all 1 elements. The transmitted correction gradient fully utilizes the information of input and output characteristics, and can more finely position the target. Herein, the
Figure BDA0002746078600000058
The sign of the gradient delivered will be automatically adjusted according to the sign of the input features in order to make the process of gradient delivery robust to models containing negative-valued features.
4) Correcting the gradients of the batch normalization layer, the local response normalization layer and the average pooling layer with the input features containing negative values by using a transfer formula
Figure BDA0002746078600000059
5) The original gradient is used for the other layers to carry out reverse transfer, and the transfer formula is the same as the formula 2.
6) Transferring the corrected gradient to an intermediate feature layer or an input layer, multiplying the gradient element by the input feature, and summing in the channel direction to obtain the contribution of each input to the output, the formula being
Figure BDA00027460786000000510
A two-dimensional spatial heat map is obtained according to equation 5, which may be represented as a one-dimensional vector, S ═ S1,s2,...,sn]And n is the number of pixels in the spatial dimension.
7) And taking a threshold value from the heat map obtained in the step 6, and taking the area higher than the threshold value as the positioning area of the target.
And (4) positioning the next target each time step 1 to step 7 are completed, wherein k is equal to k +1, repeating steps 1 to 7 until k is equal to m +1, ending the cycle, and completing positioning of the m types of targets.
Furthermore, according to the requirements of specific tasks in practical application, the positioning area can be directly output in a segmentation form, that is, a segmented mask is output, namely, each pixel of the m category target positioning areas is respectively marked as a numerical sign corresponding to a category, and each pixel of the area outside the positioning area is marked as a numerical sign of the background; or outputting the positioning area in a form of a bounding box, namely outputting positioning coordinates, namely taking m most compact rectangular bounding boxes of the m category target positioning areas respectively and outputting coordinates of vertexes of the bounding boxes.

Claims (1)

1. The weak supervision target positioning method for correcting gradient by using the convolutional neural network is characterized by comprising the following specific steps of:
training a convolutional neural network for classification function on a given data set only containing class labels, firstly carrying out forward transmission on the network, outputting classification scores of all classes, then manually specifying the classes of targets to be positioned, or obtaining the top m classes as the classes of the targets to be positioned according to the classification scores output by the network, selecting one class of the targets to be positioned each time, carrying out convolutional neural network correction gradient reverse transmission, namely reversely transmitting gradients from an output layer to an input layer by layer, and carrying out corresponding correction operation;
the method for carrying out the reverse transmission of the convolutional neural network correction gradient specifically comprises the following steps:
1) initializing output layer gradients
According to the selected object class c to be positionedk(where k is 1,2, …, m), its initial gradient value is set to 1, i.e. it is set to
Figure FDA0002746078590000011
The initial gradient of the other classes is set to 0, i.e.
Figure FDA0002746078590000012
Wherein
Figure FDA0002746078590000013
Represents the gradient of the jth cell of the l +1 th layer;
2) gradient transfer of fully connected layers
2.1) correcting the gradient of the last full-connection layer in the convolutional neural network, enhancing the weight of the negative connection according to the contribution ratio of the positive connection to the negative connection, wherein the gradient transfer formula is
Figure FDA0002746078590000014
Wherein wijTo connect the weight of the ith cell of the l-th layer with the jth cell of the l + 1-th layer,
Figure FDA0002746078590000015
indicating that the weight value truncates negative values to 0,
Figure FDA0002746078590000016
the weighted value is expressed to cut off the positive value to 0, | represents absolute value operation;
2.2) other fully connected layers are counter-propagated using the original gradient, with the transfer formula
Figure FDA0002746078590000017
3) Gradient delivery of convolutional layers
Correcting the gradient of the convolution layer, and obtaining a corrected gradient transfer formula by utilizing the proportion of the sum of the output characteristic value and the absolute value of the input characteristic in the convolution receptive field:
Figure FDA0002746078590000018
wherein the content of the first and second substances,
Figure FDA0002746078590000019
the ith characteristic of the ith layer is shown,
Figure FDA00027460785900000110
express get
Figure FDA00027460785900000111
Symbol of (a), uijIs a Boolean variable when
Figure FDA00027460785900000112
In that
Figure FDA00027460785900000113
When it is in the receptive field, uij1, otherwise uij=0;
4) Correcting the gradients of the batch normalization layer, the local response normalization layer and the average pooling layer with the input features containing negative values, wherein the transfer formula is as follows:
Figure FDA0002746078590000021
5) the original gradient is used for carrying out reverse transfer on other layers, and the transfer formula is the same as the formula 2;
6) transferring the corrected gradient to an intermediate feature layer or an input layer, multiplying the gradient by the input feature element by element, and summing in the channel direction to obtain the contribution value of each input to the output, wherein the formula is as follows:
Figure FDA0002746078590000022
a two-dimensional spatial heat map is obtained according to equation 5, and is represented by a one-dimensional vector as S ═ S1,s2,...,sn]N is the number of spatial pixels;
7) taking a threshold value from the heat map obtained in the step 6), and taking the area higher than the threshold value as the positioning area of the target;
and (4) positioning the next target each time the step 1 to the step 7 are completed, wherein k is k +1, repeating the steps 1) to 7), ending the circulation until k is m +1, and completing positioning of the m types of targets.
CN202011166826.7A 2020-10-27 2020-10-27 Weak supervision target positioning method for correcting gradient by using convolutional neural network Active CN112287999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011166826.7A CN112287999B (en) 2020-10-27 2020-10-27 Weak supervision target positioning method for correcting gradient by using convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011166826.7A CN112287999B (en) 2020-10-27 2020-10-27 Weak supervision target positioning method for correcting gradient by using convolutional neural network

Publications (2)

Publication Number Publication Date
CN112287999A true CN112287999A (en) 2021-01-29
CN112287999B CN112287999B (en) 2022-06-14

Family

ID=74372599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011166826.7A Active CN112287999B (en) 2020-10-27 2020-10-27 Weak supervision target positioning method for correcting gradient by using convolutional neural network

Country Status (1)

Country Link
CN (1) CN112287999B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009679A (en) * 2019-02-28 2019-07-12 江南大学 A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks
CN110689081A (en) * 2019-09-30 2020-01-14 中国科学院大学 Weak supervision target classification and positioning method based on bifurcation learning
CN110717534A (en) * 2019-09-30 2020-01-21 中国科学院大学 Target classification and positioning method based on network supervision
CN111008630A (en) * 2019-12-18 2020-04-14 郑州大学 Target positioning method based on weak supervised learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009679A (en) * 2019-02-28 2019-07-12 江南大学 A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks
CN110689081A (en) * 2019-09-30 2020-01-14 中国科学院大学 Weak supervision target classification and positioning method based on bifurcation learning
CN110717534A (en) * 2019-09-30 2020-01-21 中国科学院大学 Target classification and positioning method based on network supervision
CN111008630A (en) * 2019-12-18 2020-04-14 郑州大学 Target positioning method based on weak supervised learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANQING FENG ET AL.: "Weakly-Supervised Learning of a Deep Convolutional Neural Networks for Semantic Segmentation", 《IEEE ACCESS》 *
ZHANG, LIAO ET AL.: "Learning Object Scale With Click Supervision for Object Detection", 《IEEE SIGNAL PROCESSING LETTERS》 *
周以鹏 等: "基于多尺度特征卷积神经网络的目标定位", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN112287999B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN109816725B (en) Monocular camera object pose estimation method and device based on deep learning
WO2021244079A1 (en) Method for detecting image target in smart home environment
CN109886121B (en) Human face key point positioning method for shielding robustness
CN111191583B (en) Space target recognition system and method based on convolutional neural network
CN108335303B (en) Multi-scale palm skeleton segmentation method applied to palm X-ray film
Lin et al. STAN: A sequential transformation attention-based network for scene text recognition
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN111583263A (en) Point cloud segmentation method based on joint dynamic graph convolution
CN112446423B (en) Fast hybrid high-order attention domain confrontation network method based on transfer learning
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN111783772A (en) Grabbing detection method based on RP-ResNet network
CN113705769A (en) Neural network training method and device
CN110766041A (en) Deep learning-based pest detection method
CN113743417B (en) Semantic segmentation method and semantic segmentation device
CN108230330B (en) Method for quickly segmenting highway pavement and positioning camera
CN114359622A (en) Image classification method based on convolution neural network-converter hybrid architecture
CN113989340A (en) Point cloud registration method based on distribution
Pérez-Villar et al. Spacecraft pose estimation based on unsupervised domain adaptation and on a 3d-guided loss combination
CN117253044A (en) Farmland remote sensing image segmentation method based on semi-supervised interactive learning
CN112287999B (en) Weak supervision target positioning method for correcting gradient by using convolutional neural network
CN116958700A (en) Image classification method based on prompt engineering and contrast learning
Lv et al. Image semantic segmentation method based on atrous algorithm and convolution CRF
CN112784800B (en) Face key point detection method based on neural network and shape constraint
US11328179B2 (en) Information processing apparatus and information processing method
CN113409351A (en) Unsupervised field self-adaptive remote sensing image segmentation method based on optimal transmission

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant