CN112364747A - Target detection method under limited sample - Google Patents
Target detection method under limited sample Download PDFInfo
- Publication number
- CN112364747A CN112364747A CN202011219061.9A CN202011219061A CN112364747A CN 112364747 A CN112364747 A CN 112364747A CN 202011219061 A CN202011219061 A CN 202011219061A CN 112364747 A CN112364747 A CN 112364747A
- Authority
- CN
- China
- Prior art keywords
- node
- edge
- network
- representing
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 27
- 238000013527 convolutional neural network Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 9
- 230000001629 suppression Effects 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 7
- 238000004220 aggregation Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000012216 screening Methods 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a target detection method under a limited sample, which mainly comprises the following steps that firstly, picture samples in a new class are input into a backbone network for extraction of detection target characteristics and regression of a boundary frame, a two-classification task of whether a target object exists or not is carried out on an obtained boundary candidate frame, the boundary frame which obviously does not comprise the detection target is removed, and screening is carried out according to classification scores to obtain a candidate recommendation area of the sample; and secondly, forming a full-connected graph by the obtained convolution characteristics corresponding to the candidate regions, and processing the graph structure by using a trained neural network to obtain a class label of each candidate region. The method is universal in the field of few samples and has wide potential application.
Description
Technical Field
The invention relates to the field of image detection and calculation, in particular to a target detection method under a limited sample.
Background
In the past few years, the deep learning algorithm based on the convolutional neural network has achieved remarkable performance in the field of target detection, and the success of the deep learning algorithm depends on a large number of target detection data sets with complete and accurate frame annotations. In practical applications, data with a complete annotation tag may be limited for a given target detection task. When data is scarce, the convolutional neural network can be severely over-fitted and cannot be generalized, and the capacity of the detector is limited. In contrast, humans exhibit a powerful ability to do this task: children can learn to identify new categories in a few pictures. Data such as medical images, endangered animals, etc. lack examples or it is difficult to obtain complete accurate data, so the ability to learn to detect objects from a small number of samples is needed for computer vision.
Since in the real world, the target object has large differences in illumination, shape, texture, etc., it is a challenge to detect few samples. Some progress has been made in current studies on few sample learning, but these methods focus on image classification and involve few target detection problems. For low-sample detection, the core problem is how to locate the target object in a cluttered background by a small number of sample studies. In this task, our goal is to challenge the problem of low-sample object detection, as shown in fig. 1, and in particular, given some base classes that have enough samples with label annotations and new classes that have only a small amount of data with labels, the object obtains a model that can detect both the new classes and the base classes. To date, few methods are available. Recently, meta-learning provides a reliable solution to similar problems, such as the low-sample classification problem. However, the difficulty of target detection is much greater, and it not only involves classification prediction of the target, but also involves positioning of the target, so the existing few-sample classification method cannot be directly applied to the problem of few-sample detection. Taking the matching network and prototype network as an example, it is unclear how to construct the target prototype for matching and positioning since there may be dispersed objects of irrelevant class in the image, or no target object at all.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: how to rapidly and accurately detect the target under the condition of few samples.
In order to solve the technical problems, the invention adopts the following technical scheme: a target detection method under a limited sample comprises the following steps:
s100: inputting the labeled samples in all the new classes into a trunk neural network, and extracting the characteristics of the detection target;
inputting the labeled samples in all the new classes into a regional recommendation network, wherein the regional recommendation network adopts a bounding box returning part in an SSD and adds a two-classification task with or without a target object behind the bounding box returning part, and a plurality of candidate recommendation regions are obtained for each labeled sample through the regional recommendation network;
performing secondary classification processing on all the boundary frames with the label samples to remove the boundary frames obviously not including the detection target so as to obtain all candidate regions of each label sample;
s200: constructing each candidate region with a label sample and the corresponding detection target feature obtained in the step S100 into a complete graph, wherein each node in the complete graph represents the feature corresponding to each recommended region, and each edge represents the probability that two connected nodes belong to the same class;
s300: then the complete graph obtained in the step S200 is used as the input of the convolutional neural network, the node features of the complete graph are formed by the convolutional features of the trunk network in the step S100 corresponding to the candidate regions, and the class relationship among the candidate recommendation regions is used as the edge features of the complete graph;
obtaining the predicted values of the node characteristics and the edge characteristics through multiple times of node characteristic updating and edge characteristic updating;
after N iterations, the predicted value and the true value are used for calculating loss, gradient feedback is carried out, network parameters are updated, when the calculated loss is larger than a threshold value, the iteration times are reset, the network is continuously updated until the loss is not larger than the threshold value, and a trained network model is obtained;
s400: the method comprises the steps of obtaining features of a target to be detected and all candidate areas to be detected by using an image to be detected through the S100 method, obtaining a complete image to be detected through the S200 method, inputting the complete image to be detected into a network model trained through the S300 method, outputting all node features and corresponding edge features of the image to be detected, conducting softmax on each query node and supporting nodes of each category and the edge features of each query node and all supporting nodes, obtaining the probability that the query node belongs to the category, and taking the category with the highest probability as the category corresponding to the target to be detected in the candidate recommended area to be detected.
Preferably, the area recommendation network in S100 is trained by the following method:
s110, training the regional recommendation network on a base class data set in a scene learning mode;
s120: the area recommendation network trained on the base class data set in the step S110 is trained again on a new class data set with a small amount of labeled data;
a loss function L used in the training processes of S110 and S120totalComprises the following steps:
Ltotal=Lmain+λBDLBD
where L2 regularization is used to penalize FBDActivation of (2):
LBD=||FBD||2
Lmain=Lreg+Lcls
said LBDRepresenting background suppression regularization, FBDRepresenting a characteristic region corresponding to the background of the image, said LregRepresents the regression loss, L, of the target bounding boxclsRepresenting a loss of binary class, λ, of a non-target objectBDRepresenting the weighting coefficients of the background suppression regularization.
Preferably, in S200, the method for processing all candidate regions and corresponding detection target features of each labeled sample into a complete graph includes: and taking the target feature of each category of the new category as a support node, taking the feature of a detection target corresponding to a candidate recommendation area obtained through a regional recommendation network as a query node, determining edge features between the nodes according to the categories between the nodes, wherein the edge feature value between the support nodes belonging to the same category is 1, and the edge feature value between the support nodes not belonging to the same category is 0.
Preferably, the training process of the S300 convolutional neural network is as follows:
s310: let G denote the complete graph, viAnd eijRespectively representing the ith node feature in the node feature set and the edge feature between the ith node and the jth node in the edge feature set, and the true value y of each edge labelijDefined by the true value of the node feature:
wherein, yiClass label, y, representing the ith nodejA category label representing the jth node;
each edge feature is a two-dimensional feature vector eij∈[0,1]2The node features are initialized by the middle-level features of the recommended region, and each edge feature is initialized by the edge label in the following way:
wherein the content of the first and second substances,representing a similarity relationship between two nodes.
S320: the convolutional neural network is composed of L layers, forward propagation is composed of alternate edge feature update and node feature update, and the node features of L-1 layers are givenAnd edge characteristicsFirstly, updating node characteristics according to a field aggregation process, performing characteristic conversion on the obtained aggregated characteristics by aggregating the characteristics of other nodes and edge characteristics in proportion, and updating the node characteristics of the layer;
wherein the content of the first and second substances, a representation of a feature transformation network is shown,andrespectively representing similarity relation and dissimilarity relation between the l-1 level nodes i and j,representing the node characteristics of level l-1 node j,a parameter representing a feature transformation network of layer l;
s330: the edge feature updating is based on the updated node features, the node similarity scores between any pair of nodes are obtained again, and each edge feature is updated by combining the previous edge feature value and the updated node similarity score;
wherein the content of the first and second substances,to measure the network, a similarity score is calculated,a parameter representing a metric network used to compute the similarity score,representing the similarity scores of the ith node and the jth node at the l-1 level,representing the dissimilarity score of the ith node and the jth node at the l-1 level,a node characteristic at level l representing a kth node;
s340: edge prediction labels are ultimately obtained from edge features, i.e.Each node ViClassification can be done by simple weighted voting on the support node-related edge features of known class information added when constructing the full graph,simple weighted voting is to sum edge features of the support nodes belonging to the category and the query nodes, obtain normalized probability of the query nodes belonging to the category through softmax, select the category with the highest probability from all the categories, and obtain a final category label; the edge label prediction probability is defined as:
wherein, CkRepresenting the kth class, T representing the classification task for a given full graph,representing the probability that the ith node belongs to the kth class.
Preferably, the loss function in the training process of the S300 convolutional neural network is as follows:
wherein, Ym,eRepresenting the true values corresponding to all edge labels,indicating the predicted value of the l layer of the network under the m task of all the edge labels.
Compared with the prior art, the invention has at least the following advantages:
in the invention, a new target detector with less samples based on graph convolution is provided to solve the target detection problem under the condition of less samples. Firstly, the advantages of a traditional target detection framework SSD are fully utilized, background suppression regularization is introduced, and fine adjustment difficulty of few-sample detection is reduced. Secondly, a complete graph is constructed for the proposed candidate region, and the data of the graph structure is processed in a graph convolution mode to obtain a final detection result. And a scene learning mode is adopted on the two types of data sets, so that a few-sample learning task is simulated, and the few-sample learning capability of the model is fully improved. In the work that follows, the correctness and rationality of the proposed method will be demonstrated by more detailed, more thorough experiments.
Drawings
Fig. 1 shows target detection in the case of a small number of samples.
FIG. 2 is an overall block diagram of the method of the present invention.
Fig. 3 is a detector based on graph convolution.
Fig. 4 is a network structure of a node feature transformation network and a node similarity metric.
Detailed Description
The method of the present invention is described in further detail below with reference to the accompanying drawings.
Given a support image S with target objects and a query image Q which may contain target objects, the task is to find all target objects belonging to the support category in the query image and mark them with a tight border. If the support set contains N categories, each of which contains K instances, one such problem is referred to as N-way K-shot detection.
A few sample target detection setting is defined in which there are two types of data available for training, namely the base class and the new class. For the base class, there is a large amount of annotation data available, while the new class provides only a few labeled examples. Our goal is to learn the way to detect new objects by using knowledge in the base class, with both the base class and the new class existing.
Such a low-sample target detection setup is useful because it is well suited to practical situations where one may wish to deploy a pre-trained detector for a new class with only a few labeled examples. More specifically, large scale target detection datasets (e.g., PSACAL VOC, MSCOCO) can be used to pre-train the detection model. However, the number of target object classes is quite limited, especially compared to the huge object classes in the real world. Therefore, it is imperative to solve this problem of target detection with few samples.
Example (b): referring to fig. 2-4, a method for detecting a target under a limited sample includes the following steps:
s100: and inputting the labeled samples in all the new classes into a backbone neural network, removing a rear full-connection layer by using the backbone neural network through a classical classification network VGG16, and extracting the features of the input image, wherein the features mainly comprise contour features, texture features and color features.
Inputting the labeled samples in all new classes into a regional recommendation network, adopting a boundary box regression part in an SSD (single network multi-box detector) and then adding a two-classification task with or without a target object, and training by adopting the following training method provided by the invention to obtain a plurality of boundary box recommendation regions for each labeled sample.
The regional recommendation network is trained by adopting the following method:
and S110, training the regional recommendation network on the base class data set in a scene learning mode, wherein the training mode of the scene learning belongs to the prior art, and the training mode can simulate a learning task with few samples, so that the difficulty in fine adjustment can be reduced, and the learning capacity of the few samples can be improved.
S120: training the area recommendation network trained on the base class data set by the S110 on a new class data set with a small amount of labeled data; the training of this step is actually fine tuning.
A loss function L used in the training processes of S110 and S120totalComprises the following steps:
Ltotal=Lmain+λBDLBD
where L2 regularization is used to penalize FBDActivation of (2):
LBD=||FBD||2
Lmain=Lreg+Lcls
said LBDRepresenting background suppression regularization, FBDRepresentation and imageCharacteristic region corresponding to background, LregRepresents the regression loss, L, of the target bounding boxclsRepresenting a loss of binary class, λ, of a non-target objectBDRepresenting the weighting coefficients of the background suppression regularization.
In the training stage of the regional recommendation network base class, the loss function mainly comprises two parts, wherein one part is regression loss of a target boundary box and two classification losses of whether a target object exists or not:
Lmain=Lreg+Lcls
the regression loss of the bounding box adopts the same loss function in the SSD, the common classification loss, namely the binary cross entropy loss is adopted for the target object, and the sum of the two parts is used as the loss function in the training stage of the regional recommendation network.
In order to further enhance the detection capability of few samples in a new class, a new regularization mode is adopted, and the background is inhibited and regularized LBDThe regularization method is adopted for training, so that the interference of complex background information on the positioning performance can be reduced. Background suppression (BD) regularization is performed by using knowledge of objects on the new class, i.e., the true bounding box in the training image. Specifically, for training images in the new class, we first generate a convolved feature cube from the middle convolutional layer of the backbone network. Then, I mask this convolution cube with the real bounding box of all objects in the image. Thus, we can identify the feature region corresponding to the image background, i.e., FBD. To suppress background interference, we penalize F using L2 regularizationBDActivation of (2):
LBD=||FBD||2。
by means of the background suppression regularization, the model can focus more on the region corresponding to the target object while suppressing the background region, which is particularly important for few-sample learning. The total loss function of the area recommendation network new class training stage is as follows:
Ltotal=Lmain+λBDLBD。
and performing two-classification processing on all the boundary frames of each labeled sample, wherein the two-classification processing in the invention is the prior art, and removing the boundary frames obviously not including the detection target to obtain all the candidate areas of each labeled sample.
S200: and constructing each candidate region with the label sample and the corresponding detection target feature obtained in the step S100 into a complete graph, wherein each node in the complete graph represents the feature corresponding to each recommended region, and each edge represents the probability that two connected nodes belong to the same class.
The method comprises the following steps of processing all candidate areas and corresponding detection target features of each labeled sample into a complete graph, taking the target features of each category of a new category as support nodes, taking the features of the detection target corresponding to a candidate recommendation area obtained through a regional recommendation network as query nodes, determining edge features between the nodes according to the categories, setting edge feature values between the support nodes belonging to the same category as 1, setting edge feature values between the support nodes not belonging to the same category as 0, initializing edges between the query nodes and the support nodes as 0.5.
S300: then the complete graph obtained in the step S200 is used as the input of the convolutional neural network, the node features of the complete graph are formed by the convolutional features of the trunk network in the step S100 corresponding to the candidate regions, and the class relationship among the candidate recommendation regions is used as the edge features of the complete graph; intra-cluster similarity and inter-cluster variability are directly exploited.
Obtaining the predicted values of the node characteristics and the edge characteristics through multiple times of node characteristic updating and edge characteristic updating; each updating is to update the node features and the edge features in the complete graph, so that a new complete graph can be formed, and the edge features between the query node and the support nodes in the complete graph updated each time represent the probability that the query node and the support nodes belong to the same class;
when fine tuning is performed on new data, a new regularization method is introduced in the feature extraction stage, activation of background features is inhibited, and the difficulty of fine tuning is reduced.
After N iterations, the predicted value and the true value are used for calculating loss, gradient feedback is carried out, network parameters are updated, when the calculated loss is larger than a threshold value, the iteration times are reset, the network is continuously updated until the loss is not larger than the threshold value, and a trained network model is obtained; the training process of the convolutional neural network is as follows:
s310: let G denote the complete graph, viAnd eijRespectively representing the ith node feature in the node feature set and the edge feature between the ith node and the jth node in the edge feature set, and the true value y of each edge labelijDefined by the true value of the node feature:
wherein, yiIndicating a class label, y, representing the ith nodejA category label representing the ith node.
Each edge feature is a two-dimensional feature vector eij∈[0,1]2The strength of normalized intra-class and inter-class relationships between two connected nodes is shown, so that intra-cluster similarity and inter-cluster dissimilarity can be fully utilized. The node features are initialized by the middle layer features of the recommended area, the middle layer features are convolution features output by the convolution layer at the middle position of the backbone network, and each edge feature is initialized by the edge label according to the following mode:
wherein the content of the first and second substances,the representation represents a similarity relationship between two nodes;
s320: the convolutional neural network is composed of L layers, forward propagation is composed of alternate edge feature update and node feature update, and the node features of L-1 layers are givenAnd edge characteristicsFirstly, updating node characteristics according to a field aggregation process, performing characteristic conversion on the obtained aggregated characteristics by aggregating the characteristics of other nodes and edge characteristics in proportion, and updating the node characteristics of the layer; the feature conversion network is composed of a multi-layer perceptron network, and belongs to the prior art.
wherein the content of the first and second substances, a representation of a feature transformation network is shown,andrespectively representing similarity relation and dissimilarity relation between the l-1 level nodes i and j,representing the node characteristics of level l-1 node j,a parameter representing a feature transformation network of layer l;
the method not only considers intra-class aggregation but also considers inter-class aggregation, and makes full use of the dissimilarity neighbor information and the similar neighbor information provided by the target node.
S330: the edge feature updating is based on the updated node features, the node similarity scores between any pair of nodes are obtained again, and each edge feature is updated by combining the previous edge feature value and the updated node similarity score;
wherein the content of the first and second substances,to measure the network, a similarity score is calculated so that the node features flow into the edge features and each element of the edge features is updated separately from each normalized intra-class similarity and inter-class dissimilarity. That is, each edge feature update takes into account not only the relationship of the corresponding node pair, but also the relationship of other node pairs. We can choose to use two separate measurement networks to compute the similarity or dissimilarity of node pairs.A parameter representing a metric network used to compute the similarity score,representing the similarity scores of the ith node and the jth node at the l-1 level,representing the dissimilarity score of the ith node and the jth node at the l-1 level,representing the node characteristics at level i for the kth node.
S340: edge prediction labels are ultimately obtained from edge features, i.e.Can be considered as two nodes ViAnd VjProbabilities from the same category. Each node ViThe classification can be carried out by simply weighting voting on the edge features related to the support nodes of the known category information added when the complete graph is constructed, the simple weighting voting is to sum the edge features of the support nodes belonging to the category and the query node, then a softmax is carried out to obtain the normalized probability of the query node belonging to the category, the category with the highest probability is selected from all the categories to obtain the final category label; the edge label prediction probability is defined as:
wherein, CkRepresenting the kth class, T representing the classification task for a given full graph,representing the probability that the ith node belongs to the kth class.
The loss function in the training process of the convolutional neural network is as follows: in the training process of the S300 convolutional neural network, the node features and the edge features are obtained by training as parameters through a loss function represented by the following minimization formula:
wherein, Ym,eRepresenting the true values corresponding to all edge labels,indicating the predicted value of the l layer of the network under the m task of all the edge labels. Edge loss LeDefined as a binary cross entropy loss. This makes it possible to obtain not only edge predictors from the last layer but also from other layers, so the total loss is the sum of all losses calculated in all layers to improve the gradient flow in the lower layers of the network.
S400: the method comprises the steps of obtaining features of a target to be detected and all candidate areas to be detected by using an image to be detected through the S100 method, obtaining a complete image to be detected through the S200 method, inputting the complete image to be detected into a network model trained through the S300, outputting all node features and corresponding edge features of the image to be detected, enabling the final edge features between nodes to represent the probability that two nodes belong to the same category, adding the edge features of each query node and the support node of each category, and then conducting softmax to obtain the probability of the category, wherein the category with the largest probability is the category corresponding to the target to be detected in the candidate recommended area to be detected. And integrating the region recommendation result in the S100 to obtain a final detection result.
In the invention, the regional recommendation proposal network adopts an SSD mode to carry out regression of the boundary frame, the multi-convolution mode can position target objects with different sizes, and under the condition of limited samples, the adopted method can effectively obtain the boundary frame for the target objects with different sizes in the scene because the data volume is less and enough target objects with various sizes are lacked. Subsequent two-classification with or without targets can further improve the accuracy of the candidate frames, and remove the candidate frames which obviously do not contain the target object, so that the positioning accuracy of the target object under the condition of limited samples can be integrally improved. The classification part of the target object detected by the universal target only aims at the convolution characteristic of the corresponding target boundary frame, and the design of the graph structure can not only utilize the convolution characteristic of the corresponding candidate frame, but also utilize the category relation between each candidate frame and the candidate frame. The graph structure edge comprises similarity relation and dissimilarity relation of two connected bounding boxes, and the mode similar to the attention mechanism not only utilizes aggregation among classes, but also simultaneously utilizes aggregation in the classes, and fully utilizes similarity neighbor information and dissimilarity neighbor information provided by the nodes. Node features may flow into edge features simultaneously when performing edge feature updates. And the available information is fully utilized under the condition of limited samples, so that the classification accuracy of the model on the target area can be greatly improved. The whole frame structure can exert better target detection capability under the condition of limited samples.
Claims (5)
1. A method for detecting a target under a limited sample is characterized by comprising the following steps:
s100: inputting the labeled samples in all the new classes into a backbone neural network, and extracting the characteristics of the detection target;
inputting the labeled samples in all the new classes into a regional recommendation network, wherein the regional recommendation network adopts a boundary box regression part in the SSD and adds a two-classification task with or without a target object behind the boundary box regression part, and a plurality of candidate recommendation regions are obtained for each labeled sample through the regional recommendation network;
performing secondary classification processing on all the boundary frames with the label samples to remove the boundary frames obviously not including the detection target so as to obtain all candidate regions of each label sample;
s200: constructing each candidate region with a label sample and the corresponding detection target feature obtained in the step S100 into a complete graph, wherein each node in the complete graph represents the feature corresponding to each recommended region, and each edge represents the probability that two connected nodes belong to the same class;
s300: then the complete graph obtained in the step S200 is used as the input of the convolutional neural network, the node features of the complete graph are formed by the convolutional features of the trunk network in the step S100 corresponding to the candidate regions, and the class relationship among the candidate recommendation regions is used as the edge features of the complete graph;
obtaining the predicted values of the node characteristics and the edge characteristics through multiple times of node characteristic updating and edge characteristic updating;
after N iterations, the predicted value and the true value are used for calculating loss, gradient feedback is carried out, network parameters are updated, when the calculated loss is larger than a threshold value, the iteration times are reset, the network is continuously updated until the loss is not larger than the threshold value, and a trained network model is obtained;
s400: the method comprises the steps of obtaining features of a target to be detected and all candidate areas to be detected by using an image to be detected through the S100 method, obtaining a complete image to be detected through the S200 method, inputting the complete image to be detected into a network model trained through the S300 method, outputting all node features and corresponding edge features of the image to be detected, conducting softmax on each query node and supporting nodes of each category and the edge features of each query node and all supporting nodes, obtaining the probability that the query node belongs to the category, and enabling the category with the highest probability to be the category corresponding to the target to be detected in the candidate recommended area to be detected.
2. The method for detecting objects under a limited sample of claim 1, wherein: the area recommendation network in S100 is trained by the following method:
s110, training the regional recommendation network on a base class data set in a scene learning mode;
s120: the area recommendation network trained on the base class data set in the step S110 is trained again on a new class data set with a small amount of labeled data;
a loss function L used in the training processes of S110 and S120totalComprises the following steps:
Ltotal=Lmain+λBDLBD
where L2 regularization is used to penalize FBDActivation of (2):
LBD=||FBD||2
Lmain=Lreg+Lcls
said LBDRepresenting background suppression regularization, FBDRepresenting a characteristic region corresponding to the background of the image, said LregRepresents the regression loss, L, of the target bounding boxclsRepresenting a loss of binary class, λ, of a non-target objectBDA weighting coefficient representing a background suppression regularization.
3. The method for detecting objects under limited sample according to claim 1 or 2, wherein the method for processing all candidate regions and corresponding detected object features of each labeled sample into a complete graph in S200 comprises: and taking the target feature of each category of the new category as a support node, taking the feature of a detection target corresponding to a candidate recommendation area obtained through the regional recommendation network as a query node, determining edge features between the nodes according to the categories between the nodes, wherein the edge feature value between the support nodes belonging to the same category is 1, and the edge feature value between the support nodes not belonging to the same category is 0.
4. The method for detecting the target under the limited sample according to claim 3, wherein the training process of the S300 convolutional neural network is as follows:
s310: let G denote the complete graph, viAnd eijRespectively representing the ith node feature in the node feature set and the edge feature between the ith node and the jth node in the edge feature set, and the true value y of each edge labelijDefined by the true values of the node features:
wherein, yiClass label, y, representing the ith nodejA category label representing the jth node;
each edge feature is a two-dimensional feature vector eij∈[0,1]2The node features are initialized by the middle-level features of the recommended region, and each edge feature is initialized by the edge label in the following way:
wherein the content of the first and second substances,representing a similarity relationship between two nodes.
S320: the convolutional neural network is composed of L layers, forward propagation is composed of alternate edge feature update and node feature update, and the node features of L-1 layers are givenAnd edge characteristicsFirstly, updating node characteristics according to a field aggregation process, performing characteristic conversion on the obtained aggregated characteristics by aggregating the characteristics of other nodes and edge characteristics in proportion, and updating the node characteristics of the layer;
wherein the content of the first and second substances, a representation of a feature transformation network is shown,andrespectively representing similarity relation and dissimilarity relation between the l-1 level nodes i and j,representing the node characteristics of level l-1 node j,a parameter representing a feature transformation network of layer l;
s330: the edge feature updating is based on the updated node features, the node similarity scores between any pair of nodes are obtained again, and each edge feature is updated by combining the previous edge feature value and the updated node similarity score;
wherein the content of the first and second substances,to measure the network, a similarity score is calculated,a parameter representing a metric network used to compute the similarity score,is shown asThe similarity scores of the i nodes and the j node at the l-1 level,representing the dissimilarity score of the ith node and the jth node at the l-1 level,a node characteristic at level l representing a kth node;
s340: edge prediction labels are ultimately obtained from edge features, i.e.Each node ViThe classification can be carried out by simply weighting voting on the edge features related to the support nodes with known category information added when the complete graph is constructed, the simple weighting voting is to sum the edge features of the support nodes belonging to the category and the query nodes, then a softmax is carried out to obtain the normalized probability of the query nodes belonging to the category, the category with the maximum probability is selected from all the categories to obtain the final category label; the edge label prediction probability is defined as:
5. The method for detecting the target under the finite sample as set forth in claim 4, wherein the loss function in the training process of the S300 convolutional neural network is:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011219061.9A CN112364747B (en) | 2020-11-04 | 2020-11-04 | Target detection method under limited sample |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011219061.9A CN112364747B (en) | 2020-11-04 | 2020-11-04 | Target detection method under limited sample |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112364747A true CN112364747A (en) | 2021-02-12 |
CN112364747B CN112364747B (en) | 2024-02-27 |
Family
ID=74514257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011219061.9A Active CN112364747B (en) | 2020-11-04 | 2020-11-04 | Target detection method under limited sample |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364747B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111804A (en) * | 2021-04-16 | 2021-07-13 | 北京房江湖科技有限公司 | Face detection method and device, electronic equipment and storage medium |
CN113283514A (en) * | 2021-05-31 | 2021-08-20 | 高新兴科技集团股份有限公司 | Unknown class classification method, device and medium based on deep learning |
CN114627437A (en) * | 2022-05-16 | 2022-06-14 | 科大天工智能装备技术(天津)有限公司 | Traffic target identification method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109934261A (en) * | 2019-01-31 | 2019-06-25 | 中山大学 | A kind of Knowledge driving parameter transformation model and its few sample learning method |
CN110097079A (en) * | 2019-03-29 | 2019-08-06 | 浙江工业大学 | A kind of privacy of user guard method based on classification boundaries |
CA3061717A1 (en) * | 2018-11-16 | 2020-05-16 | Royal Bank Of Canada | System and method for a convolutional neural network for multi-label classification with partial annotations |
CN111274981A (en) * | 2020-02-03 | 2020-06-12 | 中国人民解放军国防科技大学 | Target detection network construction method and device and target detection method |
CN111738318A (en) * | 2020-06-11 | 2020-10-02 | 大连理工大学 | Super-large image classification method based on graph neural network |
CN111860588A (en) * | 2020-06-12 | 2020-10-30 | 华为技术有限公司 | Training method for graph neural network and related equipment |
-
2020
- 2020-11-04 CN CN202011219061.9A patent/CN112364747B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3061717A1 (en) * | 2018-11-16 | 2020-05-16 | Royal Bank Of Canada | System and method for a convolutional neural network for multi-label classification with partial annotations |
CN109934261A (en) * | 2019-01-31 | 2019-06-25 | 中山大学 | A kind of Knowledge driving parameter transformation model and its few sample learning method |
CN110097079A (en) * | 2019-03-29 | 2019-08-06 | 浙江工业大学 | A kind of privacy of user guard method based on classification boundaries |
CN111274981A (en) * | 2020-02-03 | 2020-06-12 | 中国人民解放军国防科技大学 | Target detection network construction method and device and target detection method |
CN111738318A (en) * | 2020-06-11 | 2020-10-02 | 大连理工大学 | Super-large image classification method based on graph neural network |
CN111860588A (en) * | 2020-06-12 | 2020-10-30 | 华为技术有限公司 | Training method for graph neural network and related equipment |
Non-Patent Citations (9)
Title |
---|
CHEN H等: "LSTD: A Low-Shot Transfer Detector for Object Detection", 《ARXIV:1803.01529V1》, pages 1 - 8 * |
KIM J等: "Edge-labeling graph neural network for few-shot learning", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, pages 11 - 20 * |
KIPF T N等: "Semi-supervised classification with graph convolutional networks", 《ARXIV:1609.02907V4 》, pages 1 - 5 * |
LIU W等: "Ssd: Single shot multibox detector", 《COMPUTER VISION–ECCV 2016: 14TH EUROPEAN CONFERENCE, AMSTERDAM》, pages 21 - 37 * |
MA C等: "ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks", 《PATTERN RECOGNITION》, vol. 111, pages 1 - 13 * |
YAN C等: "Semantics-preserving graph propagation for zero-shot object detection", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 29, pages 8163 - 8176, XP011803387, DOI: 10.1109/TIP.2020.3011807 * |
吴国娟: "小样本学习(few-shot learning)探析", 《福建质量管理》, pages 222 * |
简毅等: "基于遗传优化GRNN神经网络的人脸识别算法", 《兵器装备工程学报》, vol. 39, no. 2, pages 131 - 135 * |
黄丹等: "有限样本下基于图卷积神经网络的目标检测方法研究", 《重庆理工大学学报(自然科学)》, pages 1 - 10 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111804A (en) * | 2021-04-16 | 2021-07-13 | 北京房江湖科技有限公司 | Face detection method and device, electronic equipment and storage medium |
CN113111804B (en) * | 2021-04-16 | 2024-06-04 | 贝壳找房(北京)科技有限公司 | Face detection method and device, electronic equipment and storage medium |
CN113283514A (en) * | 2021-05-31 | 2021-08-20 | 高新兴科技集团股份有限公司 | Unknown class classification method, device and medium based on deep learning |
CN113283514B (en) * | 2021-05-31 | 2024-05-21 | 高新兴科技集团股份有限公司 | Unknown class classification method, device and medium based on deep learning |
CN114627437A (en) * | 2022-05-16 | 2022-06-14 | 科大天工智能装备技术(天津)有限公司 | Traffic target identification method and system |
CN114627437B (en) * | 2022-05-16 | 2022-08-05 | 科大天工智能装备技术(天津)有限公司 | Traffic target identification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112364747B (en) | 2024-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111489358B (en) | Three-dimensional point cloud semantic segmentation method based on deep learning | |
CN109919108B (en) | Remote sensing image rapid target detection method based on deep hash auxiliary network | |
CN110956185B (en) | Method for detecting image salient object | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN112364747B (en) | Target detection method under limited sample | |
CN113378632A (en) | Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization | |
CA3066029A1 (en) | Image feature acquisition | |
CN111625667A (en) | Three-dimensional model cross-domain retrieval method and system based on complex background image | |
CN109033978B (en) | Error correction strategy-based CNN-SVM hybrid model gesture recognition method | |
CN112308115B (en) | Multi-label image deep learning classification method and equipment | |
CN111523586B (en) | Noise-aware-based full-network supervision target detection method | |
CN110598746A (en) | Adaptive scene classification method based on ODE solver | |
CN114219824A (en) | Visible light-infrared target tracking method and system based on deep network | |
CN116385773A (en) | Small target detection method, storage medium and electronic equipment | |
CN113673482A (en) | Cell antinuclear antibody fluorescence recognition method and system based on dynamic label distribution | |
CN111739037A (en) | Semantic segmentation method for indoor scene RGB-D image | |
CN113592008B (en) | System, method, device and storage medium for classifying small sample images | |
CN111079930A (en) | Method and device for determining quality parameters of data set and electronic equipment | |
CN117611838A (en) | Multi-label image classification method based on self-adaptive hypergraph convolutional network | |
Zhang et al. | Tree-shaped multiobjective evolutionary CNN for hyperspectral image classification | |
CN113032612B (en) | Construction method of multi-target image retrieval model, retrieval method and device | |
CN109614581A (en) | The Non-negative Matrix Factorization clustering method locally learnt based on antithesis | |
CN113378620B (en) | Cross-camera pedestrian re-identification method in surveillance video noise environment | |
CN109299291A (en) | A kind of Ask-Answer Community label recommendation method based on convolutional neural networks | |
CN112818982B (en) | Agricultural pest image detection method based on depth feature autocorrelation activation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |