CN112364747A - Target detection method under limited sample - Google Patents

Target detection method under limited sample Download PDF

Info

Publication number
CN112364747A
CN112364747A CN202011219061.9A CN202011219061A CN112364747A CN 112364747 A CN112364747 A CN 112364747A CN 202011219061 A CN202011219061 A CN 202011219061A CN 112364747 A CN112364747 A CN 112364747A
Authority
CN
China
Prior art keywords
node
edge
network
representing
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011219061.9A
Other languages
Chinese (zh)
Other versions
CN112364747B (en
Inventor
黄丹
冯欣
陈志�
吴浩铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing High Tech Zone Feima Innovation Research Institute
Original Assignee
Chongqing High Tech Zone Feima Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing High Tech Zone Feima Innovation Research Institute filed Critical Chongqing High Tech Zone Feima Innovation Research Institute
Priority to CN202011219061.9A priority Critical patent/CN112364747B/en
Publication of CN112364747A publication Critical patent/CN112364747A/en
Application granted granted Critical
Publication of CN112364747B publication Critical patent/CN112364747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a target detection method under a limited sample, which mainly comprises the following steps that firstly, picture samples in a new class are input into a backbone network for extraction of detection target characteristics and regression of a boundary frame, a two-classification task of whether a target object exists or not is carried out on an obtained boundary candidate frame, the boundary frame which obviously does not comprise the detection target is removed, and screening is carried out according to classification scores to obtain a candidate recommendation area of the sample; and secondly, forming a full-connected graph by the obtained convolution characteristics corresponding to the candidate regions, and processing the graph structure by using a trained neural network to obtain a class label of each candidate region. The method is universal in the field of few samples and has wide potential application.

Description

Target detection method under limited sample
Technical Field
The invention relates to the field of image detection and calculation, in particular to a target detection method under a limited sample.
Background
In the past few years, the deep learning algorithm based on the convolutional neural network has achieved remarkable performance in the field of target detection, and the success of the deep learning algorithm depends on a large number of target detection data sets with complete and accurate frame annotations. In practical applications, data with a complete annotation tag may be limited for a given target detection task. When data is scarce, the convolutional neural network can be severely over-fitted and cannot be generalized, and the capacity of the detector is limited. In contrast, humans exhibit a powerful ability to do this task: children can learn to identify new categories in a few pictures. Data such as medical images, endangered animals, etc. lack examples or it is difficult to obtain complete accurate data, so the ability to learn to detect objects from a small number of samples is needed for computer vision.
Since in the real world, the target object has large differences in illumination, shape, texture, etc., it is a challenge to detect few samples. Some progress has been made in current studies on few sample learning, but these methods focus on image classification and involve few target detection problems. For low-sample detection, the core problem is how to locate the target object in a cluttered background by a small number of sample studies. In this task, our goal is to challenge the problem of low-sample object detection, as shown in fig. 1, and in particular, given some base classes that have enough samples with label annotations and new classes that have only a small amount of data with labels, the object obtains a model that can detect both the new classes and the base classes. To date, few methods are available. Recently, meta-learning provides a reliable solution to similar problems, such as the low-sample classification problem. However, the difficulty of target detection is much greater, and it not only involves classification prediction of the target, but also involves positioning of the target, so the existing few-sample classification method cannot be directly applied to the problem of few-sample detection. Taking the matching network and prototype network as an example, it is unclear how to construct the target prototype for matching and positioning since there may be dispersed objects of irrelevant class in the image, or no target object at all.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: how to rapidly and accurately detect the target under the condition of few samples.
In order to solve the technical problems, the invention adopts the following technical scheme: a target detection method under a limited sample comprises the following steps:
s100: inputting the labeled samples in all the new classes into a trunk neural network, and extracting the characteristics of the detection target;
inputting the labeled samples in all the new classes into a regional recommendation network, wherein the regional recommendation network adopts a bounding box returning part in an SSD and adds a two-classification task with or without a target object behind the bounding box returning part, and a plurality of candidate recommendation regions are obtained for each labeled sample through the regional recommendation network;
performing secondary classification processing on all the boundary frames with the label samples to remove the boundary frames obviously not including the detection target so as to obtain all candidate regions of each label sample;
s200: constructing each candidate region with a label sample and the corresponding detection target feature obtained in the step S100 into a complete graph, wherein each node in the complete graph represents the feature corresponding to each recommended region, and each edge represents the probability that two connected nodes belong to the same class;
s300: then the complete graph obtained in the step S200 is used as the input of the convolutional neural network, the node features of the complete graph are formed by the convolutional features of the trunk network in the step S100 corresponding to the candidate regions, and the class relationship among the candidate recommendation regions is used as the edge features of the complete graph;
obtaining the predicted values of the node characteristics and the edge characteristics through multiple times of node characteristic updating and edge characteristic updating;
after N iterations, the predicted value and the true value are used for calculating loss, gradient feedback is carried out, network parameters are updated, when the calculated loss is larger than a threshold value, the iteration times are reset, the network is continuously updated until the loss is not larger than the threshold value, and a trained network model is obtained;
s400: the method comprises the steps of obtaining features of a target to be detected and all candidate areas to be detected by using an image to be detected through the S100 method, obtaining a complete image to be detected through the S200 method, inputting the complete image to be detected into a network model trained through the S300 method, outputting all node features and corresponding edge features of the image to be detected, conducting softmax on each query node and supporting nodes of each category and the edge features of each query node and all supporting nodes, obtaining the probability that the query node belongs to the category, and taking the category with the highest probability as the category corresponding to the target to be detected in the candidate recommended area to be detected.
Preferably, the area recommendation network in S100 is trained by the following method:
s110, training the regional recommendation network on a base class data set in a scene learning mode;
s120: the area recommendation network trained on the base class data set in the step S110 is trained again on a new class data set with a small amount of labeled data;
a loss function L used in the training processes of S110 and S120totalComprises the following steps:
Ltotal=LmainBDLBD
where L2 regularization is used to penalize FBDActivation of (2):
LBD=||FBD||2
Lmain=Lreg+Lcls
said LBDRepresenting background suppression regularization, FBDRepresenting a characteristic region corresponding to the background of the image, said LregRepresents the regression loss, L, of the target bounding boxclsRepresenting a loss of binary class, λ, of a non-target objectBDRepresenting the weighting coefficients of the background suppression regularization.
Preferably, in S200, the method for processing all candidate regions and corresponding detection target features of each labeled sample into a complete graph includes: and taking the target feature of each category of the new category as a support node, taking the feature of a detection target corresponding to a candidate recommendation area obtained through a regional recommendation network as a query node, determining edge features between the nodes according to the categories between the nodes, wherein the edge feature value between the support nodes belonging to the same category is 1, and the edge feature value between the support nodes not belonging to the same category is 0.
Preferably, the training process of the S300 convolutional neural network is as follows:
s310: let G denote the complete graph, viAnd eijRespectively representing the ith node feature in the node feature set and the edge feature between the ith node and the jth node in the edge feature set, and the true value y of each edge labelijDefined by the true value of the node feature:
Figure BDA0002761441190000031
wherein, yiClass label, y, representing the ith nodejA category label representing the jth node;
each edge feature is a two-dimensional feature vector eij∈[0,1]2The node features are initialized by the middle-level features of the recommended region, and each edge feature is initialized by the edge label in the following way:
Figure BDA0002761441190000032
wherein the content of the first and second substances,
Figure BDA0002761441190000033
representing a similarity relationship between two nodes.
S320: the convolutional neural network is composed of L layers, forward propagation is composed of alternate edge feature update and node feature update, and the node features of L-1 layers are given
Figure BDA0002761441190000034
And edge characteristics
Figure BDA0002761441190000035
Firstly, updating node characteristics according to a field aggregation process, performing characteristic conversion on the obtained aggregated characteristics by aggregating the characteristics of other nodes and edge characteristics in proportion, and updating the node characteristics of the layer;
edge characteristics of l-1 layer
Figure BDA0002761441190000036
Degree coefficient as corresponding node:
Figure BDA0002761441190000041
wherein the content of the first and second substances,
Figure BDA0002761441190000042
Figure BDA0002761441190000043
a representation of a feature transformation network is shown,
Figure BDA0002761441190000044
and
Figure BDA0002761441190000045
respectively representing similarity relation and dissimilarity relation between the l-1 level nodes i and j,
Figure BDA0002761441190000046
representing the node characteristics of level l-1 node j,
Figure BDA0002761441190000047
a parameter representing a feature transformation network of layer l;
s330: the edge feature updating is based on the updated node features, the node similarity scores between any pair of nodes are obtained again, and each edge feature is updated by combining the previous edge feature value and the updated node similarity score;
Figure BDA0002761441190000048
Figure BDA0002761441190000049
Figure BDA00027614411900000410
wherein the content of the first and second substances,
Figure BDA00027614411900000411
to measure the network, a similarity score is calculated,
Figure BDA00027614411900000412
a parameter representing a metric network used to compute the similarity score,
Figure BDA00027614411900000413
representing the similarity scores of the ith node and the jth node at the l-1 level,
Figure BDA00027614411900000414
representing the dissimilarity score of the ith node and the jth node at the l-1 level,
Figure BDA00027614411900000415
a node characteristic at level l representing a kth node;
s340: edge prediction labels are ultimately obtained from edge features, i.e.
Figure BDA00027614411900000416
Each node ViClassification can be done by simple weighted voting on the support node-related edge features of known class information added when constructing the full graph,simple weighted voting is to sum edge features of the support nodes belonging to the category and the query nodes, obtain normalized probability of the query nodes belonging to the category through softmax, select the category with the highest probability from all the categories, and obtain a final category label; the edge label prediction probability is defined as:
Figure BDA00027614411900000417
Figure BDA00027614411900000418
wherein, CkRepresenting the kth class, T representing the classification task for a given full graph,
Figure BDA00027614411900000419
representing the probability that the ith node belongs to the kth class.
Preferably, the loss function in the training process of the S300 convolutional neural network is as follows:
Figure BDA0002761441190000051
wherein, Ym,eRepresenting the true values corresponding to all edge labels,
Figure BDA0002761441190000052
indicating the predicted value of the l layer of the network under the m task of all the edge labels.
Compared with the prior art, the invention has at least the following advantages:
in the invention, a new target detector with less samples based on graph convolution is provided to solve the target detection problem under the condition of less samples. Firstly, the advantages of a traditional target detection framework SSD are fully utilized, background suppression regularization is introduced, and fine adjustment difficulty of few-sample detection is reduced. Secondly, a complete graph is constructed for the proposed candidate region, and the data of the graph structure is processed in a graph convolution mode to obtain a final detection result. And a scene learning mode is adopted on the two types of data sets, so that a few-sample learning task is simulated, and the few-sample learning capability of the model is fully improved. In the work that follows, the correctness and rationality of the proposed method will be demonstrated by more detailed, more thorough experiments.
Drawings
Fig. 1 shows target detection in the case of a small number of samples.
FIG. 2 is an overall block diagram of the method of the present invention.
Fig. 3 is a detector based on graph convolution.
Fig. 4 is a network structure of a node feature transformation network and a node similarity metric.
Detailed Description
The method of the present invention is described in further detail below with reference to the accompanying drawings.
Given a support image S with target objects and a query image Q which may contain target objects, the task is to find all target objects belonging to the support category in the query image and mark them with a tight border. If the support set contains N categories, each of which contains K instances, one such problem is referred to as N-way K-shot detection.
A few sample target detection setting is defined in which there are two types of data available for training, namely the base class and the new class. For the base class, there is a large amount of annotation data available, while the new class provides only a few labeled examples. Our goal is to learn the way to detect new objects by using knowledge in the base class, with both the base class and the new class existing.
Such a low-sample target detection setup is useful because it is well suited to practical situations where one may wish to deploy a pre-trained detector for a new class with only a few labeled examples. More specifically, large scale target detection datasets (e.g., PSACAL VOC, MSCOCO) can be used to pre-train the detection model. However, the number of target object classes is quite limited, especially compared to the huge object classes in the real world. Therefore, it is imperative to solve this problem of target detection with few samples.
Example (b): referring to fig. 2-4, a method for detecting a target under a limited sample includes the following steps:
s100: and inputting the labeled samples in all the new classes into a backbone neural network, removing a rear full-connection layer by using the backbone neural network through a classical classification network VGG16, and extracting the features of the input image, wherein the features mainly comprise contour features, texture features and color features.
Inputting the labeled samples in all new classes into a regional recommendation network, adopting a boundary box regression part in an SSD (single network multi-box detector) and then adding a two-classification task with or without a target object, and training by adopting the following training method provided by the invention to obtain a plurality of boundary box recommendation regions for each labeled sample.
The regional recommendation network is trained by adopting the following method:
and S110, training the regional recommendation network on the base class data set in a scene learning mode, wherein the training mode of the scene learning belongs to the prior art, and the training mode can simulate a learning task with few samples, so that the difficulty in fine adjustment can be reduced, and the learning capacity of the few samples can be improved.
S120: training the area recommendation network trained on the base class data set by the S110 on a new class data set with a small amount of labeled data; the training of this step is actually fine tuning.
A loss function L used in the training processes of S110 and S120totalComprises the following steps:
Ltotal=LmainBDLBD
where L2 regularization is used to penalize FBDActivation of (2):
LBD=||FBD||2
Lmain=Lreg+Lcls
said LBDRepresenting background suppression regularization, FBDRepresentation and imageCharacteristic region corresponding to background, LregRepresents the regression loss, L, of the target bounding boxclsRepresenting a loss of binary class, λ, of a non-target objectBDRepresenting the weighting coefficients of the background suppression regularization.
In the training stage of the regional recommendation network base class, the loss function mainly comprises two parts, wherein one part is regression loss of a target boundary box and two classification losses of whether a target object exists or not:
Lmain=Lreg+Lcls
the regression loss of the bounding box adopts the same loss function in the SSD, the common classification loss, namely the binary cross entropy loss is adopted for the target object, and the sum of the two parts is used as the loss function in the training stage of the regional recommendation network.
In order to further enhance the detection capability of few samples in a new class, a new regularization mode is adopted, and the background is inhibited and regularized LBDThe regularization method is adopted for training, so that the interference of complex background information on the positioning performance can be reduced. Background suppression (BD) regularization is performed by using knowledge of objects on the new class, i.e., the true bounding box in the training image. Specifically, for training images in the new class, we first generate a convolved feature cube from the middle convolutional layer of the backbone network. Then, I mask this convolution cube with the real bounding box of all objects in the image. Thus, we can identify the feature region corresponding to the image background, i.e., FBD. To suppress background interference, we penalize F using L2 regularizationBDActivation of (2):
LBD=||FBD||2
by means of the background suppression regularization, the model can focus more on the region corresponding to the target object while suppressing the background region, which is particularly important for few-sample learning. The total loss function of the area recommendation network new class training stage is as follows:
Ltotal=LmainBDLBD
and performing two-classification processing on all the boundary frames of each labeled sample, wherein the two-classification processing in the invention is the prior art, and removing the boundary frames obviously not including the detection target to obtain all the candidate areas of each labeled sample.
S200: and constructing each candidate region with the label sample and the corresponding detection target feature obtained in the step S100 into a complete graph, wherein each node in the complete graph represents the feature corresponding to each recommended region, and each edge represents the probability that two connected nodes belong to the same class.
The method comprises the following steps of processing all candidate areas and corresponding detection target features of each labeled sample into a complete graph, taking the target features of each category of a new category as support nodes, taking the features of the detection target corresponding to a candidate recommendation area obtained through a regional recommendation network as query nodes, determining edge features between the nodes according to the categories, setting edge feature values between the support nodes belonging to the same category as 1, setting edge feature values between the support nodes not belonging to the same category as 0, initializing edges between the query nodes and the support nodes as 0.5.
S300: then the complete graph obtained in the step S200 is used as the input of the convolutional neural network, the node features of the complete graph are formed by the convolutional features of the trunk network in the step S100 corresponding to the candidate regions, and the class relationship among the candidate recommendation regions is used as the edge features of the complete graph; intra-cluster similarity and inter-cluster variability are directly exploited.
Obtaining the predicted values of the node characteristics and the edge characteristics through multiple times of node characteristic updating and edge characteristic updating; each updating is to update the node features and the edge features in the complete graph, so that a new complete graph can be formed, and the edge features between the query node and the support nodes in the complete graph updated each time represent the probability that the query node and the support nodes belong to the same class;
when fine tuning is performed on new data, a new regularization method is introduced in the feature extraction stage, activation of background features is inhibited, and the difficulty of fine tuning is reduced.
After N iterations, the predicted value and the true value are used for calculating loss, gradient feedback is carried out, network parameters are updated, when the calculated loss is larger than a threshold value, the iteration times are reset, the network is continuously updated until the loss is not larger than the threshold value, and a trained network model is obtained; the training process of the convolutional neural network is as follows:
s310: let G denote the complete graph, viAnd eijRespectively representing the ith node feature in the node feature set and the edge feature between the ith node and the jth node in the edge feature set, and the true value y of each edge labelijDefined by the true value of the node feature:
Figure BDA0002761441190000081
wherein, yiIndicating a class label, y, representing the ith nodejA category label representing the ith node.
Each edge feature is a two-dimensional feature vector eij∈[0,1]2The strength of normalized intra-class and inter-class relationships between two connected nodes is shown, so that intra-cluster similarity and inter-cluster dissimilarity can be fully utilized. The node features are initialized by the middle layer features of the recommended area, the middle layer features are convolution features output by the convolution layer at the middle position of the backbone network, and each edge feature is initialized by the edge label according to the following mode:
Figure BDA0002761441190000082
wherein the content of the first and second substances,
Figure BDA0002761441190000083
the representation represents a similarity relationship between two nodes;
s320: the convolutional neural network is composed of L layers, forward propagation is composed of alternate edge feature update and node feature update, and the node features of L-1 layers are given
Figure BDA0002761441190000084
And edge characteristics
Figure BDA0002761441190000085
Firstly, updating node characteristics according to a field aggregation process, performing characteristic conversion on the obtained aggregated characteristics by aggregating the characteristics of other nodes and edge characteristics in proportion, and updating the node characteristics of the layer; the feature conversion network is composed of a multi-layer perceptron network, and belongs to the prior art.
Edge characteristics of l-1 layer
Figure BDA0002761441190000086
Degree coefficient as corresponding node:
Figure BDA0002761441190000087
wherein the content of the first and second substances,
Figure BDA0002761441190000088
Figure BDA0002761441190000089
a representation of a feature transformation network is shown,
Figure BDA00027614411900000810
and
Figure BDA00027614411900000811
respectively representing similarity relation and dissimilarity relation between the l-1 level nodes i and j,
Figure BDA00027614411900000812
representing the node characteristics of level l-1 node j,
Figure BDA00027614411900000813
a parameter representing a feature transformation network of layer l;
the method not only considers intra-class aggregation but also considers inter-class aggregation, and makes full use of the dissimilarity neighbor information and the similar neighbor information provided by the target node.
S330: the edge feature updating is based on the updated node features, the node similarity scores between any pair of nodes are obtained again, and each edge feature is updated by combining the previous edge feature value and the updated node similarity score;
Figure BDA0002761441190000091
Figure BDA0002761441190000092
Figure BDA0002761441190000093
wherein the content of the first and second substances,
Figure BDA0002761441190000094
to measure the network, a similarity score is calculated so that the node features flow into the edge features and each element of the edge features is updated separately from each normalized intra-class similarity and inter-class dissimilarity. That is, each edge feature update takes into account not only the relationship of the corresponding node pair, but also the relationship of other node pairs. We can choose to use two separate measurement networks to compute the similarity or dissimilarity of node pairs.
Figure BDA0002761441190000095
A parameter representing a metric network used to compute the similarity score,
Figure BDA0002761441190000096
representing the similarity scores of the ith node and the jth node at the l-1 level,
Figure BDA0002761441190000097
representing the dissimilarity score of the ith node and the jth node at the l-1 level,
Figure BDA0002761441190000098
representing the node characteristics at level i for the kth node.
S340: edge prediction labels are ultimately obtained from edge features, i.e.
Figure BDA0002761441190000099
Can be considered as two nodes ViAnd VjProbabilities from the same category. Each node ViThe classification can be carried out by simply weighting voting on the edge features related to the support nodes of the known category information added when the complete graph is constructed, the simple weighting voting is to sum the edge features of the support nodes belonging to the category and the query node, then a softmax is carried out to obtain the normalized probability of the query node belonging to the category, the category with the highest probability is selected from all the categories to obtain the final category label; the edge label prediction probability is defined as:
Figure BDA00027614411900000910
Figure BDA00027614411900000911
wherein, CkRepresenting the kth class, T representing the classification task for a given full graph,
Figure BDA00027614411900000912
representing the probability that the ith node belongs to the kth class.
The loss function in the training process of the convolutional neural network is as follows: in the training process of the S300 convolutional neural network, the node features and the edge features are obtained by training as parameters through a loss function represented by the following minimization formula:
Figure BDA00027614411900000913
wherein, Ym,eRepresenting the true values corresponding to all edge labels,
Figure BDA00027614411900000914
indicating the predicted value of the l layer of the network under the m task of all the edge labels. Edge loss LeDefined as a binary cross entropy loss. This makes it possible to obtain not only edge predictors from the last layer but also from other layers, so the total loss is the sum of all losses calculated in all layers to improve the gradient flow in the lower layers of the network.
S400: the method comprises the steps of obtaining features of a target to be detected and all candidate areas to be detected by using an image to be detected through the S100 method, obtaining a complete image to be detected through the S200 method, inputting the complete image to be detected into a network model trained through the S300, outputting all node features and corresponding edge features of the image to be detected, enabling the final edge features between nodes to represent the probability that two nodes belong to the same category, adding the edge features of each query node and the support node of each category, and then conducting softmax to obtain the probability of the category, wherein the category with the largest probability is the category corresponding to the target to be detected in the candidate recommended area to be detected. And integrating the region recommendation result in the S100 to obtain a final detection result.
In the invention, the regional recommendation proposal network adopts an SSD mode to carry out regression of the boundary frame, the multi-convolution mode can position target objects with different sizes, and under the condition of limited samples, the adopted method can effectively obtain the boundary frame for the target objects with different sizes in the scene because the data volume is less and enough target objects with various sizes are lacked. Subsequent two-classification with or without targets can further improve the accuracy of the candidate frames, and remove the candidate frames which obviously do not contain the target object, so that the positioning accuracy of the target object under the condition of limited samples can be integrally improved. The classification part of the target object detected by the universal target only aims at the convolution characteristic of the corresponding target boundary frame, and the design of the graph structure can not only utilize the convolution characteristic of the corresponding candidate frame, but also utilize the category relation between each candidate frame and the candidate frame. The graph structure edge comprises similarity relation and dissimilarity relation of two connected bounding boxes, and the mode similar to the attention mechanism not only utilizes aggregation among classes, but also simultaneously utilizes aggregation in the classes, and fully utilizes similarity neighbor information and dissimilarity neighbor information provided by the nodes. Node features may flow into edge features simultaneously when performing edge feature updates. And the available information is fully utilized under the condition of limited samples, so that the classification accuracy of the model on the target area can be greatly improved. The whole frame structure can exert better target detection capability under the condition of limited samples.

Claims (5)

1. A method for detecting a target under a limited sample is characterized by comprising the following steps:
s100: inputting the labeled samples in all the new classes into a backbone neural network, and extracting the characteristics of the detection target;
inputting the labeled samples in all the new classes into a regional recommendation network, wherein the regional recommendation network adopts a boundary box regression part in the SSD and adds a two-classification task with or without a target object behind the boundary box regression part, and a plurality of candidate recommendation regions are obtained for each labeled sample through the regional recommendation network;
performing secondary classification processing on all the boundary frames with the label samples to remove the boundary frames obviously not including the detection target so as to obtain all candidate regions of each label sample;
s200: constructing each candidate region with a label sample and the corresponding detection target feature obtained in the step S100 into a complete graph, wherein each node in the complete graph represents the feature corresponding to each recommended region, and each edge represents the probability that two connected nodes belong to the same class;
s300: then the complete graph obtained in the step S200 is used as the input of the convolutional neural network, the node features of the complete graph are formed by the convolutional features of the trunk network in the step S100 corresponding to the candidate regions, and the class relationship among the candidate recommendation regions is used as the edge features of the complete graph;
obtaining the predicted values of the node characteristics and the edge characteristics through multiple times of node characteristic updating and edge characteristic updating;
after N iterations, the predicted value and the true value are used for calculating loss, gradient feedback is carried out, network parameters are updated, when the calculated loss is larger than a threshold value, the iteration times are reset, the network is continuously updated until the loss is not larger than the threshold value, and a trained network model is obtained;
s400: the method comprises the steps of obtaining features of a target to be detected and all candidate areas to be detected by using an image to be detected through the S100 method, obtaining a complete image to be detected through the S200 method, inputting the complete image to be detected into a network model trained through the S300 method, outputting all node features and corresponding edge features of the image to be detected, conducting softmax on each query node and supporting nodes of each category and the edge features of each query node and all supporting nodes, obtaining the probability that the query node belongs to the category, and enabling the category with the highest probability to be the category corresponding to the target to be detected in the candidate recommended area to be detected.
2. The method for detecting objects under a limited sample of claim 1, wherein: the area recommendation network in S100 is trained by the following method:
s110, training the regional recommendation network on a base class data set in a scene learning mode;
s120: the area recommendation network trained on the base class data set in the step S110 is trained again on a new class data set with a small amount of labeled data;
a loss function L used in the training processes of S110 and S120totalComprises the following steps:
Ltotal=LmainBDLBD
where L2 regularization is used to penalize FBDActivation of (2):
LBD=||FBD||2
Lmain=Lreg+Lcls
said LBDRepresenting background suppression regularization, FBDRepresenting a characteristic region corresponding to the background of the image, said LregRepresents the regression loss, L, of the target bounding boxclsRepresenting a loss of binary class, λ, of a non-target objectBDA weighting coefficient representing a background suppression regularization.
3. The method for detecting objects under limited sample according to claim 1 or 2, wherein the method for processing all candidate regions and corresponding detected object features of each labeled sample into a complete graph in S200 comprises: and taking the target feature of each category of the new category as a support node, taking the feature of a detection target corresponding to a candidate recommendation area obtained through the regional recommendation network as a query node, determining edge features between the nodes according to the categories between the nodes, wherein the edge feature value between the support nodes belonging to the same category is 1, and the edge feature value between the support nodes not belonging to the same category is 0.
4. The method for detecting the target under the limited sample according to claim 3, wherein the training process of the S300 convolutional neural network is as follows:
s310: let G denote the complete graph, viAnd eijRespectively representing the ith node feature in the node feature set and the edge feature between the ith node and the jth node in the edge feature set, and the true value y of each edge labelijDefined by the true values of the node features:
Figure FDA0002761441180000021
wherein, yiClass label, y, representing the ith nodejA category label representing the jth node;
each edge feature is a two-dimensional feature vector eij∈[0,1]2The node features are initialized by the middle-level features of the recommended region, and each edge feature is initialized by the edge label in the following way:
Figure FDA0002761441180000022
wherein the content of the first and second substances,
Figure FDA0002761441180000023
representing a similarity relationship between two nodes.
S320: the convolutional neural network is composed of L layers, forward propagation is composed of alternate edge feature update and node feature update, and the node features of L-1 layers are given
Figure FDA0002761441180000024
And edge characteristics
Figure FDA0002761441180000025
Firstly, updating node characteristics according to a field aggregation process, performing characteristic conversion on the obtained aggregated characteristics by aggregating the characteristics of other nodes and edge characteristics in proportion, and updating the node characteristics of the layer;
edge characteristics of l-1 layer
Figure FDA0002761441180000031
Degree coefficient as corresponding node:
Figure FDA0002761441180000032
wherein the content of the first and second substances,
Figure FDA0002761441180000033
Figure FDA0002761441180000034
a representation of a feature transformation network is shown,
Figure FDA0002761441180000035
and
Figure FDA0002761441180000036
respectively representing similarity relation and dissimilarity relation between the l-1 level nodes i and j,
Figure FDA0002761441180000037
representing the node characteristics of level l-1 node j,
Figure FDA0002761441180000038
a parameter representing a feature transformation network of layer l;
s330: the edge feature updating is based on the updated node features, the node similarity scores between any pair of nodes are obtained again, and each edge feature is updated by combining the previous edge feature value and the updated node similarity score;
Figure FDA0002761441180000039
Figure FDA00027614411800000310
Figure FDA00027614411800000311
wherein the content of the first and second substances,
Figure FDA00027614411800000312
to measure the network, a similarity score is calculated,
Figure FDA00027614411800000313
a parameter representing a metric network used to compute the similarity score,
Figure FDA00027614411800000314
is shown asThe similarity scores of the i nodes and the j node at the l-1 level,
Figure FDA00027614411800000315
representing the dissimilarity score of the ith node and the jth node at the l-1 level,
Figure FDA00027614411800000316
a node characteristic at level l representing a kth node;
s340: edge prediction labels are ultimately obtained from edge features, i.e.
Figure FDA00027614411800000317
Each node ViThe classification can be carried out by simply weighting voting on the edge features related to the support nodes with known category information added when the complete graph is constructed, the simple weighting voting is to sum the edge features of the support nodes belonging to the category and the query nodes, then a softmax is carried out to obtain the normalized probability of the query nodes belonging to the category, the category with the maximum probability is selected from all the categories to obtain the final category label; the edge label prediction probability is defined as:
Figure FDA00027614411800000318
Figure FDA00027614411800000319
wherein, CkRepresenting the kth class, T representing the classification task for a given full graph,
Figure FDA00027614411800000320
representing the probability that the ith node belongs to the kth class.
5. The method for detecting the target under the finite sample as set forth in claim 4, wherein the loss function in the training process of the S300 convolutional neural network is:
Figure FDA0002761441180000041
wherein, Ym,eRepresenting the true values corresponding to all edge labels,
Figure FDA0002761441180000042
indicating the predicted value of the l layer of the network under the m task of all the edge labels.
CN202011219061.9A 2020-11-04 2020-11-04 Target detection method under limited sample Active CN112364747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011219061.9A CN112364747B (en) 2020-11-04 2020-11-04 Target detection method under limited sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011219061.9A CN112364747B (en) 2020-11-04 2020-11-04 Target detection method under limited sample

Publications (2)

Publication Number Publication Date
CN112364747A true CN112364747A (en) 2021-02-12
CN112364747B CN112364747B (en) 2024-02-27

Family

ID=74514257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011219061.9A Active CN112364747B (en) 2020-11-04 2020-11-04 Target detection method under limited sample

Country Status (1)

Country Link
CN (1) CN112364747B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111804A (en) * 2021-04-16 2021-07-13 北京房江湖科技有限公司 Face detection method and device, electronic equipment and storage medium
CN113283514A (en) * 2021-05-31 2021-08-20 高新兴科技集团股份有限公司 Unknown class classification method, device and medium based on deep learning
CN114627437A (en) * 2022-05-16 2022-06-14 科大天工智能装备技术(天津)有限公司 Traffic target identification method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934261A (en) * 2019-01-31 2019-06-25 中山大学 A kind of Knowledge driving parameter transformation model and its few sample learning method
CN110097079A (en) * 2019-03-29 2019-08-06 浙江工业大学 A kind of privacy of user guard method based on classification boundaries
CA3061717A1 (en) * 2018-11-16 2020-05-16 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
CN111274981A (en) * 2020-02-03 2020-06-12 中国人民解放军国防科技大学 Target detection network construction method and device and target detection method
CN111738318A (en) * 2020-06-11 2020-10-02 大连理工大学 Super-large image classification method based on graph neural network
CN111860588A (en) * 2020-06-12 2020-10-30 华为技术有限公司 Training method for graph neural network and related equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3061717A1 (en) * 2018-11-16 2020-05-16 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
CN109934261A (en) * 2019-01-31 2019-06-25 中山大学 A kind of Knowledge driving parameter transformation model and its few sample learning method
CN110097079A (en) * 2019-03-29 2019-08-06 浙江工业大学 A kind of privacy of user guard method based on classification boundaries
CN111274981A (en) * 2020-02-03 2020-06-12 中国人民解放军国防科技大学 Target detection network construction method and device and target detection method
CN111738318A (en) * 2020-06-11 2020-10-02 大连理工大学 Super-large image classification method based on graph neural network
CN111860588A (en) * 2020-06-12 2020-10-30 华为技术有限公司 Training method for graph neural network and related equipment

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
CHEN H等: "LSTD: A Low-Shot Transfer Detector for Object Detection", 《ARXIV:1803.01529V1》, pages 1 - 8 *
KIM J等: "Edge-labeling graph neural network for few-shot learning", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, pages 11 - 20 *
KIPF T N等: "Semi-supervised classification with graph convolutional networks", 《ARXIV:1609.02907V4 》, pages 1 - 5 *
LIU W等: "Ssd: Single shot multibox detector", 《COMPUTER VISION–ECCV 2016: 14TH EUROPEAN CONFERENCE, AMSTERDAM》, pages 21 - 37 *
MA C等: "ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks", 《PATTERN RECOGNITION》, vol. 111, pages 1 - 13 *
YAN C等: "Semantics-preserving graph propagation for zero-shot object detection", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 29, pages 8163 - 8176, XP011803387, DOI: 10.1109/TIP.2020.3011807 *
吴国娟: "小样本学习(few-shot learning)探析", 《福建质量管理》, pages 222 *
简毅等: "基于遗传优化GRNN神经网络的人脸识别算法", 《兵器装备工程学报》, vol. 39, no. 2, pages 131 - 135 *
黄丹等: "有限样本下基于图卷积神经网络的目标检测方法研究", 《重庆理工大学学报(自然科学)》, pages 1 - 10 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111804A (en) * 2021-04-16 2021-07-13 北京房江湖科技有限公司 Face detection method and device, electronic equipment and storage medium
CN113111804B (en) * 2021-04-16 2024-06-04 贝壳找房(北京)科技有限公司 Face detection method and device, electronic equipment and storage medium
CN113283514A (en) * 2021-05-31 2021-08-20 高新兴科技集团股份有限公司 Unknown class classification method, device and medium based on deep learning
CN113283514B (en) * 2021-05-31 2024-05-21 高新兴科技集团股份有限公司 Unknown class classification method, device and medium based on deep learning
CN114627437A (en) * 2022-05-16 2022-06-14 科大天工智能装备技术(天津)有限公司 Traffic target identification method and system
CN114627437B (en) * 2022-05-16 2022-08-05 科大天工智能装备技术(天津)有限公司 Traffic target identification method and system

Also Published As

Publication number Publication date
CN112364747B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN110956185B (en) Method for detecting image salient object
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN112364747B (en) Target detection method under limited sample
CN113378632A (en) Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CA3066029A1 (en) Image feature acquisition
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN111523586B (en) Noise-aware-based full-network supervision target detection method
CN110598746A (en) Adaptive scene classification method based on ODE solver
CN114219824A (en) Visible light-infrared target tracking method and system based on deep network
CN116385773A (en) Small target detection method, storage medium and electronic equipment
CN113673482A (en) Cell antinuclear antibody fluorescence recognition method and system based on dynamic label distribution
CN111739037A (en) Semantic segmentation method for indoor scene RGB-D image
CN113592008B (en) System, method, device and storage medium for classifying small sample images
CN111079930A (en) Method and device for determining quality parameters of data set and electronic equipment
CN117611838A (en) Multi-label image classification method based on self-adaptive hypergraph convolutional network
Zhang et al. Tree-shaped multiobjective evolutionary CNN for hyperspectral image classification
CN113032612B (en) Construction method of multi-target image retrieval model, retrieval method and device
CN109614581A (en) The Non-negative Matrix Factorization clustering method locally learnt based on antithesis
CN113378620B (en) Cross-camera pedestrian re-identification method in surveillance video noise environment
CN109299291A (en) A kind of Ask-Answer Community label recommendation method based on convolutional neural networks
CN112818982B (en) Agricultural pest image detection method based on depth feature autocorrelation activation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant