CN113159116A - Small sample image target detection method based on class interval balance - Google Patents

Small sample image target detection method based on class interval balance Download PDF

Info

Publication number
CN113159116A
CN113159116A CN202110262177.9A CN202110262177A CN113159116A CN 113159116 A CN113159116 A CN 113159116A CN 202110262177 A CN202110262177 A CN 202110262177A CN 113159116 A CN113159116 A CN 113159116A
Authority
CN
China
Prior art keywords
class
small sample
training
data
sample image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110262177.9A
Other languages
Chinese (zh)
Inventor
叶齐祥
李博豪
焦建彬
杨博宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chinese Academy of Sciences
Original Assignee
University of Chinese Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Chinese Academy of Sciences filed Critical University of Chinese Academy of Sciences
Priority to CN202110262177.9A priority Critical patent/CN113159116A/en
Publication of CN113159116A publication Critical patent/CN113159116A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample image target detection method based on class spacing balance, which comprises a training stage and a testing stage, wherein the training stage comprises the following steps: step 1, training a network model by using base class data; and 2, carrying out fine tuning training on the network model by using the combination of the novel class data and the base class data. The small sample image target detection method based on class-spacing balance, disclosed by the invention, does not need a large amount of data annotation, reduces the manual annotation cost, reduces the aliasing among different types of prototype vectors, and improves the target detection precision of a neural network on an inquiry image; the method has important significance for small sample learning, incremental learning and the like, and has application value for target detection in the fields of natural scene images and the like.

Description

Small sample image target detection method based on class interval balance
Technical Field
The invention belongs to the technical field of small sample learning and computer vision, and particularly relates to a small sample image target detection method based on class spacing balance.
Background
In recent years, visual object detection has made great progress, mainly due to the availability of large-scale datasets with precise annotations and Convolutional Neural Networks (CNNs) capable of absorbing the annotation information. However, annotating a large number of objects is expensive and laborious, and this is inconsistent with cognitive learning, which can use little supervision to build accurate models.
Small sample object detection that mimics the way human learning has attracted increasing attention. Given a base class that is sufficient to train data and a novel class that rarely supervises a sample, a small sample target detects the training model to detect objects from both the base class and the novel class. For this reason, most work divides the training process into two phases: base class training (token learning) and novel class reconstruction (meta training). In the characterization learning, enough base class training data are used for training a network and constructing a representative feature space; in meta-training, the network is fine-tuned so that novel class objects can be represented within the feature space.
While significant advances have been made in small sample target detection methods, previous work has ignored the inherent contradiction between characterization and classification. Namely: in order to separate classes to reduce aliasing between classes, the distributions of the two base classes need to be far from each other (maximum margin) but to represent the novel classes accurately, the distributions of the base classes should be close to each other (minimum margin), which leads to difficulties in classification.
Therefore, how to optimize the characterization and classification of novel classes in the same feature space is a problem that needs to be solved.
Disclosure of Invention
In order to overcome the above problems, the present inventors have conducted intensive studies to design a small sample image object detection method based on class-spacing balance, which constructs a feature space during base class training to maximize the spacing between novel classes by introducing class-spacing loss; during the novel class reconstruction, a characteristic perturbation module is introduced by truncating the gradient map; through multiple training iterations, feature reconstruction and object classification are simultaneously achieved in the same feature space by equalizing class intervals in a antagonism min-max manner, thereby completing the present invention.
Specifically, the present invention aims to provide the following:
in a first aspect, a small sample image target detection method based on class-spacing balance is provided, the method includes a training phase and a testing phase, and the training phase includes the following steps:
step 1, training a network model by using base class data;
and 2, carrying out fine tuning training on the network model by using the combination of the novel class data and the base class data.
In a second aspect, a computer-readable storage medium is provided, which stores a class-spacing-balance-based small sample image object detection training program, and when the program is executed by a processor, the program causes the processor to execute the steps of the class-spacing-balance-based small sample image object detection method according to the first aspect.
In a third aspect, a computer device is provided, which includes a memory and a processor, the memory stores a small sample image object detection training program based on class-pitch balance, and the program, when executed by the processor, causes the processor to execute the steps of the small sample image object detection method based on class-pitch balance of the first aspect.
The invention has the advantages that:
(1) the invention provides a small sample image target detection method based on class interval balance, which discloses contraction of feature classification hidden in small sample target detection, provides a feasible method for relieving the contraction from the aspect of class interval balance (CME), and realizes feature reconstruction and target classification in the same feature space;
(2) the small sample image target detection method based on class interval balance provided by the invention provides a maximum edge loss and characteristic disturbance module, realizes class interval balance in a antagonism minimum-maximum mode, reduces aliasing among different types of prototype vectors, and improves target detection precision of a neural network on a query image;
(3) according to the small sample image target detection method based on class-space balance, the problem of small sample target detection is converted into the problem of small sample classification by filtering the positioning features in the prototype vector, so that the manual labeling cost is reduced, the method has great significance for small sample learning, incremental learning and the like, and has application value for target detection in the fields of natural scene images and the like.
Drawings
FIG. 1 illustrates a flow diagram for training a network model using base class large amounts of data in accordance with a preferred embodiment of the present invention;
FIG. 2 illustrates a fine tuning training flow diagram in accordance with a preferred embodiment of the present invention;
FIG. 3 illustrates prototype vector t-SNE visualizations generated by the baseline method of the embodiment and the method of the invention, wherein the novelty classes are shown in bold;
FIG. 4 shows the t-SNE visualization of the evolution of prototype vector features of two classes in the fine tuning phase in an embodiment, where the dashed line represents the eigen-disturbance path (forward propagation) and the solid line segment represents the path with the largest edge loss (backward propagation);
FIG. 5 is a graph showing comparison of the results of the baseline method and the method of the present invention, wherein the green frame is the correct frame, the blue frame is the missing frame compared to the correct result, and the red frame is the wrong frame.
Detailed Description
The present invention will be described in further detail below with reference to preferred embodiments and examples. The features and advantages of the present invention will become more apparent from the description.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The inventor finds that in the small sample image target detection method, a feature space is constructed during base class training, and the distance between novel classes is maximized by introducing class distance loss; during the novel class reconstruction, a feature perturbation module is introduced. Feature space division for small sample target detection is optimized through antagonism class spacing regularization, and therefore feature reconstruction and target classification are achieved in the same feature space.
Based on the above, the first aspect of the present invention provides a method for detecting a small sample image target based on Class distance balance (CME), the method includes a training phase and a testing phase, the training phase includes the following steps, as shown in fig. 1 and 2:
step 1, training a network model by using base class data;
and 2, carrying out fine tuning training on the network model by using the combination of the novel class data and the base class data.
In the invention, the training phase is to train the neural network by detecting the target of the query set by using the support set under the condition of giving the support set (support image) and the query set (query image) with the same category.
The support set (support image) refers to an image with semantic labels, the query set (query image) refers to an image to be detected without labels, and the same category refers to the same category of semantic labels, such as sheep, cattle and the like.
The steps involved in the training phase are described further below:
step 1, training a network model by using base class data.
As shown in FIG. 1, the network model is trained using a base class of large amounts of data.
Preferably, step 1 comprises the sub-steps of:
step 1-1, extracting a feature map of the query image.
According to a preferred embodiment of the invention, a convolutional neural network base network is used to extract a feature map from a query image.
Wherein, the base net can be YOLO, darknet19, etc., preferably darknet 19. For example, the base network from which the query image was extracted is darknet19, the network input is a 3-dimensional image of 416 × 416 size, and the network output is a 13 × 13 feature map of 1024.
And 1-2, extracting a feature map of the support image and a corresponding prototype vector thereof.
According to a preferred embodiment of the invention, the feature map of the annotated support image is extracted using a lightweight convolutional neural network, and the prototype vector corresponding to the support image feature map is extracted using maximal pooling.
The lightweight convolutional neural network refers to a network computing mode designed to be more efficient, so that network parameters are reduced while network performance is not lost, and the lightweight convolutional neural network can be a Darknet network, a MobileNet network and the like.
In a further preferred embodiment, the prototype vector corresponding to the feature map of the support image is obtained by the following formula:
Figure BDA0002970478000000051
wherein f isθS(. is a lightweight convolutional neural network that extracts features of the support image, ISMask for object in image MSIs used to support the image(s) of (1),
Figure BDA0002970478000000052
for embedding operation, ISAnd MSEmbedding to generate a four-dimensional image to be input to a neural network;
supporting the data set Scomposed
Figure BDA0002970478000000053
Wherein i is the index number corresponding to the category, k is the index number corresponding to the target in the category,
Figure BDA0002970478000000054
is a three-dimensional image of W x H,
Figure BDA0002970478000000055
to correspond to
Figure BDA0002970478000000056
W × H, W and H are preferably 416.
And 1-3, performing dimensionality reduction on the prototype vector to obtain a dimensionality-reduced prototype vector and obtain a loss value corresponding to the dimensionality-reduced prototype vector.
The inventor researches and discovers that the prototype vector corresponding to the support image feature map obtained through the step 1-2 mainly causes aliasing of different types of prototype vectors due to the fact that the prototype vector comprises the class features and the positioning features of the target in the support image, and therefore, according to a preferred embodiment of the invention, the dimension of the prototype vector is reduced by adopting a full-connection convolution layer, and the target detection problem is converted into the classification problem.
Preferably, the prototype vector is reduced in dimension by:
Figure BDA0002970478000000061
wherein the content of the first and second substances,
Figure BDA0002970478000000062
is a fully connected convolutional layer.
In a further preferred embodiment, the loss value corresponding to the reduced prototype vector is obtained by the maximum edge loss.
Preferably, the loss value corresponding to the reduced prototype vector is obtained by the following formula:
Figure BDA0002970478000000063
wherein the content of the first and second substances,
Figure BDA0002970478000000064
in order to maximize the edge loss,
Figure BDA0002970478000000065
the distance between the inner sides of the clusters is the similar inner distance,
Figure BDA0002970478000000066
Figure BDA0002970478000000067
Figure BDA0002970478000000068
the distance between the two adjacent lines is the similar distance,
Figure BDA0002970478000000069
μ′iis a vector of a class-averaged prototype,
Figure BDA00029704780000000610
wherein, K is the number of prototype vectors corresponding to a class, and j is the index corresponding to the prototype vector in the class.
In the present invention, it is preferable to decouple local features that may mislead the class pitch in the feature space by introducing fully connected convolutional layers.
The method converts the problem of small sample target detection into the problem of small sample classification by filtering the positioning characteristics in the prototype vector, reduces the manual labeling cost, and has important significance for small sample learning, increment learning and the like.
And 1-4, performing feature activation on the query image to realize target detection on the query image.
The method comprises the steps of carrying out feature activation on an unmarked query image to be detected by adopting a prototype vector before dimensionality reduction, and realizing target detection on the query image.
The feature activation can be performed by a method commonly used in the prior art, and preferably, a prototype vector before dimension reduction and a corresponding feature of the extracted query image are multiplied channel by channel so that the prototype vector and the feature are fused to complete the feature activation.
And 1-5, updating parameters of the network model.
According to a preferred embodiment of the present invention, the detection result and the labeled detection loss are calculated, the gradient of the loss function is calculated, the error gradient is transmitted back to the network, and the network parameters are updated.
In a further preferred embodiment, the detection result and the annotated detection loss are obtained by the following formula:
Figure BDA0002970478000000071
wherein the content of the first and second substances,
Figure BDA0002970478000000072
representing a query image IQExtracted corresponding feature map, θQRepresenting the extracted query image features corresponding to the network model parameters,
Figure BDA0002970478000000073
which means that the multiplication is performed channel by channel,
Figure BDA0002970478000000074
representing a prediction model, thetaPRepresenting the parameters of the prediction model, muiRepresenting the corresponding prototype vector, M, of the support imageQAnd representing the result of the corresponding test picture marking.
In the invention, the query image feature map after activating the prediction model is output
Figure BDA0002970478000000075
Namely a detection result graph of the network for the query image.
In a still further preferred embodiment, the detection loss function is obtained by the following formula:
Figure BDA0002970478000000076
wherein the content of the first and second substances,
Figure BDA0002970478000000077
in order to classify the function of the loss,
Figure BDA0002970478000000078
in order to be a function of the regression loss,
Figure BDA0002970478000000079
as an anchor confidence loss function.
Figure BDA00029704780000000710
Wherein the content of the first and second substances,
Figure BDA00029704780000000711
as an index function, to determine whether the current detection frame target is of class i,
Figure BDA00029704780000000712
to a confidence of corresponding class i, λclassIs the classification loss function weight.
Figure BDA00029704780000000713
Figure BDA0002970478000000081
Wherein the content of the first and second substances,
Figure BDA0002970478000000082
predicting the coordinates of the upper left corner of the detection box and the width and height, x, of the box for the network, respectivelyi,yi,wi,hiRespectively the coordinates of the upper left corner of the corresponding real frame and the width and height of the frame, lambdacoordFor the regression loss function weights, l.w and l.h are the width and height of the final output feature map, and l.n is the number of test boxes corresponding to each cell of the final output feature map.
Figure BDA0002970478000000083
Wherein λ isnoobjThe weight, lambda, of the target case is not included in the Anchor confidence loss functionobjThen the weights for the included target cases,
Figure BDA0002970478000000084
determining whether the current detection frame does not contain the target as class i, C for the index functioniThe confidence of the corresponding anchor is output for the network,
Figure BDA0002970478000000085
then is the true confidence level for the anchor.
And 2, carrying out fine tuning training on the network model by using the combination of the novel class data and the base class data.
Wherein, as shown in fig. 2, the network model is fine-tuned and trained by using a small amount of data containing a novel class in combination with the base class data.
Preferably, step 2 comprises the following sub-steps:
and 2-1, combining the novel class data and the base class data to form a new data set, then training a network model, and repeating the steps 1-1 to 1-5.
Because the novel class data and the base class data need to keep balance in quantity to obtain a better training effect, the combination of the novel class data and the base class data refers to: when there are only N novel class targets, the number of base class targets provided is also N.
And 2-2, obtaining new training data.
Wherein, the step 2-2 comprises the following substeps:
and 2-2-1, obtaining a gradient map corresponding to the support image with the label according to the neural network back transmission.
According to a preferred embodiment of the present invention, the gradient map corresponding to the annotated support image is obtained by:
Figure BDA0002970478000000091
wherein the content of the first and second substances,
Figure BDA0002970478000000092
represents a loss function in fine-tuning training,
Figure BDA0002970478000000093
the gradient map G (x, y) is W × H, and has the same size as I (x, y).
And 2-2-2, performing binarization processing on the gradient map, and multiplying the gradient map by a mask map corresponding to a target in the support image to obtain a new mask map serving as new training data.
According to a preferred embodiment of the invention, the new mask map is obtained by:
MS′(x,y)=MS(x,y)·T(G(x,y))
wherein M isS(x, y) denotes an original mask map, and T (G (x, y)) denotes a gradient map after binarization processing;
Figure BDA0002970478000000094
where τ is a dynamic threshold, generally set to: the gradient map is divided into two parts, namely a corresponding mask map area and a non-mask map area, wherein the mask map area sets the value of the mask map area corresponding to the Top 15% of the gradient value in the area to 0 (originally 1), and the non-mask map area is not changed.
And 2-3, training the network model by adopting new training data, and repeating the steps 1-5.
In the invention, during the training of the base class, a characteristic space is constructed, and the interval between novel classes is maximized by introducing class interval loss; during the novel class reconstruction, a feature perturbation module is introduced by truncating the gradient map. Through multiple iterative training, the class intervals are equalized in a mode of antagonism minimum-maximum, so that the class intervals are regularized, and feature reconstruction and target classification are simultaneously realized in the same feature space.
According to a preferred embodiment of the present invention, the testing stage is to apply the trained network model to a small sample target detection task of a novel class to verify the validity of the model.
In a further preferred embodiment, the data class of the testing phase is the same as the data class of the training phase, but the pictures are different.
In a further preferred embodiment, the number of supported images in each novel class of data (new category) is 1 (1shot) or more (raw shot);
preferably, when the number of the support images is multiple, the prototype vector extracted from each class is averaged to obtain the corresponding class prototype vector for target detection.
The small sample image target detection method based on class-spacing balance establishes a characteristic disturbance module for a small number of support images with labels to amplify data of mask images corresponding to targets in the images, extracts prototype vectors from the images through a convolutional neural network, and performs characteristic activation on query images to be detected without labels by using the prototype vectors to realize target detection on the query images. The method does not need a large amount of data marking, and reduces the manual marking cost; aliasing among different types of prototype vectors is reduced, and the target detection precision of the neural network on the query image is improved; the method has important significance for small sample learning, incremental learning and the like, and has application value for target detection in the fields of natural scene images and the like.
The invention also provides a computer readable storage medium, which stores a small sample image target detection training program based on class-spacing balance, and when the program is executed by a processor, the program causes the processor to execute the steps of the small sample image target detection method based on class-spacing balance.
The small sample image target detection method based on class-spacing balance can be realized by means of software plus a necessary general hardware platform, wherein the software is stored in a computer-readable storage medium (comprising a ROM/RAM, a magnetic disk and an optical disk) and comprises a plurality of instructions for enabling a terminal device (which can be a mobile phone, a computer, a server, a network device and the like) to execute the method.
The invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a small sample image object detection training program based on class-spacing balance, and the program causes the processor to execute the steps of the small sample image object detection method based on class-spacing balance when being executed by the processor.
Examples
The present invention is further described below by way of specific examples, which are merely exemplary and do not limit the scope of the present invention in any way.
Example 1
1. Data set
This example was evaluated on the commonly used pascal voc 2007, pascal voc 2012 and MSCOCO datasets.
The object types in the data set are divided into two types: a base class with a large number of annotations and a novel class with K (K-shot) instances of annotations. In the process of training the model by using a large amount of base class data, optimizing the network by using the training data of the base class; during the fine-tuning, the network is optimized by K instances of each novel class and base class.
For the Pascal VOC data set, the whole data set is divided into 3 groups for cross validation, 5 classes are selected as novel classes in each group, the other 15 classes are base classes, and the number K of annotated instances is set to be 1,2,3,5 and 10; for the MS COCO dataset, 20 classes were selected as novel classes, and the remaining 60 classes were set as base classes.
2. Task description
In the small sample target detection process, after completing network feature expression learning by using data of a training set in a data set, realizing target detection on an inquiry image by using a small number of images with labels, namely a support set, of a test set; and performing performance evaluation by using the mAP after the test is finished.
Among these, mAP references "Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll' ar, and C.Lawrence Zitnick. Microsoft COCO: common objects in context ECCV, pages 740-.
Specifically, the training phase:
(1) extracting a feature map of the query image: extracting a base network of the query image as darknet19, inputting a 3-dimensional image with the size of 416 × 416 by the network, and outputting a 13 × 13 feature map with the output of 1024 by the network;
(2) extracting a feature map of the support image and a corresponding prototype vector: prototype vector for supporting an image
Figure BDA0002970478000000121
Wherein f isθS(. is a lightweight convolutional neural network that extracts features of the support image, ISMask for object in image MSThe support image of (a); supporting the data set Scomposed
Figure BDA0002970478000000122
Wherein i is the index number corresponding to the category, k is the index number corresponding to the target in the category,
Figure BDA0002970478000000123
is a three-dimensional image of W x H,
Figure BDA0002970478000000124
to correspond to
Figure BDA0002970478000000125
W × H one-dimensional mask map of (1);
Figure BDA0002970478000000126
for embedding operation, ISAnd MSEmbedding to generate a four-dimensional image to be input to a neural network, wherein W and H are both 416;
(3) and (3) carrying out dimensionality reduction on the prototype vector, and obtaining a loss value after dimensionality reduction: by passing
Figure BDA0002970478000000127
Converting the object detection problem into a classification problem, wherein
Figure BDA0002970478000000128
Is a full connection layer;
maximum edge loss
Figure BDA0002970478000000129
Wherein the class average prototype vector
Figure BDA00029704780000001210
Inner distance of class
Figure BDA00029704780000001211
Class spacing
Figure BDA00029704780000001212
The maximum edge loss can be calculated by the above equation.
(4) And performing feature activation on the query image to realize target detection on the query image.
(5) And calculating the detection result and the labeled detection loss, calculating the gradient of the loss function, performing error gradient back transmission on the network, and updating the network parameters.
The detection result and the labeled detection loss are obtained by the following formula:
Figure BDA0002970478000000131
wherein the content of the first and second substances,
Figure BDA00029704780000001312
representing a query image IQExtracted corresponding feature map, θQRepresenting the extracted query image features corresponding to the network model parameters,
Figure BDA0002970478000000132
which means that the multiplication is performed channel by channel,
Figure BDA0002970478000000133
representing a prediction model, thetaPRepresenting the parameters of the prediction model, muiRepresenting the corresponding prototype vector, M, of the support imageQRepresenting the result of the corresponding test picture marking;
the detection loss function is obtained by:
Figure BDA0002970478000000134
wherein the content of the first and second substances,
Figure BDA0002970478000000135
in order to classify the function of the loss,
Figure BDA0002970478000000136
in order to be a function of the regression loss,
Figure BDA0002970478000000137
as an anchor confidence loss function.
(6) And (5) training the network model by combining the novel class data and the base class data, and repeating the steps (1) to (5).
(7) New training data were obtained:
obtaining a corresponding gradient map of the support image with the label by the following formula:
Figure BDA0002970478000000138
wherein the content of the first and second substances,
Figure BDA0002970478000000139
represents a loss function in fine-tuning training,
Figure BDA00029704780000001310
represents the maximum edge loss, and the gradient map G (x, y) is W × H, and has the same size as I (x, y);
obtaining a new mask map MS′(x,y)=MS(x, y). T (G (x, y)), where MS(x, y) is the original mask map,
Figure BDA00029704780000001311
where τ is a dynamic threshold, typically set to: the gradient map is divided into two parts, corresponding masked and non-masked regionsThe mask map area sets the value of the mask map area corresponding to Top 15% of the gradient value in the area to 0 (originally 1), while the non-mask map area is not changed.
(8) And (5) training the network model by adopting new training data, and repeating the steps (1) to (5).
And (3) a testing stage: the number of the support images in each novel class data (new category) is 1 (1shot) or more (raw shot); when the number of the support images is multiple, the prototype vector extracted from each class is averaged to obtain the corresponding class prototype vector for target detection.
3. Results and analysis
The invention uses Meta YOLO and MPSR as base line, respectively, to study and evaluate on Pascal VOC, MS COCO database, the results are shown in tables 1 and 2 respectively.
TABLE 1 PASCALVOC data set object detection test Performance comparison
Figure BDA0002970478000000141
Figure BDA0002970478000000151
TABLE 2 comparison of target detection test performance results for MSCOCO database
Figure BDA0002970478000000152
Figure BDA0002970478000000161
Where APS is the small target average accuracy, APM is the medium target average accuracy, APL is the large target average accuracy, AR1 is the average recall rate for only one proposal area per graph, AR10 is the average recall rate for only ten proposal areas per graph, AR100 is the average recall rate for only one hundred proposal areas per graph, ARs is the average recall rate for small targets, ARM is the average recall rate for medium targets, and ARL is the average recall rate for large targets.
Wherein LSTD, Meta YOLO, Meta R-CNN, MetaDet, TFA w/cos, Viewpoint and MPSR are the latest technical methods for detecting the current small sample target, and CME is the method provided by the invention.
LSTD is specifically described in the literature "Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao.LSTD A low-shot transfer detector for object detection. in Sheila A.McIlraith and Kilian Q.Weinberger, editors, AAAI, pages 2836-2843,2018.7, 8";
meta Yolo is specifically described in the literature "Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, JianshiFeng, and Trevor Darrell. Few-shot object detection via feature repaging. in IEEE ICCV, pages 8419-8428,2019.1, 2,3,5,6,7, 8";
MetaDet is specifically described in the literature "Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Meta learning to detect ray objects. in IEEE ICCV, pages 9924-9933,2019.1, 2,7, 8";
meta R-CNN is described in particular in the literature "Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. Meta R-CNN: towards general solvent for instance-level low-shot learning. in ECCV, pages 9576-9585. IEEE,2019.1,2,7, 8";
TFA w/cos is specifically described in the literature "Xin Wang, Thomas E.Huang, Trevor Darrell, Joseph E.Gonzalez, and Fisher Yu.Frustratingly simple fe-shot object detection. CoRR, abs/2003.06957,2020.1,3,7, 8";
the Viewpoint is specifically described in the document "Yang Xiao and Red Marlet. Few-shot object detection and Viewpoint estimation for objects in the world. in ECCV,2020.7, 8";
MPSR is specifically described in the literature "Jianxi Wu, Songtao Liu, Di Huang, and Yunhong Wang. Multi-scale positive sample definition for raw-shot object detection. in Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, ECCV, pages 456-472,2020.1, 3,5,7, 8".
In table 1, a comparison was made on the pascalloc dataset for CME and a first order small sample target detector based on a YOLO detector, including LSTD, Meta YOLO, and MetaDet. As can be seen from table 1, the CME detector (method) proposed by the present invention shows a great advantage over other detectors. Specifically, for packet 1(Novel set 1), CME achieves a 0.7% boost at 1shot setting (17.8% versus 17.1%), a 7.0% boost at 2shot setting (26.1% versus 19.1%), and a 2.6% boost at 3shot setting (31.5% versus 28.9%). The boost at the 5shot setting is 9.8% (44.8% compared to 35.0%) and the average boost is 3.8%, which is a large boost for the small sample target detection domain. The average performance improvement for packet 3 was also 0.7%.
In addition, table 1 shows the comparison of the proposed method of the present invention with two-stage small sample target detectors, including MetaDet, Meta RCNN, TFA, Viewpoint and MPSR, which are all based on the fast-RCNN framework. As can be seen from Table 1, CME performs better than the same type of detector in most cases. For packet 1, the CME achieves a 5% boost at the 2shot setting (47.5% versus 42.5%) and a 6% boost at the 5shot setting (58.2% versus 52.2%). The average lifting reaches 1.2 percent. In addition, the average performance of group 2 increased to 1.5% and the average performance of group 3 increased to 0.3%.
The MS COCO dataset has more object classes and images than the Pascal VOC dataset, which means that class-spacing balance may benefit from richer feature representations. As shown in table 2, the test on the MS COCO dataset achieved more significant relative improvement, for the 10shot setting, the CME improved the AP by 5.3% on the basis of the baseline method MPSR, and for the 30shot setting, the AP improved by 2.8%.
Further, fig. 3 and 5 show the comparison of the target detection situation of the method of the present invention and the Baseline method.
Wherein, fig. 3 shows the t-SNE visualization result of the prototype vector generated by the method of the invention and the Baseline method, and the novel class is displayed in bold.
FIG. 4 shows a t-SNE visualization of the evolution of prototype vector features of two classes when trained in the fine-tuning phase. Where the dashed line represents the characteristic interference path (forward propagation) and the solid line segment represents the path with the largest edge loss (backward propagation). In the fine-tuning phase, the instance features of the novel class form subspaces in the feature space learned on the base class.
Fig. 5 shows the target detection results of the method of the present invention and the Baseline method (Meta YOLO and MPSR), where the red box indicates the wrong detection result, the blue box indicates the missing detection result, and the green box indicates the correct detection result.
As can be seen from fig. 5, the method (CME) of the present invention can accurately detect more objects with less erroneous detection results.
For maximum edge loss, the small sample object detector can reduce false detection results due to the increased distance between each class, which is beneficial for the classifier to distinguish and obtain correct classification results. However, as the number of missing targets increases, the maximum edge loss does not facilitate feature reconstruction. By introducing class-spacing balancing, the method of the invention balances the contradiction between classification and characterization.
Further, comparative experiments were performed on each module of CME on the novel class of Pascal vocscope 1, with the results shown in table 3:
TABLE 3
Figure BDA0002970478000000191
Wherein, MM is maximum edge (Max Margin, MM), FF is Feature Filtering (FF) (i.e. the prototype vector is reduced in dimension in step (3)), and FD is Feature perturbation (FD); "√" indicates the inclusion of this module; avg Imp represents the average improvement accuracy of the method of the invention at a setting of 1-10 shot.
As can be seen from table 3, in the method described in this embodiment, the accuracy of model target detection is improved by all three modules.
Comparative experimental results for the number of output channels of the feature filtration module on the novel class of pascal voc split1 are shown in table 4:
TABLE 4
Figure BDA0002970478000000201
As can be seen from Table 4, the model performance improvement is highest when the number of channels of the prototype vector is reduced from 1024 dimensions to 512 dimensions.
The results of the class comparison experiments on the novel class of Pascal VOC split1 for the effect of the feature perturbation module are shown in Table 5:
TABLE 5
Figure BDA0002970478000000202
Wherein, CNovelExpressed as acting on a novel class, CBaseExpressed as acting on the base class.
As can be seen from Table 5, the performance is most improved when the feature perturbation module only acts on the domain base class.
The results of the different perturbation strategies for the characteristic perturbation module comparison experiments on the novel class of Pascal VOC split1 are shown in Table 6:
TABLE 6
Figure BDA0002970478000000211
The method comprises the steps of selecting a Random sample point for truncation, selecting a Random crop for truncation by a Random picture area, performing feature map value truncation according to the feature map value, and performing Gradient run value truncation according to the Gradient map value.
As can be seen from table 6, the characteristic perturbation mode in the characteristic perturbation module is the mode of gradient truncation, which has the highest performance improvement for the model.
The invention has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to be construed in a limiting sense. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, which fall within the scope of the present invention.

Claims (10)

1. A small sample image target detection method based on class-spacing balance is characterized by comprising a training phase and a testing phase, wherein the training phase comprises the following steps:
step 1, training a network model by using base class data;
and 2, carrying out fine tuning training on the network model by using the combination of the novel class data and the base class data.
2. The small sample image target detection method based on class-spacing balance according to claim 1, wherein step 1 comprises the following sub-steps:
step 1-1, extracting a feature map of a query image;
step 1-2, extracting a feature map of a support image and a prototype vector corresponding to the feature map;
step 1-3, performing dimensionality reduction on the prototype vector to obtain a dimensionality-reduced prototype vector and obtain a loss value corresponding to the dimensionality-reduced prototype vector;
step 1-4, performing feature activation on the query image to realize target detection on the query image;
and 1-5, updating parameters of the network model.
3. The small sample image object detection method based on class-spacing balance according to claim 2,
in step 1-2, the prototype vector corresponding to the feature map of the support image is obtained by the following formula:
Figure FDA0002970477990000011
wherein f isθS(. is a lightweight convolutional neural network that extracts features of the support image, ISMask for object in image MSIs used to support the image(s) of (1),
Figure FDA0002970477990000012
for embedding operation, ISAnd MSEmbedding to generate a four-dimensional image to be input to a neural network;
supporting the data set Scomposed
Figure FDA0002970477990000013
I is the index number corresponding to the category, k is the index number corresponding to the target in the category,
Figure FDA0002970477990000014
is a three-dimensional image of W x H,
Figure FDA0002970477990000015
to correspond to
Figure FDA0002970477990000016
W × H, W and H are preferably 416.
4. The small sample image object detection method based on class-spacing balance according to claim 2,
in the step 1-3, the dimension of the prototype vector is reduced by adopting the fully-connected convolution layer, and the loss value corresponding to the prototype vector after dimension reduction is obtained through the maximum edge loss.
5. The method for detecting small sample image targets based on class-spacing balance according to claim 4, wherein the loss value corresponding to the prototype vector after dimensionality reduction is obtained by the following formula:
Figure FDA0002970477990000021
wherein the content of the first and second substances,
Figure FDA0002970477990000022
in order to maximize the edge loss,
Figure FDA0002970477990000023
the distance between the inner sides of the clusters is the similar inner distance,
Figure FDA0002970477990000024
Figure FDA0002970477990000025
Figure FDA0002970477990000026
the distance between the two adjacent lines is the similar distance,
Figure FDA0002970477990000027
μ′iis a vector of a class-averaged prototype,
Figure FDA0002970477990000028
k is the number of prototype vectors corresponding to a class, and j is the index corresponding to the prototype vector in the class.
6. The small sample image object detection method based on class-spacing balance according to claim 1,
step 2 comprises the following substeps:
step 2-1, combining the novel class data and the base class data to form a new data set, then training a network model, and repeating the steps 1-1 to 1-5;
step 2-2, obtaining new training data;
and 2-3, training the network model by adopting new training data, and repeating the steps 1-5.
7. The small sample image object detection method based on class-spacing balance according to claim 6,
step 2-2 comprises the following substeps:
step 2-2-1, obtaining a gradient map corresponding to the support image with the label according to the neural network back transmission;
and 2-2-2, performing binarization processing on the gradient map, and multiplying the gradient map by a mask map corresponding to a target in the support image to obtain new training data.
8. The small sample image object detection method based on class-spacing balance according to claim 1,
the testing stage is to apply the trained network model to a small sample target detection task of a novel class to verify the effectiveness of the model,
preferably, the data category used in the testing phase is the same as the data category used in the training phase, and the pictures are different.
9. A computer-readable storage medium, in which a class-pitch-balance-based small-sample image object detection training program is stored, which, when executed by a processor, causes the processor to perform the steps of the class-pitch-balance-based small-sample image object detection method according to one of claims 1 to 8.
10. A computer device comprising a memory and a processor, wherein the memory stores a class-spacing-balance-based small sample image object detection training program, which when executed by the processor causes the processor to perform the steps of the class-spacing-balance-based small sample image object detection method of one of claims 1 to 8.
CN202110262177.9A 2021-03-10 2021-03-10 Small sample image target detection method based on class interval balance Pending CN113159116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110262177.9A CN113159116A (en) 2021-03-10 2021-03-10 Small sample image target detection method based on class interval balance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110262177.9A CN113159116A (en) 2021-03-10 2021-03-10 Small sample image target detection method based on class interval balance

Publications (1)

Publication Number Publication Date
CN113159116A true CN113159116A (en) 2021-07-23

Family

ID=76886731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110262177.9A Pending CN113159116A (en) 2021-03-10 2021-03-10 Small sample image target detection method based on class interval balance

Country Status (1)

Country Link
CN (1) CN113159116A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963337A (en) * 2021-12-22 2022-01-21 中国科学院自动化研究所 Object image contour primitive extraction method and device
CN114219804A (en) * 2022-02-22 2022-03-22 汉斯夫(杭州)医学科技有限公司 Small sample tooth detection method based on prototype segmentation network and storage medium
CN115100532A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Small sample remote sensing image target detection method and system
CN115424053A (en) * 2022-07-25 2022-12-02 北京邮电大学 Small sample image identification method, device and equipment and storage medium
WO2023053569A1 (en) * 2021-09-28 2023-04-06 株式会社Jvcケンウッド Machine learning device, machine learning method, and machine learning program
CN117152596A (en) * 2023-08-30 2023-12-01 广东皮阿诺科学艺术家居股份有限公司 Intelligent verification method for number and type of custom furniture hardware fitting bags

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766769A (en) * 2018-12-18 2019-05-17 四川大学 A kind of road target detection recognition method based on monocular vision and deep learning
CN111583284A (en) * 2020-04-22 2020-08-25 中国科学院大学 Small sample image semantic segmentation method based on hybrid model
CN112001428A (en) * 2020-08-05 2020-11-27 中国科学院大学 Anchor frame-free target detection network training method based on feature matching optimization
CN112329827A (en) * 2020-10-26 2021-02-05 同济大学 Increment small sample target detection method based on meta-learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766769A (en) * 2018-12-18 2019-05-17 四川大学 A kind of road target detection recognition method based on monocular vision and deep learning
CN111583284A (en) * 2020-04-22 2020-08-25 中国科学院大学 Small sample image semantic segmentation method based on hybrid model
CN112001428A (en) * 2020-08-05 2020-11-27 中国科学院大学 Anchor frame-free target detection network training method based on feature matching optimization
CN112329827A (en) * 2020-10-26 2021-02-05 同济大学 Increment small sample target detection method based on meta-learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BOHAO LI 等: "Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection", 《ARXIV:2103.04612V1》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023053569A1 (en) * 2021-09-28 2023-04-06 株式会社Jvcケンウッド Machine learning device, machine learning method, and machine learning program
CN113963337A (en) * 2021-12-22 2022-01-21 中国科学院自动化研究所 Object image contour primitive extraction method and device
CN113963337B (en) * 2021-12-22 2022-04-08 中国科学院自动化研究所 Object image contour primitive extraction method and device
CN114219804A (en) * 2022-02-22 2022-03-22 汉斯夫(杭州)医学科技有限公司 Small sample tooth detection method based on prototype segmentation network and storage medium
CN115424053A (en) * 2022-07-25 2022-12-02 北京邮电大学 Small sample image identification method, device and equipment and storage medium
CN115100532A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Small sample remote sensing image target detection method and system
CN117152596A (en) * 2023-08-30 2023-12-01 广东皮阿诺科学艺术家居股份有限公司 Intelligent verification method for number and type of custom furniture hardware fitting bags
CN117152596B (en) * 2023-08-30 2024-04-19 广东皮阿诺科学艺术家居股份有限公司 Intelligent verification method for number and type of custom furniture hardware fitting bags

Similar Documents

Publication Publication Date Title
CN113159116A (en) Small sample image target detection method based on class interval balance
Gu et al. A review on 2D instance segmentation based on deep neural networks
CN109446889B (en) Object tracking method and device based on twin matching network
Zhong et al. Squeeze-and-excitation wide residual networks in image classification
CN109117781B (en) Multi-attribute identification model establishing method and device and multi-attribute identification method
Liu et al. TSingNet: Scale-aware and context-rich feature learning for traffic sign detection and recognition in the wild
CN113378710B (en) Layout analysis method and device for image file, computer equipment and storage medium
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN112967341B (en) Indoor visual positioning method, system, equipment and storage medium based on live-action image
US20190034704A1 (en) Method and apparatus for face classification
CN110765882B (en) Video tag determination method, device, server and storage medium
CN110399895A (en) The method and apparatus of image recognition
CN109241816B (en) Image re-identification system based on label optimization and loss function determination method
CN108038515A (en) Unsupervised multi-target detection tracking and its storage device and camera device
CN113657087B (en) Information matching method and device
Vo et al. Active learning strategies for weakly-supervised object detection
Zhang et al. Learning to detect salient object with multi-source weak supervision
CN112818904A (en) Crowd density estimation method and device based on attention mechanism
Etezadifar et al. Scalable video summarization via sparse dictionary learning and selection simultaneously
Yuan et al. Few-shot scene classification with multi-attention deepemd network in remote sensing
Alsanad et al. Real-time fuel truck detection algorithm based on deep convolutional neural network
CN115393666A (en) Small sample expansion method and system based on prototype completion in image classification
Jin et al. The Open Brands Dataset: Unified brand detection and recognition at scale
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data
CN116883740A (en) Similar picture identification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210723