CN115661542A

CN115661542A - Small sample target detection method based on feature relation migration

Info

Publication number: CN115661542A
Application number: CN202211388184.4A
Authority: CN
Inventors: 欧凤婵; 胡蓝青; 阚美娜; 山世光
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-01-31

Abstract

The invention provides a training method of a small sample target detection model based on feature relation migration, which comprises the following steps: s1, acquiring a training set and a support set, and training a pre-trained two-stage model by using the training set to obtain an initialized target detection basic model and an interested area adjusting module; and S2, carrying out repeated iterative training on a training model consisting of the initialized target detection basic model, the region-of-interest adjusting module, the feature extraction network, the relation attention module, the category classification module and the detection frame position regression adjusting module to converge by adopting a training set and a support set so as to obtain a final small sample target detection model consisting of the target detection basic model, the region-of-interest adjusting module, the feature extraction network, the detection frame position regression adjusting module and the category classification module. The invention improves the positioning capability of the small sample target detection model on the target, and can solve the problem of forgetting learning of the base class while having the capability of identifying the small sample class.

Description

Small sample target detection method based on feature relation migration

Technical Field

The invention relates to the field of computer vision, in particular to the field of target detection in computer vision, and more particularly to a training method of a small sample target detection model based on feature relationship migration and a small sample target detection method based on the trained model.

Background

Computer vision is an important area in Artificial Intelligence (AI) for computers to be able to obtain meaningful information from images, video, or other visual input and to take action based on that information to provide suggested or referenced information for downstream tasks. Computer vision involves multiple task branches, such as object detection, image classification, etc., which are important for downstream tasks in artificial intelligence. Among them, the Object Detection task is an important branch in the field of computer vision, and it needs to find out all interested objects from the image and determine their corresponding categories and positions. A common target detection task is to train a model with higher accuracy on a data set with a large amount of labeling resources to perform target identification, but the labeling resources are scarce, the labeling work is very costly, and some samples (e.g., rare animal faces) are difficult to obtain, so in practical application, a detection task of a small sample class is common. In order to implement the task of detecting small sample classes, it is common practice to combine the information of the source domain data set (also called training set) of the class with more sufficient training samples and the information of the target domain data set (also called support set) of the class with only a few training samples (small sample class) to implement the target detection of small sample types.

Currently, the detection task of small sample class is mostly based on a deep neural network framework. For example, chinese patent application publication No. CN108229658A discloses a method and an apparatus for implementing an object detector based on a limited sample, which is a method for detecting a small sample category based on a regularized transfer learning method. The method mainly adopts the technical means that an object detector (target detection model) based on a neural network is established, and a regularization-based transfer learning method is designed to obtain knowledge transfer and background suppression regular terms from a source domain data set (training set) to a target domain data set (support set) so as to train the object detector. The method only migrates knowledge on the classification result level, and the relation mining between the characteristic attributes of the samples in the training set and the support set is not fine-grained enough, so that the classification capability of a target detection model and the detection capability of a target position are influenced.

For another example, the chinese patent application publication No. CN112364747A discloses a "target detection method under a limited sample", which is a small sample class detection method based on graph structure modeling. The method mainly comprises the following technical means: the method comprises the steps of screening prediction candidate frames in picture samples in a support set (the category in the support set is called as a new category, and the sample in the support set is called as a new category sample) by adopting a model trained on a source domain (the category in the source domain is called as a base class), obtaining candidate frames containing objects (the candidate frames containing the objects are candidate areas or interested areas), carrying out convolution processing on the candidate frames, and forming a graph structure by convolution characteristics corresponding to the obtained candidate areas so as to train and obtain a category label of each candidate area. The method only constructs the relation between the base class and the new class characteristics and the classification result to detect the class of each candidate region, but the generalization detection capability of the model obtained by the method to the new class is poor due to the lack of direct enhancement to the few sample characteristics of the new class.

In summary, in the prior art, only the migration training is performed on the target detection model in the classification result level, and the relationship between the feature attributes of the samples in the source domain training set and the target domain support set is not mined, so that the relationship between the new class and the base class feature is not fully utilized by the target detection model for training convergence, which causes the problems that the converged target detection model has a learning forgetting problem for the base class and positioning is inaccurate in the target detection of the small sample class (new class).

Disclosure of Invention

Therefore, an object of the present invention is to overcome the above-mentioned drawbacks of the prior art, and provide a training method for a small sample target detection model based on feature relationship migration and a small sample target detection method based on a trained model.

According to a first aspect of the present invention, there is provided a training method for a small sample target detection model based on feature relationship migration, the method including: s1, acquiring a training set and a support set, and training a pre-trained two-stage model by using the training set to obtain an initialized target detection basic model and an interested area adjusting module; the training set comprises a plurality of base classes, each base class is provided with a plurality of samples with base class labels, and the base class labels comprise class labels of the base classes and position labels of target detection frames; the supporting set comprises a plurality of new classes different from the base class, each new class has a sample meeting the task requirement of the small sample and having a new class label, and the new class label comprises a class label of the new class, a position label of the target detection frame and a background label; and S2, carrying out repeated iterative training on a training model consisting of the initialized target detection basic model, the initialized region-of-interest adjusting module, the feature extraction network, the relation attention module, the class classification module and the detection frame position regression adjusting module to converge by adopting a training set and a support set so as to obtain a final small sample target detection model consisting of the target detection basic model, the region-of-interest adjusting module, the feature extraction network, the detection frame position regression adjusting module and the class classification module.

In some embodiments of the invention, each iterative training of the invention comprises: s21, respectively extracting an original region of interest of each sample in a training set and a support set by using an initialized target detection basic model; s22, adjusting the original region of interest of each sample in the training set and the support set extracted in the step S21 by using a region of interest adjusting module; s23, extracting the original characteristics of the region of interest of each sample in the training set and the support set adjusted in the step S22 by using a characteristic extraction network; s24, enhancing the original features of each sample in the support set by adopting a relational attention module based on the original features of each sample in the support set and the original features of each sample in the training set to obtain enhanced features of each sample in the support set; s25, outputting the predicted target detection frame position of the sample based on the original characteristics of each sample in the support set by adopting a detection frame position regression adjusting module, and calculating the target detection frame position regression loss based on the target detection frame position labels and the predicted target detection frame positions of all samples in the support set; s26, carrying out class classification on the samples by adopting a class classification module based on the enhanced features of each sample in the support set to obtain a prediction classification result, and calculating class classification loss based on the prediction classification results and class labels of all samples in the support set; and S27, updating parameters of the training model by adopting the position return loss and the class classification loss of the target detection frame.

In some embodiments of the present invention, the training model further comprises a background classification module, and each iteration of the training of the present invention further comprises: s26', outputting a background classification prediction result of the sample based on the original characteristics of each sample in the support set by adopting a background classification module, and calculating background classification loss based on the background classification prediction results and the background labels of all samples in the support set; and S27', updating parameters of the training model by adopting the position regression loss, the background classification loss and the class classification loss of the target detection frame.

In some embodiments of the invention, the invention updates parameters of the detection frame position regression adjustment module with target detection frame position regression loss; and updating parameters of the category classification module, the feature extraction network and the region-of-interest adjusting module by adopting category classification loss and background classification loss.

In some embodiments of the invention, the pre-trained two-stage model is any one of the following networks: RCNN, fastRCCNN, fasterRCNN, SPPNet, FPN.

In some embodiments of the invention, in step S24, the original features of each sample in the support set are enhanced by: s241, calculating an attention matrix of the sample in the support set to all samples in the training set based on the features of all samples in the training set extracted in the step S21 and the features of the current sample in the support set; s242, obtaining the attention matrix and the characteristics of the current sample based on the step S241, and enhancing the original characteristics of the sample through the following formula:

F_qknew＝F_qk+Rk

wherein F _ qknew is the enhanced feature of the kth sample in the support set, F _ qk is the original feature of the kth sample in the support set, and Rk is the attention matrix of the kth sample in the support set to all samples in the training set.

In some embodiments of the invention, the invention calculates the attention matrix for each sample in the support set to all samples in the training set using the following formula:

Rk＝softmax(Uk)*F_s＝{softmax(Ukv)*F_sv}

Uk＝{Ukv}

F_s＝{F_sv}

wherein Rk is an attention matrix of a kth sample in the support set to all samples in the training set, uk is a similarity matrix of the kth sample in the support set to all samples in the training set, softmax (Uk) is a normalization performed on each element in the similarity matrix respectively, F _ s is a set of features of all samples in the training set, F _ SV is a feature of a v-th sample in the training set, F _ qk is a feature of a kth sample in the support set, ukv is a v-th element in Uk, uki is a similarity between the kth sample in the support set and an i-th base class sample, and n is the number of all base class samples in the training set.

In some embodiments of the invention, the invention calculates the background classification loss of the support set using the following formula:

wherein, bglabel is a background label of the new class, K _ fb represents a set of original features of all samples of the support set input to the background classification module, and sigmoid (K _ fb) is a background classification result corresponding to the feature set K _ fb.

In some embodiments of the invention, the invention calculates the class classification loss for the support set using the following formula:

LF＝objlabel*log(Softmax(Kc))＝objlabel*log(Softmax({Kc _i }))；

wherein objlabel is a class label of the new class, kc _i Representing the probability that all samples of each class in the support set are classified as class i, softmax (Kc _ i) is the normalized classification probability that all samples of each class in the support set are classified as class i.

In some embodiments of the present invention, the regression loss is calculated in step S25 using the following formula:

wherein, bi _jx The x-axis coordinate, bi, of the jth vertex in the predicted target detection frame position after the original feature of the ith sample in the support set is adjusted by the detection frame position regression adjusting module _jy The y-axis coordinate Di in the position of the predicted target detection frame is adjusted by the detection frame position regression adjusting module according to the original characteristics of the ith sample in the support set _jx Is the x-axis coordinate, di, of the jth vertex in the target detection frame position label of the ith sample in the support set _jy Is the y-axis coordinate of the jth vertex in the target detection frame position label that supports the ith sample in the set.

According to a second aspect of the present invention, there is provided an object detection method comprising: f1, acquiring an input image; and F2, carrying out target detection on the input image obtained in the step F1 by adopting a small sample target detection model obtained by the method according to the first aspect of the invention so as to obtain the target position and the target classification in the image.

Compared with the prior art, the invention has the advantages that: aiming at the detection problem of small sample classes (such as object classes, animal face detection and other target classes), a detection frame position regression adjusting module is introduced to be used as a regression branch to predict the position of a target detection frame, the target detection frame position regression loss of all samples in a support set is calculated based on the target detection frame position labels and the predicted target detection frame positions of all samples in the support set, and the parameters of the detection frame position regression adjusting module are adjusted based on the regression loss, so that the target positioning capability in a small sample target detection task is improved; and a relation attention module is also introduced to enhance the original features of each sample in the support set based on the original features of each sample in the support set and the original features of each sample in the training set to obtain the enhanced features of each sample in the support set, a category classification module performs category classification prediction based on the enhanced features of each sample in the support set, calculates the category classification loss of the support set based on the prediction classification and the category label of all samples in the support set, and adjusts the parameters of the training model based on the category classification loss, so that the final small sample target detection model can retain the learning knowledge of the base class while having the capability of identifying the category of the small sample, and solves the problem that the existing model forgets to learn the base class.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

fig. 1 is a schematic flowchart of a training method of a small sample target detection model based on feature relationship migration according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a training method of a small sample target detection model based on feature relation migration according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As mentioned in the background art, in the prior art, the target detection model is only subjected to migration training on the classification result level, and the relation between the feature attributes of the samples in the training set and the support set is not mined, so that the relation between the new class and the base class features is not fully utilized by the training converged target detection model, and the problems that the converged target detection model has learning forgetting to the base class and has insufficient positioning capability in small sample class target detection are caused.

In order to solve the above problems, the solution provided by the present invention is to introduce an auxiliary training task (for example, a target regression detection task, a background classification task, etc.) to perform migration training on a training model in addition to a basic target detection task in the process of performing migration training on a target detection model to obtain a small sample target detection model, thereby improving the target positioning capability of the small sample target detection model, so that the small sample target detection model has generalization capability and avoids forgetting to learn a base class.

In summary, the present invention first provides a training method of a small sample target detection model based on feature relationship migration, as shown in fig. 1, the method includes: s1, acquiring a training set and a support set, and training a pre-trained two-stage model (namely the two-stage model which is trained in the prior art) by using the training set to obtain an initialized target detection basic model and an area-of-interest adjusting module; and S2, carrying out repeated iterative training on the training model constructed on the basis of the initialized target detection basic model and the region-of-interest adjusting module by adopting a training set and a support set to converge so as to obtain a final small sample target detection model.

According to one embodiment of the invention, step S2 comprises: and performing repeated iterative training on a training model (namely constructing the training model) consisting of the initialized target detection basic model, the initialized region-of-interest adjusting module, the feature extraction network, the relationship attention module, the class classification module and the detection frame position regression adjusting module by adopting a training set and a support set until convergence to obtain a final small sample target detection model consisting of the target detection basic model, the region-of-interest adjusting module, the feature extraction network, the detection frame position regression adjusting module and the class classification module, and updating parameters of the training model by adopting target detection frame position regression loss and class classification loss. The method comprises the steps of introducing a detection frame position regression adjusting module as a regression branch to predict the position of a target detection frame, calculating the position regression loss of the target detection frame of all samples in a support set based on the position labels of the target detection frame of all samples in the support set and the predicted position of the target detection frame, and adjusting the parameters of the detection frame position regression adjusting module based on the regression loss, so that the target positioning capability in a small sample target detection task is improved. The invention also introduces a relation attention module to enhance the original features of each sample in the support set based on the original features of each sample in the support set and the original features of each sample in the training set to obtain the enhanced features of each sample in the support set, a category classification module performs category classification prediction based on the enhanced features of each sample in the support set, calculates the category classification loss of the support set based on the prediction classification and the category labels of all samples in the support set, and adjusts the parameters of the training model based on the category classification loss, so that the final small sample target detection model can keep the learning knowledge of a base class while having the capability of identifying the category of the small sample, and solves the problem that the existing model forgets to learn the base class. In addition, in order to further improve the identification capability of the final small sample target detection model on the small sample class, a background classification module is introduced into the training model and used for outputting a background classification prediction result of the sample based on the original characteristics of each sample in the support set so as to obtain the background classification prediction of all samples in the support set, the background classification loss of the support set is calculated based on the background classification prediction of all samples and a background label, the model parameter is updated, the perception capability of the final small sample target detection model on the background is improved, and therefore the identification capability of the model on the small sample class target is improved.

According to one embodiment of the present invention, the target detection basis model and the region of interest adjustment module in the present invention employ a pre-trained two-stage model. According to an embodiment of the present invention, the target detection basis model is any one of the following models: the model comprises RCNN, fastRCCNN, fasterRCNN, SPPNet and FPN, which can realize basic target detection and region-of-interest adjustment, for the convenience of understanding, the part for realizing the target detection function is called a target detection basic model, the part for realizing the region-of-interest adjustment is called a region-of-interest adjustment module, after the two-stage model is trained and initialized by a training set, the initialized target detection basic model and the region-of-interest adjustment module can be obtained, and in the migration training process, the parameters of the target detection basic model initialized by the training set are solidified, and the parameters of the region-of-interest adjustment function module are updated in an iterative manner. According to one embodiment of the invention, the feature extraction network employs a CNN network. According to one embodiment of the invention, the category classification module and the background classification module both employ a fully connected layer. Since the network structure and functions adopted by the training model are basically known to those skilled in the art, the invention does not give much details on the specific functions of each sub-module and sub-network in the training model, and is only introduced in the aspect of the training method of the small sample target detection model.

For a better understanding of the present invention, reference is made to the following detailed description of the invention in conjunction with the accompanying drawings and examples.

Fig. 2 is a schematic diagram of a training method for a small sample target detection model based on feature relationship migration, where a region in a dashed line frame represents a constructed training model and a data flow direction in the model, and the training model includes a target detection basic model, an interested region adjusting module, a feature extraction network, a relationship attention module, a category classification module, and a detection frame position regression adjusting module; the small sample target detection model comprises a target detection basic model, an interesting region adjusting module, a feature extraction network, a detection frame position returning adjusting module and a category classification module. The arrows between modules represent specific data information transmitted between the modules. In the process of training a constructed training model by adopting a training set and a supporting set, the training set and the supporting set are jointly used as the input of the model to carry out repeated iterative training until convergence, wherein the training set comprises a plurality of base classes, each base class is provided with a plurality of samples with base class labels (the samples in the training set are also called base class samples), and the base class labels comprise class labels of the base classes and position labels of target detection frames; the support set comprises a plurality of new classes different from the base class, each new class has a sample (the sample in the support set is also called a new class sample) which meets the task requirement of the small sample and has a new class label, and the new class label comprises a class label of the new class, a position label of the target detection frame and a background label. According to one embodiment of the invention, each iterative training in the invention comprises: s21, respectively extracting an original region of interest of each sample in a training set and a support set by using an initialized target detection basic model; s22, adjusting the original region of interest of each sample in the training set and the support set extracted in the step S21 by using a region of interest adjusting module; s23, extracting the original features of the region of interest of the adjustment of each sample in the training set and the support set adjusted in the step S22 by using a feature extraction network (the original features of the samples of the support set in the attached figure 1 are a set formed by the original features of the region of interest of the adjustment of all samples in the support set; s24, enhancing the original features of each sample in the support set by using a relational attention module based on the original features of each sample in the support set and the original features of each sample in the training set to obtain enhanced features of each sample in the support set (the enhanced support set features in the attached drawing 1 are a set formed by enhanced features of all samples in the support set); s25, outputting the predicted target detection frame position of the sample based on the original feature of each sample in the support set by adopting a detection frame position regression adjusting module, and calculating the target detection frame position regression loss based on the target detection frame position labels and the predicted target detection frame positions of all samples in the support set; s26, carrying out class classification on the samples by adopting a class classification module based on the enhanced features of each sample in the support set to obtain a prediction classification result, and calculating class classification loss based on the prediction classification results and class labels of all samples in the support set; and S27, updating parameters of the training model by adopting the position regression loss and the class classification loss of the target detection frame. According to an embodiment of the invention, the small sample target detection method further comprises a background classification module for improving the perception capability of the final small sample target detection model to the background, so as to improve the identification capability of the small sample class. According to an embodiment of the present invention, in each iterative training, on the basis of steps S21 to S26, each iterative training further includes: s26', outputting a background classification prediction result of the sample based on the original characteristics of each sample in the support set by adopting a background classification module, and calculating background classification loss based on the background classification prediction results and the background labels of all samples in the support set; and S27', updating parameters of the training model by adopting the position regression loss, the background classification loss and the class classification loss of the target detection frame.

Since the steps described in step S21 to step S23 are methods known to those skilled in the art, the present invention is not described in detail, and other steps are described in detail below for better understanding of the present invention.

In step S24, a relational attention module is used to perform enhancement processing on the original features of each sample in the support set based on the original features of each sample in the support set and the original features of each sample in the training set to obtain enhanced features of each sample in the support set. According to one embodiment of the invention, the original features of each sample in the support set are enhanced by: s241, calculating an attention matrix of the sample in the support set to all samples in the training set based on the features of all samples in the training set extracted in the step S21 and the features of the current sample in the support set; s242, obtaining the attention matrix and the characteristics of the current sample based on the step S241, and enhancing the original characteristics of the sample through the following formula:

F_qknew＝F_qk+Rk

According to one embodiment of the invention, the invention calculates the attention matrix for each sample in the support set to all samples in the training set using the following formula:

Rk＝softmax(Uk)*F_s＝{softmax(Ukv)*F_sv}

Uk＝{Ukv}

F_s＝{F_sv}

wherein Rk is an attention matrix of a kth sample in the support set to all samples in the training set, uk is a similarity matrix of the kth sample in the support set to all samples in the training set, softmax (Uk) is a normalization performed on each element in the similarity matrix respectively, F _ s is a set of features of all samples in the training set, F _ sv is a feature of a v-th sample in the training set, F _ qk is a feature of a kth sample in the support set, ukv is a v-th element in Uk, uki is a similarity between the kth sample in the support set and the ith base class sample, and n is the number of all base class samples in the training set.

In step S25, the detection frame position regression adjustment module is adopted to output the predicted target detection frame position of the sample based on the original feature of each sample in the support set, and calculate the target detection frame position regression loss based on the target detection frame position labels and the predicted target detection frame positions of all samples in the support set. The target detection frame position regression processing is an independent regression branch task and is used for performing target position regression processing based on the original characteristics of the support set so as to achieve the purpose of more accurately positioning the target position. According to one embodiment of the invention, the regression loss is calculated using the following formula:

wherein, bi _jx The x-axis coordinate, bi, of the jth vertex in the predicted target detection frame position after the original feature of the ith sample in the support set is adjusted by the detection frame position regression adjusting module _jy The original characteristics of the ith sample in the support set are adjusted by a position returning adjusting module of a detection frameY-axis coordinate, di, in the integrated predicted target detection frame position _jx Is the x-axis coordinate, di, of the jth vertex in the target detection frame position label of the ith sample in the support set _jy Is the y-axis coordinate of the jth vertex in the target detection frame position label that supports the ith sample in the set.

In step S26, a class classification module is used to classify the samples based on the enhanced features of each sample in the support set to obtain a predicted classification result, and a class classification loss is calculated based on the predicted classification results and class labels of all samples in the support set. According to one embodiment of the invention, the category classification loss of the support set is calculated using the following formula:

LF＝objlabel*log(Softmax(Kc))＝objlabel*log(Softmax({Kc _i }))；

wherein objlabel is a class label of the new class, kc _i Representing the probability of all samples of each class in the support set being classified as class i, softmax (Kc _ i) is the normalized classification probability of all samples of each class in the support set being classified as class i.

In order to improve the classification accuracy, a background classification task is additionally introduced into the classification task so as to carry out background classification prediction on the samples based on the original characteristics of the samples in the support set by adopting a background classification module, and the background classification loss is calculated based on the background classification prediction results of all new samples. According to one embodiment of the invention, the background classification loss of the support set is calculated using the following formula:

wherein, bglabel is a background label of a new class, K _ fb represents a set of original features of all samples of the support set input to the background classification module, and sigmoid (K _ fb) is a background classification prediction result corresponding to the feature set K _ fb.

In step S27, the parameters of the training model are updated with the target detection frame position regression loss and the class classification loss, wherein the parameters of the training model are updated with the target detection frame position regression loss, the background classification loss and the class classification loss when the background classification task is introduced. According to one embodiment of the invention, the parameters of the detection frame position regression adjustment module are updated by adopting the target detection frame position regression loss, and the parameters of the category classification module, the feature extraction network and the region-of-interest adjustment module are updated by adopting the category classification loss and the background classification loss.

To better illustrate the technical effects of the present invention, the following experiments were performed to verify the effects.

In the experimental process, firstly, a final small sample target detection model is obtained through the training method of the small sample target detection model provided by the invention, and then the final small sample target detection model and the target detection model obtained through the training of the existing method are subjected to comparative test evaluation through selecting an experimental data set.

Firstly, selecting a common object detection image data set pascal voc, dividing a part of categories (for example, 10 categories) as a training set S, taking a small number of samples (the number of the small number of samples is set according to the number required by the small samples) from the other part of categories (for example, 2 categories) as a support set Q, and taking other samples with the same category as that in the support set Q as a verification set V; secondly, inputting a training set S into a trained two-stage model (preferably a FasterRCNN pre-trained two-stage model in the experiment) to obtain an initialized target detection basic model and an interested region adjusting module, and in the experiment, training a FasterRCNN 15 round by using the training set S to obtain the initialized target detection basic model and the interested region adjusting module, wherein the initial learning rate of the 1 st round to the 10 th round is 0.02, and the learning rate is respectively attenuated to 0.1 of the learning rate of the previous round in the 11 th round and the 13 th round; inputting the training set S and the support set Q into a training model (such as the corresponding training model in FIG. 2), and training the method of the training model to be convergent by using the method corresponding to the training method of the small sample target detection model provided by the invention. In this experiment, the specific data of convergence are: the 300 training runs reached convergence with an initial learning rate of 0.005, which decayed to the previous 0.1 at runs 250 and 275. And finally, selecting a small sample target detection model in the convergence training model as a final small sample target detection model.

Then training the final small sample target detection model with the existing method to obtain a target detection model, and performing the following tests: and randomly selecting an image in an image data set passacal voc, inputting the image into a final small sample target detection model to obtain the target position and target classification of the image, and outputting the target in the image as a background category if no object exists. Specifically, 15 types of samples (officially set) are randomly taken from a common object detection image data set pascal voc as base class samples, 5 types of samples (officially set) are taken as new type samples to form an experimental data set, and evaluation test of detection results is carried out on a final small sample target detection model. The evaluation test selects the commonly used test index mAP (mean Average Precision) to obtain the test result. In the experiment, seven groups of comparison experiments are selected to calculate the average precision of the base class and the average precision of the new class, and the specific calculation result is shown in table 1, wherein the first group of experiments (the first group of experiments are represented by '3 samples of the new class' in table 1) select all classes in an experiment data set to test, and 3 samples in each class are taken to obtain data for calculation and obtain the average precision of the base class and the average precision of the new class; in the second set of experiments (in table 1, "10 samples of the new class" is used to represent the first set of experiments), all classes are selected from the experimental data set for testing, 10 samples different from the first group are taken from each class to obtain data, and the average precision of the base class and the average precision of the new class are calculated. In the experiment, five groups of control experiments are also used for calculating the average precision of the new class, and the specific calculation result is shown in table 2, wherein the third group of experiments selects all new classes in an experiment data set for testing, 1 sample in each class is taken to obtain data, and the average precision of the new classes is calculated; selecting all new classes in an experimental data set for testing in a fourth group of experiments, obtaining data and calculating to obtain new class average precision by taking 2 samples in each class, selecting all new classes in an experimental data set for testing in a fifth group of experiments, obtaining data and calculating to obtain new class average precision by taking 3 samples in each class, selecting all new classes in an experimental data set for testing in a sixth group of experiments, obtaining data and calculating to obtain new class average precision by taking 5 samples in each class, selecting all new classes in an experimental data set for testing in a seventh group of experiments, obtaining data of 10 samples in each class and calculating to obtain new class average precision, wherein the samples in each group are not the same as the samples used in the other groups, and corresponding columns of 1, 2, 3, 5 and 10 in table 2 sequentially represent experiments in the third group, the fourth group, the fifth group, the sixth group and the seventh group; each row of the column in Table 2 in which "method" is located represents a target detection model trained by one method. It should be noted that the calculation process of the average precision is well known in the art and is not described herein; in tables 1 and 2, FRTN represents that the final small sample target detection model is obtained by the training method provided by the present invention, and the remaining english abbreviations, such as LSTD, meta-YOLO, etc., represent models obtained by training corresponding to the existing methods, which are not described herein again.

As can be seen from Table 1, the average precision of the base class of the final small sample target detection model obtained by the invention in the first group of experiments is 75.0, and the average precision of the new class is 61.4; the base class average precision of the final small sample target detection model obtained by the invention in the second group of experiments is 79.5, and the new class average precision is 71.3, which shows that the final small sample target detection model obtained by the training method of the small sample target detection model provided by the invention can keep the base class average precision, namely reduce the problem of forgetting to learn the base class (learning forgetting means that the classification detection capability of the base class is reduced in the learning process).

TABLE 1

TABLE 2

As can be seen from table 2, the new class average precision of the final small sample target detection model obtained by the present invention in the seventh group of experiments is 71.3, which is higher than the new class average precision of the models obtained by other existing methods, which indicates that the training method for the small sample target detection model provided by the present invention improves the target location ability of the final small sample target detection model obtained by the present invention for the small sample class through the improvement of the new class detection classification precision.

In summary, in the present invention, for the detection problem of small sample classes (e.g., object classes such as object and animal face detection), the detection frame position regression adjustment module is introduced to be used as a regression branch to predict the target detection frame position, and the regression loss of the target detection frame position regression of all samples in the support set is calculated based on the target detection frame position labels and the predicted target detection frame positions of all samples in the support set, and then the parameters of the detection frame position regression adjustment module are adjusted based on the regression loss, thereby improving the target positioning capability in the small sample target detection task. The invention also introduces a relation attention module to enhance the original characteristics of each sample in the support set based on the original characteristics of each sample in the support set and the original characteristics of each sample in the training set to obtain the enhanced characteristics of each sample in the support set, a category classification module performs category classification prediction based on the enhanced characteristics of each sample in the support set, calculates the category classification loss of the support set based on the prediction classification and the category label of all samples in the support set, and adjusts the parameters of the training model based on the category classification loss, so that the final small sample target detection model can keep the learning knowledge of the base class while having the capability of identifying the small sample category, solves the problem that the existing model forgets to learn the base class and improves the perception capability of the final small sample target detection model to the background, thereby improving the capability of identifying the small sample category target by the model.

It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that holds and stores the instructions for use by the instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A training method of a small sample target detection model based on feature relation migration is characterized by comprising the following steps:

s1, acquiring a training set and a support set, and training a pre-trained two-stage model by using the training set to obtain an initialized target detection basic model and an interesting region adjusting module; the training set comprises a plurality of base classes, each base class is provided with a plurality of samples with base class labels, and the base class labels comprise class labels of the base classes and position labels of target detection frames; the supporting set comprises a plurality of new classes different from the base class, each new class has a sample meeting the task requirement of the small sample and having a new class label, and the new class label comprises a class label of the new class, a position label of the target detection frame and a background label;

s2, carrying out repeated iterative training on a training model consisting of the initialized target detection basic model, the initialized interested region adjusting module, the feature extraction network, the relation attention module, the category classification module and the detection frame position regressive adjusting module to convergence by adopting a training set and a support set so as to obtain a final small sample target detection model consisting of the target detection basic model, the interested region adjusting module, the feature extraction network, the detection frame position regressive adjusting module and the category classification module, wherein each iterative training comprises:

s21, respectively extracting an original region of interest of each sample in a training set and a support set by using an initialized target detection basic model;

s22, adjusting the original region of interest of each sample in the training set and the support set extracted in the step S21 by using a region of interest adjusting module;

s23, extracting the original characteristics of the region of interest of each sample in the training set and the support set adjusted in the step S22 by using a characteristic extraction network;

s24, enhancing the original features of each sample in the support set by adopting a relational attention module based on the original features of each sample in the support set and the original features of each sample in the training set to obtain enhanced features of each sample in the support set;

s25, outputting the predicted target detection frame position of the sample based on the original characteristics of each sample in the support set by adopting a detection frame position regression adjusting module, and calculating the target detection frame position regression loss based on the target detection frame position labels and the predicted target detection frame positions of all samples in the support set;

s26, carrying out class classification on the samples by adopting a class classification module based on the enhanced features of each sample in the support set to obtain a prediction classification result, and calculating class classification loss based on the prediction classification results and class labels of all samples in the support set;

and S27, updating parameters of the training model by adopting the position regression loss and the class classification loss of the target detection frame.

2. The method of claim 1, the training model further comprising a background classification module, wherein each iterative training further comprises:

s26', outputting a background classification prediction result of the sample based on the original characteristics of each sample in the support set by adopting a background classification module, and calculating background classification loss based on the background classification prediction results and the background labels of all samples in the support set;

and S27', updating parameters of the training model by adopting the position regression loss, the background classification loss and the class classification loss of the target detection frame.

3. The method of claim 2, wherein the parameters of the detection frame position homing adjustment module are updated with target detection frame position homing loss; and updating parameters of the category classification module, the feature extraction network and the region-of-interest adjusting module by adopting category classification loss and background classification loss.

4. The method of claim 3, wherein the pre-trained two-stage model is any one of the following networks: RCNN, fastRCCNN, fasterRCNN, SPPNet, FPN.

5. The method according to claim 4, wherein in step S24, the original features of each sample in the support set are enhanced by:

s241, calculating an attention matrix of the sample in the support set to all samples in the training set based on the features of all samples in the training set extracted in the step S21 and the features of the current sample in the support set;

s242, obtaining the attention matrix and the characteristics of the current sample based on the step S241, and enhancing the original characteristics of the sample through the following formula:

F_qknew＝F_qk+Rk

wherein, F _ qknew is the enhanced feature of the kth sample in the support set, F _ qk is the original feature of the kth sample in the support set, and Rk is the attention matrix of the kth sample in the support set to all samples in the training set.

6. The method of claim 5, wherein the attention matrix for each sample in the support set to all samples in the training set is calculated using the following formula:

Rk＝softmax(Uk)*F_s＝{softmax(Ukv)*F_sv}

Uk＝{Ukv}

F_s＝{F_sv}

wherein, rk is an attention matrix of the kth sample in the support set to all samples in the training set, uk is a similarity matrix of the kth sample in the support set to all samples in the training set, softmax (Uk) is normalization of each element in the similarity matrix, F _ s is a set of features of all samples in the training set, F _ sv is a feature of the vth sample in the training set, F _ qk is a feature of the kth sample in the support set, ukv is a vth element in Uk, uki is a similarity of the kth sample in the support set to the ith base class sample, and n is the number of all base class samples in the training set.

7. The method of claim 6, wherein the background classification loss of the support set is calculated using the following formula:

LQ＝bglabel*log(sigmoid(K_fb))；

wherein, bglabel is a background label of a new class, K _ fb represents a set of original features of all samples of the support set input to the background classification module, and sigmoid (K _ fb) is a background classification result corresponding to the feature set K _ fb.

8. The method of claim 7, wherein the class classification loss of the support set is calculated using the following formula:

LF＝objlabel*log(Softmax(Kc))＝objlabel*log(Softmax({Kc _i }))；

9. The method of claim 3, wherein the regression loss is calculated in step S25 using the following formula:

wherein, bi _jx The original characteristic of the ith sample in the support set is adjusted by a detection frame position regression adjusting module to predict the x-axis coordinate, bi, of the jth vertex in the target detection frame position _jy The y-axis coordinate Di in the position of the predicted target detection frame is adjusted by the detection frame position regression adjusting module according to the original characteristics of the ith sample in the support set _jx Is the x-axis coordinate, di, of the jth vertex in the target detection frame position label of the ith sample in the support set _jy Is the y-axis coordinate of the jth vertex in the target detection frame position label for the ith sample in the support set.

10. An object detection method, characterized in that the detection method comprises:

f1, acquiring an input image;

f2, carrying out target detection on the input image obtained in the step F1 by using the small sample target detection model obtained according to the method of any one of claims 1 to 9 so as to obtain the target position and the target classification in the image.

11. A computer-readable storage medium, having stored thereon a computer program executable by a processor for performing the steps of the method of any one of claims 1 to 9, 10.

12. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the steps of the method according to any one of claims 1 to 9, 10.