WO2024032010A1 - Transfer learning strategy-based real-time few-shot object detection method - Google Patents

Transfer learning strategy-based real-time few-shot object detection method Download PDF

Info

Publication number
WO2024032010A1
WO2024032010A1 PCT/CN2023/086781 CN2023086781W WO2024032010A1 WO 2024032010 A1 WO2024032010 A1 WO 2024032010A1 CN 2023086781 W CN2023086781 W CN 2023086781W WO 2024032010 A1 WO2024032010 A1 WO 2024032010A1
Authority
WO
WIPO (PCT)
Prior art keywords
few
detection
sample
model
training
Prior art date
Application number
PCT/CN2023/086781
Other languages
French (fr)
Chinese (zh)
Inventor
李国权
夏瑞阳
林金朝
庞宇
朱宏钰
Original Assignee
重庆邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 重庆邮电大学 filed Critical 重庆邮电大学
Publication of WO2024032010A1 publication Critical patent/WO2024032010A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the invention belongs to the field of image processing and relates to a real-time detection method of few-sample targets based on a transfer learning strategy.
  • Object detection is one of the most important and fundamental tasks in computer vision.
  • CNN Convolutional Neural Network
  • visual Transformer with high detection performance.
  • the excellent detection performance of these models is achieved at the expense of large amounts of data. Due to the complexity of the object and the large number of model parameters, the detection accuracy will drop rapidly when the amount of data is limited. Therefore, few-shot target detection has received more and more attention in recent years.
  • the purpose of the method based on meta-learning strategy is to obtain the correlation between the current image and the few samples.
  • the detection performance for the few samples has been improved, due to the feature extraction structure, input features and few sample features in the minority sample detection branch, The structure of the relationship between them and the number of small sample categories have resulted in a greatly increased computational complexity of the model.
  • the purpose of the method based on the transfer learning strategy is to enable the detection model that already has feature representation capabilities to be well adapted to the few-sample target.
  • the purpose of the present invention is to provide a two-way combined real-time target detection model, based on the transfer learning strategy, using Darknet-53 combined with Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (Feature Pyramid). Network, FPN) as the backbone and neck, respectively extract image features and provide semantic features at different scales.
  • SPP Spatial Pyramid Pooling
  • Feature Pyramid Feature Pyramid Network
  • FPN Feature Pyramid Network
  • the large-sample category detection branch is only used to detect large-sample category objects, while the few-sample category detection branch is used to detect all categories of objects.
  • the discriminator After outputting the detection results in parallel, the discriminator will scan the two results and output the more appropriate result of the two parallel branches based on a metric criterion.
  • the main reason for using the dual-path combination structure is that when the model is trained on a small number of samples, the detection accuracy of objects in the large sample category will degrade, and the few sample detection branch will have false positive bounding boxes that actually belong to the large sample category.
  • the few-sample detection branch also learns the prediction differences of large-sample categories from the large-sample detection branch through knowledge distillation, thereby improving the generalization ability of the detection branch.
  • the present invention proposes a feature-based response
  • the Attentive DropBlock regularization method is used to guide the model to focus on the overall characteristics of the target, avoid being dominated by local salient features, and enhance the generalization ability of the model.
  • a real-time detection method of few-sample targets based on transfer learning strategy including the following steps:
  • S4 Fine-tune the few-sample category detection branch on the few-sample category data; use a new regularization method to guide the model to focus on the overall characteristics of the object during fine-tuning;
  • the detection network model includes: the backbone network is Darknet-53 combined with Spatial Pyramid Pooling (SPP), which is used to extract image features; the detection neck network is composed of Feature Pyramid Network (Feature Pyramid Network, FPN), used to provide semantic features of different scales to the detection head network; the detection head network is a dual-channel detection branch network structure with a discriminator, in which the large sample category detection branch is only used to detect categories corresponding to large samples The target, few-shot category detection branch is used to detect all categories of targets, and the discriminator is used to scan the results of the two branches in sequence and obtain the final output result according to a measurement criterion.
  • SPP Spatial Pyramid Pooling
  • FPN Feature Pyramid Network
  • step S2 processing limited data by using random affine transformation, multi-scale image training strategy, MixUp data fusion strategy and Label Smoothing label processing strategy.
  • step S3 the backbone network is initialized to the weights trained on the ImageNet data set, and the network model except the few-sample detection branch is trained from scratch using large-sample category data.
  • L box is the additive combination of the GIoU loss function and smooth L1 loss of coordinate regression;
  • L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
  • step S4 the model parameters of the main part of the detection model, the detection neck part and the large sample category detection branch part are frozen, and only the small sample category detection branch is fine-tuned.
  • the loss function at this stage involves the coordinates of the prediction frame , target confidence, classification results and the difference of large sample category detection branches.
  • step S4 specifically includes the following steps:
  • N represents the batch size
  • l represents the absolute error function
  • is used to control the impact of base class distillation loss on model gradient update
  • O d (i, j) represents the discriminator output of a specific spatial grid.
  • the new regularization method is the Attentive DropBlock algorithm, which has a dynamic coefficient ⁇ , as shown below:
  • the parameters keep_prob and block_size affect the frequency and range of the feature map being set to zero
  • represents the sigmoid function, which is used to control the response range
  • represents the response amplification factor
  • the Attentive DropBlock algorithm first determines whether it is currently in the fine-tuning stage. If the model is fine-tuning, obtain the channel response f C and spatial response f S of the few-sample category detection branch; then, calculate the parameter ⁇ according to the parameters keep_prob, block_size and ⁇ . Finally, the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with parameter ⁇ ; finally, with the zero position as the center, a mask block with a length and width value of block_size is constructed, so that Regularize the model.
  • step S5 train and test on the PASCAL VOC and MS COCO data sets
  • the training set and the verification set are first merged into one set for training to detect the magic heart, and then its test set is selected for testing.
  • the test evaluation standard adopts the Intersection over Union (IoU) threshold of 0.5
  • the mean Average Precision (mAP) i.e. mAP@50
  • the average number of frames per second (mean Frames Per Second, mFPS) of multiple different small sample collections represent the detection accuracy and speed of the detection model;
  • mAP i.e. AP
  • FPS frames per second
  • step S5 stochastic gradient descent is used as the optimization method of the network model, the initial learning rate is 1 ⁇ 10 -3 , and the set minimum batch size is 16 in different data sets; for PASCAL VOC and MS COCO Data set, the number of times of initial training and fine-tuning of the detection model is 300, and the CosineLR learning rate change strategy (from 0.001 to 0.00001) is used during the training process; during the prediction process, the length and width of the input image are fixed at 448 ⁇ 448; FPS To obtain the sum of the waiting time for each result and the time for post-processing the results, mFPS is the average FPS under different few-sample sets.
  • the present invention proposes an Attentive DropBlock regularization method based on feature response to guide the model to pay attention to the overall characteristics of the object, avoid over-fitting of the model in the fine-tuning stage, avoid being dominated by local salient features, and enhance Due to the generalization ability of the model, the present invention can not only achieve accurate detection of few-sample category objects under smaller model parameters, but also achieve real-time detection of related targets.
  • Figure 1 is an overall flow chart of the model proposed by the present invention.
  • Figure 2 is a visual comparison chart of DropBlock algorithm and Attentive DropBlock algorithm
  • Figure 3 is a diagram showing the visual detection results of large sample and small sample category objects by the model proposed by the present invention.
  • Figure 4 shows the response to the target and the visual detection results of the large-sample category detection branch and the few-sample category detection branch of the model proposed by the present invention.
  • a real-time detection method of few-sample targets based on transfer learning strategy includes the following steps:
  • the S1 specifically includes the following steps:
  • multi-scale image training strategy (320, 352, 384, 416, 448, 480, 512, 544, 576 and 608), MixUp data fusion strategy and Label Smoothing label processing strategy to conduct limited data Processing, thereby increasing the generalization performance of the detection model to the sample.
  • L box is the additive combination of the GIoU loss function of coordinate regression and the smooth L1 loss.
  • L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
  • the backbone, detection neck and large sample detection branches are frozen to maintain strong generalization ability, and only the few sample detection branches and SPP layers and their adjacent volumes are Stacked layers for training.
  • many false positive bounding boxes are generated, resulting in low detection accuracy due to the similarity between objects in the two classes. Therefore, we randomly sample K instances from the corresponding data for each large-sample category, so that the few-shot detection branch predicts all categories of objects.
  • the large-sample category detection branch has strong generalization ability
  • the few-sample detection branch should learn this branch to obtain better generalization ability. Therefore, we establish the base class distillation loss L b between the two branches, and the calculation formula is as follows:
  • N the batch size.
  • is used to control the impact of base class distillation loss on model gradient update.
  • O d (i, j) represents the discriminator output of a specific spatial grid.
  • the present invention proposes an Attentive DropBlock algorithm.
  • This algorithm is not only affected by the parameters keep_prob and block_size, but also affected by the model's semantic features. Impact of response.
  • the DropBlock algorithm sets a constant coefficient for all locations within the feature map, as follows:
  • is a dynamic coefficient that relies on the feature map response extracted in the Attentive DropBlock algorithm.
  • is a dynamic coefficient that relies on the feature map response extracted in the Attentive DropBlock algorithm.
  • F ⁇ R B ⁇ C ⁇ H ⁇ W adopts the global max pooling function for each channel feature to obtain the response f C ⁇ R B ⁇ C ⁇ 1 ⁇ 1
  • the global average pooling function yields the response f S ⁇ R B ⁇ 1 ⁇ H ⁇ W . Therefore, the calculation formula of ⁇ in the Attentive DropBlock algorithm is as follows:
  • represents the sigmoid function used to control the response range
  • represents the response amplification factor
  • the Attentive DropBlock algorithm will first determine whether the model is currently in the fine-tuning stage. If the model is fine-tuning, obtain the channel response f C and spatial response f S of the few-sample category detection branch. Afterwards, after calculating the parameter ⁇ based on the two responses, keep_prob, block_size and ⁇ , the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with the parameter ⁇ . Finally, with the zero position as the center, a mask block with a length and width of block_size is constructed to regularize the model.
  • Figure 2 shows the difference between DropBlock and Attentive DropBlock. It can be observed that Attentive The gamma value in DropBlock is related to the target response. Feature maps that contain more target responses have higher ⁇ values, which means that the detection model can better avoid being dominated by local obvious features and thus pay more attention to unobvious features during the training process, thereby obtaining better results. Sample target detection accuracy.
  • the S5 for the PASCAL VOC data set, three different data combination structures are obtained in such a way that 15 categories are large-sample categories and the remaining 5 categories are few-sample categories (the first few-sample category includes Birds, buses, cows, motorcycles, and sofas; the second few-shot category includes airplanes, bottles, cows, horses, and sofas; the third few-shot category includes boats, cats, motorcycles, sheep, and sofas); for MS In the COCO data set, the 20 categories that are the same as those in the PASCAL VOC data set are small-sample categories, and the remaining 60 categories are large-sample categories.
  • the present invention uses stochastic gradient descent as the optimization method of the network model, the initial learning rate is 1 ⁇ 10 -3 , and the set minimum batch size is 16 in different data sets. For these two data sets, the number of times the model was trained from scratch and fine-tuned was 300, and the CosineLR learning rate change strategy (from 0.001 to 0.00001) was used during the training process.
  • the length and width of the input image are fixed at 448 ⁇ 448.
  • the present invention compares the detection accuracy and detection speed of various few-sample target detection models proposed in recent years on the PASCAL VOC 2007 and MS COCO 2014 data sets.
  • the detection model of the present invention was evaluated on the challenging PASCAL VOC 2007 and MS COCO 2014 data sets according to the evaluation criteria specified in the PASCAL VOC and MS COCO data.
  • These two benchmark data contain training sets, validation sets and test sets.
  • the PASCAL VOC 2007 data set contains 20 target categories
  • the MS COCO 2014 data set contains 80 categories.
  • the present invention first combines the PASCAL VOC 2007 and PASCAL VOC 2012 training sets and verification sets into one set for training the detection model, and selects the PASCAL VOC 2007 test set for testing.
  • the test evaluation standard adopts the Intersection Ratio (Intersection).
  • the detection model is represented by the mean Average Precision (mAP) (i.e. mAP@50) with a threshold of 0.5 over Union (IoU) and the average number of frames per second (mean Frames Per Second, mFPS) of multiple different few sample sets. detection accuracy and speed.
  • mAP mean Average Precision
  • IoU 0.5 over Union
  • mFPS mean Frames Per Second
  • the present invention only uses the MS COCO 2014 training set for training, and uses its verification set for verification in the test phase, using the mAP (i.e. AP) of IoU from 0.5 to 0.95 (interval is 0.05) and the number of transmission frames per second (Frames Per Second, FPS) represents the detection accuracy and speed of the detection model.

Abstract

The present invention relates to the field of image processing, and relates to a transfer learning strategy-based real-time few-shot object detection method, comprising the following steps: S1: constructing a detection network model; S2: preprocessing input data; S3: training an object detection model from scratch by using large-sample class data; S4: fine-tuning a few-shot class detection branch by using few-shot class data; and during the fine-tuning, using a new regularization method to guide the model to pay attention to a global feature of an object; and S5: training the detection model by means of a training set, and carrying out a test by using a test set. The present invention avoids overfitting of a model in a fine-tuning stage, avoids dominance by local salient features, and enhances the generalization capability of the model. The present invention not only can achieve accurate detection on few-shot class objects using fewer model parameters, but also can achieve real-time detection of related objects.

Description

一种基于迁移学习策略的少样本目标实时检测方法A real-time detection method of few-sample targets based on transfer learning strategy 技术领域Technical field
本发明属于图像处理领域,涉及一种基于迁移学习策略的少样本目标实时检测方法。The invention belongs to the field of image processing and relates to a real-time detection method of few-sample targets based on a transfer learning strategy.
背景技术Background technique
目标检测是计算机视觉中最重要且基础的任务之一。有许多基于卷积神经网络(Convolutional Neural Network,CNN)或视觉Transformer的检测器具有较高的检测性能。然而,这些模型优异的检测性能是以大量数据为代价实现的。由于对象的复杂性和模型参数的庞大性,当数据数量有限时会导致检测精度将迅速下降。因此,近几年来,少样本目标检测受到了越来越多的关注。Object detection is one of the most important and fundamental tasks in computer vision. There are many detectors based on Convolutional Neural Network (CNN) or visual Transformer with high detection performance. However, the excellent detection performance of these models is achieved at the expense of large amounts of data. Due to the complexity of the object and the large number of model parameters, the detection accuracy will drop rapidly when the amount of data is limited. Therefore, few-shot target detection has received more and more attention in recent years.
为了更好地适应样本数量限制的情景,目前已经有一些基于元学习策略及迁移学习策略的少样本目标检测模型。基于元学习策略的方法目的是获取当前图像和少样本之间的相关性,虽然对于少样本的检测性能得到了改善,但由于少数样本检测分支中的特征提取结构、输入特征和少样本特征之间建立关系的结构以及少样本类别的数量,导致了模型的计算复杂度也大大增加。基于迁移学习策略的方法目的是使已经具备特征表示能力的检测模型能够很好地适应少样本目标。然而,为了提高检测精度,大多数方法侧重于两阶段检测模型,例如Faster RCNN或Cascade RCNN,由于输入至这些模型的图像较大,并且建议框需要在Region Proposal Network(RPN)中生成,导致了这类检测模型在推断阶段较为耗时。In order to better adapt to scenarios where the number of samples is limited, there are currently some few-sample target detection models based on meta-learning strategies and transfer learning strategies. The purpose of the method based on meta-learning strategy is to obtain the correlation between the current image and the few samples. Although the detection performance for the few samples has been improved, due to the feature extraction structure, input features and few sample features in the minority sample detection branch, The structure of the relationship between them and the number of small sample categories have resulted in a greatly increased computational complexity of the model. The purpose of the method based on the transfer learning strategy is to enable the detection model that already has feature representation capabilities to be well adapted to the few-sample target. However, in order to improve detection accuracy, most methods focus on two-stage detection models, such as Faster RCNN or Cascade RCNN. Since the images input to these models are large and the proposal boxes need to be generated in the Region Proposal Network (RPN), resulting in This type of detection model is time-consuming in the inference phase.
发明内容Contents of the invention
有鉴于此,本发明的目的在于提供一种双路组合的实时目标检测模型,基于迁移学习策略,利用Darknet-53结合空间金字塔池化层(Spatial Pyramid Pooling,SPP)和特征金字塔网络(Feature Pyramid Network,FPN)作为主干和颈部,分别提取图像特征和提供不同尺度的语义特征。对于检测头部结构,提出了带鉴别器的双路径检测分支,大样本类别检测分支仅用于检测大样本类别对象,而少样本类别检测分支用于检测所有类别对象。在并行输出检测结果后,鉴别器将扫描这两个结果,并根据一种度量准则输出两个并行分支中更合适的结果。使用双路径组合结构的主要原因是,当模型在少样本上训练时,会对大样本类别物体出现检测精度退化的现象,并且少样本检测分支会出现实际属于大样本类别的误报边界框。此外,少样本检测分支还通过知识蒸馏从大样本检测分支中学习大样本类别的预测差异,从而提升该检测分支的泛化能力。最后,为了避免模型在微调阶段出现过拟合,本发明提出了基于特征响应 的Attentive DropBlock正则化方法来引导模型关注目标的整体特征,避免受局部显著特征主导,增强模型的泛化能力。In view of this, the purpose of the present invention is to provide a two-way combined real-time target detection model, based on the transfer learning strategy, using Darknet-53 combined with Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (Feature Pyramid). Network, FPN) as the backbone and neck, respectively extract image features and provide semantic features at different scales. For detecting the head structure, a dual-path detection branch with a discriminator is proposed. The large-sample category detection branch is only used to detect large-sample category objects, while the few-sample category detection branch is used to detect all categories of objects. After outputting the detection results in parallel, the discriminator will scan the two results and output the more appropriate result of the two parallel branches based on a metric criterion. The main reason for using the dual-path combination structure is that when the model is trained on a small number of samples, the detection accuracy of objects in the large sample category will degrade, and the few sample detection branch will have false positive bounding boxes that actually belong to the large sample category. In addition, the few-sample detection branch also learns the prediction differences of large-sample categories from the large-sample detection branch through knowledge distillation, thereby improving the generalization ability of the detection branch. Finally, in order to avoid overfitting of the model in the fine-tuning stage, the present invention proposes a feature-based response The Attentive DropBlock regularization method is used to guide the model to focus on the overall characteristics of the target, avoid being dominated by local salient features, and enhance the generalization ability of the model.
为达到上述目的,本发明提供如下技术方案:In order to achieve the above objects, the present invention provides the following technical solutions:
一种基于迁移学习策略的少样本目标实时检测方法,包括以下步骤:A real-time detection method of few-sample targets based on transfer learning strategy, including the following steps:
S1:构建检测网络模型;S1: Build a detection network model;
S2:对输入数据进行预处理;S2: Preprocess the input data;
S3:在大样本类别数据上对目标检测模型进行从头训练;S3: Train the target detection model from scratch on large sample category data;
S4:在少样本类别数据上对少样本类别检测分支进行微调;在微调时通过一种新的正则化方法以引导模型关注物体的整体特征;S4: Fine-tune the few-sample category detection branch on the few-sample category data; use a new regularization method to guide the model to focus on the overall characteristics of the object during fine-tuning;
S5:通过训练集训练检测模型,再测试集进行测试。S5: Train the detection model through the training set, and then test it on the test set.
进一步,所述检测网络模型包括:主干网络为Darknet-53结合空间金字塔池化层(Spatial Pyramid Pooling,SPP),用于对图像特征进行提取;检测颈部网络由特征金字塔网络(Feature Pyramid Network,FPN)构成,用于给检测头部网络提供不同尺度的语义特征;检测头部网络为带判别器的双路检测分支网络结构,其中,大样本类别检测分支仅用于检测大样本对应的类别目标,少样本类别检测分支用于检测所有类别目标,判别器用于依次扫描两个分支的结果,并根据一种度量准则获取最终输出结果。Further, the detection network model includes: the backbone network is Darknet-53 combined with Spatial Pyramid Pooling (SPP), which is used to extract image features; the detection neck network is composed of Feature Pyramid Network (Feature Pyramid Network, FPN), used to provide semantic features of different scales to the detection head network; the detection head network is a dual-channel detection branch network structure with a discriminator, in which the large sample category detection branch is only used to detect categories corresponding to large samples The target, few-shot category detection branch is used to detect all categories of targets, and the discriminator is used to scan the results of the two branches in sequence and obtain the final output result according to a measurement criterion.
进一步,步骤S2中所述的预处理具体为:通过使用具有随机仿射变换、多尺度图像训练策略、MixUp数据融合策略及Label Smoothing标签处理策略来对有限数据进行处理。Further, the preprocessing described in step S2 is specifically: processing limited data by using random affine transformation, multi-scale image training strategy, MixUp data fusion strategy and Label Smoothing label processing strategy.
进一步,步骤S3中,主干网络初始化为ImageNet数据集训练下的权重,对除少样本检测分支外的网络模型利用大样本类别数据进行从头训练,本阶段损失函数涉及预测框坐标,目标置信度及分类结果,损失函数为:
Lbase training=Lbox+Lcls+Lobj      (1)
Further, in step S3, the backbone network is initialized to the weights trained on the ImageNet data set, and the network model except the few-sample detection branch is trained from scratch using large-sample category data. The loss function at this stage involves prediction box coordinates, target confidence and Classification results, the loss function is:
L base training =L box +L cls +L obj (1)
其中,Lbox是坐标回归的GIoU损失函数和smooth L1损失的相加组合;Lcls和Lobj分别是Focal Loss函数和二元交叉熵损失函数。Among them, L box is the additive combination of the GIoU loss function and smooth L1 loss of coordinate regression; L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
进一步,步骤S4中,对检测模型的主干部分、检测颈部部分及大样本类别检测分支部分的模型参数进行冻结,只对少样本类别检测分支进行微调,本阶段的损失函数涉及预测框的坐标,目标置信度、分类结果和大样本类别检测分支的差异度。Further, in step S4, the model parameters of the main part of the detection model, the detection neck part and the large sample category detection branch part are frozen, and only the small sample category detection branch is fine-tuned. The loss function at this stage involves the coordinates of the prediction frame , target confidence, classification results and the difference of large sample category detection branches.
进一步,步骤S4中,具体包括以下步骤:Further, step S4 specifically includes the following steps:
S41:在大样本类别检测分支与少样本检测分支之间建立基类蒸馏损失Lb,计算公式如下:
S41: Establish a base class distillation loss L b between the large-sample category detection branch and the few-sample detection branch. The calculation formula is as follows:
其中,N表示批量大小,l表示绝对误差函数,分别表示第i张图像在大样本检测分支和少样本类别检测分支的输出;Among them, N represents the batch size, l represents the absolute error function, and Represents the output of the i-th image in the large-sample detection branch and the few-sample category detection branch respectively;
S42:在少样本上微调的损失函数为:
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb      (3)
S42: The loss function fine-tuned on a few samples is:
L few-shot tuning =L box +2L cls +L obj +λ·L b (3)
其中,λ用于控制基类蒸馏损失对模型梯度更新的影响程度;Among them, λ is used to control the impact of base class distillation loss on model gradient update;
S43:在大样本类别检测分支与少样本检测分支后加入判别器,判别器选择大样本类别检测分支结果以及少样本类别检测分支结果之间的最大值作为最终输出,其度量准则如下所示:
S43: Add a discriminator after the large sample category detection branch and the few sample detection branch, and the discriminator selects the result of the large sample category detection branch and few-shot category detection branch results The maximum value between them is used as the final output, and its measurement criterion is as follows:
其中Od(i,j)表示某一具体空间网格的判别器输出。where O d (i, j) represents the discriminator output of a specific spatial grid.
进一步,所述新的正则化方法为Attentive DropBlock算法,其具有动态系数γ,如下所示:
Further, the new regularization method is the Attentive DropBlock algorithm, which has a dynamic coefficient γ, as shown below:
其中,参数keep_prob和block_size影响特征图置零的频率及范围,σ表示sigmoid函数,用于控制响应范围,α表示响应放大因子。Among them, the parameters keep_prob and block_size affect the frequency and range of the feature map being set to zero, σ represents the sigmoid function, which is used to control the response range, and α represents the response amplification factor.
进一步,所述Attentive DropBlock算法首先判断当前是否处于微调阶段,如果模型正在微调,则获取少样本类别检测分支的通道响应fC和空间响应fS;之后,根据参数keep_prob、block_size和α计算参数γ后,每个不同通道特征的空间位置按照服从参数为γ的伯努利分布概率对该位置特征置零;最后,以置零位置为中心,构建一个长宽数值为block_size的掩膜块,从而对模型实现正则化处理。Further, the Attentive DropBlock algorithm first determines whether it is currently in the fine-tuning stage. If the model is fine-tuning, obtain the channel response f C and spatial response f S of the few-sample category detection branch; then, calculate the parameter γ according to the parameters keep_prob, block_size and α. Finally, the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with parameter γ; finally, with the zero position as the center, a mask block with a length and width value of block_size is constructed, so that Regularize the model.
进一步,步骤S5中,在PASCAL VOC及MS COCO数据集上进行训练和测试;Further, in step S5, train and test on the PASCAL VOC and MS COCO data sets;
对于PASCAL VOC数据集,首先将训练集和验证集合并为一个集合,用于训练检测魔心,再选择其测试集进行测试,测试评估标准采用交并比(Intersection over Union,IoU)阈值为0.5的平均精度均值(mean Average Precision,mAP)(即mAP@50)和多个不同少样本集合的平均每秒处理帧数(mean Frames Per Second,mFPS)表示检测模型的检测精度及速度;For the PASCAL VOC data set, the training set and the verification set are first merged into one set for training to detect the magic heart, and then its test set is selected for testing. The test evaluation standard adopts the Intersection over Union (IoU) threshold of 0.5 The mean Average Precision (mAP) (i.e. mAP@50) and the average number of frames per second (mean Frames Per Second, mFPS) of multiple different small sample collections represent the detection accuracy and speed of the detection model;
对于MS COCO数据集,只采用其训练集进行训练,利用其验证集进行验证,使用IoU从0.5至0.95(间隔为0.05)的mAP(即AP)和每秒传输帧数(Frames Per Second,FPS)表示检测模型的检测精度及速度。For the MS COCO data set, only its training set is used for training, and its validation set is used for verification. IoU is used from 0.5 to 0.95 (interval is 0.05) mAP (i.e. AP) and frames per second (Frames Per Second, FPS) ) represents the detection accuracy and speed of the detection model.
进一步,步骤S5的训练过程中,采用随机梯度下降作为网络模型的优化方法,初始学习率为1×10-3,并且设定的最小批量在不同数据集下都为16;对于PASCAL VOC及MS COCO 数据集,检测模型从头训练及微调的次数皆为300,并且在训练过程中采用CosineLR学习率变化策略(从0.001到0.00001);在预测过程中,输入图像的长宽固定为448×448;FPS为获取每个结果的等待时间及对结果进行后处理的时间之和,mFPS为不同少样本集合下的FPS均值。Furthermore, during the training process of step S5, stochastic gradient descent is used as the optimization method of the network model, the initial learning rate is 1×10 -3 , and the set minimum batch size is 16 in different data sets; for PASCAL VOC and MS COCO Data set, the number of times of initial training and fine-tuning of the detection model is 300, and the CosineLR learning rate change strategy (from 0.001 to 0.00001) is used during the training process; during the prediction process, the length and width of the input image are fixed at 448×448; FPS To obtain the sum of the waiting time for each result and the time for post-processing the results, mFPS is the average FPS under different few-sample sets.
本发明的有益效果在于:本发明提出了基于特征响应的Attentive DropBlock正则化方法来引导模型关注物体的整体特征,避免了模型在微调阶段出现过拟合,避免了受局部显著特征主导,增强了模型的泛化能力,本发明不仅能够在较小的模型参数下对少样本类别物体实现精准检测,并且能够对相关目标实现实时检测。The beneficial effects of the present invention are: the present invention proposes an Attentive DropBlock regularization method based on feature response to guide the model to pay attention to the overall characteristics of the object, avoid over-fitting of the model in the fine-tuning stage, avoid being dominated by local salient features, and enhance Due to the generalization ability of the model, the present invention can not only achieve accurate detection of few-sample category objects under smaller model parameters, but also achieve real-time detection of related targets.
本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the present invention will, to the extent that they are set forth in the description that follows, and to the extent that they will become apparent to those skilled in the art upon examination of the following, or may be derived from This invention is taught by practicing it. The objects and other advantages of the invention may be realized and obtained by the following description.
附图说明Description of drawings
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作优选的详细描述,其中:In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings, in which:
图1为本发明提出的模型整体流程图;Figure 1 is an overall flow chart of the model proposed by the present invention;
图2为DropBlock算法及Attentive DropBlock算法可视化比较图;Figure 2 is a visual comparison chart of DropBlock algorithm and Attentive DropBlock algorithm;
图3为本发明提出的模型对大样本及少样本类别物体的可视化检测结果图;Figure 3 is a diagram showing the visual detection results of large sample and small sample category objects by the model proposed by the present invention;
图4为本发明提出的模型大样本类别检测分支及少样本类别检测分支对目标的响应及可视化检测结果。Figure 4 shows the response to the target and the visual detection results of the large-sample category detection branch and the few-sample category detection branch of the model proposed by the present invention.
具体实施方式Detailed ways
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The following describes the embodiments of the present invention through specific examples. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments. Various details in this specification can also be modified or changed in various ways based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the illustrations provided in the following embodiments only illustrate the basic concept of the present invention in a schematic manner. The following embodiments and the features in the embodiments can be combined with each other as long as there is no conflict.
其中,附图仅用于示例性说明,表示的仅是示意图,而非实物图,不能理解为对本发明的限制;为了更好地说明本发明的实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;对本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。 The drawings are only for illustrative purposes, and represent only schematic diagrams rather than actual drawings, which cannot be understood as limitations of the present invention. In order to better illustrate the embodiments of the present invention, some components of the drawings will be omitted. The enlargement or reduction does not represent the size of the actual product; it is understandable to those skilled in the art that some well-known structures and their descriptions may be omitted in the drawings.
本发明实施例的附图中相同或相似的标号对应相同或相似的部件;在本发明的描述中,需要理解的是,若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此附图中描述位置关系的用语仅用于示例性说明,不能理解为对本发明的限制,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。In the drawings of the embodiments of the present invention, the same or similar numbers correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms "upper", "lower", "left" and "right" The orientation or positional relationship indicated by "front", "rear", etc. is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, and does not indicate or imply that the device or element referred to must be It has a specific orientation and is constructed and operated in a specific orientation. Therefore, the terms describing the positional relationships in the drawings are only for illustrative purposes and cannot be understood as limitations of the present invention. For those of ordinary skill in the art, they can determine the specific position according to the specific orientation. Understand the specific meaning of the above terms.
请参阅图1~图4,一种基于迁移学习策略的少样本目标实时检测方法,该方法包括以下步骤:Please refer to Figures 1 to 4, a real-time detection method of few-sample targets based on transfer learning strategy. The method includes the following steps:
S1:对输入数据进行预处理;S1: Preprocess the input data;
S2:在大样本类别数据上对目标检测模型(除少样本检测分支外)进行从头训练;S2: Train the target detection model (except the few-sample detection branch) from scratch on large-sample category data;
S3:在少样本类别数据上对少样本类别检测分支进行微调;S3: Fine-tune the few-shot category detection branch on the few-shot category data;
S4:在微调阶段引入一种新的正则化方法以引导模型关注物体的整体特征;S4: Introduce a new regularization method in the fine-tuning stage to guide the model to focus on the overall characteristics of the object;
S5:在自然数据集PASCAL VOC 2007和MS COCO 2014数据集上进行实验;S5: Conduct experiments on the natural data set PASCAL VOC 2007 and MS COCO 2014 data set;
可选的,所述S1具体包括以下步骤:Optionally, the S1 specifically includes the following steps:
通过使用具有随机仿射变换、多尺度图像训练策略(320、352、384、416、448、480、512、544、576和608)、MixUp数据融合策略及Label Smoothing标签处理策略来对有限数据进行处理,从而增加检测模型对样本的泛化性能。By using random affine transformation, multi-scale image training strategy (320, 352, 384, 416, 448, 480, 512, 544, 576 and 608), MixUp data fusion strategy and Label Smoothing label processing strategy to conduct limited data Processing, thereby increasing the generalization performance of the detection model to the sample.
可选的,所述S2中,为使模型具有较强的目标表证能力,对除少样本检测分支外的整个网络利用大样本类别数据进行从头训练。因此,第一个阶段整个网络训练的损失函数为:
Lbasw training=Lbox+Lcls+Lobj      (1)
Optionally, in S2, in order to make the model have strong target representation ability, the entire network except the few-sample detection branch is trained from scratch using large-sample category data. Therefore, the loss function of the entire network training in the first stage is:
L basw training =L box +L cls +L obj (1)
其中,Lbox是坐标回归的GIoU损失函数和smooth L1损失的相加组合。Lcls和Lobj分别是Focal Loss函数和二元交叉熵损失函数。Among them, L box is the additive combination of the GIoU loss function of coordinate regression and the smooth L1 loss. L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
可选的,所述S3中,在少样本的微调阶段,主干、检测颈部和大样本检测分支被冻结以保持较强泛化能力,仅对少样本检测分支和SPP层及其相邻卷积层进行训练。然而,当仅采用新类对象时,由于两种类别的物体存在相似性,因此生成会许多假阳性边界框从而导致检测精度较低。因此,我们为每个大样本类别从相应数据中随机抽取K个实例,使得少样本检测分支预测所有类别物体。此外,考虑到大样本类别检测分支具较强的泛化能力,为获得更好的泛化能力,少样本检测分支应该学习该分支以获得更好的泛化能力。因此,我们在两个分支之间建立了基类蒸馏损失Lb,计算公式如下:
Optionally, in the S3, in the fine-tuning phase of few samples, the backbone, detection neck and large sample detection branches are frozen to maintain strong generalization ability, and only the few sample detection branches and SPP layers and their adjacent volumes are Stacked layers for training. However, when only new classes of objects are adopted, many false positive bounding boxes are generated, resulting in low detection accuracy due to the similarity between objects in the two classes. Therefore, we randomly sample K instances from the corresponding data for each large-sample category, so that the few-shot detection branch predicts all categories of objects. In addition, considering that the large-sample category detection branch has strong generalization ability, in order to obtain better generalization ability, the few-sample detection branch should learn this branch to obtain better generalization ability. Therefore, we establish the base class distillation loss L b between the two branches, and the calculation formula is as follows:
其中,N表示批量大小。l是绝对误差函数之和。分别表示第i张图像在大样本检测分支和少样本检测分支的输出。因此,在少样本上微调的损失函数可以总结为:
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb      (3)
Among them, N represents the batch size. l is the sum of absolute error functions. and Represents the output of the i-th image in the large-sample detection branch and the few-sample detection branch respectively. Therefore, the loss function fine-tuned on few samples can be summarized as:
L few-shot tuning =L box +2L cls +L obj +λ·L b (3)
其中,λ用于控制基类蒸馏损失对模型梯度更新的影响程度。Among them, λ is used to control the impact of base class distillation loss on model gradient update.
在推理阶段,两并联分支用于联合检测对象。然而,同时分析两个分支的输出结果将严重延长推断过程。因此,我们在这两个分支后面加入了一个判别器,以选择两者输出中最可能的结果。具体而言,判别器将选择大样本类别检测分支结果以及少样本类别检测分支的结果之间的最大值作为最终输出。其度量准则如下所示:
During the inference phase, two parallel branches are used to jointly detect objects. However, analyzing the outputs of both branches at the same time will seriously lengthen the inference process. Therefore, we add a discriminator behind these two branches to choose the most likely outcome of the two outputs. Specifically, the discriminator will select the large sample class detection branch result And the results of the few-shot category detection branch The maximum value between them is used as the final output. Its measurement criteria are as follows:
其中Od(i,j)表示某一具体空间网格的判别器输出。where O d (i, j) represents the discriminator output of a specific spatial grid.
可选的,所述S4中,为了进一步提高模型对于少样本类别的泛化能力,本发明提出了一种Attentive DropBlock算法,该算法不仅受参数keep_prob和block_size的影响,而且还受到模型对于语义特征响应的影响。具体而言,DropBlock算法为特征图内的所有位置设置了恒定系数,如下所示:
Optionally, in S4, in order to further improve the model's generalization ability for few-sample categories, the present invention proposes an Attentive DropBlock algorithm. This algorithm is not only affected by the parameters keep_prob and block_size, but also affected by the model's semantic features. Impact of response. Specifically, the DropBlock algorithm sets a constant coefficient for all locations within the feature map, as follows:
其中,参数keep_prob和block_size影响特征置零的频率和范围。与原始DropBlock不同,γ是一个动态系数,它依赖于Attentive DropBlock算法中提取的特征图响应。具体而言,考虑一个特征图F∈RB×C×H×W,对每个通道特征采用全局最大池化函数得到响应fC∈RB×C×1×1,对每个空间坐标采用全局平均池化函数得到响应fS∈RB×1×H×W。因此,Attentive DropBlock算法中γ的计算公式如下:
Among them, the parameters keep_prob and block_size affect the frequency and range of feature zeroing. Different from the original DropBlock, γ is a dynamic coefficient that relies on the feature map response extracted in the Attentive DropBlock algorithm. Specifically, consider a feature map F∈R B×C×H×W , adopt the global max pooling function for each channel feature to obtain the response f C∈R B×C×1×1 , and adopt The global average pooling function yields the response f S ∈R B×1×H×W . Therefore, the calculation formula of γ in the Attentive DropBlock algorithm is as follows:
其中,σ表示sigmoid函数用于控制响应范围,α表示响应放大因子。Among them, σ represents the sigmoid function used to control the response range, and α represents the response amplification factor.
Attentive DropBlock算法将首先判断当前是否处于微调阶段,如果模型正在微调,则获取少样本类别检测分支的通道响应fC和空间响应fS。之后,根据两种响应、,keep_prob、block_size和α计算参数γ后,每个不同通道特征的空间位置按照服从参数为γ的伯努利分布概率对该位置特征置零。最后,以置零位置为中心,构建一个长宽数值为block_size的掩膜块,从而对模型实现正则化处理。The Attentive DropBlock algorithm will first determine whether the model is currently in the fine-tuning stage. If the model is fine-tuning, obtain the channel response f C and spatial response f S of the few-sample category detection branch. Afterwards, after calculating the parameter γ based on the two responses, keep_prob, block_size and α, the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with the parameter γ. Finally, with the zero position as the center, a mask block with a length and width of block_size is constructed to regularize the model.
图2显示了DropBlock和Attentive DropBlock之间的差异。从中可以观察到,Attentive  DropBlock中的γ值与目标响应相关。包含更多目标响应的特征图具有更高的γ值,这意味着检测模型可以更好地避免受局部明显特征的支配,从而在训练过程中更加关注不明显的特征,从而获得更好的少样本目标检测精度。Figure 2 shows the difference between DropBlock and Attentive DropBlock. It can be observed that Attentive The gamma value in DropBlock is related to the target response. Feature maps that contain more target responses have higher γ values, which means that the detection model can better avoid being dominated by local obvious features and thus pay more attention to unobvious features during the training process, thereby obtaining better results. Sample target detection accuracy.
可选的,所述S5中,对于PASCAL VOC数据集,按照其中15类为大样本类别,其余5类为少样本类别的方式得到了三种不同的数据组合结构(第一种少样本类别包含鸟、公共汽车、奶牛、摩托车和沙发;第二种少样本类别包含飞机、瓶子、奶牛、马和沙发;第三种少样本类别包含船、猫、摩托车、羊和沙发);对于MS COCO数据集,令其与PASCAL VOC数据集中类别相同的20类为少样本类别,其余60类为大样本类别。在训练过程中,本发明采用随机梯度下降作为网络模型的优化方法,初始学习率为1×10-3,并且设定的最小批量在不同数据集下都为16。对于这两个数据集,模型从头训练及微调的次数皆为300,并且在训练过程中采用CosineLR学习率变化策略(从0.001到0.00001)。在预测过程中,输入图像的长宽固定为448×448。Optionally, in the S5, for the PASCAL VOC data set, three different data combination structures are obtained in such a way that 15 categories are large-sample categories and the remaining 5 categories are few-sample categories (the first few-sample category includes Birds, buses, cows, motorcycles, and sofas; the second few-shot category includes airplanes, bottles, cows, horses, and sofas; the third few-shot category includes boats, cats, motorcycles, sheep, and sofas); for MS In the COCO data set, the 20 categories that are the same as those in the PASCAL VOC data set are small-sample categories, and the remaining 60 categories are large-sample categories. During the training process, the present invention uses stochastic gradient descent as the optimization method of the network model, the initial learning rate is 1×10 -3 , and the set minimum batch size is 16 in different data sets. For these two data sets, the number of times the model was trained from scratch and fine-tuned was 300, and the CosineLR learning rate change strategy (from 0.001 to 0.00001) was used during the training process. During the prediction process, the length and width of the input image are fixed at 448×448.
实验结果Experimental results
在本实例中,本发明在PASCAL VOC 2007和MS COCO 2014数据集上比较了近年来所提出的多种少样本目标检测模型的检测精度及检测速度。具体而言,按照PASCAL VOC及MS COCO数据中规定的评估标准,在具有挑战性的PASCAL VOC 2007和MS COCO 2014数据集上评估本发明的检测模型。这两个基准数据含有训练集、验证集和测试集,PASCAL VOC 2007数据集包含20个目标类别,MS COCO 2014数据集含有80个类别。对于前者,本发明先将PASCAL VOC 2007和PASCAL VOC 2012训练集和验证集合并为一个集合,用于训练该检测模型,并选择PASCAL VOC 2007测试集进行测试,测试评估标准采用交并比(Intersection over Union,IoU)阈值为0.5的平均精度均值(mean Average Precision,mAP)(即mAP@50)和多个不同少样本集合的平均每秒处理帧数(mean Frames Per Second,mFPS)表示检测模型的检测精度及速度。对于后者,本发明只用MS COCO 2014训练集进行训练,测试阶段利用其验证集进行验证,使用IoU从0.5至0.95(间隔为0.05)的mAP(即AP)和每秒传输帧数(Frames Per Second,FPS)表示检测模型的检测精度及速度。In this example, the present invention compares the detection accuracy and detection speed of various few-sample target detection models proposed in recent years on the PASCAL VOC 2007 and MS COCO 2014 data sets. Specifically, the detection model of the present invention was evaluated on the challenging PASCAL VOC 2007 and MS COCO 2014 data sets according to the evaluation criteria specified in the PASCAL VOC and MS COCO data. These two benchmark data contain training sets, validation sets and test sets. The PASCAL VOC 2007 data set contains 20 target categories, and the MS COCO 2014 data set contains 80 categories. For the former, the present invention first combines the PASCAL VOC 2007 and PASCAL VOC 2012 training sets and verification sets into one set for training the detection model, and selects the PASCAL VOC 2007 test set for testing. The test evaluation standard adopts the Intersection Ratio (Intersection). The detection model is represented by the mean Average Precision (mAP) (i.e. mAP@50) with a threshold of 0.5 over Union (IoU) and the average number of frames per second (mean Frames Per Second, mFPS) of multiple different few sample sets. detection accuracy and speed. For the latter, the present invention only uses the MS COCO 2014 training set for training, and uses its verification set for verification in the test phase, using the mAP (i.e. AP) of IoU from 0.5 to 0.95 (interval is 0.05) and the number of transmission frames per second (Frames Per Second, FPS) represents the detection accuracy and speed of the detection model.
表1

Table 1

最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not limiting. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be modified. Modifications or equivalent substitutions without departing from the purpose and scope of the technical solution shall be included in the scope of the claims of the present invention.

Claims (10)

  1. 一种基于迁移学习策略的少样本目标实时检测方法,其特征在于:包括以下步骤:A real-time detection method of few-sample targets based on transfer learning strategy, which is characterized by: including the following steps:
    S1:构建检测网络模型;S1: Build a detection network model;
    S2:对输入数据进行预处理;S2: Preprocess the input data;
    S3:在大样本类别数据上对目标检测模型进行从头训练;S3: Train the target detection model from scratch on large sample category data;
    S4:在少样本类别数据上对少样本类别检测分支进行微调;在微调时通过一种新的正则化方法以引导模型关注物体的整体特征;S4: Fine-tune the few-sample category detection branch on the few-sample category data; use a new regularization method to guide the model to focus on the overall characteristics of the object during fine-tuning;
    S5:通过训练集训练检测模型,再测试集进行测试。S5: Train the detection model through the training set, and then test it on the test set.
  2. 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:所述检测网络模型包括:主干网络为Darknet-53结合空间金字塔池化层,用于对图像特征进行提取;检测颈部网络由特征金字塔网络构成,用于给检测头部网络提供不同尺度的语义特征;检测头部网络为带判别器的双路检测分支网络结构,其中,大样本类别检测分支仅用于检测大样本对应的类别目标,少样本类别检测分支用于检测所有类别目标,判别器用于依次扫描两个分支的结果,并根据一种度量准则获取最终输出结果。The real-time detection method of few-sample targets based on transfer learning strategy according to claim 1, characterized in that: the detection network model includes: the backbone network is Darknet-53 combined with a spatial pyramid pooling layer, used to extract image features ; The detection neck network is composed of a feature pyramid network, which is used to provide semantic features of different scales to the detection head network; the detection head network is a dual-path detection branch network structure with a discriminator, in which the large sample category detection branch only uses For detecting category targets corresponding to large samples, the few-sample category detection branch is used to detect all category targets, and the discriminator is used to scan the results of the two branches in sequence, and obtain the final output result according to a measurement criterion.
  3. 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S2中所述的预处理具体为:通过使用具有随机仿射变换、多尺度图像训练策略、MixUp数据融合策略及Label Smoothing标签处理策略来对有限数据进行处理。The real-time detection method of few-sample targets based on transfer learning strategy according to claim 1, characterized in that: the preprocessing described in step S2 is: by using random affine transformation, multi-scale image training strategy, MixUp data Fusion strategy and Label Smoothing label processing strategy are used to process limited data.
  4. 根据权利要求2所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S3中,主干网络初始化为ImageNet数据集训练下的权重,对除少样本检测分支外的网络模型利用大样本类别数据进行从头训练,本阶段损失函数涉及预测框坐标,目标置信度及分类结果,损失函数为:
    Lbase training=Lbox+Lcls+Lobj   (1)
    The real-time detection method of few-sample targets based on transfer learning strategy according to claim 2, characterized in that: in step S3, the backbone network is initialized to the weights trained on the ImageNet data set, and the network model except the few-sample detection branch is used Large sample category data is trained from scratch. The loss function at this stage involves prediction box coordinates, target confidence and classification results. The loss function is:
    L base training =L box +L cls +L obj (1)
    其中,Lbox是坐标回归的GIoU损失函数和smooth L1损失的相加组合;Lcls和Lobj分别是Focal Loss函数和二元交叉熵损失函数。Among them, L box is the additive combination of the GIoU loss function and smooth L1 loss of coordinate regression; L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
  5. 根据权利要求2所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S4中,对检测模型的主干部分、检测颈部部分及大样本类别检测分支部分的模型参数进行冻结,只对少样本类别检测分支进行微调,本阶段的损失函数涉及预测框的坐标,目标置信度、分类结果和大样本类别检测分支的差异度。The real-time detection method of few-sample targets based on transfer learning strategy according to claim 2, characterized in that: in step S4, the model parameters of the main part of the detection model, the detection neck part and the large-sample category detection branch part are frozen , only fine-tuning the few-sample category detection branch. The loss function at this stage involves the coordinates of the prediction box, target confidence, classification results and the difference of the large-sample category detection branch.
  6. 根据权利要求5所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S4中,具体包括以下步骤:The real-time detection method of few-sample targets based on transfer learning strategy according to claim 5, characterized in that step S4 specifically includes the following steps:
    S41:在大样本类别检测分支与少样本检测分支之间建立基类蒸馏损失Lb,计算公式如下:
    S41: Establish a base class distillation loss L b between the large-sample category detection branch and the few-sample detection branch. The calculation formula is as follows:
    其中,N表示批量大小,l表示绝对误差函数,分别表示第i张图像在大样本检测分支和少样本类别检测分支的输出;Among them, N represents the batch size, l represents the absolute error function, and Represents the output of the i-th image in the large-sample detection branch and the few-sample category detection branch respectively;
    S42:在少样本上微调的损失函数为:
    Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb  (3)
    S42: The loss function fine-tuned on a few samples is:
    L few-shot tuning =L box +2L cls +L obj +λ·L b (3)
    其中,λ用于控制基类蒸馏损失对模型梯度更新的影响程度;Among them, λ is used to control the impact of base class distillation loss on model gradient update;
    S43:在大样本类别检测分支与少样本检测分支后加入判别器,判别器选择大样本类别检测分支结果以及少样本类别检测分支结果之间的最大值作为最终输出,其度量准则如下所示:
    S43: Add a discriminator after the large sample category detection branch and the few sample detection branch, and the discriminator selects the result of the large sample category detection branch and few-shot category detection branch results The maximum value between them is used as the final output, and its measurement criterion is as follows:
    其中Od(i,j)表示某一具体空间网格的判别器输出。where O d (i, j) represents the discriminator output of a specific spatial grid.
  7. 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:所述新的正则化方法为Attentive DropBlock算法,其具有动态系数γ,如下所示:
    The real-time detection method of few-sample targets based on transfer learning strategy according to claim 1, characterized in that: the new regularization method is the Attentive DropBlock algorithm, which has a dynamic coefficient γ, as shown below:
    其中,参数keep_prob和block_size影响特征图置零的频率及范围,σ表示sigmoid函数,用于控制响应范围,α表示响应放大因子。Among them, the parameters keep_prob and block_size affect the frequency and range of the feature map being set to zero, σ represents the sigmoid function, which is used to control the response range, and α represents the response amplification factor.
  8. 根据权利要求7所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:所述Attentive DropBlock算法首先判断当前是否处于微调阶段,如果模型正在微调,则获取少样本类别检测分支的通道响应fC和空间响应fS;之后,根据参数keep_prob、block_size和α计算参数γ后,每个不同通道特征的空间位置按照服从参数为γ的伯努利分布概率对该位置特征置零;最后,以置零位置为中心,构建一个长宽数值为block_size的掩膜块,从而对模型实现正则化处理。The real-time detection method of few-sample targets based on transfer learning strategy according to claim 7, characterized in that: the Attentive DropBlock algorithm first determines whether it is currently in the fine-tuning stage, and if the model is being fine-tuned, obtains the channel of the few-sample category detection branch Response f C and spatial response f S ; After that, after calculating the parameter γ according to the parameters keep_prob, block_size and α, the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with parameter γ; finally , with the zero position as the center, construct a mask block with a length and width value of block_size, thereby regularizing the model.
  9. 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S5中,在PASCAL VOC及MS COCO数据集上进行训练和测试;The real-time detection method of few-sample targets based on transfer learning strategy according to claim 1, characterized in that: in step S5, training and testing are performed on the PASCAL VOC and MS COCO data sets;
    对于PASCAL VOC数据集,首先将训练集和验证集合并为一个集合,用于训练检测魔心,再选择其测试集进行测试,测试评估标准采用交并比阈值为0.5的平均精度均值和多个不同少样本集合的平均每秒处理帧数表示检测模型的检测精度及速度;For the PASCAL VOC data set, the training set and the verification set are first merged into one set for training to detect the magic heart, and then its test set is selected for testing. The test evaluation standard uses the average accuracy mean with an intersection-to-union ratio threshold of 0.5 and multiple The average number of frames processed per second of different few-sample sets represents the detection accuracy and speed of the detection model;
    对于MS COCO数据集,只采用其训练集进行训练,利用其验证集进行验证,使用IoU从0.5至0.95,间隔为0.05的mAP和每秒传输帧数表示检测模型的检测精度及速度。 For the MS COCO data set, only its training set is used for training, and its validation set is used for verification. The IoU is from 0.5 to 0.95, the mAP with an interval of 0.05 and the number of transmission frames per second represent the detection accuracy and speed of the detection model.
  10. 根据权利要求9所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S5的训练过程中,采用随机梯度下降作为网络模型的优化方法,初始学习率为1×10-3,并且设定的最小批量在不同数据集下都为16;对于PASCAL VOC及MS COCO数据集,检测模型从头训练及微调的次数皆为300,并且在训练过程中采用CosineLR学习率变化策略,即学习率从0.001到0.00001;在预测过程中,输入图像的长宽固定为448×448;FPS为获取每个结果的等待时间及对结果进行后处理的时间之和,mFPS为不同少样本集合下的FPS均值。 The real-time detection method of few-sample targets based on transfer learning strategy according to claim 9, characterized in that: during the training process of step S5, stochastic gradient descent is used as the optimization method of the network model, and the initial learning rate is 1×10 -3 , and the set minimum batch size is 16 in different data sets; for the PASCAL VOC and MS COCO data sets, the number of times the detection model is trained and fine-tuned from scratch is 300, and the CosineLR learning rate change strategy is used during the training process, that is The learning rate ranges from 0.001 to 0.00001; during the prediction process, the length and width of the input image are fixed at 448×448; FPS is the sum of the waiting time to obtain each result and the time to post-process the results, mFPS is the time under different few sample sets FPS average.
PCT/CN2023/086781 2022-08-11 2023-04-07 Transfer learning strategy-based real-time few-shot object detection method WO2024032010A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210962295.5 2022-08-11
CN202210962295.5A CN115393634B (en) 2022-08-11 2022-08-11 Small sample target real-time detection method based on migration learning strategy

Publications (1)

Publication Number Publication Date
WO2024032010A1 true WO2024032010A1 (en) 2024-02-15

Family

ID=84118843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/086781 WO2024032010A1 (en) 2022-08-11 2023-04-07 Transfer learning strategy-based real-time few-shot object detection method

Country Status (2)

Country Link
CN (1) CN115393634B (en)
WO (1) WO2024032010A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393634B (en) * 2022-08-11 2023-12-26 重庆邮电大学 Small sample target real-time detection method based on migration learning strategy

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615016A (en) * 2018-12-20 2019-04-12 北京理工大学 A kind of object detection method of the convolutional neural networks based on pyramid input gain
CN110674866A (en) * 2019-09-23 2020-01-10 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN111223553A (en) * 2020-01-03 2020-06-02 大连理工大学 Two-stage deep migration learning traditional Chinese medicine tongue diagnosis model
AU2020100705A4 (en) * 2020-05-05 2020-06-18 Chang, Jiaying Miss A helmet detection method with lightweight backbone based on yolov3 network
US20220067335A1 (en) * 2020-08-26 2022-03-03 Beijing University Of Civil Engineering And Architecture Method for dim and small object detection based on discriminant feature of video satellite data
CN114663729A (en) * 2022-03-29 2022-06-24 南京工程学院 Cylinder sleeve small sample defect detection method based on meta-learning
CN115393634A (en) * 2022-08-11 2022-11-25 重庆邮电大学 Transfer learning strategy-based small-sample target real-time detection method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth
CN109977812B (en) * 2019-03-12 2023-02-24 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN113971815A (en) * 2021-10-28 2022-01-25 西安电子科技大学 Small sample target detection method based on singular value decomposition characteristic enhancement
CN114841257B (en) * 2022-04-21 2023-09-22 北京交通大学 Small sample target detection method based on self-supervision comparison constraint

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615016A (en) * 2018-12-20 2019-04-12 北京理工大学 A kind of object detection method of the convolutional neural networks based on pyramid input gain
CN110674866A (en) * 2019-09-23 2020-01-10 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN111223553A (en) * 2020-01-03 2020-06-02 大连理工大学 Two-stage deep migration learning traditional Chinese medicine tongue diagnosis model
AU2020100705A4 (en) * 2020-05-05 2020-06-18 Chang, Jiaying Miss A helmet detection method with lightweight backbone based on yolov3 network
US20220067335A1 (en) * 2020-08-26 2022-03-03 Beijing University Of Civil Engineering And Architecture Method for dim and small object detection based on discriminant feature of video satellite data
CN114663729A (en) * 2022-03-29 2022-06-24 南京工程学院 Cylinder sleeve small sample defect detection method based on meta-learning
CN115393634A (en) * 2022-08-11 2022-11-25 重庆邮电大学 Transfer learning strategy-based small-sample target real-time detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GHIASI GOLNAZ, TSUNG-YI LIN, LE QUOC V: "Dropblock: A regularization method for convolutional networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 30 October 2018 (2018-10-30), Ithaca, XP093137589, [retrieved on 20240305], DOI: 10.48550/arXiv.1810.12890 *
XIA RUIYANG; LI GUOQUAN; HUANG ZHENGWEN; MENG HONGYING; PANG YU: "Bi-path Combination YOLO for Real-time Few-shot Object Detection", PATTERN RECOGNITION LETTERS., ELSEVIER, AMSTERDAM., NL, vol. 165, 1 December 2022 (2022-12-01), NL , pages 91 - 97, XP087247996, ISSN: 0167-8655, DOI: 10.1016/j.patrec.2022.11.025 *

Also Published As

Publication number Publication date
CN115393634A (en) 2022-11-25
CN115393634B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
CN109961034B (en) Video target detection method based on convolution gating cyclic neural unit
WO2019228317A1 (en) Face recognition method and device, and computer readable medium
CN107610087B (en) Tongue coating automatic segmentation method based on deep learning
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN111898406B (en) Face detection method based on focus loss and multitask cascade
CN109711422A (en) Image real time transfer, the method for building up of model, device, computer equipment and storage medium
CN111860587B (en) Detection method for small targets of pictures
Gao et al. YOLOv4 object detection algorithm with efficient channel attention mechanism
CN110348447A (en) A kind of multiple-model integration object detection method with rich space information
CN115661943A (en) Fall detection method based on lightweight attitude assessment network
CN113052039B (en) Method, system and server for detecting pedestrian density of traffic network
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
WO2024032010A1 (en) Transfer learning strategy-based real-time few-shot object detection method
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN113205103A (en) Lightweight tattoo detection method
CN110163130B (en) Feature pre-alignment random forest classification system and method for gesture recognition
CN116580322A (en) Unmanned aerial vehicle infrared small target detection method under ground background
CN115564983A (en) Target detection method and device, electronic equipment, storage medium and application thereof
Chen et al. Ship Detection with Optical Image Based on Attention and Loss Improved YOLO
Jeevanantham et al. Deep Learning Based Plant Diseases Monitoring and Detection System
Tu et al. Toward automatic plant phenotyping: starting from leaf counting
JP7239002B2 (en) OBJECT NUMBER ESTIMATING DEVICE, CONTROL METHOD, AND PROGRAM
CN111401286B (en) Pedestrian retrieval method based on component weight generation network
CN113887455A (en) Face mask detection system and method based on improved FCOS
Qu et al. Visual tracking with genetic algorithm augmented logistic regression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851240

Country of ref document: EP

Kind code of ref document: A1