WO2023109208A1 - 小样本目标检测方法及装置 - Google Patents

小样本目标检测方法及装置 Download PDF

Info

Publication number
WO2023109208A1
WO2023109208A1 PCT/CN2022/117896 CN2022117896W WO2023109208A1 WO 2023109208 A1 WO2023109208 A1 WO 2023109208A1 CN 2022117896 W CN2022117896 W CN 2022117896W WO 2023109208 A1 WO2023109208 A1 WO 2023109208A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
learning
features
backbone network
different
Prior art date
Application number
PCT/CN2022/117896
Other languages
English (en)
French (fr)
Inventor
欧中洪
杨峻伟
康霄阳
范家伟
于勰
宋美娜
Original Assignee
北京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京邮电大学 filed Critical 北京邮电大学
Priority to US18/551,919 priority Critical patent/US20240177462A1/en
Publication of WO2023109208A1 publication Critical patent/WO2023109208A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to the field of deep learning, and in particular to a small sample target detection method and device.
  • Object Detection is one of the research branches in the field of computer vision, and its main task is to classify and locate objects in images.
  • General object detection is mainly divided into two main branches: two-stage detection network and single-stage detection network. Both require pre-training on a large number of labeled datasets, resulting in a single coverage scene, and when migrating to a scene with scarce labeled data, it is easy to cause a drop in detection accuracy.
  • the present disclosure aims to solve one of the technical problems in the related art at least to a certain extent.
  • the embodiment of the first aspect of the present disclosure proposes a small sample target detection method, including:
  • the weight of the backbone network and the weight of the feature pyramid are derived from the visual representation backbone network generated by self-supervised training;
  • the candidate region comes from the foreground and background classification and regression results of the region proposal network on the output features of the visual representation backbone network;
  • the candidate area According to the candidate area, generate a candidate area feature of a uniform size by means of a pooling operator, and perform position regression, content classification, and fine-grained feature mining of the candidate area feature of the uniform size;
  • the fine-grained feature mining includes: The candidate region features are divided into regions, the characteristics of different regions after division are extracted, and the same label is assigned according to the division results from the same candidate region, and the strategy of assigning different labels to the division results from different candidate regions;
  • a loss function is formed according to the strategy in the fine-grained feature mining, and the detection network parameters are updated through the calculation of the loss function.
  • the self-supervised training produces a visual representation backbone network comprising:
  • the image data of each view in the plurality of views is input to multiple backbone networks for comparative learning between different granularity features, and a visual representation backbone network is generated according to the comparative learning of global features and local features.
  • the image data of each of the plurality of views of the image data are respectively input to a plurality of backbone networks for comparative learning between features of different granularities, and a visual representation is generated according to the comparative learning of global features and local features Backbone network, including:
  • Pre-training fine-grained features including: the unsupervised data pseudo-label presents different feature granularities at different levels of the feature pyramid, and forms a unified output through pooling;
  • a visual representation backbone network includes: joint global feature and local feature comparative learning, using the loss function of the comparative learning to update model parameters, and finally generate a backbone network with different scale object representation capabilities .
  • the plurality of backbone networks include:
  • a first backbone network where the first backbone network is used for backpropagation to update parameters
  • the second backbone network is used to eliminate the gradient of all parameters and perform parameter update through momentum update.
  • the loss function includes:
  • the loss function uses InfoNCE, and the overall loss comes from global feature learning and local feature learning.
  • the specific formula is as follows:
  • q g represents the first global feature learning result
  • k g+ represents the second global feature learning result
  • k gi represents the third global feature learning result
  • q l , k l+ and k li represent the corresponding local feature learning
  • represents the hyperparameter
  • L total represents the overall loss
  • L global represents the loss of global feature learning
  • L local represents the loss of local learning
  • the whole small sample target detection method contains a total of three loss functions, namely the category loss function L cls , the prediction box regression loss function L bbox , and the sample feature mining loss function L feat , and the overall loss is a mixture of the three loss functions in the same proportion , the formula is as follows:
  • the generating candidate regions further includes:
  • the feature pyramid network is embedded in the backbone network as a component of the detection network to extract image features of different granularities to generate candidate regions.
  • the embodiment of the second aspect of the present disclosure proposes a small sample target detection device, including:
  • the sending module is used to send the weight of the backbone network and the weight of the feature pyramid to the basic detection network, and the weight of the backbone network and the weight of the feature pyramid are derived from the visual representation backbone network;
  • the candidate regions of the first generation module come from the foreground and background classification and regression results of the output features of the visual representation backbone network by the region suggestion network;
  • the second generation module is used to generate a uniform size candidate region feature by means of a pooling operator according to the candidate region, and perform position regression, content classification and fine-grained feature mining of the uniform size candidate region feature;
  • the learning module is used to use the fine-grained feature mining to construct fine-grained positive sample pairs and negative sample pairs to form contrastive learning between the fine-grained features of the candidate area.
  • the fine-grained feature mining includes: The candidate region features of the same size are divided into regions, the characteristics of different regions after division are extracted, and the same label is assigned according to the division results from the same candidate region, and the strategy of assigning different labels to the division results from different candidate regions;
  • the update module is used to form a loss function according to the strategy in the fine-grained feature mining, and update the detection network parameters through the calculation of the loss function.
  • the embodiment of the third aspect of the present application proposes a computer device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the computer program
  • the computer program executes the computer program
  • the small sample target detection method described in the embodiment of the first aspect of the present application is realized.
  • the embodiment of the fourth aspect of the present application proposes a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the embodiment of the first aspect of the application is realized.
  • the embodiment of the fifth aspect of the present application proposes a computer program product, including computer programs or instructions.
  • a small sample object detection method
  • this disclosure proposes a small-sample target detection method, device, computer equipment, and non-transitory computer-readable storage medium, by adding a sample feature mining learning module to provide richer features for small-sample learning, Through the feature mining within samples and between samples, the attention to fine-grained features of samples within a class, the representation ability of categories in feature space, and the accuracy of model detection are improved.
  • FIG. 1 is a schematic flowchart of a small sample target detection method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of the overall flow of self-supervised training provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an overall process flow of small-sample target detection provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an in-sample feature mining learning method provided by an embodiment of the present disclosure
  • Fig. 5 is a schematic structural diagram of a small-sample target detection device provided by an embodiment of the present disclosure.
  • Few-shot Learning has gradually attracted the attention of academia and industry.
  • Small sample learning aims to mine potential features of samples through a small amount of labeled data, and then fit a robust feature space through a small amount of data to complete related visual perception tasks.
  • Few-Shot Object Detection as one of the important branches in the field of small sample learning, aims to complete the classification and positioning of corresponding objects through a small amount of new class label data.
  • Small-sample target detection based on Meta Learning Aiming at letting the model learn "how to learn", it can be divided into optimization-based methods and metric-based learning methods. Among them, the optimization-based method aims at constructing an effective parameter update rule or constructing a good parameter initialization rule; the metric-based method focuses on how to construct a robust feature embedding space, and forms different categories in the embedding space through similarity calculations. characterization.
  • This method first conducts sufficient training on the base class with abundant samples, and then uses a small amount of data of the new class to fine-tune the model, and achieves better generalization performance on the new class without losing the knowledge of the base class as much as possible.
  • this disclosure proposes a small sample target detection method, by using self-supervised learning instead of large data sets, and designing self-supervised tasks, constructing a robust visual representation backbone network, providing good parameter initialization for small sample target detection direction. Moreover, based on the spatial attention mechanism between samples and within samples, the essential characteristics of data are mined to improve the performance of the model on small sample data sets.
  • FIG. 1 is a schematic flowchart of a small sample target detection method provided by an embodiment of the present disclosure.
  • the small sample target detection method includes the following steps S10 to S50:
  • Step S10 Send the weight of the backbone network and the weight of the feature pyramid to the detection network, the weight of the backbone network and the weight of the feature pyramid are derived from the visual representation backbone network generated by self-supervised training.
  • the weights of the ResNet series backbone network used in self-supervised training can be completely migrated to the basic detection network, such as Faster R-CNN, Cascade R-CNN, etc.
  • self-supervised training generates a visual representation backbone network, including:
  • the original image data comes from the output of the image acquisition device
  • the image data of each view in the image data of multiple views is input to multiple backbone networks for comparative learning between different granularity features, and a visual representation backbone network is generated according to the comparative learning of global features and local features.
  • the image data of each view in the image data of multiple views are respectively input to multiple backbone networks for comparative learning between different granularity features, and according to the comparison between global features and local features Learn to generate a visual representation backbone network, including:
  • unsupervised data pseudo-labels generate corresponding views from unlabeled original image data through different data enhancements, and input them into multiple backbone networks respectively, and the views from the same unlabeled original image data are extracted through multiple backbone networks , assigning the same pseudo-label, and assigning different pseudo-labels to views from different unlabeled data through feature extraction of multiple backbone networks;
  • Pre-training fine-grained features including: unsupervised data pseudo-labels present different feature granularities at different levels of the feature pyramid, and form a unified output through pooling;
  • Generate a backbone network for visual representation including: comparative learning of joint global features and local features, using the loss function of comparative learning to update model parameters, and finally generate a backbone network with different scale object representation capabilities.
  • the loss function uses InfoNCE, and the overall loss comes from global feature learning and local feature learning.
  • the specific formula is as follows:
  • q g is the first global feature learning result
  • k g+ is the second global feature learning result
  • k gi is the third global feature learning result
  • q l , k l+ and k li are the corresponding local feature learning results
  • represents hyperparameters
  • L total is the overall loss
  • L global is the loss of global feature learning
  • L local is the loss of local learning.
  • the first global feature learning result is the global feature learning result in Figure 2
  • the second global feature learning result is the global feature encoding result in Figure 3
  • the third global feature learning result is Global feature encoding results in Figure 3 produced by different images.
  • multiple backbone networks may include:
  • the first backbone network the first backbone network is used for backpropagation to perform parameter update;
  • the second backbone network is used to eliminate the gradient of all parameters and perform parameter update through momentum update.
  • the multiple backbone networks usually use ResNet, and the comparative learning of the joint global features and local features uses the same method as MoCo v2 for image-level comparative learning .
  • FIG. 3 is a schematic diagram of an overall process flow of small-sample target detection provided by the present disclosure.
  • This figure follows the small-sample detection framework based on transfer learning, where the base class represents a category that contains sufficient labeled data, and the new class represents a detection scenario that only contains a small number of labeled samples.
  • transfer learning is used to guide the transfer of knowledge from the base class to the new class, and then achieve better model performance under the condition of few samples.
  • Step S20 Generate candidate regions, which come from the background and background classification and regression results of the region proposal network on the output features of the visual representation backbone network.
  • generating the candidate region also includes:
  • the feature pyramid network is embedded in the backbone network as a component of the detection network to extract image features of different granularities to generate candidate regions.
  • they are respectively used to extract and generate candidate regions with sizes ranging from 32x32 to 512x512, so as to perform position regression on objects of different sizes.
  • Step S30 According to the candidate area, the pooling operator is used to generate the feature of the candidate area of uniform size, and the position regression, content classification and fine-grained feature mining of the feature of the candidate area of uniform size are performed.
  • the pooling operator may include:
  • Step S40 Utilize fine-grained feature mining to construct fine-grained positive sample pairs and negative sample pairs to form comparative learning between fine-grained features of candidate regions.
  • Fine-grained feature mining includes: use the method of equalization to divide the features of candidate regions of uniform size into regions, extract the features of different regions after division, and assign the same label according to the division results from the same candidate region, and the division from different candidate regions Strategies for assigning different labels as a result.
  • a fine-grained feature representation of the local instance is formed by extracting features from different regions, and the final dimension of the feature is 128.
  • the present disclosure provides an intra-sample feature mining learning head and an inter-sample feature mining learning head.
  • In-sample feature mining The learning head performs window sliding on the input sample features to generate refined features at different positions of each candidate box, uses pseudo-label assignment and comparative learning to mine finer sample features, and provides more accurate information for the classification of candidate boxes. granular information.
  • Step S50 Form a loss function according to the strategy in fine-grained feature mining, and update the parameters of the detection network through the calculation of the loss function.
  • the loss function may include:
  • the whole small sample target detection method contains a total of three loss functions, namely the category loss function L cls , the prediction box regression loss function L bbox , and the sample feature mining loss function L feat , and the overall loss is a mixture of the three loss functions in the same proportion , the formula is as follows:
  • this disclosure proposes to use self-supervised pre-training instead of supervision.
  • Pre-training focuses on features within instances, which can mine fine-grained features in samples and effectively reserve space. Information; spatial feature attention between instances can widen the distance between different categories of data and improve the accuracy of category information representation in feature space.
  • Fig. 5 is a schematic structural diagram of a small-sample target detection device provided by an embodiment of the present disclosure.
  • the small-sample object detection device includes the following modules: a sending module 510 , a first generating module 520 , a second generating module 530 , a learning module 540 and an updating module 550 .
  • the sending module 510 is configured to send the weight of the backbone network and the weight of the feature pyramid to the basic detection network, and the weight of the backbone network and the weight of the feature pyramid are derived from the visual representation backbone network generated by self-supervised training.
  • the first generating module 520 is used to generate candidate regions, which come from the foreground and background classification and regression results of the region proposal network on the output features of the visual representation backbone network.
  • the second generation module 530 is used to generate candidate region features of a uniform size by means of a pooling operator according to the candidate region, and perform position regression, content classification and fine-grained feature mining of the candidate region features of a uniform size.
  • the learning module 540 is used to use fine-grained feature mining to construct fine-grained positive sample pairs and negative sample pairs to form comparative learning between fine-grained features of candidate regions.
  • the candidate region features are divided into regions, the features of different regions after division are extracted, and the same label is assigned according to the division results from the same candidate region, and the strategy of assigning different labels to the division results from different candidate regions.
  • the update module 550 is used to form a loss function according to the strategy in the fine-grained feature mining, and to update the detection network parameters through the calculation of the loss function.
  • the first generation module 520 is configured to embed the feature pyramid network as a component of the detection network in the backbone network to extract image features of different granularities to generate candidate regions.
  • this disclosure proposes a small sample target detection device, proposes to use self-supervised pre-training instead of supervised pre-training, and designs a learning task combining global and local features, and adds a sample feature mining learning module , which aims to provide richer features for small-sample learning.
  • a sample feature mining learning module By mining intra-sample features and inter-sample features, it improves the attention to fine-grained features of intra-class samples, the ability to represent categories in feature space, and the accuracy of model detection.
  • the embodiment of the third aspect of the present application proposes a computer device, on which a memory, a processor, and a computer program stored on the memory and operable on the processor are provided, and the computer program is controlled by When executed by the processor, the method for small-sample object detection described in the embodiment of the first aspect of the present application is realized.
  • the embodiment of the fourth aspect of the present application proposes a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the embodiment of the first aspect of the application is realized.
  • the described method for small sample target detection is described.
  • the embodiment of the fifth aspect of the present application proposes a computer program product, including computer programs or instructions.
  • a small sample object detection method
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate, or transmit a program for use in or in conjunction with an instruction execution system, apparatus, or device.
  • computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, as it may be possible, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or other suitable processing if necessary.
  • the program is processed electronically and stored in computer memory.
  • various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof.
  • various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本公开提出一种小样本目标检测方法,包括:向检测网络发送骨干网络的权重以及特征金字塔的权重;生成候选区域,候选区域来自于区域建议网络对视觉表征骨干网络输出特征的前后景分类与回归结果;根据候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘;利用细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习;按照细粒度特征挖掘中的策略形成损失函数,通过损失函数的计算进行检测网络参数的更新。

Description

小样本目标检测方法及装置
相关申请的交叉引用
本申请基于申请号为202111535847.6、申请日为2021年12月15日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及深度学习领域,尤其涉及一种小样本目标检测方法和装置。
背景技术
近年来,深度卷积神经网络(Deep Convolutional Neural Networks)的发展极大促进了计算机视觉(Computer Vision)领域相关算法的进步。目标检测(Object Detection)作为计算机视觉领域的研究分支之一,其主要任务是对图像中的目标进行分类和定位。通用目标检测主要分为两大主流分支:两阶段检测网络和单阶段检测网络。二者都需在大量标注数据集上进行预训练,所以导致覆盖场景单一,并且当迁移至标注数据匮乏的场景时,易造成检测精度下降。
发明内容
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。
为达上述目的,本公开第一方面实施例提出了一种小样本目标检测方法,包括:
向检测网络发送骨干网络的权重以及特征金字塔的权重,所述骨干网络的权重以及所述特征金字塔的权重来源于自监督训练产生的视觉表征骨干网络;
生成候选区域,所述候选区域来自于区域建议网络对所述视觉表征骨干网络输出特征的前后景分类与回归结果;
根据所述候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行所述统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘;
利用所述细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习,所述细粒度特征挖掘包括:使用均分的方式对所述统一尺寸的候选区域特征进去区域划分,提取划分后的不同区域的特征,并根据来自同一候选区域的划分结果分配相同标签,来自不同候选区域的划分结果分配不同标签的策略;
按照所述细粒度特征挖掘中的策略形成损失函数,通过所述损失函数的计算进行检测网络参数的更新。
在一些实施例中,所述自监督训练产生视觉表征骨干网络,包括:
获取无标注的原始图像数据,所述原始图像数据来源于图像采集设备的输出结果;
分别对所述原始图像数据进行不同的数据增强,得到对应的多个视图的图像数据;
将所述多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络。
在一些实施例中,将所述多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络,包括:
构造无监督数据伪标签,将所述无标注的原始图像数据经过不同的数据增强生成相应的视图,分别输入所述多个骨干网络,来源于相同所述无标注的原始图像数据的视图经过所述多个骨干网络特征提取,分配相同的伪标签,来源于不同所述无标注数据的视图经过所述多个骨干网络的特征提取,分配不同的伪标签;
预训练细粒度特征,包括:所述无监督数据伪标签在特征金字塔的不同层级呈现不同的特征粒度,通过池化形成统一输出;
生成视觉表征骨干网络,所述生成视觉表征骨干网络,包括:联合全局特征与局部特征的对比学习,使用所述对比学习的损失函数进行模型参数更新,最终生成具有不同尺度物体表征能力的骨干网络。
在一些实施例中,所述多个骨干网络包括:
第一骨干网,所述第一骨干网用于反向传播进行参数更新;
第二骨干网,所述第二骨干网用于剔除所有参数的梯度通过动量更新进行参数更新。
在一些实施例中,所述损失函数包括:
损失函数采用InfoNCE,总体损失来源于全局特征学习和局部特征学习,具体公式如下:
Figure PCTCN2022117896-appb-000001
其中,q g代表了第一全局特征学习结果,k g+代表了第二全局特征学习结果,k gi代表第三全局特征学习结果,q l、k l+和k li分别代表了相应的局部特征学习结果,τ表示超参数,以及L total代表了总体损失,L global代表全局特征学习的损失,L local代表局部学习的损失;
整个小样本目标检测方法总共包含三个损失函数,分别为类别损失函数L cls,预测框回归损失函数L bbox,以及样本特征挖掘损失函数L feat,总体损失为三个损失函数按照相同比例的混合,公式如下:
Figure PCTCN2022117896-appb-000002
在一些实施例中,所述生成候选区域,还包括:
将所述特征金字塔网络作为检测网络的组件嵌入在骨干网络中,用来提取不同粒度的图像特征,以生成的候选区域。
为达上述目的,本公开第二方面实施例提出了一种小样本目标检测装置,包括:
发送模块用于向基础检测网络发送骨干网络的权重以及特征金字塔的权重,所述所述骨干网络的权重以及所述特征金字塔的权重来源于所述视觉表征骨干网络;
第一生成模块所述候选区域来自于区域建议网络对所述视觉表征骨干网络输出特征的前后景分类与回归结果;
第二生成模块用于根据所述候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行所述统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘;
学习模块用于利用所述细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习,所述细粒度特征挖掘包括:使用均分的方式对所述统一尺寸的候选区域特征进去区域划分,提取划分后的不同区域的特征,并根据来自同一候选区域的划分结果分配相同标签,来自不同候选区域的划分结果分配不同标签的策略;
更新模块用于按照所述细粒度特征挖掘中的策略形成损失函数,通过所述损失函数的计算进行检测网络参数的更新。
为达上述目的,本申请第三方面实施例提出了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现本申请第一方面实施例所述的小样本目标检测方法。
为达上述目的,本申请第四方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现本申请第一方面实施例所述的一种小样本目标检测方法。
为达上述目的,本申请第五方面实施例提出了一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现本申请第一方面实施例所述的一种小样本目标检测方法。
综上所述,本公开提出一种小样本目标检测方法,、装置、计算机设备和非临时性计算机可读存储介质,通过增加一个样本特征挖掘学习模块,为小样本学习提供更丰富的特征,通过对样本内特征挖掘以及样本间特征挖掘,提升对类内样本的细粒度特征关注以及类别在特征空间中的表征能力,以及模型检测精度。
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。
附图说明
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本公开实施例所提供的一种小样本目标检测方法的流程示意图;
图2为本公开实施例所提供的自监督训练整体流程示意图;
图3为本公开实施例所提供的一种小样本目标检测整体流程示意图;
图4为本公开实施例所提供的样本内特征挖掘学习方式示意图;
图5为本公开实施例所提供的一种小样本目标检测装置的结构示意图。
具体实施方式
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。
相关技术中,小样本学习(Few-shot Learning)开始逐渐受到学术界和工业界关注。 小样本学习旨在通过少量标注数据,挖掘样本潜在特征,继而通过少量数据拟合出鲁棒的特征空间,完成相关视觉感知任务。小样本目标检测(Few-Shot Object Detection)作为小样本学习领域的重要分支之一,旨在通过少量新类标注数据完成相应物体的分类与定位。目前小样本目标检测的方案主要有以下两种。
基于元学习(Meta Learning)的小样本目标检测。旨在让模型学习“如何去学”,可分为基于优化的方法和基于度量学习的方法。其中,基于优化的方法旨在构建有效的参数更新法则或构建良好的参数初始化法则;基于度量学习的方法聚焦于如何构建一个健壮的特征嵌入空间,通过相似性计算形成不同类别在嵌入空间中的表征。
基于迁移学习(Transfer Learning)的小样本目标检测。该方法首先在拥有丰富样本的基类上进行充足训练,然后使用新类的少量数据对模型进行微调,在尽量不丢失基类知识的情况下,在新类上实现较好的泛化性能。
但是,基于元学习在训练时引入了额外的模型参数,增加了模型的空间复杂度,同时容易导致过拟合问题,从而无法产生良好的网络初始化法则。另外,基于迁移学习通过微调的方式将基类学习到的知识迁移到新类中,由于新类中数据量过少,导致模型对空间信息的关注度不够,在物体分类模块容易产生错误分类,导致检测精度低。因此,亟需“一种泛化性能更高的小样本目标检测方法”。
为此,本公开提出一种小样本目标检测方法,通过使用自监督学习代替在大型数据集,并设计自监督任务,构建一个健壮的视觉表征骨干网,为小样本目标检测提供良好的参数初始化方向。并且,基于样本间和样本内的空间注意力机制,挖掘数据的本质特征,提升模型在小样本数据集上的表现能力。
下面参考附图描述本公开实施例的小样本目标检测方法和装置。
图1为本公开实施例所提供的一种小样本目标检测方法的流程示意图。
如图1所示,该小样本目标检测方法包括以下步骤S10至S50:
步骤S10:向检测网络发送骨干网络的权重以及特征金字塔的权重,所述骨干网络的权重以及所述特征金字塔的权重来源于自监督训练产生的视觉表征骨干网络。
需要说明的是,在本公开的一个实施例之中,自监督训练采用的ResNet系列骨干网络,可以将其权重完全迁移至基础检测网络,如Faster R-CNN、Cascade R-CNN等。
其中,在本公开的一个实施例之中,自监督训练产生视觉表征骨干网络,包括:
获取无标注的原始图像数据,原始图像数据来源于图像采集设备的输出结果;
分别对原始图像数据进行不同的数据增强,得到对应的多个视图的图像数据;
将多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络。
以及,在本公开的一个实施例之中,将多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络,包括:
构造无监督数据伪标签,将无标注的原始图像数据经过不同的数据增强生成相应的视图,分别输入多个骨干网络,来源于相同无标注的原始图像数据的视图经过多个骨干网络的特征提取,分配相同的伪标签,来源于不同无标注数据的视图经过多个骨干网络的特征提取, 分配不同的伪标签;
预训练细粒度特征,包括:无监督数据伪标签在特征金字塔的不同层级呈现不同的特征粒度,通过池化形成统一输出;
生成视觉表征骨干网络,包括:联合全局特征与局部特征的对比学习,使用对比学习的损失函数进行模型参数更新,最终生成具有不同尺度物体表征能力的骨干网络。
其中,在本公开的一个实施例之中,损失函数采用InfoNCE,总体损失来源于全局特征学习和局部特征学习,具体公式如下:
Figure PCTCN2022117896-appb-000003
其中,q g为第一全局特征学习结果,k g+为第二全局特征学习结果,k gi为第三全局特征学习结果,q l、k l+和k li分别为相应的局部特征学习结果,τ表示超参数,以及L total为总体损失,L global为全局特征学习的损失,L local为局部学习的损失。
进一步地,在本公开的一个实施例之中,第一全局特征学习结果为图2中全局特征学习结果,第二全局特征学习结果为图3中全局特征编码结果,第三全局特征学习结果为不同图像产生的图3中全局特征编码结果。
进一步地,在本公开的一个实施例之中,多个骨干网络可以包括:
第一骨干网,该第一骨干网用于反向传播进行参数更新;
第二骨干网,该第二骨干网用于剔除所有参数的梯度通过动量更新进行参数更新。
需要说明的是,在本公开的一个实施例之中,该多个骨干网通常采用ResNet,以及,该联合全局特征与局部特征的对比学习,采用和MoCo v2相同的方式进行图片级的对比学习。
此外,图3为本公开所提供的一种小样本目标检测整体流程示意图。该图沿用了基于迁移学习的小样本检测框架,其中基类表示包含充足标注数据的类别,新类表示仅包含少量标注样本的检测场景。模型在基类数据训练后,通过迁移学习,引导知识从基类向新类传递,继而在少样本条件下,实现较好的模型性能。
步骤S20:生成候选区域,候选区域来自于区域建议网络对视觉表征骨干网络输出特征的前后景分类与回归结果。
在本公开的一个实施例之中,生成候选区域,还包括:
将特征金字塔网络作为检测网络的组件嵌入在骨干网络中,用来提取不同粒度的图像特征,以生成的候选区域。
以及,在本公开的一个实施例之中,分别用于提取生成从32x32至512x512尺寸的候选区域,以便于对不同尺寸的目标进行位置回归。
步骤S30:根据候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘。
在本公开的一个实施例之中,该池化算子可以包括:
ROI Pooling;
ROI Align。
步骤S40:利用细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习。细粒度特征挖掘包括:使用均分的方式对统一尺寸的候选区域特征进去区域划分,提取划分后的不同区域的特征,并根据来自同一候选区域的划分结果分配相同标签,来自不同候选区域的划分结果分配不同标签的策略。
在本公开的一个实施例之中,通过对不同区域的特征提取,形成关于实例局部的细粒度特征表征,该特征最终维度为128。
如图4所示,本公开提供了一种样本内特征挖掘学习头以及样本间特征挖掘学习头。样本内特征挖掘学习头对于输入的样本特征进行窗口滑动,产生每一个候选框不同位置的细化特征,采用伪标签分配以及对比学习的方式挖掘更精细的样本特征,为候选框的分类提供更精细的信息。
步骤S50:按照细粒度特征挖掘中的策略形成损失函数,通过损失函数的计算进行检测网络参数的更新。
在本公开的一个实施例之中,该损失函数可以包括:
整个小样本目标检测方法总共包含三个损失函数,分别为类别损失函数L cls,预测框回归损失函数L bbox,以及样本特征挖掘损失函数L feat,总体损失为三个损失函数按照相同比例的混合,公式如下:
Figure PCTCN2022117896-appb-000004
综上所述,本公开提出一种小样本目标检测方法之中,本公开提出使用自监督预训练代替有监督,预训练面向实例内的特征关注可以挖掘样本内的细粒度特征,有效保留空间信息;面向实例间的空间特征关注可以拉大不同类别数据间的距离,提升特征空间内类别信息表征的准确性。
图5为本公开实施例所提供的一种小样本目标检测装置的结构示意图。
如图5所示,该小样本目标检测装置包括以下模块发送模块510、第一生成模块520、第二生成模块530、学习模块540和更新模块550。
发送模块510用于向基础检测网络发送骨干网络的权重以及特征金字塔的权重,骨干网络的权重以及特征金字塔的权重来源于自监督训练产生的视觉表征骨干网络。
第一生成模块520用于生成候选区域,所述候选区域来自于区域建议网络对视觉表征骨干网络输出特征的前后景分类与回归结果。
第二生成模块530用于根据候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘。
学习模块540用于利用细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习,细粒度特征挖掘包括:使用均分的方式对统一尺寸的候选区域特征进去区域划分,提取划分后的不同区域的特征,并根据来自同一候选区域的划分结果分配相同标签,来自不同候选区域的划分结果分配不同标签的策略。
更新模块550用于按照细粒度特征挖掘中的策略形成损失函数,通过损失函数的计算进 行检测网络参数的更新。
在一些实施例中,所述第一生成模块520用于将所述特征金字塔网络作为检测网络的组件嵌入在骨干网络中,用来提取不同粒度的图像特征,以生成的候选区域。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
综上所述,本公开提出一种小样本目标检测装置,提出使用自监督预训练代替有监督预训练,并设计了全局与局部特征相结合的学习任务,并增加了一个样本特征挖掘学习模块,旨在为小样本学习提供更丰富的特征,通过对样本内特征挖掘以及样本间特征挖掘,提升对类内样本的细粒度特征关注以及类别在特征空间中的表征能力,以及模型检测精度。
为达上述目的,本申请第三方面实施例提出了一种计算机设备,其上存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被处理器执行时实现本申请第一方面实施例所述的一种小样本目标检测的方法。
为达上述目的,本申请第四方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现本申请第一方面实施例所述的一种小样本目标检测的方法。
为达上述目的,本申请第五方面实施例提出了一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现本申请第一方面实施例所述的一种小样本目标检测的方法。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以 供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (15)

  1. 一种小样本目标检测方法,包括:
    向检测网络发送骨干网络的权重以及特征金字塔的权重,所述骨干网络的权重以及所述特征金字塔的权重来源于自监督训练产生的视觉表征骨干网络;
    生成候选区域,所述候选区域来自于区域建议网络对所述视觉表征骨干网络输出特征的前后景分类与回归结果;
    根据所述候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行所述统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘;
    利用所述细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习,所述细粒度特征挖掘包括:使用均分的方式对所述统一尺寸的候选区域特征进去区域划分,提取划分后的不同区域的特征,并根据来自同一候选区域的划分结果分配相同标签,来自不同候选区域的划分结果分配不同标签的策略;
    按照所述细粒度特征挖掘中的策略形成损失函数,通过所述损失函数的计算进行检测网络参数的更新。
  2. 如权利要求1所述的方法,其中,所述自监督训练产生视觉表征骨干网络,包括:
    获取无标注的原始图像数据,所述原始图像数据来源于图像采集设备的输出结果;
    分别对所述原始图像数据进行不同的数据增强,得到对应的多个视图的图像数据;
    将所述多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络。
  3. 如权利要求2所述的方法,其中,将所述多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络,包括:
    构造无监督数据伪标签,将所述无标注的原始图像数据经过不同的数据增强生成相应的视图,分别输入多个骨干网络,来源于相同所述无标注的原始图像数据的视图经过所述多个骨干网络的特征提取,分配相同的伪标签,来源于不同所述无标注数据的视图经过所述多个骨干网络的特征提取,分配不同的伪标签;
    预训练细粒度特征,包括:所述无监督数据伪标签在特征金字塔的不同层级呈现不同的特征粒度,通过池化形成统一输出;
    生成视觉表征骨干网络,所述生成视觉表征骨干网络,包括:联合全局特征与局部特征的对比学习,使用所述对比学习的损失函数进行模型参数更新,最终生成具有不同尺度物体表征能力的骨干网络。
  4. 如权利要求2或3所述的方法,其中,所述多个骨干网络包括:
    第一骨干网,所述第一骨干网用于反向传播进行参数更新;
    第二骨干网,所述第二骨干网用于剔除所有参数的梯度通过动量更新进行参数更新。
  5. 如权利要求1至3中任一项所述的方法,其中,所述损失函数包括:
    损失函数采用InfoNCE,总体损失来源于全局特征学习和局部特征学习,具体公式如下:
    Figure PCTCN2022117896-appb-100001
    其中,q g代表了第一全局特征学习结果,k g+代表了第二全局特征学习结果,k gi代表第三全局特征学习结果,q l、k l+和k li分别代表了相应的局部特征学习结果,τ表示超参数,以及L total代表了总体损失,L global代表全局特征学习的损失,L local代表局部学习的损失;
    整个小样本目标检测方法总共包含三个损失函数,分别为类别损失函数L cls,预测框回归损失函数L bbox,以及样本特征挖掘损失函数L feat,总体损失为三个损失函数按照相同比例的混合,公式如下:
    Figure PCTCN2022117896-appb-100002
  6. 如权利要求1所述的方法,其中,所述生成候选区域,还包括:
    将所述特征金字塔网络作为检测网络的组件嵌入在骨干网络中,用来提取不同粒度的图像特征,以生成的候选区域。
  7. 一种小样本目标检测装置,包括:
    发送模块,用于向基础检测网络发送骨干网络的权重以及特征金字塔的权重,所述骨干网络的权重以及所述特征金字塔的权重来源于自监督训练产生的视觉表征骨干网络;
    第一生成模块,用于生成候选区域,所述候选区域来自于区域建议网络对所述视觉表征骨干网络输出特征的前后景分类与回归结果;
    第二生成模块,用于根据所述候选区域,借助池化算子生成统一尺寸的候选区域特征,并进行所述统一尺寸的候选区域特征的位置回归,内容分类以及细粒度特征挖掘;
    学习模块,用于利用所述细粒度特征挖掘,构建细粒度的正样本对以及负样本对,形成对候选区域细粒度特征间的对比学习,所述细粒度特征挖掘包括:使用均分的方式对所述统一尺寸的候选区域特征进去区域划分,提取划分后的不同区域的特征,并根据来自同一候选区域的划分结果分配相同标签,来自不同候选区域的划分结果分配不同标签的策略;
    更新模块,用于按照所述细粒度特征挖掘中的策略形成损失函数,通过所述损失函数的计算进行检测网络参数的更新。
  8. 如权利要求7所述的装置,其中,所述自监督训练产生视觉表征骨干网络,包括:
    获取无标注的原始图像数据,所述原始图像数据来源于图像采集设备的输出结果;
    分别对所述原始图像数据进行不同的数据增强,得到对应的多个视图的图像数据;
    将所述多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络。
  9. 如权利要求8所述的装置,其中,将所述多个视图的图像数据中每个视图的图像数据分别输入至多个骨干网络进行不同粒度特征间的对比学习,并根据全局特征与局部特征的对比学习生成视觉表征骨干网络,包括:
    构造无监督数据伪标签,将所述无标注的原始图像数据经过不同的数据增强生成相应的视图,分别输入多个骨干网络,来源于相同所述无标注的原始图像数据的视图经过所述多个 骨干网络的特征提取,分配相同的伪标签,来源于不同所述无标注数据的视图经过所述多个骨干网络的特征提取,分配不同的伪标签;
    预训练细粒度特征,包括:所述无监督数据伪标签在特征金字塔的不同层级呈现不同的特征粒度,通过池化形成统一输出;
    生成视觉表征骨干网络,所述生成视觉表征骨干网络,包括:联合全局特征与局部特征的对比学习,使用所述对比学习的损失函数进行模型参数更新,最终生成具有不同尺度物体表征能力的骨干网络。
  10. 如权利要求8或9所述的装置,其中,所述多个骨干网络包括:
    第一骨干网,所述第一骨干网用于反向传播进行参数更新;
    第二骨干网,所述第二骨干网用于剔除所有参数的梯度通过动量更新进行参数更新。
  11. 如权利要求7至9中任一项所述的装置,其中,所述损失函数包括:
    损失函数采用InfoNCE,总体损失来源于全局特征学习和局部特征学习,具体公式如下:
    Figure PCTCN2022117896-appb-100003
    其中,q g代表了第一全局特征学习结果,k g+代表了第二全局特征学习结果,k gi代表第三全局特征学习结果,q l、k l+和k li分别代表了相应的局部特征学习结果,τ表示超参数,以及L total代表了总体损失,L global代表全局特征学习的损失,L local代表局部学习的损失;
    整个小样本目标检测方法总共包含三个损失函数,分别为类别损失函数L cls,预测框回归损失函数L bbox,以及样本特征挖掘损失函数L feat,总体损失为三个损失函数按照相同比例的混合,公式如下:
    Figure PCTCN2022117896-appb-100004
  12. 根据权利要求7所述的装置,其中,所述第一生成模块用于将所述特征金字塔网络作为检测网络的组件嵌入在骨干网络中,用来提取不同粒度的图像特征,以生成的候选区域。
  13. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1至6中任一项所述的方法。
  14. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时,实现如权利要求1至6中任一所述的方法。
  15. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现如权利要求1至6中任一项所述的方法。
PCT/CN2022/117896 2021-12-15 2022-09-08 小样本目标检测方法及装置 WO2023109208A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/551,919 US20240177462A1 (en) 2021-12-15 2022-09-08 Few-shot object detection method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111535847.6 2021-12-15
CN202111535847.6A CN114399644A (zh) 2021-12-15 2021-12-15 一种基于小样本目标检测方法及装置

Publications (1)

Publication Number Publication Date
WO2023109208A1 true WO2023109208A1 (zh) 2023-06-22

Family

ID=81226553

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117896 WO2023109208A1 (zh) 2021-12-15 2022-09-08 小样本目标检测方法及装置

Country Status (3)

Country Link
US (1) US20240177462A1 (zh)
CN (1) CN114399644A (zh)
WO (1) WO2023109208A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664825A (zh) * 2023-06-26 2023-08-29 北京智源人工智能研究院 面向大场景点云物体检测的自监督对比学习方法及系统
CN116824274A (zh) * 2023-08-28 2023-09-29 江西师范大学 小样本细粒度图像分类方法及系统
CN117079075A (zh) * 2023-08-18 2023-11-17 北京航空航天大学 一种基于伪标签生成与校正的小样本目标检测方法
CN117197475A (zh) * 2023-09-20 2023-12-08 南京航空航天大学 一种面向大范围多干扰源场景的目标检测方法
CN117237697A (zh) * 2023-08-01 2023-12-15 北京邮电大学 一种小样本图像检测方法、系统、介质及设备
CN117292331A (zh) * 2023-11-27 2023-12-26 四川发展环境科学技术研究院有限公司 基于深度学习的复杂异物检测系统及方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399644A (zh) * 2021-12-15 2022-04-26 北京邮电大学 一种基于小样本目标检测方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200097771A1 (en) * 2018-09-25 2020-03-26 Nec Laboratories America, Inc. Deep group disentangled embedding and network weight generation for visual inspection
CN111611998A (zh) * 2020-05-21 2020-09-01 中山大学 一种基于候选区域面积和宽高的自适应特征块提取方法
CN112464879A (zh) * 2020-12-10 2021-03-09 山东易视智能科技有限公司 基于自监督表征学习的海洋目标检测方法及系统
CN112861720A (zh) * 2021-02-08 2021-05-28 西北工业大学 基于原型卷积神经网络的遥感图像小样本目标检测方法
CN113392855A (zh) * 2021-07-12 2021-09-14 昆明理工大学 一种基于注意力和对比学习的小样本目标检测方法
CN113642574A (zh) * 2021-07-30 2021-11-12 中国人民解放军军事科学院国防科技创新研究院 基于特征加权与网络微调的小样本目标检测方法
CN114399644A (zh) * 2021-12-15 2022-04-26 北京邮电大学 一种基于小样本目标检测方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200097771A1 (en) * 2018-09-25 2020-03-26 Nec Laboratories America, Inc. Deep group disentangled embedding and network weight generation for visual inspection
CN111611998A (zh) * 2020-05-21 2020-09-01 中山大学 一种基于候选区域面积和宽高的自适应特征块提取方法
CN112464879A (zh) * 2020-12-10 2021-03-09 山东易视智能科技有限公司 基于自监督表征学习的海洋目标检测方法及系统
CN112861720A (zh) * 2021-02-08 2021-05-28 西北工业大学 基于原型卷积神经网络的遥感图像小样本目标检测方法
CN113392855A (zh) * 2021-07-12 2021-09-14 昆明理工大学 一种基于注意力和对比学习的小样本目标检测方法
CN113642574A (zh) * 2021-07-30 2021-11-12 中国人民解放军军事科学院国防科技创新研究院 基于特征加权与网络微调的小样本目标检测方法
CN114399644A (zh) * 2021-12-15 2022-04-26 北京邮电大学 一种基于小样本目标检测方法及装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664825A (zh) * 2023-06-26 2023-08-29 北京智源人工智能研究院 面向大场景点云物体检测的自监督对比学习方法及系统
CN117237697A (zh) * 2023-08-01 2023-12-15 北京邮电大学 一种小样本图像检测方法、系统、介质及设备
CN117237697B (zh) * 2023-08-01 2024-05-17 北京邮电大学 一种小样本图像检测方法、系统、介质及设备
CN117079075A (zh) * 2023-08-18 2023-11-17 北京航空航天大学 一种基于伪标签生成与校正的小样本目标检测方法
CN116824274A (zh) * 2023-08-28 2023-09-29 江西师范大学 小样本细粒度图像分类方法及系统
CN116824274B (zh) * 2023-08-28 2023-11-28 江西师范大学 小样本细粒度图像分类方法及系统
CN117197475A (zh) * 2023-09-20 2023-12-08 南京航空航天大学 一种面向大范围多干扰源场景的目标检测方法
CN117197475B (zh) * 2023-09-20 2024-02-20 南京航空航天大学 一种面向大范围多干扰源场景的目标检测方法
CN117292331A (zh) * 2023-11-27 2023-12-26 四川发展环境科学技术研究院有限公司 基于深度学习的复杂异物检测系统及方法
CN117292331B (zh) * 2023-11-27 2024-02-02 四川发展环境科学技术研究院有限公司 基于深度学习的复杂异物检测系统及方法

Also Published As

Publication number Publication date
US20240177462A1 (en) 2024-05-30
CN114399644A (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2023109208A1 (zh) 小样本目标检测方法及装置
CN111814854B (zh) 一种无监督域适应的目标重识别方法
US11568245B2 (en) Apparatus related to metric-learning-based data classification and method thereof
AU2016201908B2 (en) Joint depth estimation and semantic labeling of a single image
CN105765609B (zh) 使用有向无环图的存储器促进
KR20210122855A (ko) 검출 모델 훈련 방법과 장치, 컴퓨터 장치, 및 저장 매체
JP2018200685A (ja) 完全教師あり学習用のデータセットの形成
CN109858505B (zh) 分类识别方法、装置及设备
WO2019123451A1 (en) System and method for use in training machine learning utilities
TWI831016B (zh) 機器學習方法、機器學習系統以及非暫態電腦可讀取媒體
CN114600130A (zh) 在无标记的情况下学习新图像类的处理
Ghosh et al. The class imbalance problem in deep learning
Mougeot et al. A deep learning approach for dog face verification and recognition
CN112861758A (zh) 一种基于弱监督学习视频分割的行为识别方法
US20220156585A1 (en) Training point cloud processing neural networks using pseudo-element - based data augmentation
KR102223687B1 (ko) 기계 학습 데이터 선택 방법 및 장치
CN113128565B (zh) 面向预训练标注数据不可知的图像自动标注系统和装置
US11816185B1 (en) Multi-view image analysis using neural networks
KR101919698B1 (ko) 실루엣을 적용한 그룹 탐색 최적화 데이터 클러스터링 방법 및 시스템
CN116188478A (zh) 图像分割方法、装置、电子设备及存储介质
US20210365735A1 (en) Computer-implemented training method, classification method and system and computer-readable recording medium
JP2020181265A (ja) 情報処理装置、システム、情報処理方法及びプログラム
Jabari et al. Semi-Automated X-ray Transmission Image Annotation Using Data-efficient Convolutional Neural Networks and Cooperative Machine Learning
KR102594480B1 (ko) Mim 기반의 퓨샷 객체 검출 모델 학습 방법
Hill Modeling and analysis of mitochondrial dynamics using dynamic social network graphs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22905962

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18551919

Country of ref document: US