CN114612663B - Domain self-adaptive instance segmentation method and device based on weak supervision learning - Google Patents
Domain self-adaptive instance segmentation method and device based on weak supervision learning Download PDFInfo
- Publication number
- CN114612663B CN114612663B CN202210236149.4A CN202210236149A CN114612663B CN 114612663 B CN114612663 B CN 114612663B CN 202210236149 A CN202210236149 A CN 202210236149A CN 114612663 B CN114612663 B CN 114612663B
- Authority
- CN
- China
- Prior art keywords
- mask
- instance segmentation
- instance
- domain
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 101
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000005070 sampling Methods 0.000 claims abstract description 15
- 238000012937 correction Methods 0.000 claims abstract description 8
- 238000002372 labelling Methods 0.000 claims abstract description 8
- 230000003044 adaptive effect Effects 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims 2
- 238000009833 condensation Methods 0.000 claims 1
- 230000005494 condensation Effects 0.000 claims 1
- 238000013461 design Methods 0.000 claims 1
- 238000012795 verification Methods 0.000 abstract description 5
- 238000004220 aggregation Methods 0.000 abstract 1
- 230000002776 aggregation Effects 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 230000006978 adaptation Effects 0.000 description 6
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及实例分割技术领域,具体涉及一种基于弱监督学习的域自适应实例分割方法及装置。The present invention relates to the technical field of instance segmentation, and in particular to a domain adaptive instance segmentation method and device based on weakly supervised learning.
背景技术Background Art
在过去十年中,实例分割一直是研究和工程的活跃领域。它被用于多个信号处理领域,例如图像编辑、场景理解、自动驾驶和人机交互。随着深度卷积神经网络(DeepConvolutional Neural Network,DCNN)的快速发展,当前的实例分割方法获得了令人满意的效果和效率。然而,当涉及到域自适应,从源域学习的分割模型应用于目标域时,由于源域和目标域之间存在数据漂移(Data drift),它们遇到了性能迅速下降的尴尬。目标域上大量图像像素级的标注需要极大的时间成本,而无监督方法最小化源域的任务特定损失和域对抗性损失,受到源域和目标域之间分布重叠的限制。另外一些方法采用自训练策略,通过使用目标特定的伪标签来微调分割模型,由于伪标签中的噪声或引入了强假设,只能获得有限的改进。Instance segmentation has been an active area of research and engineering over the past decade. It is used in multiple signal processing fields, such as image editing, scene understanding, autonomous driving, and human-computer interaction. With the rapid development of deep convolutional neural networks (DCNNs), current instance segmentation methods have achieved satisfactory results and efficiency. However, when it comes to domain adaptation, when segmentation models learned from the source domain are applied to the target domain, they encounter the embarrassment of rapidly degrading performance due to the data drift between the source and target domains. Pixel-level annotation of a large number of images on the target domain requires a huge time cost, while unsupervised methods minimize the task-specific loss and domain adversarial loss of the source domain, which is limited by the distribution overlap between the source and target domains. Other methods adopt a self-training strategy to fine-tune the segmentation model by using target-specific pseudo-labels, which can only achieve limited improvements due to the noise in the pseudo-labels or the introduction of strong assumptions.
发明内容Summary of the invention
本发明的目的在于针对现有技术的不足,提供一种基于弱监督学习的域自适应实例分割方法及装置。The purpose of the present invention is to provide a domain adaptive instance segmentation method and device based on weakly supervised learning in view of the deficiencies in the prior art.
本发明的目的是通过以下技术方案来实现的:The objective of the present invention is achieved through the following technical solutions:
根据本说明书的第一方面,提供一种基于弱监督学习的域自适应实例分割方法,该方法包括以下步骤:According to a first aspect of the present specification, a domain adaptive instance segmentation method based on weakly supervised learning is provided, the method comprising the following steps:
(1)在源域上训练初始实例分割模型,输出骨干网络特征和语义分数张量,所述语义分数张量包括每个像素所属不同实例的概率;(1) Train an initial instance segmentation model on the source domain, output backbone network features and a semantic score tensor, where the semantic score tensor includes the probability that each pixel belongs to a different instance;
(2)在目标域上利用源域训练得到的初始实例分割模型进行实例分割,输出每张图像对应的骨干网络特征和语义分数张量;(2) Perform instance segmentation on the target domain using the initial instance segmentation model trained in the source domain, and output the backbone network features and semantic score tensor corresponding to each image;
(3)在步骤(2)得到的语义分数张量的实例维度上取最大值,得到目标域每张图像的实例分割掩码;将目标域每张图像的实例分割掩码分别与目标域骨干网络特征、目标域语义分数张量相乘,得到目标域每个实例的掩码特征和掩码语义分数张量;(3) Taking the maximum value on the instance dimension of the semantic score tensor obtained in step (2) to obtain the instance segmentation mask of each image in the target domain; multiplying the instance segmentation mask of each image in the target domain with the target domain backbone network features and the target domain semantic score tensor respectively to obtain the mask features and mask semantic score tensor of each instance in the target domain;
(4)将步骤(3)得到的实例t的掩码特征ft和掩码语义分数张量st进行拼接,得到实例t的增强掩码特征ft +;(4) Concatenate the mask feature f t of instance t obtained in step (3) and the mask semantic score tensor s t to obtain the enhanced mask feature f t + of instance t;
(5)使用层次凝聚聚类(Hierarchical Agglomerative Clustering,HAC)构建每个类别对应的语义树,将属于该类别的每个实例的增强掩码特征视为一个叶节点,每次凝聚选择实例间增强掩码特征的欧式距离最小的两个子节点进行合并得到合并节点,子节点包括叶节点和中间节点,所述合并节点的增强掩码特征和掩码语义分数张量分别是子节点对应增强掩码特征和掩码语义分数张量的线性组合;(5) Hierarchical Agglomerative Clustering (HAC) is used to construct a semantic tree corresponding to each category. The enhanced mask feature of each instance belonging to the category is regarded as a leaf node. Each time the two child nodes with the smallest Euclidean distance between the enhanced mask features of the instances are selected for merging to obtain a merged node. The child nodes include leaf nodes and intermediate nodes. The enhanced mask feature and mask semantic score tensor of the merged node are respectively linear combinations of the enhanced mask features and mask semantic score tensors corresponding to the child nodes.
(6)对于每个语义树,基于设定的抽样率对语义树的叶节点进行抽样,快速判断实例分割掩码是否准确,标注判断结果;(6) For each semantic tree, sample the leaf nodes of the semantic tree based on the set sampling rate, quickly determine whether the instance segmentation mask is accurate, and mark the judgment result;
(7)将类别k对应的语义树上的所有抽样实例的标注结果的统计值例如均值与设定阈值比较:如果统计值大于阈值说明类别k的预测准确,不准确的抽样实例会利用准确的抽样实例进行掩码修正;如果统计值小于等于阈值说明类别k的预测不准确,将对应的语义树拆分为两个子树,每个子树重新抽样实例计算标注结果的统计值,再与设计阈值进行比较,重复拆分-比较过程直到子树不可拆分或子树不包含任何准确的抽样实例;(7) Compare the statistical value of the annotation results of all sampled instances on the semantic tree corresponding to category k, such as the mean, with the set threshold: if the statistical value is greater than the threshold, it means that the prediction of category k is accurate, and the inaccurate sampled instances will be masked and corrected using the accurate sampled instances; if the statistical value is less than or equal to the threshold, it means that the prediction of category k is inaccurate, and the corresponding semantic tree is split into two subtrees, and each subtree resamples instances to calculate the statistical value of the annotation results, and then compares it with the designed threshold, and repeats the split-comparison process until the subtree cannot be split or the subtree does not contain any accurate sampled instances;
(8)根据目标域掩码修正结果微调初始实例分割模型,从而改善实例分割模型的有效性。(8) Fine-tune the initial instance segmentation model based on the target domain mask correction results to improve the effectiveness of the instance segmentation model.
进一步地,步骤(5)具体为:Furthermore, step (5) is specifically as follows:
将实例t对应的子节点和实例o对应的子节点进行合并得到合并节点nj,合并节点nj的增强掩码特征fj +和掩码语义分数张量sj是子节点对应增强掩码特征和掩码语义分数张量的线性组合:The child nodes corresponding to instance t and the child nodes corresponding to instance o are merged to obtain the merged node n j . The enhanced mask feature f j + and mask semantic score tensor s j of the merged node n j are linear combinations of the enhanced mask features and mask semantic score tensors corresponding to the child nodes:
sj=wtst+woso s j = w t s t + w o s o
其中,权重wt和wo与子节点的大小相关:Among them, the weights w t and w o are related to the size of the child nodes:
其中,Pt和Po分别为相应子节点包含的实例个数;对于叶子节点,wt=w0=1/2;Where Pt and P0 are the number of instances contained in the corresponding child nodes; for leaf nodes, wt = w0 = 1/2;
通过多次凝聚合并节点,最终构建成每个类别对应的语义树;语义树根节点表示为n0,其余中间节点表示为其中Jk是类别k的中间节点数。By merging nodes multiple times, a semantic tree corresponding to each category is finally constructed; the root node of the semantic tree is represented as n 0 , and the remaining intermediate nodes are represented as where Jk is the number of intermediate nodes of category k.
进一步地,步骤(7)中,类别k对应的语义树上的所有抽样实例的标注结果的统计值Qk的计算公式为:Furthermore, in step (7), the calculation formula of the statistical value Q k of the annotation results of all sampled instances on the semantic tree corresponding to category k is:
其中,N是对于类别k的语义树进行抽样的实例个数,实例的编号为k1,…,kN,lt是步骤(6)中实例分割掩码的判断结果。Where N is the number of instances sampled from the semantic tree of category k, the instances are numbered k 1 ,…,k N , and l t is the judgment result of the instance segmentation mask in step (6).
进一步地,所述初始实例分割模型的骨干网络由swin transformer构成;在源域,训练数据集包括图像集{Xsource}和相应实例掩码图像集{Ysource};在目标域,测试数据集仅包括图像集{Xtatget}。Furthermore, the backbone network of the initial instance segmentation model is composed of a swin transformer; in the source domain, the training data set includes an image set {X source } and a corresponding instance mask image set {Y source }; in the target domain, the test data set only includes an image set {X tatget }.
进一步地,所述初始实例分割模型在训练阶段进行数据增强,数据增强包括水平/垂直翻转、平移和尺度伸缩;初始实例分割模型使用AdamW优化器进行训练,初始学习率为0.001,遵循多项式衰减策略,权重衰减为0.0001,实验中的批量大小为4。Furthermore, the initial instance segmentation model performs data enhancement during the training phase, and the data enhancement includes horizontal/vertical flipping, translation, and scale scaling. The initial instance segmentation model is trained using the AdamW optimizer with an initial learning rate of 0.001, following a polynomial decay strategy, a weight decay of 0.0001, and a batch size of 4 in the experiment.
进一步地,步骤(3)中,目标域实例t的掩码特征ft和掩码语义分数张量st的具体表现为:Furthermore, in step (3), the mask feature f t and the mask semantic score tensor s t of the target domain instance t are specifically expressed as:
掩码特征ft为目标域骨干网络特征乘以目标域实例t的实例分割掩码,掩码语义分数张量st为目标域语义分数张量乘以目标域实例t的实例分割掩码,st∈RW×H×K,K是图像所包含的实例个数,W×H为图像尺寸。The mask feature f t is the target domain backbone network feature multiplied by the instance segmentation mask of the target domain instance t, and the mask semantic score tensor s t is the target domain semantic score tensor multiplied by the instance segmentation mask of the target domain instance t, s t ∈ R W×H×K , K is the number of instances contained in the image, and W×H is the image size.
进一步地,步骤(6)中,对语义树的叶节点进行抽样,具体为:Furthermore, in step (6), the leaf nodes of the semantic tree are sampled, specifically:
对于类别k构建的语义树Tk,基于设定的抽样率R随机选择N个实例分割掩码标注者快速判断选择的实例分割掩码是否准确,如果预测的实例分割掩码mt准确则用1标注,不准确用0标注。For the semantic tree T k constructed for category k, N instance segmentation masks are randomly selected based on the set sampling rate R The annotator quickly determines whether the selected instance segmentation mask is accurate. If the predicted instance segmentation mask mt is accurate, it is marked with 1, and if it is inaccurate, it is marked with 0.
进一步地,本发明在Pascal VOC 2012数据集和COCO数据集上实施,Pascal VOC2012数据集由1464张训练图像和1449张验证图像组成,并带有源分割结果,Pascal VOC2012数据集中有20个类别,评估预测分割掩码质量的指标是平均交并比(meanIntersection over Union,mIoU);COCO数据集中有80个类,评价标准是预测框平均精度APbox和掩码平均精度APmask。Furthermore, the present invention is implemented on the Pascal VOC 2012 dataset and the COCO dataset. The Pascal VOC2012 dataset consists of 1464 training images and 1449 verification images with source segmentation results. There are 20 categories in the Pascal VOC2012 dataset, and the indicator for evaluating the quality of the predicted segmentation mask is the mean Intersection over Union (mIoU); there are 80 categories in the COCO dataset, and the evaluation criteria are the average precision of the predicted box AP box and the average precision of the mask AP mask .
根据本说明书的第二方面,提供一种基于弱监督学习的域自适应实例分割装置,包括存储器和一个或多个处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,用于实现如第一方面所述的基于弱监督学习的域自适应实例分割方法。According to a second aspect of the present specification, a domain adaptive instance segmentation device based on weakly supervised learning is provided, comprising a memory and one or more processors, wherein the memory stores an executable code, and when the processor executes the executable code, it is used to implement the domain adaptive instance segmentation method based on weakly supervised learning as described in the first aspect.
根据本说明书的第三方面,提供一种计算机可读存储介质,其上存储有程序,其特征在于,该程序被处理器执行时,实现如第一方面所述的基于弱监督学习的域自适应实例分割方法。According to a third aspect of this specification, a computer-readable storage medium is provided, on which a program is stored, characterized in that when the program is executed by a processor, the domain adaptive instance segmentation method based on weakly supervised learning as described in the first aspect is implemented.
相比于现有技术,本发明具有的有益效果为:Compared with the prior art, the present invention has the following beneficial effects:
1.在源域上训练初始实例分割模型,在目标域上利用源域训练得到的初始实例分割模型进行实例分割,输出目标域每个实例的掩码特征和掩码语义分数张量,利用层次凝聚聚类方法构建语义树,分层探索预测图像之间的外观和语义相似度。1. Train the initial instance segmentation model in the source domain, and use it to perform instance segmentation in the target domain. Output the mask features and mask semantic score tensor of each instance in the target domain, build a semantic tree using the hierarchical agglomerative clustering method, and explore the appearance and semantic similarity between the predicted images in a hierarchical manner.
2.对语义树的叶节点进行抽样,快速判断实例分割掩码是否准确,利用准确样本对不准确样本进行掩码修正,根据掩码修正结果微调初始实例分割模型,解决了在域自适应方面,虽然可以通过引入来自目标域数据集的监督信号来改进分割模型,但人工标注繁琐耗时,并且自训练在伪标签中包含太多噪声的问题。2. Sample the leaf nodes of the semantic tree to quickly determine whether the instance segmentation mask is accurate, use accurate samples to mask correct inaccurate samples, and fine-tune the initial instance segmentation model according to the mask correction results. This solves the problem that in terms of domain adaptation, although the segmentation model can be improved by introducing supervisory signals from the target domain dataset, manual labeling is cumbersome and time-consuming, and self-training contains too much noise in the pseudo-labels.
3.在Pascal VOC 2012数据集和COCO数据集的实验结果表明,与其他先进方法相比,本发明花费了有限的人力资源进行标签验证,并获得了接近监督学习方法的有效性,具有可观的竞争力。3. Experimental results on the Pascal VOC 2012 dataset and the COCO dataset show that compared with other advanced methods, the present invention spends limited human resources for label verification and achieves effectiveness close to that of supervised learning methods, which is quite competitive.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.
图1是本发明实施例提供的域自适应问题的说明示意图;FIG1 is a schematic diagram illustrating a domain adaptation problem provided by an embodiment of the present invention;
图2是本发明实施例提供的基于弱监督学习的域自适应实例分割的整体框架示意图;FIG2 is a schematic diagram of an overall framework of domain adaptive instance segmentation based on weakly supervised learning provided by an embodiment of the present invention;
图3是本发明实施例提供的语义树构建示意图;FIG3 is a schematic diagram of constructing a semantic tree according to an embodiment of the present invention;
图4是本发明实施例在Pascal VOC 2012数据集上的分段输出示意图;FIG4 is a schematic diagram of segmented output of an embodiment of the present invention on the Pascal VOC 2012 dataset;
图5是本发明实施例在COCO数据集上的分段输出示意图;FIG5 is a schematic diagram of segmented output of an embodiment of the present invention on the COCO dataset;
图6是本发明实施例提供的基于弱监督学习的域自适应实例分割装置结构框图。FIG6 is a structural block diagram of a domain adaptive instance segmentation device based on weakly supervised learning provided by an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明的具体实施方式做详细的说明。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, the specific embodiments of the present invention are described in detail below with reference to the accompanying drawings.
在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是本发明还可以采用其他不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本发明内涵的情况下做类似推广,因此本发明不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention, but the present invention may also be implemented in other ways different from those described herein, and those skilled in the art may make similar generalizations without violating the connotation of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below.
图1为域自适应问题示意,本发明提供一种基于弱监督学习的域自适应实例分割方法,解决了在域自适应方面,虽然可以通过引入来自目标域数据集的监督信号来改进分割模型,但人工标注繁琐耗时,并且自训练在伪标签中包含太多噪声的问题。FIG1 is a schematic diagram of the domain adaptation problem. The present invention provides a domain adaptation instance segmentation method based on weakly supervised learning, which solves the problem that, in terms of domain adaptation, although the segmentation model can be improved by introducing supervisory signals from the target domain dataset, manual labeling is cumbersome and time-consuming, and self-training contains too much noise in the pseudo labels.
本发明实施例提供的一种基于弱监督学习的域自适应实例分割方法,如图2所示,该方法包括以下步骤:An embodiment of the present invention provides a domain adaptive instance segmentation method based on weakly supervised learning, as shown in FIG2 , and the method includes the following steps:
1.源域训练初始分割模型1. Train the initial segmentation model in the source domain
在源域上训练初始实例分割模型,训练数据集包括图像集{Xsource}和相应实例掩码图像集{Ysource},输出骨干网络特征和语义分数张量,所述语义分数张量包括每个像素所属不同实例的概率。An initial instance segmentation model is trained on the source domain, the training dataset includes an image set {X source } and a corresponding instance mask image set {Y source }, and the backbone network features and a semantic score tensor are output. The semantic score tensor includes the probability that each pixel belongs to a different instance.
在训练阶段,数据增强包括水平/垂直翻转、平移和尺度方差。During the training phase, data augmentation includes horizontal/vertical flipping, translation, and scale variance.
初始实例分割模型的骨干网络由swin transformer构成。初始实例分割模型使用AdamW优化器进行训练,初始学习率为0.001,遵循多项式衰减策略,权重衰减为0.0001,实验中的批量大小为4。The backbone network of the initial instance segmentation model is composed of swin transformer. The initial instance segmentation model is trained using the AdamW optimizer with an initial learning rate of 0.001, following a polynomial decay strategy, a weight decay of 0.0001, and a batch size of 4 in the experiment.
2.目标域应用初始分割模型2. Apply the initial segmentation model to the target domain
在目标域上利用源域训练得到的初始实例分割模型进行实例分割,测试数据集仅包括图像集{Xtatget},输出每张图像对应的骨干网络特征和语义分数张量。The initial instance segmentation model trained in the source domain is used to perform instance segmentation on the target domain. The test dataset only includes the image set {X tatget }, and the backbone network features and semantic score tensor corresponding to each image are output.
3.提取目标域特征3. Extract target domain features
在步骤2得到的语义分数张量的实例维度上取最大值,得到目标域每张图像的实例分割掩码;将目标域每张图像的实例分割掩码分别与目标域骨干网络特征、目标域语义分数张量相乘,得到目标域每个实例的掩码特征和掩码语义分数张量;Take the maximum value on the instance dimension of the semantic score tensor obtained in step 2 to obtain the instance segmentation mask of each image in the target domain; multiply the instance segmentation mask of each image in the target domain with the target domain backbone network feature and the target domain semantic score tensor respectively to obtain the mask feature and mask semantic score tensor of each instance in the target domain;
目标域实例t的掩码特征ft和掩码语义分数张量st具体为:掩码特征ft为目标域骨干网络特征乘以目标域实例t的实例分割掩码,掩码语义分数张量st为目标域语义分数张量乘以目标域实例t的实例分割掩码,st∈RW×H×K,K是图像所包含的实例个数,W×H为图像尺寸。The mask feature f t and mask semantic score tensor s t of the target domain instance t are specifically: the mask feature f t is the target domain backbone network feature multiplied by the instance segmentation mask of the target domain instance t, and the mask semantic score tensor s t is the target domain semantic score tensor multiplied by the instance segmentation mask of the target domain instance t, s t ∈R W×H×K , K is the number of instances contained in the image, and W×H is the image size.
4.拼接特征4. Splicing features
将步骤3得到的实例t的掩码特征ft和掩码语义分数张量st进行拼接,得到实例t的增强掩码特征ft +。Concatenate the mask feature f t of instance t obtained in step 3 and the mask semantic score tensor s t to obtain the enhanced mask feature f t + of instance t.
5.构建语义树5. Constructing a semantic tree
如图3所示,使用层次凝聚聚类构建每个类别对应的语义树,将属于该类别的每个实例的增强掩码特征视为一个叶节点,每次凝聚选择实例间增强掩码特征的欧式距离最小的两个子节点进行合并得到合并节点,所述合并节点的增强掩码特征和掩码语义分数张量分别是子节点对应增强掩码特征和掩码语义分数张量的线性组合;具体为:As shown in Figure 3, hierarchical agglomerative clustering is used to construct the semantic tree corresponding to each category. The enhanced mask feature of each instance belonging to the category is regarded as a leaf node. Each time the two child nodes with the smallest Euclidean distance between the enhanced mask features of the instances are selected for merging to obtain a merged node. The enhanced mask feature and mask semantic score tensor of the merged node are respectively the linear combination of the enhanced mask feature and mask semantic score tensor corresponding to the child nodes; specifically:
将实例t对应的子节点和实例o对应的子节点进行合并得到合并节点nj,合并节点nj的增强掩码特征fj +和掩码语义分数张量sj是子节点对应增强掩码特征和掩码语义分数张量的线性组合:The child nodes corresponding to instance t and the child nodes corresponding to instance o are merged to obtain the merged node n j . The enhanced mask feature f j + and mask semantic score tensor s j of the merged node n j are linear combinations of the enhanced mask features and mask semantic score tensors corresponding to the child nodes:
sj=wtst+woso s j = w t s t + w o s o
其中,权重wt和wo与子节点的大小相关:Among them, the weights w t and w o are related to the size of the child nodes:
其中,Pt和Po分别为相应子节点包含的实例个数;对于叶子节点,wt=wo=1/2;Where Pt and Po are the number of instances contained in the corresponding child nodes; for leaf nodes, wt = wo = 1/2;
通过多次凝聚合并节点,最终构建成每个类别对应的语义树;语义树根节点表示为n0,其余中间节点表示为其中Jk是类别k的中间节点数。By merging nodes multiple times, a semantic tree corresponding to each category is finally constructed; the root node of the semantic tree is represented as n 0 , and the remaining intermediate nodes are represented as where Jk is the number of intermediate nodes of category k.
6.抽样快速判断掩码准确性6. Sampling to quickly determine mask accuracy
对于类别k构建的语义树Tk,基于设定的抽样率R随机选择N个实例分割掩码标注者快速判断选择的实例分割掩码是否准确,如果预测的实例分割掩码mt准确则用1标注,不准确用0标注。For the semantic tree T k constructed for category k, N instance segmentation masks are randomly selected based on the set sampling rate R The annotator quickly determines whether the selected instance segmentation mask is accurate. If the predicted instance segmentation mask mt is accurate, it is marked with 1, and if it is inaccurate, it is marked with 0.
7.掩码修正7. Mask correction
将类别k对应的语义树上的所有抽样实例的标注结果的统计值与设定阈值比较:如果统计值大于阈值说明类别k的预测准确,不准确的抽样实例会利用准确的抽样实例进行掩码修正;如果统计值小于等于阈值说明类别k的预测不准确,将对应的语义树拆分为两个子树,每个子树重新抽样实例计算标注结果的统计值,再与设计阈值进行比较,重复拆分-比较过程直到子树不可拆分或子树不包含任何准确的抽样实例;Compare the statistical value of the annotation results of all sampled instances on the semantic tree corresponding to category k with the set threshold: if the statistical value is greater than the threshold, it means that the prediction of category k is accurate, and the inaccurate sampled instances will be masked and corrected using the accurate sampled instances; if the statistical value is less than or equal to the threshold, it means that the prediction of category k is inaccurate, and the corresponding semantic tree is split into two subtrees, and each subtree resamples instances to calculate the statistical value of the annotation results, and then compares it with the designed threshold, and repeats the split-comparison process until the subtree cannot be split or the subtree does not contain any accurate sampled instances;
类别k对应的语义树上的所有抽样实例的标注结果的统计值Qk的计算公式为:The calculation formula for the statistical value Q k of the annotation results of all sampled instances on the semantic tree corresponding to category k is:
其中,N是对于类别k的语义树进行抽样的实例个数,实例的编号为k1,…,N,t是步骤6中实例分割掩码的判断结果。Wherein, N is the number of instances sampled for the semantic tree of category k, the instances are numbered as k 1 ,…, N , and t is the judgment result of the instance segmentation mask in step 6.
8.微调初始实例分割模型8. Fine-tune the initial instance segmentation model
根据目标域掩码修正结果微调初始实例分割模型,从而改善实例分割模型的有效性。The initial instance segmentation model is fine-tuned according to the target domain mask correction results to improve the effectiveness of the instance segmentation model.
本发明在Pascal VOC 2012数据集和COCO数据集上实施,Pascal VOC 2012数据集由1464张训练图像和1449张验证图像组成,并带有源分割结果,Pascal VOC 2012数据集中有20个类别,评估预测分割掩码质量的指标是平均交并比(mean Intersection overUnion,mIoU);COCO数据集中有80个类,评价标准是预测框平均精度APbox和掩码平均精度APmask。The present invention is implemented on the Pascal VOC 2012 dataset and the COCO dataset. The Pascal VOC 2012 dataset consists of 1464 training images and 1449 verification images with source segmentation results. There are 20 categories in the Pascal VOC 2012 dataset, and the indicator for evaluating the quality of the predicted segmentation mask is the mean Intersection over Union (mIoU); there are 80 categories in the COCO dataset, and the evaluation criteria are the average precision of the predicted box AP box and the average precision of the mask AP mask .
申请人在Pascal VOC 2012数据集和COCO数据集的实验结果如图4和图5所示,实验结果表明,与其他先进方法相比,本发明花费了有限的人力资源进行标签验证,并获得了接近监督学习方法的有效性,具有可观的竞争力。The applicant's experimental results on the Pascal VOC 2012 dataset and the COCO dataset are shown in Figures 4 and 5. The experimental results show that compared with other advanced methods, the present invention spends limited human resources on label verification and achieves effectiveness close to that of supervised learning methods, which is quite competitive.
与前述基于弱监督学习的域自适应实例分割方法的实施例相对应,本发明还提供了基于弱监督学习的域自适应实例分割装置的实施例。Corresponding to the above-mentioned embodiment of the domain adaptive instance segmentation method based on weakly supervised learning, the present invention also provides an embodiment of a domain adaptive instance segmentation device based on weakly supervised learning.
参见图6,本发明实施例提供的一种基于弱监督学习的域自适应实例分割装置,包括存储器和一个或多个处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,用于实现上述实施例中的基于弱监督学习的域自适应实例分割方法。Referring to FIG. 6 , an embodiment of the present invention provides a domain adaptive instance segmentation device based on weakly supervised learning, comprising a memory and one or more processors, wherein the memory stores executable code, and when the processor executes the executable code, it is used to implement the domain adaptive instance segmentation method based on weakly supervised learning in the above embodiment.
本发明基于弱监督学习的域自适应实例分割装置的实施例可以应用在任意具备数据处理能力的设备上,该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图6所示,为本发明基于弱监督学习的域自适应实例分割装置所在任意具备数据处理能力的设备的一种硬件结构图,除了图6所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能,还可以包括其他硬件,对此不再赘述。The embodiment of the domain adaptive instance segmentation device based on weakly supervised learning of the present invention can be applied to any device with data processing capabilities, and the any device with data processing capabilities can be a device or apparatus such as a computer. The device embodiment can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by the processor of any device with data processing capabilities in which it is located to read the corresponding computer program instructions in the non-volatile memory into the memory and run it. From the hardware level, as shown in Figure 6, it is a hardware structure diagram of any device with data processing capabilities where the domain adaptive instance segmentation device based on weakly supervised learning of the present invention is located. In addition to the processor, memory, network interface, and non-volatile memory shown in Figure 6, any device with data processing capabilities where the device in the embodiment is located can also include other hardware according to the actual function of the device with data processing capabilities, which will not be repeated.
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。The implementation process of the functions and effects of each unit in the above-mentioned device is specifically described in the implementation process of the corresponding steps in the above-mentioned method, and will not be repeated here.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant parts can refer to the partial description of the method embodiment. The device embodiment described above is only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of the present invention. Ordinary technicians in this field can understand and implement it without paying creative work.
本发明实施例还提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,实现上述实施例中的基于弱监督学习的域自适应实例分割方法。An embodiment of the present invention further provides a computer-readable storage medium having a program stored thereon. When the program is executed by a processor, the domain adaptive instance segmentation method based on weakly supervised learning in the above embodiment is implemented.
所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元,例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备,例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、SD卡、闪存卡(Flash Card)等。进一步的,所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据,还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capability described in any of the aforementioned embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of any device with data processing capability, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card, etc. equipped on the device. Furthermore, the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capability. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capability, and may also be used to temporarily store data that has been output or is to be output.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The above is a description of a specific embodiment of the specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
在本说明书一个或多个实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of this specification are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of this specification. The singular forms of "one", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items.
应当理解,尽管在本说明书一个或多个实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used to describe various information in one or more embodiments of this specification, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of this specification, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".
以上所述仅为本说明书一个或多个实施例的较佳实施例而已,并不用以限制本说明书一个或多个实施例,凡在本说明书一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例保护的范围之内。The above description is merely a preferred embodiment of one or more embodiments of the present specification and is not intended to limit one or more embodiments of the present specification. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of one or more embodiments of the present specification shall be included in the scope of protection of one or more embodiments of the present specification.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210236149.4A CN114612663B (en) | 2022-03-11 | 2022-03-11 | Domain self-adaptive instance segmentation method and device based on weak supervision learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210236149.4A CN114612663B (en) | 2022-03-11 | 2022-03-11 | Domain self-adaptive instance segmentation method and device based on weak supervision learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114612663A CN114612663A (en) | 2022-06-10 |
CN114612663B true CN114612663B (en) | 2024-09-13 |
Family
ID=81863866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210236149.4A Active CN114612663B (en) | 2022-03-11 | 2022-03-11 | Domain self-adaptive instance segmentation method and device based on weak supervision learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114612663B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115578564B (en) * | 2022-10-25 | 2023-05-23 | 北京医准智能科技有限公司 | Training method and device for instance segmentation model, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112116599A (en) * | 2020-08-12 | 2020-12-22 | 南京理工大学 | Method and system for semantic segmentation of Mycobacterium tuberculosis in sputum smear based on weakly supervised learning |
AU2020103905A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784424B (en) * | 2019-03-26 | 2021-02-09 | 腾讯科技(深圳)有限公司 | An image classification model training method, image processing method and device |
US20210150281A1 (en) * | 2019-11-14 | 2021-05-20 | Nec Laboratories America, Inc. | Domain adaptation for semantic segmentation via exploiting weak labels |
CN112699892B (en) * | 2021-01-08 | 2024-11-08 | 北京工业大学 | An unsupervised domain adaptive semantic segmentation method |
-
2022
- 2022-03-11 CN CN202210236149.4A patent/CN114612663B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112116599A (en) * | 2020-08-12 | 2020-12-22 | 南京理工大学 | Method and system for semantic segmentation of Mycobacterium tuberculosis in sputum smear based on weakly supervised learning |
AU2020103905A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning |
Also Published As
Publication number | Publication date |
---|---|
CN114612663A (en) | 2022-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112115995A (en) | A semi-supervised learning based image multi-label classification method | |
WO2023184918A1 (en) | Image anomaly detection method, apparatus and system, and readable storage medium | |
CN109684476B (en) | Text classification method, text classification device and terminal equipment | |
CN108959305A (en) | A kind of event extraction method and system based on internet big data | |
Li et al. | Fuzzy based affinity learning for spectral clustering | |
Vo et al. | Active learning strategies for weakly-supervised object detection | |
CN110516098A (en) | An Image Annotation Method Based on Convolutional Neural Network and Binary Coded Features | |
CN114297388B (en) | A text keyword extraction method | |
CN115080748B (en) | Weak supervision text classification method and device based on learning with noise label | |
CN112101031A (en) | Entity identification method, terminal equipment and storage medium | |
CN110175657B (en) | A kind of image multi-label marking method, device, device and readable storage medium | |
CN113535947A (en) | Multi-label classification method and device for incomplete data with missing labels | |
CN111832435A (en) | Beautiful prediction method, device and storage medium based on migration and weak supervision | |
CN111291807A (en) | A fine-grained image classification method, device and storage medium | |
CN114612663B (en) | Domain self-adaptive instance segmentation method and device based on weak supervision learning | |
Qin | Application of efficient recognition algorithm based on deep neural network in English teaching scene | |
CN114004233B (en) | Remote supervision named entity recognition method based on semi-training and sentence selection | |
CN115965874A (en) | A crop disease identification method, system, device and storage medium | |
CN113591479B (en) | Named entity recognition method, device and computer equipment for power metering | |
CN118397250A (en) | A generative zero-shot object detection method and system based on distilled CLIP model | |
CN113159049A (en) | Training method and device of weak supervision semantic segmentation model, storage medium and terminal | |
CN113191263B (en) | Video description method and device | |
CN110968685B (en) | Commodity name collection method and device | |
CN114385831A (en) | Knowledge graph relation prediction method based on feature extraction | |
CN112926697A (en) | Abrasive particle image classification method and device based on semantic segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |