CN114612663B

CN114612663B - Domain self-adaptive instance segmentation method and device based on weak supervision learning

Info

Publication number: CN114612663B
Application number: CN202210236149.4A
Authority: CN
Inventors: 来锴楠; 田彦
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2024-09-13
Anticipated expiration: 2042-03-11
Also published as: CN114612663A

Abstract

The invention discloses a field self-adaptive instance segmentation method and device based on weak supervision learning, which comprises the steps of firstly training an initial instance segmentation model on a source field, outputting backbone network characteristics and semantic score tensors, constructing a semantic tree by using hierarchical aggregation clustering, sampling leaf nodes of the semantic tree, rapidly judging whether an instance segmentation mask is accurate, carrying out mask correction on an instance with inaccurate prediction according to labeling information, and finely adjusting the initial instance segmentation model according to a target field mask correction result, thereby improving the effectiveness of the instance segmentation model. According to the method, the accurate samples are rapidly judged by using limited verification signals, the adaptability of the initial instance segmentation model is improved by spreading the accurate samples, partial noise in the inaccurate samples is processed, and the problems that in the aspect of domain self-adaption, the segmentation model is improved by introducing a supervision signal from a target domain data set, manual labeling is tedious and time-consuming, and self-training contains too much noise in a pseudo tag are solved.

Description

Domain adaptive instance segmentation method and device based on weakly supervised learning

技术领域Technical Field

本发明涉及实例分割技术领域，具体涉及一种基于弱监督学习的域自适应实例分割方法及装置。The present invention relates to the technical field of instance segmentation, and in particular to a domain adaptive instance segmentation method and device based on weakly supervised learning.

背景技术Background Art

在过去十年中，实例分割一直是研究和工程的活跃领域。它被用于多个信号处理领域，例如图像编辑、场景理解、自动驾驶和人机交互。随着深度卷积神经网络(DeepConvolutional Neural Network，DCNN)的快速发展，当前的实例分割方法获得了令人满意的效果和效率。然而，当涉及到域自适应，从源域学习的分割模型应用于目标域时，由于源域和目标域之间存在数据漂移(Data drift)，它们遇到了性能迅速下降的尴尬。目标域上大量图像像素级的标注需要极大的时间成本，而无监督方法最小化源域的任务特定损失和域对抗性损失，受到源域和目标域之间分布重叠的限制。另外一些方法采用自训练策略，通过使用目标特定的伪标签来微调分割模型，由于伪标签中的噪声或引入了强假设，只能获得有限的改进。Instance segmentation has been an active area of research and engineering over the past decade. It is used in multiple signal processing fields, such as image editing, scene understanding, autonomous driving, and human-computer interaction. With the rapid development of deep convolutional neural networks (DCNNs), current instance segmentation methods have achieved satisfactory results and efficiency. However, when it comes to domain adaptation, when segmentation models learned from the source domain are applied to the target domain, they encounter the embarrassment of rapidly degrading performance due to the data drift between the source and target domains. Pixel-level annotation of a large number of images on the target domain requires a huge time cost, while unsupervised methods minimize the task-specific loss and domain adversarial loss of the source domain, which is limited by the distribution overlap between the source and target domains. Other methods adopt a self-training strategy to fine-tune the segmentation model by using target-specific pseudo-labels, which can only achieve limited improvements due to the noise in the pseudo-labels or the introduction of strong assumptions.

发明内容Summary of the invention

本发明的目的在于针对现有技术的不足，提供一种基于弱监督学习的域自适应实例分割方法及装置。The purpose of the present invention is to provide a domain adaptive instance segmentation method and device based on weakly supervised learning in view of the deficiencies in the prior art.

本发明的目的是通过以下技术方案来实现的：The objective of the present invention is achieved through the following technical solutions:

根据本说明书的第一方面，提供一种基于弱监督学习的域自适应实例分割方法，该方法包括以下步骤：According to a first aspect of the present specification, a domain adaptive instance segmentation method based on weakly supervised learning is provided, the method comprising the following steps:

(1)在源域上训练初始实例分割模型，输出骨干网络特征和语义分数张量，所述语义分数张量包括每个像素所属不同实例的概率；(1) Train an initial instance segmentation model on the source domain, output backbone network features and a semantic score tensor, where the semantic score tensor includes the probability that each pixel belongs to a different instance;

(2)在目标域上利用源域训练得到的初始实例分割模型进行实例分割，输出每张图像对应的骨干网络特征和语义分数张量；(2) Perform instance segmentation on the target domain using the initial instance segmentation model trained in the source domain, and output the backbone network features and semantic score tensor corresponding to each image;

(3)在步骤(2)得到的语义分数张量的实例维度上取最大值，得到目标域每张图像的实例分割掩码；将目标域每张图像的实例分割掩码分别与目标域骨干网络特征、目标域语义分数张量相乘，得到目标域每个实例的掩码特征和掩码语义分数张量；(3) Taking the maximum value on the instance dimension of the semantic score tensor obtained in step (2) to obtain the instance segmentation mask of each image in the target domain; multiplying the instance segmentation mask of each image in the target domain with the target domain backbone network features and the target domain semantic score tensor respectively to obtain the mask features and mask semantic score tensor of each instance in the target domain;

(4)将步骤(3)得到的实例t的掩码特征f_t和掩码语义分数张量s_t进行拼接，得到实例t的增强掩码特征f_t ⁺；(4) Concatenate the mask feature f t of instance _t obtained in step (3) and the mask semantic score tensor s _t to obtain the enhanced mask feature f _t ⁺ of instance t;

(5)使用层次凝聚聚类(Hierarchical Agglomerative Clustering，HAC)构建每个类别对应的语义树，将属于该类别的每个实例的增强掩码特征视为一个叶节点，每次凝聚选择实例间增强掩码特征的欧式距离最小的两个子节点进行合并得到合并节点，子节点包括叶节点和中间节点，所述合并节点的增强掩码特征和掩码语义分数张量分别是子节点对应增强掩码特征和掩码语义分数张量的线性组合；(5) Hierarchical Agglomerative Clustering (HAC) is used to construct a semantic tree corresponding to each category. The enhanced mask feature of each instance belonging to the category is regarded as a leaf node. Each time the two child nodes with the smallest Euclidean distance between the enhanced mask features of the instances are selected for merging to obtain a merged node. The child nodes include leaf nodes and intermediate nodes. The enhanced mask feature and mask semantic score tensor of the merged node are respectively linear combinations of the enhanced mask features and mask semantic score tensors corresponding to the child nodes.

(6)对于每个语义树，基于设定的抽样率对语义树的叶节点进行抽样，快速判断实例分割掩码是否准确，标注判断结果；(6) For each semantic tree, sample the leaf nodes of the semantic tree based on the set sampling rate, quickly determine whether the instance segmentation mask is accurate, and mark the judgment result;

(7)将类别k对应的语义树上的所有抽样实例的标注结果的统计值例如均值与设定阈值比较：如果统计值大于阈值说明类别k的预测准确，不准确的抽样实例会利用准确的抽样实例进行掩码修正；如果统计值小于等于阈值说明类别k的预测不准确，将对应的语义树拆分为两个子树，每个子树重新抽样实例计算标注结果的统计值，再与设计阈值进行比较，重复拆分-比较过程直到子树不可拆分或子树不包含任何准确的抽样实例；(7) Compare the statistical value of the annotation results of all sampled instances on the semantic tree corresponding to category k, such as the mean, with the set threshold: if the statistical value is greater than the threshold, it means that the prediction of category k is accurate, and the inaccurate sampled instances will be masked and corrected using the accurate sampled instances; if the statistical value is less than or equal to the threshold, it means that the prediction of category k is inaccurate, and the corresponding semantic tree is split into two subtrees, and each subtree resamples instances to calculate the statistical value of the annotation results, and then compares it with the designed threshold, and repeats the split-comparison process until the subtree cannot be split or the subtree does not contain any accurate sampled instances;

(8)根据目标域掩码修正结果微调初始实例分割模型，从而改善实例分割模型的有效性。(8) Fine-tune the initial instance segmentation model based on the target domain mask correction results to improve the effectiveness of the instance segmentation model.

进一步地，步骤(5)具体为：Furthermore, step (5) is specifically as follows:

将实例t对应的子节点和实例o对应的子节点进行合并得到合并节点n_j，合并节点n_j的增强掩码特征f_j ⁺和掩码语义分数张量s_j是子节点对应增强掩码特征和掩码语义分数张量的线性组合：The child nodes corresponding to instance t and the child nodes corresponding to instance o are merged to obtain the merged node n _j . The enhanced mask feature f _j ⁺ and mask semantic score tensor s _j of the merged node n _j are linear combinations of the enhanced mask features and mask semantic score tensors corresponding to the child nodes:

s_j＝w_ts_t+w_os_o s _j = w _t s _t + w _o s _o

其中，权重w_t和w_o与子节点的大小相关：Among them, the weights w _t and w _o are related to the size of the child nodes:

其中，P_t和P_o分别为相应子节点包含的实例个数；对于叶子节点，w_t＝w₀＝1/2；Where _Pt and _P0 are the number of instances contained in the corresponding child nodes; for leaf nodes, _wt = _w0 = 1/2;

通过多次凝聚合并节点，最终构建成每个类别对应的语义树；语义树根节点表示为n₀，其余中间节点表示为其中J_k是类别k的中间节点数。By merging nodes multiple times, a semantic tree corresponding to each category is finally constructed; the root node of the semantic tree is represented as n ₀ , and the remaining intermediate nodes are represented as where _Jk is the number of intermediate nodes of category k.

进一步地，步骤(7)中，类别k对应的语义树上的所有抽样实例的标注结果的统计值Q_k的计算公式为：Furthermore, in step (7), the calculation formula of the statistical value Q _k of the annotation results of all sampled instances on the semantic tree corresponding to category k is:

其中，N是对于类别k的语义树进行抽样的实例个数，实例的编号为k₁,…,k_N，l_t是步骤(6)中实例分割掩码的判断结果。Where N is the number of instances sampled from the semantic tree of category k, the instances are numbered k ₁ ,…,k _N , and l _t is the judgment result of the instance segmentation mask in step (6).

进一步地，所述初始实例分割模型的骨干网络由swin transformer构成；在源域，训练数据集包括图像集{X_source}和相应实例掩码图像集{Y_source}；在目标域，测试数据集仅包括图像集{X_tatget}。Furthermore, the backbone network of the initial instance segmentation model is composed of a swin transformer; in the source domain, the training data set includes an image set {X _source } and a corresponding instance mask image set {Y _source }; in the target domain, the test data set only includes an image set {X _tatget }.

进一步地，所述初始实例分割模型在训练阶段进行数据增强，数据增强包括水平/垂直翻转、平移和尺度伸缩；初始实例分割模型使用AdamW优化器进行训练，初始学习率为0.001，遵循多项式衰减策略，权重衰减为0.0001，实验中的批量大小为4。Furthermore, the initial instance segmentation model performs data enhancement during the training phase, and the data enhancement includes horizontal/vertical flipping, translation, and scale scaling. The initial instance segmentation model is trained using the AdamW optimizer with an initial learning rate of 0.001, following a polynomial decay strategy, a weight decay of 0.0001, and a batch size of 4 in the experiment.

进一步地，步骤(3)中，目标域实例t的掩码特征f_t和掩码语义分数张量s_t的具体表现为：Furthermore, in step (3), the mask feature f _t and the mask semantic score tensor s _t of the target domain instance t are specifically expressed as:

掩码特征f_t为目标域骨干网络特征乘以目标域实例t的实例分割掩码，掩码语义分数张量s_t为目标域语义分数张量乘以目标域实例t的实例分割掩码，s_t∈R^W×H×K，K是图像所包含的实例个数，W×H为图像尺寸。The mask feature f _t is the target domain backbone network feature multiplied by the instance segmentation mask of the target domain instance t, and the mask semantic score tensor s _t is the target domain semantic score tensor multiplied by the instance segmentation mask of the target domain instance t, s _t ∈ R ^W×H×K , K is the number of instances contained in the image, and W×H is the image size.

进一步地，步骤(6)中，对语义树的叶节点进行抽样，具体为：Furthermore, in step (6), the leaf nodes of the semantic tree are sampled, specifically:

对于类别k构建的语义树T_k，基于设定的抽样率R随机选择N个实例分割掩码标注者快速判断选择的实例分割掩码是否准确，如果预测的实例分割掩码m_t准确则用1标注，不准确用0标注。For the semantic tree T _k constructed for category k, N instance segmentation masks are randomly selected based on the set sampling rate R The annotator quickly determines whether the selected instance segmentation mask is accurate. If the predicted instance segmentation mask _mt is accurate, it is marked with 1, and if it is inaccurate, it is marked with 0.

进一步地，本发明在Pascal VOC 2012数据集和COCO数据集上实施，Pascal VOC2012数据集由1464张训练图像和1449张验证图像组成，并带有源分割结果，Pascal VOC2012数据集中有20个类别，评估预测分割掩码质量的指标是平均交并比(meanIntersection over Union,mIoU)；COCO数据集中有80个类，评价标准是预测框平均精度AP_box和掩码平均精度AP_mask。Furthermore, the present invention is implemented on the Pascal VOC 2012 dataset and the COCO dataset. The Pascal VOC2012 dataset consists of 1464 training images and 1449 verification images with source segmentation results. There are 20 categories in the Pascal VOC2012 dataset, and the indicator for evaluating the quality of the predicted segmentation mask is the mean Intersection over Union (mIoU); there are 80 categories in the COCO dataset, and the evaluation criteria are the average precision of the predicted box AP _box and the average precision of the mask AP _mask .

根据本说明书的第二方面，提供一种基于弱监督学习的域自适应实例分割装置，包括存储器和一个或多个处理器，所述存储器中存储有可执行代码，所述处理器执行所述可执行代码时，用于实现如第一方面所述的基于弱监督学习的域自适应实例分割方法。According to a second aspect of the present specification, a domain adaptive instance segmentation device based on weakly supervised learning is provided, comprising a memory and one or more processors, wherein the memory stores an executable code, and when the processor executes the executable code, it is used to implement the domain adaptive instance segmentation method based on weakly supervised learning as described in the first aspect.

根据本说明书的第三方面，提供一种计算机可读存储介质，其上存储有程序，其特征在于，该程序被处理器执行时，实现如第一方面所述的基于弱监督学习的域自适应实例分割方法。According to a third aspect of this specification, a computer-readable storage medium is provided, on which a program is stored, characterized in that when the program is executed by a processor, the domain adaptive instance segmentation method based on weakly supervised learning as described in the first aspect is implemented.

相比于现有技术，本发明具有的有益效果为：Compared with the prior art, the present invention has the following beneficial effects:

1.在源域上训练初始实例分割模型，在目标域上利用源域训练得到的初始实例分割模型进行实例分割，输出目标域每个实例的掩码特征和掩码语义分数张量，利用层次凝聚聚类方法构建语义树，分层探索预测图像之间的外观和语义相似度。1. Train the initial instance segmentation model in the source domain, and use it to perform instance segmentation in the target domain. Output the mask features and mask semantic score tensor of each instance in the target domain, build a semantic tree using the hierarchical agglomerative clustering method, and explore the appearance and semantic similarity between the predicted images in a hierarchical manner.

2.对语义树的叶节点进行抽样，快速判断实例分割掩码是否准确，利用准确样本对不准确样本进行掩码修正，根据掩码修正结果微调初始实例分割模型，解决了在域自适应方面，虽然可以通过引入来自目标域数据集的监督信号来改进分割模型，但人工标注繁琐耗时，并且自训练在伪标签中包含太多噪声的问题。2. Sample the leaf nodes of the semantic tree to quickly determine whether the instance segmentation mask is accurate, use accurate samples to mask correct inaccurate samples, and fine-tune the initial instance segmentation model according to the mask correction results. This solves the problem that in terms of domain adaptation, although the segmentation model can be improved by introducing supervisory signals from the target domain dataset, manual labeling is cumbersome and time-consuming, and self-training contains too much noise in the pseudo-labels.

3.在Pascal VOC 2012数据集和COCO数据集的实验结果表明，与其他先进方法相比，本发明花费了有限的人力资源进行标签验证，并获得了接近监督学习方法的有效性，具有可观的竞争力。3. Experimental results on the Pascal VOC 2012 dataset and the COCO dataset show that compared with other advanced methods, the present invention spends limited human resources for label verification and achieves effectiveness close to that of supervised learning methods, which is quite competitive.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1是本发明实施例提供的域自适应问题的说明示意图；FIG1 is a schematic diagram illustrating a domain adaptation problem provided by an embodiment of the present invention;

图2是本发明实施例提供的基于弱监督学习的域自适应实例分割的整体框架示意图；FIG2 is a schematic diagram of an overall framework of domain adaptive instance segmentation based on weakly supervised learning provided by an embodiment of the present invention;

图3是本发明实施例提供的语义树构建示意图；FIG3 is a schematic diagram of constructing a semantic tree according to an embodiment of the present invention;

图4是本发明实施例在Pascal VOC 2012数据集上的分段输出示意图；FIG4 is a schematic diagram of segmented output of an embodiment of the present invention on the Pascal VOC 2012 dataset;

图5是本发明实施例在COCO数据集上的分段输出示意图；FIG5 is a schematic diagram of segmented output of an embodiment of the present invention on the COCO dataset;

图6是本发明实施例提供的基于弱监督学习的域自适应实例分割装置结构框图。FIG6 is a structural block diagram of a domain adaptive instance segmentation device based on weakly supervised learning provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图对本发明的具体实施方式做详细的说明。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, the specific embodiments of the present invention are described in detail below with reference to the accompanying drawings.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是本发明还可以采用其他不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似推广，因此本发明不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention, but the present invention may also be implemented in other ways different from those described herein, and those skilled in the art may make similar generalizations without violating the connotation of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below.

图1为域自适应问题示意，本发明提供一种基于弱监督学习的域自适应实例分割方法，解决了在域自适应方面，虽然可以通过引入来自目标域数据集的监督信号来改进分割模型，但人工标注繁琐耗时，并且自训练在伪标签中包含太多噪声的问题。FIG1 is a schematic diagram of the domain adaptation problem. The present invention provides a domain adaptation instance segmentation method based on weakly supervised learning, which solves the problem that, in terms of domain adaptation, although the segmentation model can be improved by introducing supervisory signals from the target domain dataset, manual labeling is cumbersome and time-consuming, and self-training contains too much noise in the pseudo labels.

本发明实施例提供的一种基于弱监督学习的域自适应实例分割方法，如图2所示，该方法包括以下步骤：An embodiment of the present invention provides a domain adaptive instance segmentation method based on weakly supervised learning, as shown in FIG2 , and the method includes the following steps:

1.源域训练初始分割模型1. Train the initial segmentation model in the source domain

在源域上训练初始实例分割模型，训练数据集包括图像集{X_source}和相应实例掩码图像集{Y_source}，输出骨干网络特征和语义分数张量，所述语义分数张量包括每个像素所属不同实例的概率。An initial instance segmentation model is trained on the source domain, the training dataset includes an image set {X _source } and a corresponding instance mask image set {Y _source }, and the backbone network features and a semantic score tensor are output. The semantic score tensor includes the probability that each pixel belongs to a different instance.

在训练阶段，数据增强包括水平/垂直翻转、平移和尺度方差。During the training phase, data augmentation includes horizontal/vertical flipping, translation, and scale variance.

初始实例分割模型的骨干网络由swin transformer构成。初始实例分割模型使用AdamW优化器进行训练，初始学习率为0.001，遵循多项式衰减策略，权重衰减为0.0001，实验中的批量大小为4。The backbone network of the initial instance segmentation model is composed of swin transformer. The initial instance segmentation model is trained using the AdamW optimizer with an initial learning rate of 0.001, following a polynomial decay strategy, a weight decay of 0.0001, and a batch size of 4 in the experiment.

2.目标域应用初始分割模型2. Apply the initial segmentation model to the target domain

在目标域上利用源域训练得到的初始实例分割模型进行实例分割，测试数据集仅包括图像集{X_tatget}，输出每张图像对应的骨干网络特征和语义分数张量。The initial instance segmentation model trained in the source domain is used to perform instance segmentation on the target domain. The test dataset only includes the image set {X _tatget }, and the backbone network features and semantic score tensor corresponding to each image are output.

3.提取目标域特征3. Extract target domain features

在步骤2得到的语义分数张量的实例维度上取最大值，得到目标域每张图像的实例分割掩码；将目标域每张图像的实例分割掩码分别与目标域骨干网络特征、目标域语义分数张量相乘，得到目标域每个实例的掩码特征和掩码语义分数张量；Take the maximum value on the instance dimension of the semantic score tensor obtained in step 2 to obtain the instance segmentation mask of each image in the target domain; multiply the instance segmentation mask of each image in the target domain with the target domain backbone network feature and the target domain semantic score tensor respectively to obtain the mask feature and mask semantic score tensor of each instance in the target domain;

目标域实例t的掩码特征f_t和掩码语义分数张量s_t具体为：掩码特征f_t为目标域骨干网络特征乘以目标域实例t的实例分割掩码，掩码语义分数张量s_t为目标域语义分数张量乘以目标域实例t的实例分割掩码，s_t∈R^W×H×K，K是图像所包含的实例个数，W×H为图像尺寸。The mask feature f _t and mask semantic score tensor s _t of the target domain instance t are specifically: the mask feature f _t is the target domain backbone network feature multiplied by the instance segmentation mask of the target domain instance t, and the mask semantic score tensor s _t is the target domain semantic score tensor multiplied by the instance segmentation mask of the target domain instance t, s _t ∈R ^W×H×K , K is the number of instances contained in the image, and W×H is the image size.

4.拼接特征4. Splicing features

将步骤3得到的实例t的掩码特征f_t和掩码语义分数张量s_t进行拼接，得到实例t的增强掩码特征f_t ⁺。Concatenate the mask feature f t of instance _t obtained in step 3 and the mask semantic score tensor s _t to obtain the enhanced mask feature f _t ⁺ of instance t.

5.构建语义树5. Constructing a semantic tree

如图3所示，使用层次凝聚聚类构建每个类别对应的语义树，将属于该类别的每个实例的增强掩码特征视为一个叶节点，每次凝聚选择实例间增强掩码特征的欧式距离最小的两个子节点进行合并得到合并节点，所述合并节点的增强掩码特征和掩码语义分数张量分别是子节点对应增强掩码特征和掩码语义分数张量的线性组合；具体为：As shown in Figure 3, hierarchical agglomerative clustering is used to construct the semantic tree corresponding to each category. The enhanced mask feature of each instance belonging to the category is regarded as a leaf node. Each time the two child nodes with the smallest Euclidean distance between the enhanced mask features of the instances are selected for merging to obtain a merged node. The enhanced mask feature and mask semantic score tensor of the merged node are respectively the linear combination of the enhanced mask feature and mask semantic score tensor corresponding to the child nodes; specifically:

s_j＝w_ts_t+w_os_o s _j = w _t s _t + w _o s _o

其中，P_t和P_o分别为相应子节点包含的实例个数；对于叶子节点，w_t＝w_o＝1/2；Where _Pt and _Po are the number of instances contained in the corresponding child nodes; for leaf nodes, _wt = _wo = 1/2;

6.抽样快速判断掩码准确性6. Sampling to quickly determine mask accuracy

7.掩码修正7. Mask correction

将类别k对应的语义树上的所有抽样实例的标注结果的统计值与设定阈值比较：如果统计值大于阈值说明类别k的预测准确，不准确的抽样实例会利用准确的抽样实例进行掩码修正；如果统计值小于等于阈值说明类别k的预测不准确，将对应的语义树拆分为两个子树，每个子树重新抽样实例计算标注结果的统计值，再与设计阈值进行比较，重复拆分-比较过程直到子树不可拆分或子树不包含任何准确的抽样实例；Compare the statistical value of the annotation results of all sampled instances on the semantic tree corresponding to category k with the set threshold: if the statistical value is greater than the threshold, it means that the prediction of category k is accurate, and the inaccurate sampled instances will be masked and corrected using the accurate sampled instances; if the statistical value is less than or equal to the threshold, it means that the prediction of category k is inaccurate, and the corresponding semantic tree is split into two subtrees, and each subtree resamples instances to calculate the statistical value of the annotation results, and then compares it with the designed threshold, and repeats the split-comparison process until the subtree cannot be split or the subtree does not contain any accurate sampled instances;

类别k对应的语义树上的所有抽样实例的标注结果的统计值Q_k的计算公式为：The calculation formula for the statistical value Q _k of the annotation results of all sampled instances on the semantic tree corresponding to category k is:

其中，N是对于类别k的语义树进行抽样的实例个数，实例的编号为k₁,…,_N，_t是步骤6中实例分割掩码的判断结果。Wherein, N is the number of instances sampled for the semantic tree of category k, the instances are numbered as k ₁ ,…, _N , and _t is the judgment result of the instance segmentation mask in step 6.

8.微调初始实例分割模型8. Fine-tune the initial instance segmentation model

根据目标域掩码修正结果微调初始实例分割模型，从而改善实例分割模型的有效性。The initial instance segmentation model is fine-tuned according to the target domain mask correction results to improve the effectiveness of the instance segmentation model.

本发明在Pascal VOC 2012数据集和COCO数据集上实施，Pascal VOC 2012数据集由1464张训练图像和1449张验证图像组成，并带有源分割结果，Pascal VOC 2012数据集中有20个类别，评估预测分割掩码质量的指标是平均交并比(mean Intersection overUnion,mIoU)；COCO数据集中有80个类，评价标准是预测框平均精度AP_box和掩码平均精度AP_mask。The present invention is implemented on the Pascal VOC 2012 dataset and the COCO dataset. The Pascal VOC 2012 dataset consists of 1464 training images and 1449 verification images with source segmentation results. There are 20 categories in the Pascal VOC 2012 dataset, and the indicator for evaluating the quality of the predicted segmentation mask is the mean Intersection over Union (mIoU); there are 80 categories in the COCO dataset, and the evaluation criteria are the average precision of the predicted box AP _box and the average precision of the mask AP _mask .

申请人在Pascal VOC 2012数据集和COCO数据集的实验结果如图4和图5所示，实验结果表明，与其他先进方法相比，本发明花费了有限的人力资源进行标签验证，并获得了接近监督学习方法的有效性，具有可观的竞争力。The applicant's experimental results on the Pascal VOC 2012 dataset and the COCO dataset are shown in Figures 4 and 5. The experimental results show that compared with other advanced methods, the present invention spends limited human resources on label verification and achieves effectiveness close to that of supervised learning methods, which is quite competitive.

与前述基于弱监督学习的域自适应实例分割方法的实施例相对应，本发明还提供了基于弱监督学习的域自适应实例分割装置的实施例。Corresponding to the above-mentioned embodiment of the domain adaptive instance segmentation method based on weakly supervised learning, the present invention also provides an embodiment of a domain adaptive instance segmentation device based on weakly supervised learning.

参见图6，本发明实施例提供的一种基于弱监督学习的域自适应实例分割装置，包括存储器和一个或多个处理器，所述存储器中存储有可执行代码，所述处理器执行所述可执行代码时，用于实现上述实施例中的基于弱监督学习的域自适应实例分割方法。Referring to FIG. 6 , an embodiment of the present invention provides a domain adaptive instance segmentation device based on weakly supervised learning, comprising a memory and one or more processors, wherein the memory stores executable code, and when the processor executes the executable code, it is used to implement the domain adaptive instance segmentation method based on weakly supervised learning in the above embodiment.

本发明基于弱监督学习的域自适应实例分割装置的实施例可以应用在任意具备数据处理能力的设备上，该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。以软件实现为例，作为一个逻辑意义上的装置，是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，如图6所示，为本发明基于弱监督学习的域自适应实例分割装置所在任意具备数据处理能力的设备的一种硬件结构图，除了图6所示的处理器、内存、网络接口、以及非易失性存储器之外，实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能，还可以包括其他硬件，对此不再赘述。The embodiment of the domain adaptive instance segmentation device based on weakly supervised learning of the present invention can be applied to any device with data processing capabilities, and the any device with data processing capabilities can be a device or apparatus such as a computer. The device embodiment can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by the processor of any device with data processing capabilities in which it is located to read the corresponding computer program instructions in the non-volatile memory into the memory and run it. From the hardware level, as shown in Figure 6, it is a hardware structure diagram of any device with data processing capabilities where the domain adaptive instance segmentation device based on weakly supervised learning of the present invention is located. In addition to the processor, memory, network interface, and non-volatile memory shown in Figure 6, any device with data processing capabilities where the device in the embodiment is located can also include other hardware according to the actual function of the device with data processing capabilities, which will not be repeated.

上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程，在此不再赘述。The implementation process of the functions and effects of each unit in the above-mentioned device is specifically described in the implementation process of the corresponding steps in the above-mentioned method, and will not be repeated here.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant parts can refer to the partial description of the method embodiment. The device embodiment described above is only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of the present invention. Ordinary technicians in this field can understand and implement it without paying creative work.

本发明实施例还提供一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时，实现上述实施例中的基于弱监督学习的域自适应实例分割方法。An embodiment of the present invention further provides a computer-readable storage medium having a program stored thereon. When the program is executed by a processor, the domain adaptive instance segmentation method based on weakly supervised learning in the above embodiment is implemented.

所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元，例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备，例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card，SMC)、SD卡、闪存卡(Flash Card)等。进一步的，所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据，还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capability described in any of the aforementioned embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of any device with data processing capability, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card, etc. equipped on the device. Furthermore, the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capability. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capability, and may also be used to temporarily store data that has been output or is to be output.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The above is a description of a specific embodiment of the specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

在本说明书一个或多个实施例使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of this specification are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of this specification. The singular forms of "one", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书一个或多个实施例可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used to describe various information in one or more embodiments of this specification, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of this specification, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

以上所述仅为本说明书一个或多个实施例的较佳实施例而已，并不用以限制本说明书一个或多个实施例，凡在本说明书一个或多个实施例的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本说明书一个或多个实施例保护的范围之内。The above description is merely a preferred embodiment of one or more embodiments of the present specification and is not intended to limit one or more embodiments of the present specification. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of one or more embodiments of the present specification shall be included in the scope of protection of one or more embodiments of the present specification.

Claims

1. The domain self-adaptive instance segmentation method based on weak supervised learning is characterized by comprising the following steps of:

(1) Training an initial instance segmentation model on a source domain, and outputting backbone network characteristics and a semantic score tensor, wherein the semantic score tensor comprises probabilities of different instances to which each pixel belongs;

(2) Performing instance segmentation on a target domain by using an initial instance segmentation model obtained by source domain training, and outputting backbone network characteristics and semantic score tensors corresponding to each image;

(3) Taking the maximum value on the example dimension of the semantic score tensor obtained in the step (2) to obtain an example segmentation mask of each image of the target domain; multiplying the instance segmentation mask of each image of the target domain with the backbone network characteristics of the target domain and the semantic fraction tensor of the target domain respectively to obtain mask characteristics and mask semantic fraction tensor of each instance of the target domain;

(4) Splicing the mask features of the examples obtained in the step (3) with mask semantic score tensors to obtain enhanced mask features of the examples;

(5) Constructing a semantic tree corresponding to each category by using hierarchical condensation clustering, regarding the enhanced mask feature of each instance belonging to the category as a leaf node, and merging two child nodes with the minimum Euclidean distance of the enhanced mask feature among the selected instances each time to obtain a merged node, wherein the enhanced mask feature and the mask semantic score tensor of the merged node are respectively linear combinations of the enhanced mask feature and the mask semantic score tensor corresponding to the child nodes;

(6) For each semantic tree, sampling leaf nodes of the semantic tree based on a set sampling rate, rapidly judging whether an instance segmentation mask is accurate or not, and labeling a judgment result;

(7) Comparing the statistics of the labeling results of all sampling examples on the semantic tree corresponding to the category k with a set threshold value: if the statistical value is larger than the threshold value, the prediction of the category k is accurate, and an inaccurate sampling example can utilize the accurate sampling example to carry out mask correction; if the statistical value is less than or equal to the threshold value, the prediction of the specification class k is inaccurate, splitting the corresponding semantic tree into two subtrees, re-sampling the example calculation labeling result statistical value of each subtree, comparing with the design threshold value, and repeating the splitting-comparing process until the subtree is not detachable or does not contain any accurate sampling example;

(8) And fine-tuning the initial instance segmentation model according to the target domain mask correction result.

2. The domain adaptive instance segmentation method based on weak supervised learning as set forth in claim 1, wherein the step (5) specifically includes:

merging the child node corresponding to the instance t and the child node corresponding to the instance o to obtain a merging node n _j, wherein the enhanced mask feature f _j ⁺ and the mask semantic score tensor s _j of the merging node n _j are linear combinations of the enhanced mask feature and the mask semantic score tensor corresponding to the child node:

f_j ⁺＝w_tf_t ⁺+w_of_o ⁺

s_j＝w_ts_t+w_os_o

Wherein the weights w _t and w _o are related to the size of the child nodes:

Wherein, P _t and P _o are the number of examples contained in the corresponding child nodes respectively; for leaf nodes, w _t＝w_o =1/2;

and finally constructing a semantic tree corresponding to each category by condensing and merging the nodes for multiple times.

3. The method for domain adaptive instance segmentation based on weak supervised learning as set forth in claim 1, wherein in the step (7), a calculation formula of the statistics Q _k of labeling results of all sampling instances on the semantic tree corresponding to the category k is:

wherein N is the number of instances of sampling the semantic tree of the category k, and the number of instances k ₁,...,k_N,l_t is the judgment result of the instance segmentation mask in the step (6).

4. The domain adaptive instance segmentation method based on weak supervised learning as set forth in claim 1, wherein the backbone network of the initial instance segmentation model is composed of swin transformer; in the source domain, the training dataset includes an image set { X _source } and a corresponding instance mask image set { Y _source }; in the target domain, the test dataset includes only the image set { X _tatget }.

5. The domain adaptive instance segmentation method based on weak supervised learning as set forth in claim 1, wherein the initial instance segmentation model performs data enhancement during a training phase, including horizontal/vertical rollover, translation, and scale scaling; the initial example segmentation model was trained using AdamW optimizer with an initial learning rate of 0.001, following polynomial decay strategy, with a weight decay of 0.0001.

6. The method for domain adaptive instance segmentation based on weakly supervised learning as set forth in claim 1, wherein in step (3), the mask feature f _t and the mask semantic score tensor s _t of the target domain instance t are specifically:

the mask feature f _t is an instance partition mask of the target domain backbone network feature multiplied by the target domain instance t, the mask semantic score tensor s _t is an instance partition mask of the target domain semantic score tensor multiplied by the target domain instance t, s _t∈R^W×H×K, K is the number of instances contained in the image, and w×h is the image size.

7. The method for domain adaptive instance segmentation based on weakly supervised learning as set forth in claim 1, wherein in step (6), leaf nodes of the semantic tree are sampled, specifically:

for the semantic tree T _k constructed by category k, N instance segmentation masks are randomly selected based on the set sampling rate R The annotator quickly determines whether the selected instance segmentation mask is accurate, if the predicted instance segmentation mask m _t is accurate, 1 is annotated, and if not, 0 is annotated.

8. The method for domain-adaptive instance segmentation based on weakly supervised learning as recited in claim 1, wherein the Pascal VOC 2012 dataset and the COCO dataset are used as training sets for the instance segmentation model.

9. A weakly supervised learning based domain adaptive instance segmentation apparatus comprising a memory and one or more processors, the memory having executable code stored therein, wherein the processor, when executing the executable code, is configured to implement the weakly supervised learning based domain adaptive instance segmentation method as set forth in any of claims 1-8.

10. A computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, implements a domain-adaptive instance segmentation method based on weakly supervised learning as set forth in any one of claims 1 to 8.