CN114332466B

CN114332466B - Continuous learning method, system, equipment and storage medium for image semantic segmentation network

Info

Publication number: CN114332466B
Application number: CN202210237914.4A
Authority: CN
Inventors: 王子磊; 林子涵
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-07-15
Anticipated expiration: 2042-03-11
Also published as: CN114332466A

Abstract

The invention discloses a continuous learning method, system, equipment and storage medium for image semantic segmentation network. On the one hand, the method of extracting old knowledge representations and aligning them through nonlinear transformation in feature space effectively maintains the invariance of old knowledge and improves the Ability to learn new knowledge. On the other hand, the topology structure of the new category is optimized in the embedding space, and the invariance of its topology structure is maintained for the old category, so as to reduce forgetting and prevent confusion between classes. This makes it unnecessary to provide labels of old categories in the continuous learning of semantic segmentation, reducing the cost of labeling. In general, as a universal continuous learning method for semantic segmentation, the present invention has no restrictions on application scenarios, and has strong generalization ability and practical value.

Description

Image semantic segmentation network continuous learning method, system, device and storage medium

技术领域technical field

本发明涉及图像语义分割技术领域，尤其涉及一种图像语义分割网络持续学习方法、系统、设备及存储介质。The present invention relates to the technical field of image semantic segmentation, in particular to a continuous learning method, system, device and storage medium for image semantic segmentation network.

背景技术Background technique

近年来，深度神经网络在语义分割任务上取得了巨大的成功。然而传统语义分割网络训练方法需要一次性获取全部的训练数据，且训练完成后难以更新。在实际应用中，常要求网络能从数据流中逐渐的学习、更新已学会的知识，从而有效降低数据存储代价与训练代价。但是令深度神经网络直接在新数据上学习会导致已学习知识的严重遗忘。而持续学习技术通过对学习过程中施加额外的约束，以达到学习新知识的同时不会遗忘已学会的知识的目的。In recent years, deep neural networks have achieved great success in semantic segmentation tasks. However, the traditional semantic segmentation network training method needs to obtain all the training data at one time, and it is difficult to update after the training is completed. In practical applications, the network is often required to gradually learn and update the learned knowledge from the data stream, thereby effectively reducing the cost of data storage and training. But having deep neural networks learn directly on new data can lead to severe forgetting of what has been learned. The continuous learning technology imposes additional constraints on the learning process to achieve the purpose of learning new knowledge without forgetting the learned knowledge.

持续学习的一般手段为使用知识蒸馏来保持新旧网络间知识的一致性。这一操作常在输出空间或者特征空间进行。具体到语义分割领域，除了使用上述手段防止旧知识的遗忘以外，还有两个新的挑战：首先，随着学习的进行，可能需要学习过去曾经忽略的类别，从而导致对于某一特定输入，其包含的语义信息并不是一成不变的，这要求网络有更强的新知识学习能力。其次，由于获取标注数据需要大量人力物力，因此希望在新增数据上只对需要学习的类别进行标注，这将导致被标注为背景类的区域可能包含已学习类别，引入的语义不一致性会给网络训练带来较大的挑战。因此，图像分类领域的持续学习方法无法胜任语义分割持续学习任务。A general approach to continuous learning is to use knowledge distillation to maintain knowledge consistency between old and new networks. This operation is usually performed in the output space or feature space. Specific to the field of semantic segmentation, in addition to using the above methods to prevent the forgetting of old knowledge, there are two new challenges: First, as learning progresses, it may be necessary to learn categories that have been ignored in the past, resulting in The semantic information it contains is not static, which requires the network to have a stronger ability to learn new knowledge. Secondly, since obtaining labeled data requires a lot of manpower and material resources, it is hoped that only the categories that need to be learned will be labeled on the newly added data, which will cause the regions labeled as background classes may contain learned categories, and the introduced semantic inconsistency will give Network training brings greater challenges. Therefore, continuous learning methods in the field of image classification are not competent for the continuous learning task of semantic segmentation.

具体的：公布号为CN111191709A的中国专利申请《深度神经网络的持续学习框架及持续学习方法》中，使用一生成网络生成旧类别的数据，并与新数据混合用以训练网络，但是其仅解决图像分类任务。此外，这种方式严重依赖生成器的生成质量，难以胜任大规模、复杂的数据，特别是图像语义分割任务。公布号为CN111368874A的中国专利申请《一种基于单分类技术的图像类别增量学习方法》中，采用输出空间的知识蒸馏和偏好纠正两个手段以实现图像分类任务的持续学习。但是，其仍无法解决前述语义分割持续学习任务中特有的挑战，因而无法在图像语义分割中直接应用。公布号为CN103366163A的中国专利申请《基于增量学习的人脸检测系统和方法》、公布号为CN106897705A的中国专利申请《一种基于增量学习的海洋观测大数据分布方法》、以及公布号为CN103593680A的中国专利申请《一种基于隐马尔科夫模型自增量学习的动态手势识别方法》均为在某一特定领域专用的方法，无法证明其具有泛化性与普适性。Specifically: In the Chinese patent application "Continuous Learning Framework and Continuous Learning Method for Deep Neural Networks" with the publication number of CN111191709A, a generation network is used to generate data of old categories, and it is mixed with new data to train the network, but it only solves the problem of image classification tasks. In addition, this method relies heavily on the generation quality of the generator, and is incompetent for large-scale and complex data, especially the task of image semantic segmentation. In the Chinese patent application "An Incremental Learning Method for Image Classes Based on Single Classification Technology" with the publication number of CN111368874A, two methods of knowledge distillation and preference correction of the output space are used to realize the continuous learning of image classification tasks. However, it still cannot solve the unique challenges in the aforementioned continuous learning task of semantic segmentation, so it cannot be directly applied in image semantic segmentation. The Chinese patent application "Face Detection System and Method Based on Incremental Learning" with the publication number CN103366163A, the Chinese patent application with the publication number CN106897705A "A method for the distribution of ocean observation big data based on the incremental learning", and the publication number of CN103593680A's Chinese patent application "A Dynamic Gesture Recognition Method Based on Hidden Markov Model Self-Incremental Learning" is a method dedicated to a specific field, and it cannot be proved that it has generalization and universality.

因此，针对语义分割持续学习任务设计一种泛用的，尽可能降低旧知识遗忘的同时解决语义分割持续学习任务中的前后语义不一致性的方法具有重要的实用价值和现实意义。Therefore, it is of great practical value and practical significance to design a general method for the continuous learning task of semantic segmentation, which can reduce the forgetting of old knowledge as much as possible while solving the semantic inconsistency in the continuous learning task of semantic segmentation.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种图像语义分割网络持续学习方法、系统、设备及存储介质，对应用场景均无限制，具有很强的泛化能力与实用价值，填补语义分割持续学习任务上的空白。The purpose of the present invention is to provide an image semantic segmentation network continuous learning method, system, equipment and storage medium, which have no restrictions on application scenarios, have strong generalization ability and practical value, and fill the gap in the continuous learning task of semantic segmentation. .

本发明的目的是通过以下技术方案实现的：The purpose of this invention is to realize through the following technical solutions:

一种图像语义分割网络持续学习方法，包括：A continuous learning method for image semantic segmentation network, including:

获取新增语义分割数据集及对应新增类别的标签，利用原始图像语义分割网络提取所述新增语义分割数据集中图像数据的原始特征图，通过特征变换模块对所述原始特征图进行变换，并利用变换结果重构的特征图与所述原始特征图的差异初步训练所述特征变换模块；Obtaining the newly added semantic segmentation data set and the label corresponding to the newly added category, using the original image semantic segmentation network to extract the original feature map of the image data in the newly added semantic segmentation data set, and transforming the original feature map through a feature transformation module, and utilize the difference between the reconstructed feature map of the transformation result and the original feature map to preliminarily train the feature transformation module;

使用所述原始图像语义分割网络与初步训练的特征变换模块初始化一个相同的图像语义分割网络与特征变换模块，将所述原始图像语义分割网络称为旧网络，初步训练的特征变换模块称为旧特征变换模块，初始化产生的图像语义分割网络称为新网络，初始化产生的特征变换模块称为新特征变换模块；固定所述旧网络与旧特征变换模块，训练所述新网络与新特征变换模块；Use the original image semantic segmentation network and the initially trained feature transformation module to initialize an identical image semantic segmentation network and feature transformation module, the original image semantic segmentation network is called the old network, and the initially trained feature transformation module is called the old network. Feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; the old network and the old feature transformation module are fixed, and the new network and the new feature transformation module are trained. ;

训练时，将新增语义分割数据集的图像数据同时输入至所述旧网络与新网络，所述旧网络与新网络中各自进行特征图提取、解码与语义分割，获得分割结果；其中，所述旧网络提取的特征图通过所述旧特征变换模块进行变换，所述新网络提取的特征图通过所述新特征变换模块进行变换，并计算两种变换结果的对齐损失；利用所述旧网络与新网络的分割结果及解码获得的特征向量对于旧类别分别独自构建相应的类间关系矩阵与类内关系集合，并利用所述旧网络与新网络的类间关系矩阵计算类间结构保持损失，利用所述旧网络与新网络的类内关系集合计算类内结构保持损失，所述类间结构保持损失与类内结构保持损失用于保持旧类别中类间结构与类内结构的一致性；同时，对于新增类别，利用所述新网络解码获得的特征向量计算初始结构优化损失，所述初始结构优化损失用于拉近相同新增类别的特征向量的分布，疏远不同新增类别的特征向量的分布，并且，利用逐类别动态阈值对旧网络的分割结果进行优化去噪，获得相应的伪标签，利用所述伪标签计算所述新网络的分类损失；结合所述对齐损失、类间结构保持损失、类内结构保持损失、初始结构优化损失及分类损失训练所述新网络与新特征变换模块。During training, the image data of the newly added semantic segmentation dataset is input into the old network and the new network at the same time, and feature map extraction, decoding and semantic segmentation are performed in the old network and the new network respectively to obtain segmentation results; The feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of the two transformation results is calculated; using the old network With the segmentation result of the new network and the eigenvector obtained by decoding, the corresponding inter-class relationship matrix and intra-class relationship set are independently constructed for the old class, and the inter-class structure retention loss is calculated by using the inter-class relationship matrix of the old network and the new network. , the intra-class structure preservation loss is calculated by using the intra-class relationship set of the old network and the new network, and the inter-class structure preservation loss and the intra-class structure preservation loss are used to maintain the consistency between the inter-class structure and the intra-class structure in the old class At the same time, for the newly added category, the initial structure optimization loss is calculated by using the feature vector obtained by the new network decoding, and the initial structure optimization loss is used to narrow the distribution of the feature vectors of the same newly added category, and alienate the different newly added categories. The distribution of feature vectors, and use the category-by-category dynamic threshold to optimize and denoise the segmentation results of the old network, obtain corresponding pseudo-labels, and use the pseudo-labels to calculate the classification loss of the new network; The new network and the new feature transformation module are trained with inter-class structure preservation loss, intra-class structure preservation loss, initial structure optimization loss, and classification loss.

一种图像语义分割网络持续学习系统，该系统包括：An image semantic segmentation network continuous learning system, the system includes:

数据收集与初步训练单元，用于获取新增语义分割数据集及对应新增类别的标签，利用原始图像语义分割网络提取所述新增语义分割数据集中图像数据的原始特征图，通过特征变换模块对所述原始特征图进行变换，并利用变换结果重构的特征图与所述原始特征图的差异初步训练所述特征变换模块；The data collection and preliminary training unit is used to obtain the newly added semantic segmentation data set and the labels corresponding to the newly added categories, use the original image semantic segmentation network to extract the original feature map of the image data in the newly added semantic segmentation data set, and pass the feature transformation module. Transforming the original feature map, and preliminarily training the feature transformation module by using the difference between the reconstructed feature map and the original feature map from the transformation result;

学习单元，用于使用所述原始图像语义分割网络与初步训练的特征变换模块初始化一个相同的图像语义分割网络与特征变换模块，将所述原始图像语义分割网络称为旧网络，初步训练的特征变换模块称为旧特征变换模块，初始化产生的图像语义分割网络称为新网络，初始化产生的特征变换模块称为新特征变换模块；固定所述旧网络与旧特征变换模块，训练所述新网络与新特征变换模块；训练时，将新增语义分割数据集的图像数据同时输入至所述旧网络与新网络，所述旧网络与新网络中各自进行特征图提取、解码与语义分割，获得分割结果；其中，所述旧网络提取的特征图通过所述旧特征变换模块进行变换，所述新网络提取的特征图通过所述新特征变换模块进行变换，并计算两种变换结果的对齐损失；利用所述旧网络与新网络的分割结果及解码获得的特征向量对于旧类别分别独自构建相应的类间关系矩阵与类内关系集合，并利用所述旧网络与新网络的类间关系矩阵计算类间结构保持损失，利用所述旧网络与新网络的类内关系集合计算类内结构保持损失，所述类间结构保持损失与类内结构保持损失用于保持旧类别中类间结构与类内结构的一致性；同时，对于新增类别，利用所述新网络解码获得的特征向量计算初始结构优化损失，所述初始结构优化损失用于拉近相同新增类别的特征向量的分布，疏远不同新增类别的特征向量的分布，并且，利用逐类别动态阈值对旧网络的分割结果进行优化去噪，获得相应的伪标签，利用所述伪标签计算所述新网络的分类损失；结合所述对齐损失、类间结构保持损失、类内结构保持损失、初始结构优化损失及分类损失训练所述新网络与新特征变换模块。The learning unit is used to initialize an identical image semantic segmentation network and feature transformation module using the original image semantic segmentation network and the initially trained feature transformation module, and the original image semantic segmentation network is called the old network, and the initially trained feature The transformation module is called the old feature transformation module, the image semantic segmentation network generated by initialization is called the new network, and the feature transformation module generated by initialization is called the new feature transformation module; the old network and the old feature transformation module are fixed, and the new network is trained. and the new feature transformation module; during training, input the image data of the newly added semantic segmentation data set into the old network and the new network at the same time, and perform feature map extraction, decoding and semantic segmentation in the old network and the new network respectively, and obtain Segmentation result; wherein, the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of the two transformation results is calculated. ; Use the segmentation results of the old network and the new network and the eigenvectors obtained by decoding to independently construct the corresponding inter-class relationship matrix and the intra-class relationship set for the old category, and use the old network and the new network. The relationship matrix between classes The inter-class structure preservation loss is calculated, and the intra-class structure preservation loss is calculated by using the intra-class relationship set of the old network and the new network. The inter-class structure preservation loss and the intra-class structure preservation loss are used to preserve the inter-class structure and Consistency of intra-class structure; at the same time, for newly added categories, the initial structure optimization loss is calculated by using the feature vector obtained by decoding the new network, and the initial structure optimization loss is used to narrow the distribution of feature vectors of the same newly added category, Alienate the distribution of feature vectors of different new categories, and use the category-by-category dynamic threshold to optimize and denoise the segmentation results of the old network, obtain corresponding pseudo-labels, and use the pseudo-labels to calculate the classification loss of the new network; The alignment loss, inter-class structure preservation loss, intra-class structure preservation loss, initial structure optimization loss, and classification loss train the new network and new feature transformation module.

一种处理设备，包括：一个或多个处理器；存储器，用于存储一个或多个程序；A processing device, comprising: one or more processors; a memory for storing one or more programs;

其中，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现前述的方法。Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the aforementioned method.

一种可读存储介质，存储有计算机程序，当计算机程序被处理器执行时实现前述的方法。A readable storage medium storing a computer program which, when executed by a processor, implements the aforementioned method.

由上述本发明提供的技术方案可以看出，一方面，通过特征空间中的非线性变换提取旧知识表征进行对齐的手段，有效保持旧知识不变性的同时提高对新知识学习的能力。另一方面，在嵌入空间中对新类别优化其拓扑结构，对旧类别维护其拓扑结构的不变性，达到降低遗忘，防止类间混淆的效果；此外，联合伪标签及伪标签降噪技术，使得在语义分割持续学习中无需提供旧类别的标签，降低标注成本。总体来说，本发明作为一种通用性的语义分割持续学习方法，对应用场景均无限制，具有很强的泛化能力与实用价值。It can be seen from the technical solutions provided by the present invention that, on the one hand, the method of extracting old knowledge representations and aligning them through nonlinear transformation in the feature space effectively maintains the invariance of old knowledge and improves the ability to learn new knowledge. On the other hand, the topology structure of the new category is optimized in the embedding space, and the invariance of its topology structure is maintained for the old category, so as to reduce forgetting and prevent confusion between classes. This makes it unnecessary to provide labels of old categories in the continuous learning of semantic segmentation, reducing the cost of labeling. In general, as a universal continuous learning method for semantic segmentation, the present invention has no restrictions on application scenarios, and has strong generalization ability and practical value.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域的普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明实施例提供的一种图像语义分割网络持续学习方法的模型示意图；1 is a schematic diagram of a model of an image semantic segmentation network continuous learning method provided by an embodiment of the present invention;

图2为本发明实施例提供的初始结构优化部分原理示意图；FIG. 2 is a schematic diagram of the principle of an initial structure optimization part provided by an embodiment of the present invention;

图3为本发明实施例提供的类间与类内结构保持部分原理示意图；FIG. 3 is a schematic diagram of part of the principle of maintaining inter-class and intra-class structures according to an embodiment of the present invention;

图4为本发明实施例提供的不同图像语义分割网络分割结果的对比示意图；FIG. 4 is a schematic diagram of comparison of segmentation results of different image semantic segmentation networks according to an embodiment of the present invention;

图5为本发明实施例提供的一种图像语义分割网络持续学习系统的示意图；5 is a schematic diagram of an image semantic segmentation network continuous learning system provided by an embodiment of the present invention;

图6为本发明实施例提供的一种处理设备的示意图。FIG. 6 is a schematic diagram of a processing device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

首先对本文中可能使用的术语进行如下说明：First a description of terms that may be used in this article:

术语“包括”、“包含”、“含有”、“具有”或其它类似语义的描述，应被解释为非排它性的包括。例如：包括某技术特征要素（如原料、组分、成分、载体、剂型、材料、尺寸、零件、部件、机构、装置、步骤、工序、方法、反应条件、加工条件、参数、算法、信号、数据、产品或制品等），应被解释为不仅包括明确列出的某技术特征要素，还可以包括未明确列出的本领域公知的其它技术特征要素。The terms "comprising", "comprising", "containing", "having" or other descriptions with similar meanings should be construed as non-exclusive inclusions. For example: including certain technical characteristic elements (such as raw materials, components, ingredients, carriers, dosage forms, materials, dimensions, parts, components, mechanisms, devices, steps, processes, methods, reaction conditions, processing conditions, parameters, algorithms, signals, data, products or products, etc.), should be construed to include not only certain technical feature elements explicitly listed, but also other technical feature elements known in the art that are not explicitly listed.

下面对本发明所提供的一种图像语义分割网络持续学习方法、系统、设备及存储介质进行详细描述。本发明实施例中未作详细描述的内容属于本领域专业技术人员公知的现有技术。本发明实施例中未注明具体条件者，按照本领域常规条件或制造商建议的条件进行。A method, system, device and storage medium for continuous learning of image semantic segmentation network provided by the present invention will be described in detail below. Contents that are not described in detail in the embodiments of the present invention belong to the prior art known to those skilled in the art. If the specific conditions are not indicated in the examples of the present invention, it is carried out according to the conventional conditions in the art or the conditions suggested by the manufacturer.

实施例一Example 1

本发明实施例提供一种图像语义分割网络持续学习方法，它是一种基于类别结构保持与特征对齐的语义分割持续学习方法。目前主流的图像语义分割网络由特征提取器、解码器与分类器组成，主要过程流程为：通过特征提取器提取输入的待分割图像的特征图，经解码器获得相应的特征向量，最后由分类器进行语义分割，获得每一像素的分类结果（即分割结果）。本发明针对图像语义分割网络设计对应的模块以防止旧知识遗忘。具体地，该方法核心内容包含特征变换模块、类别结构信息保持模块、伪标签生成模块和联合损失函数训练语义分割网络四个部分。特征变换模块通过对特征提取器输出特征图实施非线性变换，进而提取出旧知识的表征以对齐，达到为新知识的学习提供高自由度的同时仍有效保持旧知识完整性。类别结构信息保持模块使用解码器输出建立类内拓扑结构与类间拓扑结构，通过在学习过程中维持上述结构一致性，有效降低持续学习过程中类别拓扑结构破坏的现象，从而降低遗忘与类间混淆。进一步地，针对前后语义不一致问题，伪标签生成模块利用逐类别动态阈值对旧网络输出的分割结果进行优化去噪，从而生成高质量的伪标签以弥补缺失的旧类别标注。最后，联合上述模块的损失函数对语义分割网络进行训练，以达到持续学习的效果。The embodiment of the present invention provides a continuous learning method for image semantic segmentation network, which is a continuous learning method for semantic segmentation based on category structure keeping alignment with features. The current mainstream image semantic segmentation network consists of a feature extractor, a decoder and a classifier. The main process flow is: extract the input feature map of the image to be segmented through the feature extractor, obtain the corresponding feature vector through the decoder, and finally classify the image. Semantic segmentation is performed by the processor, and the classification result of each pixel (ie, the segmentation result) is obtained. The present invention designs corresponding modules for the image semantic segmentation network to prevent old knowledge from being forgotten. Specifically, the core content of the method includes four parts: feature transformation module, category structure information retention module, pseudo-label generation module and joint loss function training semantic segmentation network. The feature transformation module performs nonlinear transformation on the output feature map of the feature extractor, and then extracts the representation of the old knowledge for alignment, so as to provide a high degree of freedom for the learning of new knowledge while still effectively maintaining the integrity of the old knowledge. The class structure information retention module uses the decoder output to establish the intra-class topology and inter-class topology. By maintaining the above structural consistency during the learning process, the phenomenon of class topology destruction in the continuous learning process is effectively reduced, thereby reducing forgetting and inter-class topology. confused. Further, for the problem of inconsistent semantics before and after, the pseudo-label generation module uses a category-by-category dynamic threshold to optimize and denoise the segmentation results output by the old network, thereby generating high-quality pseudo-labels to make up for the missing old category labels. Finally, the loss function of the above modules is combined to train the semantic segmentation network to achieve the effect of continuous learning.

每一次持续学习的过程可以描述为：Each continuous learning process can be described as:

训练时，将新增语义分割数据集的图像数据同时输入至所述旧网络与新网络，所述旧网络与新网络中各自进行特征图提取、解码与语义分割，获得分割结果；其中，所述旧网络提取的特征图通过所述旧特征变换模块进行变换，所述新网络提取的特征图通过所述新特征变换模块进行变换，并计算两种变换结果的对齐损失；利用所述旧网络与新网络各自的分割结果及解码获得的特征向量，对于旧类别分别独自构建相应的类间关系矩阵与类内关系集合，并利用所述旧网络与新网络的类间关系矩阵计算类间结构保持损失，利用所述旧网络与新网络的类内关系集合计算类内结构保持损失，所述类间结构保持损失与类内结构保持损失用于保持旧类别中类间结构与类内结构的一致性；同时，对于新增类别，利用所述新网络解码获得的特征向量计算初始结构优化损失，所述初始结构优化损失用于拉近相同新增类别的特征向量的分布，疏远不同新增类别的特征向量的分布，并且，利用逐类别动态阈值为旧网络的分割结果进行优化去噪，获得相应的伪标签，利用所述伪标签计算所述新网络的分类损失；结合所述对齐损失、类间结构保持损失、类内结构保持损失、初始结构优化损失及分类损失训练所述新网络与新特征变换模块。During training, the image data of the newly added semantic segmentation dataset is input into the old network and the new network at the same time, and feature map extraction, decoding and semantic segmentation are performed in the old network and the new network respectively to obtain segmentation results; The feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of the two transformation results is calculated; using the old network With the respective segmentation results of the new network and the eigenvectors obtained by decoding, the corresponding inter-class relationship matrix and intra-class relationship set are independently constructed for the old class, and the inter-class relationship matrix of the old network and the new network is used to calculate the inter-class structure. Preservation loss, the intra-class structure preservation loss is calculated using the set of intra-class relationships between the old network and the new network, and the inter-class structure preservation loss and intra-class structure preservation loss are used to preserve the difference between the inter-class structure and the intra-class structure in the old class. Consistency; at the same time, for the newly added category, the initial structure optimization loss is calculated by using the feature vector obtained by the new network decoding, and the initial structure optimization loss is used to narrow the distribution of the feature vectors of the same newly added category, and alienate different new ones. The distribution of the feature vectors of the categories, and use the category-by-category dynamic threshold to optimize the denoising of the segmentation results of the old network, obtain the corresponding pseudo-labels, and use the pseudo-labels to calculate the classification loss of the new network; combine the alignment loss , inter-class structure preservation loss, intra-class structure preservation loss, initial structure optimization loss and classification loss to train the new network and the new feature transformation module.

为了便于理解，下面针对上述学习过程做进一步的介绍。In order to facilitate understanding, the following is a further introduction to the above learning process.

如图1所示，为本发明提供的一种图像语义分割网络持续学习方法的模型示意图，其展示了持续学习过程中所涉及的相关流程与损失函数，主要说明如下：As shown in FIG. 1, it is a schematic diagram of a model of an image semantic segmentation network continuous learning method provided by the present invention, which shows the relevant processes and loss functions involved in the continuous learning process, and is mainly described as follows:

一、特征变换模块及相关损失函数。1. Feature transformation module and related loss function.

本发明实施例中，利用新增语义分割数据集对图像语义分割网络进行学习之前，需要对特征变换模块部分进行初步训练。In the embodiment of the present invention, before learning the image semantic segmentation network by using the newly added semantic segmentation data set, it is necessary to perform preliminary training on the feature transformation module part.

如之前所述图像语义分割网络包含了特征提取器，使用特征提取器提取新增语义分割数据集中每一图像数据的原始特征图，记为

。然后，利用特征变换模块（FeatureProjector）进行非线性变换，生成关于旧知识的表征，通过训练特征变换模块引导其输出的表征中包含丰富、有效的信息。As mentioned earlier, the image semantic segmentation network includes a feature extractor, and the feature extractor is used to extract the original feature map of each image data in the newly added semantic segmentation dataset, denoted as

. Then, the feature transformation module (FeatureProjector) is used for nonlinear transformation to generate representations of old knowledge, and the output representations are guided by training the feature transformation module to contain rich and effective information.

本发明实施例中，使用自编码器结构初步训练特征变换模块，将特征变换模块记为P ^*；通过特征变换模块P ^*对原始特征图

进行变换包括：先通过卷积操作（例如，1*1卷积）进行通道降维，再通过若干空洞卷积操作（例如，两个3*3空洞卷积）进行局部空间信息混合，生成关于原始特征图

的表征。初步训练时，使用重构网络R ^*（例如，可采用两个3*3卷积构成）对变换结果

进行重构，通过尝试从特征变换模块的输出变换结果中重构原始特征图，并使用原始特征图与重构特征图的差异构造重构损失函数，利用重构损失初步训练所述特征变换模块P ^*，从而引导特征变换模块P ^*输出的表征中包含丰富的信息。In the embodiment of the present invention, the feature transformation module is preliminarily trained using the self-encoder structure, and the feature transformation module is denoted as P ^* ; the original feature map is transformed by the feature transformation module P ^*

The transformation includes: first performing channel dimensionality reduction through convolution operations (for example, 1*1 convolution), and then mixing local spatial information through several hole convolution operations (for example, two 3*3 hole convolutions) to generate information about original feature map

characterization. During initial training, use the reconstruction network R ^* (for example, it can be composed of two 3*3 convolutions) to transform the results

Reconstruct, by trying to reconstruct the original feature map from the output transformation result of the feature transformation module, and use the difference between the original feature map and the reconstructed feature map to construct a reconstruction loss function, and use the reconstruction loss to initially train the feature transformation module P ^* , which guides the feature transformation module P ^* output representations that contain rich information.

本领域技术人员可以理解，所述卷积操作与空洞卷积操作均为常规的两类卷积操作，相较而言，所述卷积操作为标准卷积操作，空洞卷积操作则能以较少的层数混合较大范围内的空间信息。Those skilled in the art can understand that the convolution operation and the hole convolution operation are both conventional two types of convolution operations. In comparison, the convolution operation is a standard convolution operation, and the hole convolution operation can be performed with Fewer layers mix spatial information in a larger range.

具体的，重构损失函数可以为重构的特征图

与所述原始特征图

的欧氏距离，表示为：Specifically, the reconstruction loss function can be the reconstructed feature map

with the original feature map

The Euclidean distance of , expressed as:

。

.

当特征变换模块P ^*能有效生成旧知识的表征时，完成初步训练，此时的特征变换模块记为

。When the feature transformation module P ^* can effectively generate the representation of the old knowledge, the preliminary training is completed, and the feature transformation module at this time is recorded as

.

之后，可以使用原始图像分割网络及特征变换模块

的参数初始化用于学习新知识的新网络与相应的特征变换模块

，学习新知识的过程中保持原始图像分割网络（即旧网络，Old Model）及其特征变换模块（即特征变换模块

）不变，只更新初始化产生的图像分割网络（即新网络，New Model）及其特征变换模块

（即初始化产生的特征变换模块）。学习阶段是增量学习的关键概念，初始阶段记为1，每新增一次类别集合，即为一个新的持续学习阶段。图1中，下标t-1、t代表不同学习阶段，相对而言，t-1学习阶段的网络为旧网络，t学习阶段的网络为新网络，E代表编码器（特征提取器），D代表解码器，G代表分类器。After that, the original image segmentation network and feature transformation module can be used

The parameters initialize the new network for learning new knowledge and the corresponding feature transformation module

, keep the original image segmentation network (ie old network, Old Model) and its feature transformation module (ie feature transformation module) in the process of learning new knowledge

) remains unchanged, only the image segmentation network (ie new network, New Model) and its feature transformation module generated by initialization are updated

(i.e. initialize the resulting feature transformation module). The learning stage is the key concept of incremental learning. The initial stage is denoted as 1, and each new category set is a new continuous learning stage. In Figure 1, the subscripts t -1 and t represent different learning stages. Relatively speaking, the network in the t -1 learning stage is the old network, the network in the t learning stage is the new network, and E represents the encoder (feature extractor), D stands for decoder and G stands for classifier.

本发明实施例中，通过对两个网络特征变换模块的输出施加一致性约束，以保证旧知识在持续学习的过程中保持不变，同时又给与特征图较高的变化自由度以良好的学习新知识。In the embodiment of the present invention, a consistency constraint is imposed on the outputs of the two network feature transformation modules to ensure that the old knowledge remains unchanged during the continuous learning process, and at the same time, the feature map is given a high degree of freedom of change with a good degree of freedom. learn new knowledge.

本发明实施例中，前述阶段原始图像分割网络并未更新，因此，其提取的特征图即为所述原始特征图，因此依旧记为

，旧特征变换模块

的变换结果表示为

；将所述新网络提取的特征图记为

，将所述新特征变换模块记为

，变换结果表示为

，对齐损失（Alignment Loss）为两种变换结果的L1距离，表示为：In the embodiment of the present invention, the original image segmentation network in the previous stage is not updated. Therefore, the extracted feature map is the original feature map, so it is still recorded as

, the old feature transformation module

The result of the transformation is expressed as

; mark the feature map extracted by the new network as

, denote the new feature transformation module as

, the transformation result is expressed as

, Alignment Loss (Alignment Loss) is the L1 distance of the two transformation results, expressed as:

。

.

二、类别结构信息保持模块及相关损失函数。2. The category structure information retention module and related loss function.

本发明实施例中，类别结构信息保持模块基于图像分割网络的解码器输出在嵌入空间中分别构建类内结构关系与类间结构关系。通过在持续学习的过程中保持上述两种关系，从而有效的保持网络对于旧类别的判别力。类别结构信息保持模块主要包括三个部分：初始结构优化部分、类间结构保持部分、类内结构保持部分三部分。其中，初始结构优化部分主要针对新增类别，计算初始结构优化损失，它属于对比损失（Contrastive Loss）；后两个部分主要针对旧类别，计算结构保持损失（Structure Preserving Loss），包含类间结构保持损失与类内结构保持损失；所述旧类别与新增类别是相对概念，即持续学习之前图像分割网络能够识别的类别。以上三个部分的原理及相关损失函数主要如下：In the embodiment of the present invention, the class structure information retention module respectively constructs the intra-class structure relationship and the inter-class structure relationship in the embedding space based on the decoder output of the image segmentation network. By maintaining the above two relationships in the process of continuous learning, the discriminative power of the network for the old categories is effectively maintained. The class structure information preservation module mainly includes three parts: the initial structure optimization part, the inter-class structure preservation part, and the intra-class structure preservation part. Among them, the initial structure optimization part mainly calculates the initial structure optimization loss for the new category, which belongs to the Contrastive Loss; the latter two parts are mainly for the old category, calculate the structure preservation loss (Structure Preserving Loss), including the inter-class structure The preservation loss and the intra-class structure preservation loss; the old category and the new category are relative concepts, that is, the categories that the image segmentation network can recognize before continuous learning. The principles and related loss functions of the above three parts are mainly as follows:

1、初始结构优化部分。1. The initial structure optimization part.

如图2所示，为初始结构优化部分的原理示意图。仅使用交叉熵训练时，不同类别（例如图中左侧的A，B类）在嵌入空间中的分布常常较为分散，且容易发生部分重叠，而这一分布较容易在后续学习的过程中引发类别混淆，进而产生遗忘。通过引导特征向量（图中三角形）尽量靠近其对应的特征原型（图中X字形），同时令不同类别原型间的距离不小于给定阈值（右侧图灰色圆形），达到优化类别分布，减少混淆的作用。As shown in Figure 2, it is a schematic diagram of the initial structure optimization part. When only using cross-entropy training, the distribution of different categories (such as A and B on the left side of the figure) in the embedding space is often scattered and prone to partial overlap, and this distribution is more likely to be induced in the subsequent learning process. Category confusion, and then forgetting. By guiding the feature vector (the triangle in the figure) as close as possible to its corresponding feature prototype (the X in the figure), and at the same time making the distance between the prototypes of different categories not less than a given threshold (gray circle in the right figure), the category distribution is optimized. Reduce the effect of confusion.

所述的初始结构优化模块为，在学习新类别的时候，对于其在嵌入空间中的分布进行引导，使相同类别的特征向量分布尽量紧凑，使相异类别的特征向量分布尽量离散。此部分能使模型的分类边界更加清晰，减少混淆，同时对于遗忘现象也有更强的鲁棒性。初由于始结构优化模块仅针对新增类别，因此，仅利用所述新网络的输出计算初始结构优化损失。The initial structure optimization module is to guide its distribution in the embedding space when learning a new category, so that the distribution of feature vectors of the same category is as compact as possible, and the distribution of feature vectors of different categories is as discrete as possible. This part makes the classification boundary of the model clearer, reduces confusion, and is more robust to forgetting. Initially, since the initial structure optimization module is only for newly added categories, only the output of the new network is used to calculate the initial structure optimization loss.

为了达到上述目的，初始结构优化部分的损失函数（初始结构优化损失）中使用两个损失函数分别引导类内结构与类间结构的学习，表示为：In order to achieve the above purpose, the loss function of the initial structure optimization part (initial structure optimization loss) uses two loss functions to guide the learning of the intra-class structure and the inter-class structure respectively, which are expressed as:

其中，

表示引导类内结构的损失，

表示引导类间结构的损失，

为

的权重（具体大小可根据实际情况或者经验设定）。in,

represents the loss of bootstrap intra-class structure,

represents the loss of bootstrap inter-class structure,

for

(the specific size can be set according to the actual situation or experience).

引导类内结构的损失

用于拉近相同新增类别的特征向量的分布，表示为：Loss of bootstrap intra-class structure

The distribution of feature vectors used to close the same newly added category is expressed as:

其中，

表示当前学习阶段t新增类别的集合，

表示当前学习阶段t新增类别的数量，所述当前学习阶段t表示训练所述新网络与新特征变换模块的阶段；

表示新增类别c对应的类别原型，

表示属于新增类别c的特征向量。in,

represents the set of newly added categories in the current learning stage t ,

Represents the number of new categories in the current learning stage t , and the current learning stage t represents the stage of training the new network and the new feature transformation module;

Indicates the category prototype corresponding to the newly added category c ,

represents the feature vector belonging to the newly added category c .

引导类间结构的损失

用于疏远不同新增类别的分布，表示为：loss of bootstrap inter-class structure

The distribution used to alienate different newly added categories, expressed as:

其中，

与

分别表示新增类别m与新增类别n对应的类别原型，

表示类别原型

与

的余弦相似度，

为预定义的距离（具体大小可根据实际情况或者经验设定）。in,

and

represent the category prototypes corresponding to the newly added category m and the newly added category n , respectively,

Represents a class prototype

and

The cosine similarity of ,

It is a predefined distance (the specific size can be set according to the actual situation or experience).

本发明实施例中，类别原型为相应类别下所有特征向量的平均，对于新增类别c，类别原型

表示为：In the embodiment of the present invention, the category prototype is the average of all feature vectors under the corresponding category, and for the newly added category c , the category prototype

Expressed as:

其中，y为当前阶段中新增类别的标签，|y=c|表示标签中属于新增类别c的像素的数量，

为指示函数，当y=c时，输出为1，其他情况输出为0。Among them, y is the label of the new category in the current stage, |y=c| indicates the number of pixels belonging to the new category c in the label,

For the indicator function, when y=c , the output is 1, and the output is 0 in other cases.

本领域技术人员可以理解，类别原型（Class Prototypes）是计算机视觉领域中的专有名词，表示对属于某个类别的一系列特征进行均值计算，用其均值表征整个类别的信息，后文所涉及的各个类别原型也是采用前述

类似方式计算。Those skilled in the art can understand that Class Prototypes is a proper term in the field of computer vision, which means that the mean value of a series of features belonging to a certain category is calculated, and the mean value is used to represent the information of the entire category. The prototypes of each category also use the aforementioned

Calculated in a similar way.

2、类间结构保持部分。2. Inter-class structure maintenance part.

训练良好的深度神经网络可将输入样本映射到一嵌入空间中，并依类别分布于嵌入空间中的不同区域。这是深度神经网络得以正确划分各个类别的重要特性。基于此特性，通过构建嵌入空间中的类间拓扑结构，并在持续学习的过程中维护此结构，以保持类别间线性可分。A well-trained deep neural network can map input samples into an embedding space and distribute them according to categories in different regions of the embedding space. This is an important property of deep neural networks to correctly classify each category. Based on this property, the inter-class topology is constructed in the embedding space, and this structure is maintained in the process of continuous learning to keep the classes linearly separable.

本发明实施例中，利用所述旧网络的分割结果以及解码获得的特征向量对于旧类别构建的类间关系矩阵

，利用所述新网络的分割结果以及解码获得的特征向量对于旧类别构建的类间关系矩阵

；其中，类间关系矩阵中的单个元素表示两个旧类别对应的类别原型之间的余弦距离；对于旧类别i与旧类别j，旧网络中对应的类别原型分别表示为

与

，新网络中对应的类别原型分别表示为

与

，则类间关系矩阵

与

中相应元素

与

的计算方式表示为：In the embodiment of the present invention, an inter-class relationship matrix is constructed for the old class by using the segmentation result of the old network and the feature vector obtained by decoding

, using the segmentation result of the new network and the eigenvectors obtained by decoding to construct the inter-class relationship matrix for the old class

; where a single element in the inter-class relationship matrix represents the cosine distance between the class prototypes corresponding to the two old classes; for the old class i and the old class j , the corresponding class prototypes in the old network are represented as

and

, the corresponding category prototypes in the new network are respectively expressed as

and

, then the inter-class relationship matrix

and

corresponding element in

and

is calculated as:

其中，

、

分别表示类别原型

与

余弦相似度、类别原型

与

的余弦相似度。in,

,

class prototype

and

Cosine similarity, class prototype

and

The cosine similarity of .

在持续学习的过程中，使用类间结构保持损失函数维护二者的一致性，表示为：In the process of continuous learning, the inter-class structure is used to maintain the consistency of the loss function, which is expressed as:

其中，||.||_F表示矩阵的F范数。where ||.|| _F represents the F-norm of the matrix.

3、类内结构保持部分。3. The structure of the class is maintained.

类内关系定义为每个特征向量与其类别原型相对关系的集合，利用所述旧网络的分割结果以及解码获得的特征向量对于旧类别构建的类内关系集合表示为

，利用所述新网络的分割结果以及解码获得的特征向量对于旧类别构建的类内关系集合表示为

，此处的D表示某一种距离度量函数，例如，欧式距离。类内关系集合反映了嵌入空间中细粒度的拓扑结构信息。在持续学习的过程中保持类内特征向量在嵌入空间中的拓扑结构保持不变，能有效维护单类别知识的完整性。建模类内结构时所选取的距离函数为欧式距离，以利用其敏感性反应反映类内结构的微小变化。The intra-class relationship is defined as the set of relative relationship between each feature vector and its class prototype, and the intra-class relationship set constructed for the old class using the segmentation result of the old network and the feature vector obtained by decoding is expressed as

, using the segmentation result of the new network and the feature vector obtained by decoding, the intra-class relationship set constructed for the old class is expressed as

, where D represents a certain distance metric function, such as Euclidean distance. The set of intra-class relations reflects fine-grained topology information in the embedding space. In the process of continuous learning, the topology structure of intra-class feature vectors in the embedding space remains unchanged, which can effectively maintain the integrity of single-class knowledge. The distance function selected when modeling intra-class structure is Euclidean distance, in order to use its sensitivity to reflect small changes in intra-class structure.

在持续学习的过程中，使用类内结构保持损失用于保持旧类别中类内结构（即类内关系集合

与

在位置信息与距离信息）的一致性，表示为：In the process of continuous learning, the intra-class structure preservation loss is used to preserve the intra-class structure in the old class (i.e. the set of intra-class relations)

and

In the consistency of position information and distance information), it is expressed as:

其中，

与

分别表示旧网络获得的属于旧类别i的特征向量与相应的类别原型；

与

表示新网络获得的属于旧类别i的特征向量与相应的类别原型；

表示旧类别集合（即已学习过的所有旧类别），

表示旧类别的数量。in,

and

respectively represent the feature vector belonging to the old category i obtained by the old network and the corresponding category prototype;

and

Represents the feature vector obtained by the new network belonging to the old category i and the corresponding category prototype;

represents the set of old categories (i.e. all old categories that have been learned),

Indicates the number of old categories.

类间结构保持部分与类内结构保持部分中所涉及的类别原型利用相应网络输出的分割结果与特征向量进行计算，计算公式可参见前文

的公式，区别主要在于，由于此部分是针对旧类别，因此，需要将分割结果带入

公式。The category prototypes involved in the inter-class structure retention part and the intra-class structure retention part are calculated using the segmentation results and feature vectors output by the corresponding network. The calculation formula can be found in the previous section.

The main difference is that since this part is for the old category, the segmentation result needs to be brought into

formula.

以类别原型

为例，其计算公式为：prototype by category

For example, the calculation formula is:

其中，

表示新网络输出的分割结果，

表示新网络输出的分割结果中预测类别为旧类别i的像素的数量。in,

represents the segmentation result of the new network output,

Represents the number of pixels whose predicted class is the old class i in the segmentation result output by the new network.

对于旧网络而言也是类似的，结合其分割结果带入上述式子计算相应的类别原型。The same is true for the old network, combining its segmentation results into the above formula to calculate the corresponding category prototype.

如图3所示，为类间与类内结构保持部分的原理示意图。当需要学习新类别时，类间拓扑结构维护仅保证类别间的相邻、相关关系在更新过程中保持不变，但是允许其在嵌入空间中进行旋转，平移等变化，因此在有效保持旧类别知识的同时，更有利于新类别的学习。当不对类内结构进行约束时，网络的更新常导致相同输入对于特征原型的相对关系发生较大变化（图3左下侧）。而类内结构保持损失则可以减少这种变化，进而在更细粒度上维护旧知识的完整性。As shown in Figure 3, it is a schematic diagram of the principle of maintaining the structure between classes and within classes. When a new category needs to be learned, the inter-class topology maintenance only ensures that the adjacent and related relationships between categories remain unchanged during the update process, but allows them to change in the embedding space such as rotation, translation, etc., so the old category is effectively maintained. At the same time, it is more conducive to the learning of new categories. When the intra-class structure is not constrained, the update of the network often results in a large change in the relative relationship of the same input to the feature prototype (lower left side of Figure 3). The intra-class structure preservation loss can reduce such changes, thereby maintaining the integrity of old knowledge at a finer granularity.

三、伪标签生成模块及相关损失函数。3. Pseudo-label generation module and related loss function.

本发明实施例中，利用逐类别动态阈值为旧网络的分割结果进行优化去噪，从而生成高质量的伪标签以弥补缺失的旧类别标注，该过程称为伪标签细化（Pseudo LabelRefinement）。原理如下：In the embodiment of the present invention, the segmentation result of the old network is optimized and denoised by the category-by-category dynamic threshold, so as to generate high-quality pseudo-labels to make up for the missing old category labels. This process is called pseudo-label refinement (Pseudo Label Refinement). The principle is as follows:

在持续学习过程中，旧类别的标签在当前学习阶段并不会给出，即已学习类别在给定的标签中被标记为背景类。因此当使用给定标签作为监督信号直接训练网络时，将会加剧已学习类别的遗忘效应。为此，使用旧网络的语义分割结果标记给定标签的背景类，从而为已学习类别提供伪标签。进一步地，在旧网络输出的分割结果中，难以避免的包含有错误的结果。针对这一问题，采用输出类别概率的熵作为置信度评价指标，并仅使用置信度较高的结果作为伪标签使用。由于网络对于不同类别学习的情况不一，本发明针对每个类别分别计算其输出的熵的分布情况，并依此选取阈值

使得相应旧类别i保留固定比例的伪标签，融合新增类别的真实标签（在前述阶段获取得到）后生成最终的监督标签，监督标签（伪标签）生成的方法表示为：In the process of continuous learning, the labels of the old classes are not given in the current learning stage, that is, the learned classes are labeled as background classes in the given labels. Therefore, when the network is directly trained with a given label as a supervision signal, the forgetting effect of the learned class will be exacerbated. To this end, the background classes for a given label are labeled with the semantic segmentation results of the old network, thus providing pseudo-labels for the learned classes. Further, the segmentation results output by the old network inevitably contain erroneous results. To solve this problem, the entropy of the output category probability is used as the confidence evaluation index, and only the results with higher confidence are used as pseudo labels. Since the network learns differently for different categories, the present invention calculates the distribution of the entropy output for each category, and selects the threshold accordingly

Make the corresponding old category i retain a fixed proportion of pseudo-labels, and fuse the real labels of the new categories (obtained in the previous stage) to generate the final supervised labels. The method of generating supervised labels (pseudo-labels) is expressed as:

其中，

表示当前学习阶段t输入图像

中像素k对应的新增类别的真实标签，

表示旧网络对像素k的分类置信度，

表示旧类别i对应的动态阈值，

表示旧类别集合，

表示旧网络

对输入图像

输出的分割结果，即每一像素的分类结果，

为最终生成的像素k的伪标签。in,

represents the current learning stage t input image

The ground truth label of the new category corresponding to the pixel k in the middle,

represents the classification confidence of the old network for pixel k ,

represents the dynamic threshold corresponding to the old category i ,

represents a collection of old categories,

Indicates the old network

for the input image

The output segmentation result, that is, the classification result of each pixel,

is the pseudo-label for the final generated pixel k .

之后，利用所述最终生成的伪标签计算所述新网络的分类损失，具体为交叉熵损失（Cross Entropy Loss），表示为：After that, the classification loss of the new network is calculated using the finally generated pseudo-label, specifically the cross entropy loss (Cross Entropy Loss), which is expressed as:

其中，

表示所述新网络对于输入图像

输出的分割结果。in,

represents the new network for the input image

The output segmentation result.

四、联合损失函数训练语义分割网络。Fourth, the joint loss function trains the semantic segmentation network.

本发明实施例中，联合前述一~三中的对齐损失、类间结构保持损失、类内结构保持损失、初始结构优化损失及分类损失训练所述新网络与新特征变换模块，最终达到在语义分割任务上实现持续学习的目的。训练的目标损失函数为以上损失函数的加权和：In the embodiment of the present invention, the new network and the new feature transformation module are trained in combination with the alignment loss, inter-class structure preservation loss, intra-class structure preservation loss, initial structure optimization loss, and classification loss in the above-mentioned one to three, and finally achieve a semantic The purpose of continuous learning is achieved on segmentation tasks. The target loss function for training is the weighted sum of the above loss functions:

其中，

及

分别为相应损失的权重。in,

and

are the weights of the corresponding losses, respectively.

本发明实施例提供一种图像语义分割网络持续学习方法，主要获得如下有益效果：The embodiment of the present invention provides a continuous learning method for image semantic segmentation network, which mainly obtains the following beneficial effects:

1）通过特征空间中的非线性变换提取旧知识表征进行对齐的手段，有效保持旧知识不变性的同时提高对新知识学习的能力。1) The method of extracting old knowledge representations and aligning them through nonlinear transformation in the feature space effectively maintains the invariance of old knowledge and improves the ability to learn new knowledge.

2）在嵌入空间中对新类别优化其拓扑结构，对旧类别维护其拓扑结构的不变性，达到降低遗忘，防止类间混淆的效果。2) Optimize the topology of the new category in the embedding space, and maintain the invariance of the topology of the old category, so as to reduce forgetting and prevent confusion between classes.

3）联合伪标签及伪标签降噪技术，使得在语义分割持续学习中无需提供旧类别的标签，降低标注成本。3) The combination of pseudo-label and pseudo-label noise reduction technology makes it unnecessary to provide labels of old categories in the continuous learning of semantic segmentation, reducing the cost of labeling.

总体来说，本发明作为一种通用性的语义分割持续学习方法，对应用场景均无限制，具有很强的泛化能力与实用价值。In general, as a universal continuous learning method for semantic segmentation, the present invention has no restrictions on application scenarios, and has strong generalization ability and practical value.

基于上述介绍，下面提供一个完整的实施流程，包括图像语义分割网络初始阶段学习、图像语义分割网络持续学习、以及图像语义分割网络测试。Based on the above introduction, a complete implementation process is provided below, including the initial stage learning of the image semantic segmentation network, the continuous learning of the image semantic segmentation network, and the testing of the image semantic segmentation network.

一、图像语义分割网络初始阶段学习。First, the initial stage of image semantic segmentation network learning.

1、准备初始语义分割数据集及对应的类别标签构成训练数据，通过随机裁剪的方式改变图像的空间分辨率，使得图像的宽与高均为512，并进行归一化处理。1. Prepare the initial semantic segmentation data set and the corresponding category labels to form the training data, change the spatial resolution of the image by random cropping, so that the width and height of the image are both 512, and normalize it.

2、使用深度学习框架建立基于类别结构保持与特征对齐的图像语义分割模型，包含全卷积语义分割网络、特征变换模块、类别结构信息保持模块和伪标签生成模块等。其中全卷积语义分割网络为DeeplabV3，其特征提取器可选择ResNet，MobileNet等。此处使用ResNet-101作为特征提取器。其解码器部分为ASPP模块。在特征提取器的输出部分设置特征变换模块对特征进行非线性变换及对齐操作。在语义分割网络的解码器部分设置类别结构信息保持模块。在语义分割网络的输出部分设置伪标签生成模块。2. Use the deep learning framework to establish an image semantic segmentation model based on category structure keeping and feature alignment, including fully convolutional semantic segmentation network, feature transformation module, category structure information preservation module and pseudo-label generation module, etc. Among them, the fully convolutional semantic segmentation network is DeeplabV3, and its feature extractor can choose ResNet, MobileNet, etc. ResNet-101 is used here as the feature extractor. Its decoder part is ASPP module. A feature transformation module is set in the output part of the feature extractor to perform nonlinear transformation and alignment operations on the features. The category structure information preservation module is set in the decoder part of the semantic segmentation network. A pseudo-label generation module is set in the output part of the semantic segmentation network.

3、在初始阶段学习过程中，每次随机从训练数据中选择一组数据输入网络，经由模型给出语义分割结果，使用交叉熵损失及初始结构优化损失训练网络。3. In the initial learning process, randomly select a set of data from the training data to input into the network, give the semantic segmentation results through the model, and use the cross-entropy loss and the initial structure optimization loss to train the network.

此部分所涉及的训练流程均为常规技术，故不再赘述；此外，上述流程所涉及的具体图像尺寸以及网络结构与类型均为举例，并非构成限制。The training procedures involved in this part are all conventional techniques, so they will not be described again; in addition, the specific image size, network structure and type involved in the above procedures are examples and do not constitute limitations.

二、图像语义分割网络持续学习。2. Continuous learning of image semantic segmentation network.

1、初始阶段训练完成后，准备新增语义分割数据集及对应新增类别的标签。通过随机裁剪的方式改变图像的空间分辨率，使得图像的宽与高均为512，并进行归一化处理。1. After the initial stage of training is completed, prepare to add new semantic segmentation datasets and labels corresponding to the new categories. The spatial resolution of the image is changed by random cropping, so that the width and height of the image are both 512, and normalized.

同样的，此处所涉及的具体图像尺寸仅为举例，并非构成限制。Likewise, the specific image size involved here is only an example, not a limitation.

本领域技术人员可以理解，新增语义分割数据集中包含了新增类别以及旧类别，当然，有可能少数图像中不包含旧类别，但对学习效果的影响较小。此外，对于新增类别会进行标注，旧类别无需进行标注。Those skilled in the art can understand that the newly added semantic segmentation dataset includes new categories and old categories. Of course, it is possible that a few images do not contain old categories, but the impact on the learning effect is small. In addition, new categories will be marked, and old categories do not need to be marked.

2、初步训练特征变换模块。每次迭代随机从训练数据中选择一组数据输入图像语义分割网络，得到特征提取器输出的特征图，使用损失函数

训练特征变换模块，使其在新增的数据上能完成特征变换操作。2. Preliminary training feature transformation module. Each iteration randomly selects a set of data from the training data to input the image semantic segmentation network, obtains the feature map output by the feature extractor, and uses the loss function

Train the feature transformation module so that it can complete the feature transformation operation on the newly added data.

3、使用图像语义分割网络及特征变换模块的权重初始化一相同的网络与特征变换模块（即新网络与新特征变换模块）用以学习新增类别，同时旧网络及其特征变换模块不再更新。每次迭代随机从训练数据中选择一组数据同时输入新、旧网络。二者特征提取器输出特征图分别经由新旧特征变换模块得到旧知识表征，计算对齐损失

。使用新、旧网络解码器输出对旧类别构建类间关系矩阵

、

及类内关系集合

、

。并计算类间结构保持损失

与类内结构保持损失

。同时对新类别计算初始结构优化损失

。最后，使用旧网络的输出经由伪标签生成模块生成完整的语义标签，在新网络的分割结果上计算交叉熵损失。3. Use the weights of the image semantic segmentation network and feature transformation module to initialize—the same network and feature transformation module (ie, new network and new feature transformation module) are used to learn new categories, and the old network and its feature transformation module are no longer updated. . Each iteration randomly selects a set of data from the training data and feeds both the new and old networks. The output feature maps of the two feature extractors respectively obtain the old knowledge representation through the new and old feature transformation modules, and calculate the alignment loss

. Build an inter-class relationship matrix for the old classes using the new and old network decoder outputs

,

and a collection of intra-class relationships

,

. and compute the inter-class structure preservation loss

maintain loss with intra-class structure

. Simultaneously compute the initial structural optimization loss for the new class

. Finally, complete semantic labels are generated via the pseudo-label generation module using the output of the old network, and the cross-entropy loss is calculated on the segmentation results of the new network.

4、根据上述步骤的损失函数，计算总损失函数L，通过反向传播算法以及梯度下降策略，使得损失函数最小化，更新语义分割网络及特征变换模块的参数权重。 4. Calculate the total loss function L according to the loss function in the above steps, minimize the loss function through the back-propagation algorithm and the gradient descent strategy, and update the parameter weights of the semantic segmentation network and the feature transformation module.

此阶段所涉及的反向传播算法以及梯度下降策略均可参照常规技术，故不做赘述。The back-propagation algorithm and gradient descent strategy involved in this stage can refer to conventional techniques, so they will not be described in detail.

当需要继续学习新增类别时，重复执行图像语义分割网络持续学习部分的步骤1~步骤4，直至所有感兴趣的类别全部学习完毕。When it is necessary to continue to learn new categories, repeat steps 1 to 4 of the continuous learning part of the image semantic segmentation network until all interested categories have been learned.

三、图像语义分割网络测试。3. Image Semantic Segmentation Network Test.

将测试数据集中的图像输入至经持续学习后的图像语义分割网络，依次通过其内部的特征提取器与解码器获得分割结果。通过设定指标可以对分割结果进行评估，以判断持续学习后的图像语义分割网络的语义分割性能。The images in the test dataset are input to the image semantic segmentation network after continuous learning, and the segmentation results are obtained through its internal feature extractor and decoder in turn. The segmentation results can be evaluated by setting indicators to judge the semantic segmentation performance of the image semantic segmentation network after continuous learning.

如图4所示，展示了不同图像语义分割网络分割结果的对比示意图；从左至右四列图像分别表示：输入的图像、现有方案的分割结果、本发明的分割结果、真实分割结果，从图4可以发现本发明的分割结果与真实分割结果接近，并远远优于现有方案的分割结果。As shown in Figure 4, a schematic diagram of the comparison of the segmentation results of different image semantic segmentation networks is shown; from left to right, the four columns of images represent: the input image, the segmentation result of the existing scheme, the segmentation result of the present invention, and the real segmentation result, It can be found from FIG. 4 that the segmentation result of the present invention is close to the real segmentation result, and far superior to the segmentation result of the existing scheme.

实施例二Embodiment 2

本发明还提供一种图像语义分割网络持续学习系统，其主要基于前述实施例一提供的方法实现，如图5所示，该系统主要包括：The present invention also provides an image semantic segmentation network continuous learning system, which is mainly implemented based on the method provided in the first embodiment. As shown in FIG. 5 , the system mainly includes:

学习单元，用于使用所述原始图像语义分割网络与初步训练的特征变换模块初始化一个相同的图像语义分割网络与特征变换模块，将所述原始图像语义分割网络称为旧网络，初步训练的特征变换模块称为旧特征变换模块，初始化产生的图像语义分割网络称为新网络，初始化产生的特征变换模块称为新特征变换模块；固定所述旧网络与旧特征变换模块，训练所述新网络与新特征变换模块；训练时，将新增语义分割数据集的图像数据同时输入至所述旧网络与新网络，所述旧网络与新网络中各自进行特征图提取、解码与语义分割，获得分割结果；其中，所述旧网络提取的特征图通过所述旧特征变换模块进行变换，所述新网络提取的特征图通过所述新特征变换模块进行变换，并计算两种变换结果的对齐损失；利用所述旧网络与新网络各自的分割结果及解码获得的特征向量，对于旧类别分别独自构建相应的类间关系矩阵与类内关系集合，并利用所述旧网络与新网络的类间关系矩阵计算类间结构保持损失，利用所述旧网络与新网络的类内关系集合计算类内结构保持损失，所述类间结构保持损失与类内结构保持损失用于保持旧类别中类间结构与类内结构的一致性；同时，对于新增类别，利用所述新网络解码获得的特征向量计算初始结构优化损失，所述初始结构优化损失用于拉近相同新增类别的特征向量的分布，疏远不同新增类别的特征向量的分布，并且，利用逐类别动态阈值对旧网络的分割结果进行优化去噪，获得相应的伪标签，利用所述伪标签计算所述新网络的分类损失；结合所述对齐损失、类间结构保持损失、类内结构保持损失、初始结构优化损失及分类损失训练所述新网络与新特征变换模块。The learning unit is used to initialize an identical image semantic segmentation network and feature transformation module using the original image semantic segmentation network and the initially trained feature transformation module, and the original image semantic segmentation network is called the old network, and the initially trained feature The transformation module is called the old feature transformation module, the image semantic segmentation network generated by initialization is called the new network, and the feature transformation module generated by initialization is called the new feature transformation module; the old network and the old feature transformation module are fixed, and the new network is trained. and the new feature transformation module; during training, input the image data of the newly added semantic segmentation data set into the old network and the new network at the same time, and perform feature map extraction, decoding and semantic segmentation in the old network and the new network respectively, and obtain Segmentation result; wherein, the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of the two transformation results is calculated. ; Use the respective segmentation results of the old network and the new network and the eigenvectors obtained by decoding, independently construct the corresponding inter-class relationship matrix and the intra-class relationship set for the old category, and use the old network and the new network. The relationship matrix calculates the inter-class structure preservation loss, and uses the intra-class relationship set of the old network and the new network to calculate the intra-class structure preservation loss. The inter-class structure preservation loss and the intra-class structure preservation loss are used to preserve the inter-class in the old category The consistency between the structure and the intra-class structure; at the same time, for the newly added category, the eigenvector obtained by the new network decoding is used to calculate the initial structural optimization loss, and the initial structural optimization loss is used to shorten the eigenvectors of the same newly added category. distribution, alienate the distribution of feature vectors of different new categories, and use the category-by-category dynamic threshold to optimize and denoise the segmentation results of the old network to obtain corresponding pseudo-labels, and use the pseudo-labels to calculate the classification loss of the new network ; train the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure preservation loss, the intra-class structure preservation loss, the initial structure optimization loss and the classification loss.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将系统的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used for illustration. In practical applications, the above-mentioned functions can be allocated to different functional modules as required. The internal structure of the system is divided into different functional modules to complete all or part of the functions described above.

需要说明的是，上述各单元所涉及的主要原理在之前的实施例一中已经做了详细的说明，故不再赘述。It should be noted that the main principles involved in the above-mentioned units have been described in detail in the previous Embodiment 1, so they are not repeated here.

实施例三Embodiment 3

本发明还提供一种处理设备，如图6所示，其主要包括：一个或多个处理器；存储器，用于存储一个或多个程序；其中，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现前述实施例提供的方法。The present invention also provides a processing device, as shown in FIG. 6 , which mainly includes: one or more processors; a memory for storing one or more programs; wherein, when the one or more programs are described When executed by one or more processors, the one or more processors are caused to implement the methods provided by the foregoing embodiments.

进一步的，所述处理设备还包括至少一个输入设备与至少一个输出设备；在所述处理设备中，处理器、存储器、输入设备、输出设备之间通过总线连接。Further, the processing device further includes at least one input device and at least one output device; in the processing device, the processor, the memory, the input device, and the output device are connected through a bus.

本发明实施例中，所述存储器、输入设备与输出设备的具体类型不做限定；例如：In this embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:

输入设备可以为触摸屏、图像采集设备、物理按键或者鼠标等；The input device can be a touch screen, an image capture device, a physical button or a mouse, etc.;

输出设备可以为显示终端；The output device can be a display terminal;

存储器可以为随机存取存储器（Random Access Memory，RAM），也可为非不稳定的存储器（non-volatile memory），例如磁盘存储器。The memory may be random access memory (Random Access Memory, RAM), or may be non-volatile memory (non-volatile memory), such as disk memory.

实施例四Embodiment 4

本发明还提供一种可读存储介质，存储有计算机程序，当计算机程序被处理器执行时实现前述实施例提供的方法。The present invention also provides a readable storage medium storing a computer program, and when the computer program is executed by a processor, the methods provided by the foregoing embodiments are implemented.

本发明实施例中可读存储介质作为计算机可读存储介质，可以设置于前述处理设备中，例如，作为处理设备中的存储器。此外，所述可读存储介质也可以是U盘、移动硬盘、只读存储器（Read-Only Memory，ROM）、磁碟或者光盘等各种可以存储程序代码的介质。In this embodiment of the present invention, the readable storage medium, as a computer-readable storage medium, may be provided in the aforementioned processing device, for example, as a memory in the processing device. In addition, the readable storage medium may also be a U disk, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk, and other mediums that can store program codes.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求书的保护范围为准。The above description is only a preferred embodiment of the present invention, but the protection scope of the present invention is not limited to this. Substitutions should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. an image semantic segmentation network continuous learning method, is characterized in that, comprises:

Obtaining the newly added semantic segmentation data set and the label corresponding to the newly added category, using the original image semantic segmentation network to extract the original feature map of the image data in the newly added semantic segmentation data set, and transforming the original feature map through a feature transformation module, and utilize the difference between the reconstructed feature map of the transformation result and the original feature map to preliminarily train the feature transformation module;

Use the original image semantic segmentation network and the initially trained feature transformation module to initialize an identical image semantic segmentation network and feature transformation module, the original image semantic segmentation network is called the old network, and the initially trained feature transformation module is called the old network. Feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; the old network and the old feature transformation module are fixed, and the new network and the new feature transformation module are trained. ;

During training, the image data of the newly added semantic segmentation dataset is input into the old network and the new network at the same time, and feature map extraction, decoding and semantic segmentation are performed in the old network and the new network respectively to obtain segmentation results; The feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of the two transformation results is calculated; using the old network With the segmentation result of the new network and the eigenvector obtained by decoding, the corresponding inter-class relationship matrix and intra-class relationship set are independently constructed for the old class, and the inter-class structure retention loss is calculated by using the inter-class relationship matrix of the old network and the new network. , the intra-class structure preservation loss is calculated by using the intra-class relationship set of the old network and the new network, and the inter-class structure preservation loss and the intra-class structure preservation loss are used to maintain the consistency between the inter-class structure and the intra-class structure in the old class At the same time, for the newly added category, the initial structure optimization loss is calculated by using the feature vector obtained by the new network decoding, and the initial structure optimization loss is used to narrow the distribution of the feature vectors of the same newly added category, and alienate the different newly added categories. The distribution of feature vectors, and the segmentation results of the old network are optimized by the category-by-category dynamic threshold to obtain the corresponding pseudo-labels, and the pseudo-labels are used to calculate the classification loss of the new network; combined with the alignment loss, the inter-class structure The preservation loss, the intra-class structure preservation loss, the initial structure optimization loss, and the classification loss train the new network and the new feature transformation module.

2. The method for continuous learning of an image semantic segmentation network according to claim 1, wherein the original image semantic segmentation network is used to extract the original feature map of the image data in the newly added semantic segmentation data set, and the feature transforms The module transforms the original feature map, and preliminarily trains the feature transformation module using the difference between the reconstructed feature map and the original feature map from the transformation result, including:

The feature transformation module is initially trained using the autoencoder structure, and the original feature map is denoted as

, denote the feature transformation module as P ^* ; the original feature map is transformed by the feature transformation module P ^*

The transformation includes: first reducing the channel dimension through convolution operations, and then mixing local spatial information through several hole convolution operations to generate the original feature map.

characterization;

Use the reconstruction network R ^* to transform the result

Perform reconstruction, reconstructed feature map

with the original feature map

The difference is the Euclidean distance between the two, expressed as:

The feature transformation module P ^* is initially trained with a reconstruction loss.

3 . The continuous learning method for image semantic segmentation network according to claim 1 , wherein the feature map extracted by the old network is transformed by the old feature transformation module, and the feature map extracted by the new network is transformed by 3 . The new feature transformation module performs transformation, and calculates the alignment loss of the two transformation results including:

The feature map extracted by the old network is the original feature map, denoted as

, denote the old feature transformation module as

, the transformation result is expressed as

; mark the feature map extracted by the new network as

, denote the new feature transformation module as

, the transformation result is expressed as

, the alignment loss is the L1 distance of the two transformation results, expressed as:

.

4. A kind of image semantic segmentation network continuous learning method according to claim 1, it is characterized in that, described utilizing the inter-class relationship matrix of described old network and new network to calculate the loss of inter-class structure retention is expressed as:

in,

Represents the inter-class relationship matrix constructed for the old class using the segmentation result of the old network and the feature vector obtained by decoding;

Represents the inter-class relationship matrix constructed by using the segmentation result of the new network and the feature vector obtained by decoding for the old class; ||.|| _F represents the F norm of the matrix;

A single element in the inter-class relationship matrix represents the cosine distance between the class prototypes corresponding to the two old classes; for the old class i and the old class j, the corresponding class prototypes in the old network are represented as

and

, then the inter-class relationship matrix

and

corresponding element in

and

is calculated as:

Among them, the category prototype is the average of all feature vectors under the corresponding category,

,

class prototype

and

Cosine similarity, class prototype

and

The cosine similarity of .

5. A kind of image semantic segmentation network continuous learning method according to claim 1, is characterized in that, described using the intra-class relationship set of the old network and the new network to calculate the intra-class structure retention loss is expressed as:

Among them, the intra-class relationship set constructed for the old class using the segmentation result of the old network and the feature vector obtained by decoding is expressed as

,

and

respectively represent the feature vector and the corresponding class prototype that belong to the old category i obtained by decoding the old network; use the segmentation result of the new network and the feature vector obtained by decoding to represent the intra-class relationship set constructed for the old category

,

and

respectively represent the feature vector of the old category i obtained by the new network decoding and the corresponding category prototype; the category prototype is the average of all feature vectors under the corresponding category, D represents the distance metric function,

represents a collection of old categories,

Indicates the number of old categories.

6. A kind of image semantic segmentation network continuous learning method according to claim 1, is characterized in that, described for the newly added category, utilizes the feature vector that described new network decoding obtains to calculate the initial structure optimization loss and expresses as:

in,

represents the loss of bootstrap intra-class structure,

represents the loss of bootstrap inter-class structure,

for

the weight of;

Loss of bootstrap intra-class structure

in,

represents the set of newly added categories in the current learning stage t,

Represents the number of new categories in the current learning stage t, and the current learning stage t represents the stage of training the new network and the new feature transformation module;

Indicates the category prototype corresponding to the newly added category c,

Represents the feature vector belonging to the newly added category c;

loss of bootstrap inter-class structure

The distribution of feature vectors used to alienate different newly added categories, expressed as:

in,

and

represent the category prototypes corresponding to the newly added category m and the newly added category n, respectively,

Represents a class prototype

and

The cosine similarity of ,

is a predefined distance;

The category prototype is the average of all feature vectors under the corresponding category. For the new category c, the category prototype

Expressed as:

Among them, y is the label of the new category in the current stage, |y=c| indicates the number of pixels belonging to the new category c in the label,

For the indicator function, when y=c, the output is 1, and the output is 0 in other cases.

7. A method for continuous learning of image semantic segmentation network according to claim 1, characterized in that, the segmentation result of the old network is optimized by using a category-by-category dynamic threshold to obtain a corresponding pseudo-label, and the pseudo-label is used Computing the classification loss for the new network consists of:

Use the category-by-category dynamic threshold to optimize the segmentation results of the old network, and fuse the acquired labels of the new categories to obtain the corresponding pseudo-labels, which are expressed as:

in,

Represents the input image obtained at the current learning stage t

The label of the newly added category corresponding to the pixel k, and the current learning stage t represents the stage of training the new network and the new feature transformation module;

represents the classification confidence of the old network for pixel k,

represents the dynamic threshold corresponding to the old category i,

represents a collection of old categories,

Indicates the old network

for the input image

is the pseudo-label of the generated pixel k;

The classification loss of the new network is calculated using pseudo-labels, expressed as:

in,

represents the new network for the input image

The output segmentation result.

8. An image semantic segmentation network continuous learning system, characterized in that, based on the method implementation of any one of claims 1 to 7, the system comprises:

The data collection and preliminary training unit is used to obtain the newly added semantic segmentation data set and the labels corresponding to the newly added categories, use the original image semantic segmentation network to extract the original feature map of the image data in the newly added semantic segmentation data set, and pass the feature transformation module. Transforming the original feature map, and preliminarily training the feature transformation module by using the difference between the reconstructed feature map and the original feature map from the transformation result;

The learning unit is used to initialize an identical image semantic segmentation network and feature transformation module using the original image semantic segmentation network and the initially trained feature transformation module, and the original image semantic segmentation network is called the old network, and the initially trained feature The transformation module is called the old feature transformation module, the image semantic segmentation network generated by initialization is called the new network, and the feature transformation module generated by initialization is called the new feature transformation module; the old network and the old feature transformation module are fixed, and the new network is trained. and the new feature transformation module; during training, input the image data of the newly added semantic segmentation data set into the old network and the new network at the same time, and perform feature map extraction, decoding and semantic segmentation in the old network and the new network respectively, and obtain Segmentation result; wherein, the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of the two transformation results is calculated. ; Use the segmentation results of the old network and the new network and the eigenvectors obtained by decoding to independently construct the corresponding inter-class relationship matrix and the intra-class relationship set for the old category, and use the old network and the new network. The relationship matrix between classes The inter-class structure preservation loss is calculated, and the intra-class structure preservation loss is calculated by using the intra-class relationship set of the old network and the new network. The inter-class structure preservation loss and the intra-class structure preservation loss are used to preserve the inter-class structure and Consistency of intra-class structure; at the same time, for newly added categories, the initial structure optimization loss is calculated by using the feature vector obtained by decoding the new network, and the initial structure optimization loss is used to narrow the distribution of feature vectors of the same newly added category, Alienate the distribution of feature vectors of different new categories, and optimize the segmentation results of the old network by using a category-by-category dynamic threshold to obtain corresponding pseudo-labels, and use the pseudo-labels to calculate the classification loss of the new network; Alignment loss, inter-class structure preservation loss, intra-class structure preservation loss, initial structure optimization loss, and classification loss train the new network and new feature transformation module.

9. A processing device, comprising: one or more processors; a memory for storing one or more programs;

Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method according to any one of claims 1-7.

10. A readable storage medium storing a computer program, wherein the method according to any one of claims 1 to 7 is implemented when the computer program is executed by a processor.