CN114332466B - Continuous learning method, system, equipment and storage medium for image semantic segmentation network - Google Patents
Continuous learning method, system, equipment and storage medium for image semantic segmentation network Download PDFInfo
- Publication number
- CN114332466B CN114332466B CN202210237914.4A CN202210237914A CN114332466B CN 114332466 B CN114332466 B CN 114332466B CN 202210237914 A CN202210237914 A CN 202210237914A CN 114332466 B CN114332466 B CN 114332466B
- Authority
- CN
- China
- Prior art keywords
- network
- class
- old
- new
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 181
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000003860 storage Methods 0.000 title claims abstract description 12
- 230000009466 transformation Effects 0.000 claims abstract description 151
- 238000012549 training Methods 0.000 claims description 54
- 230000014759 maintenance of location Effects 0.000 claims description 39
- 239000013598 vector Substances 0.000 claims description 35
- 238000005457 optimization Methods 0.000 claims description 34
- 238000009826 distribution Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 25
- 238000010586 diagram Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 230000001131 transforming effect Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 238000002156 mixing Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000002372 labelling Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 22
- 238000012423 maintenance Methods 0.000 description 8
- 238000003709 image segmentation Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 238000013140 knowledge distillation Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- -1 carrier Substances 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 206010027175 memory impairment Diseases 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a method, a system, equipment and a storage medium for continuously learning an image semantic segmentation network, on one hand, the invariance of old knowledge is effectively kept and the ability of learning new knowledge is improved by means of extracting old knowledge representation for alignment through nonlinear transformation in a feature space. On the other hand, the topological structure of the new class is optimized in the embedding space, and the invariance of the topological structure of the old class is maintained, so that the effects of reducing forgetting and preventing confusion among the classes are achieved; in addition, the pseudo label and the pseudo label denoising technology are combined, so that old labels do not need to be provided in the continuous learning of semantic segmentation, and the labeling cost is reduced. Generally, the method is used as a universal semantic segmentation continuous learning method, has no limitation on application scenes, and has strong generalization capability and practical value.
Description
Technical Field
The invention relates to the technical field of image semantic segmentation, in particular to a continuous learning method, a system, equipment and a storage medium for an image semantic segmentation network.
Background
In recent years, deep neural networks have enjoyed great success in the task of semantic segmentation. However, the traditional semantic segmentation network training method needs to acquire all training data at one time, and is difficult to update after training is completed. In practical application, the network is often required to gradually learn and update learned knowledge from data streams, so that data storage cost and training cost are effectively reduced. But having a deep neural network to learn directly on new data can lead to severe forgetfulness of the learned knowledge. The continuous learning technology exerts additional constraint on the learning process so as to achieve the purpose of learning new knowledge without forgetting learned knowledge.
The general approach to continuous learning is to use knowledge distillation to maintain consistency of knowledge between old and new networks. This operation is often performed in the output space or feature space. In particular to the field of semantic segmentation, in addition to preventing the forgetting of old knowledge using the above measures, there are two new challenges: first, as learning progresses, it may be necessary to learn classes that have been ignored in the past, resulting in semantic information that is not constant for a particular input, which requires the network to have more new knowledge learning capabilities. Secondly, because a large amount of manpower and material resources are needed for obtaining the labeling data, only the category needing to be learned is expected to be labeled on the newly added data, which causes that the region labeled as the background category may contain the learned category, and the introduced semantic inconsistency brings great challenges to network training. Therefore, the continuous learning method in the image classification field cannot be competent for the semantic segmentation continuous learning task.
Specifically, the method comprises the following steps: in the chinese patent application CN111191709A, the continuous learning framework and continuous learning method of deep neural network, a generation network is used to generate data of old category, and the data is mixed with new data to train the network, but it only solves the task of image classification. In addition, the method depends on the generation quality of the generator seriously, and is difficult to be competent for large-scale and complex data, particularly for the task of semantic segmentation of images. In chinese patent application CN111368874A, an incremental learning method for image classification based on single classification technology, two means of knowledge distillation and preference correction in output space are used to realize continuous learning of image classification task. However, it still fails to address the challenges specific to the aforementioned semantic segmentation continuous learning task and thus cannot be directly applied in image semantic segmentation. Chinese patent application No. CN103366163A, "face detection system and method based on incremental learning", chinese patent application No. CN106897705A, "ocean observation big data distribution method based on incremental learning", and chinese patent application No. CN103593680A, "dynamic gesture recognition method based on hidden markov model self-incremental learning" are all methods dedicated to a specific field, and cannot prove that they have generalization and universality.
Therefore, the method for solving the front-back semantic inconsistency in the semantic segmentation continuous learning task while reducing the forgetting of old knowledge as much as possible is widely designed for the semantic segmentation continuous learning task, and has important practical value and practical significance.
Disclosure of Invention
The invention aims to provide a method, a system, equipment and a storage medium for continuous learning of an image semantic segmentation network, which have no limitation on application scenes, have strong generalization capability and practical value and fill up the blank of a semantic segmentation continuous learning task.
The purpose of the invention is realized by the following technical scheme:
a continuous learning method of an image semantic segmentation network comprises the following steps:
acquiring a newly added semantic segmentation data set and labels corresponding to newly added categories, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
initializing the same image semantic segmentation network and feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, wherein the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module;
during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the newly added classes, calculating initial structure optimization loss by using the eigenvectors obtained by decoding the new network, wherein the initial structure optimization loss is used for enhancing the distribution of the eigenvectors of the same newly added class, distancing the distribution of the eigenvectors of different newly added classes, optimizing and denoising the segmentation result of the old network by using class-by-class dynamic thresholds to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; and training the new network and a new feature transformation module by combining the alignment loss, the inter-class structure maintenance loss, the intra-class structure maintenance loss, the initial structure optimization loss and the classification loss.
An image semantic segmentation network continuous learning system, comprising:
the data collection and preliminary training unit is used for acquiring a newly-added semantic segmentation data set and labels corresponding to newly-added categories, extracting an original feature map of image data in the newly-added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
the learning unit is used for initializing the same image semantic segmentation network and feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module; during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and the feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the new classes, calculating initial structure optimization loss by using the eigenvector obtained by decoding the new network, wherein the initial structure optimization loss is used for approximating the distribution of the eigenvector of the same new class and distancing the distribution of the eigenvector of different new classes, optimizing and denoising the segmentation result of the old network by using class-by-class dynamic thresholds to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
A processing device, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned method.
A readable storage medium, storing a computer program which, when executed by a processor, implements the aforementioned method.
According to the technical scheme provided by the invention, on one hand, the old knowledge representation is extracted through nonlinear transformation in the feature space for alignment, so that the invariance of the old knowledge is effectively kept, and the learning capacity of new knowledge is improved. On the other hand, the topological structure of the new class is optimized in the embedding space, the invariance of the topological structure of the old class is maintained, and the effects of reducing forgetting and preventing confusion among the classes are achieved; in addition, the pseudo label and the pseudo label noise reduction technology are combined, so that the labels of old categories do not need to be provided in the continuous learning of semantic segmentation, and the labeling cost is reduced. Generally, the method is used as a universal semantic segmentation continuous learning method, has no limitation on application scenes, and has strong generalization capability and practical value.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic model diagram of a continuous learning method of an image semantic segmentation network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a part of the principle of initial structure optimization provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a portion of the inter-class and intra-class structure maintenance provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a comparison of segmentation results of different image semantic segmentation networks according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a continuous learning system of an image semantic segmentation network according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The terms that may be used herein are first described as follows:
the terms "comprising," "including," "containing," "having," or other similar terms of meaning should be construed as non-exclusive inclusions. For example: including a feature (e.g., material, component, ingredient, carrier, formulation, material, dimension, part, component, mechanism, device, step, process, method, reaction condition, processing condition, parameter, algorithm, signal, data, product, or article, etc.) that is not specifically recited, should be interpreted to include not only the specifically recited feature but also other features not specifically recited and known in the art.
The following describes a method, a system, a device and a storage medium for continuous learning of an image semantic segmentation network provided by the present invention in detail. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to a person skilled in the art. The examples of the present invention, in which specific conditions are not specified, were carried out according to the conventional conditions in the art or conditions suggested by the manufacturer.
Example one
The embodiment of the invention provides a continuous learning method of an image semantic segmentation network, which is a semantic segmentation continuous learning method based on class structure keeping and feature alignment. The current mainstream image semantic segmentation network consists of a feature extractor, a decoder and a classifier, and the main process flow is as follows: the feature extraction device is used for extracting the feature map of the input image to be segmented, the decoder is used for obtaining the corresponding feature vector, and finally the classifier is used for carrying out semantic segmentation to obtain the classification result (namely the segmentation result) of each pixel. The invention designs a corresponding module aiming at the image semantic segmentation network so as to prevent old knowledge from being forgotten. Specifically, the core content of the method comprises a feature transformation module, a category structure information keeping module, a pseudo label generation module and a joint loss function training semantic segmentation network. The feature transformation module extracts the representation of the old knowledge to align by implementing nonlinear transformation on the feature graph output by the feature extractor, so that the integrity of the old knowledge is effectively maintained while high freedom is provided for learning of new knowledge. The class structure information retaining module uses the decoder output to establish an intra-class topological structure and an inter-class topological structure, and effectively reduces the phenomenon of class topological structure damage in the continuous learning process by maintaining the structural consistency in the learning process, thereby reducing forgetting and inter-class confusion. Further, aiming at the problem of inconsistent semantics, the pseudo label generation module utilizes a class-by-class dynamic threshold value to optimally denoise the segmentation result output by the old network, so that a high-quality pseudo label is generated to make up for the missing old class label. And finally, training the semantic segmentation network by combining the loss functions of the modules so as to achieve the effect of continuous learning.
The process of each continuous learning can be described as:
acquiring a newly added semantic segmentation data set and a label corresponding to a newly added category, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and primarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, wherein the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module;
during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relationship matrixes and intra-class relationship sets for old classes by utilizing the respective segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relationship matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relationship sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the newly added classes, calculating initial structure optimization loss by using the eigenvectors obtained by decoding the new network, wherein the initial structure optimization loss is used for promoting the distribution of the eigenvectors of the same newly added class, distancing the distribution of the eigenvectors of different newly added classes, performing optimization denoising for the segmentation result of the old network by using class-by-class dynamic threshold values to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; and training the new network and a new feature transformation module by combining the alignment loss, the inter-class structure maintenance loss, the intra-class structure maintenance loss, the initial structure optimization loss and the classification loss.
For ease of understanding, the learning process described above is further described below.
As shown in fig. 1, a model schematic diagram of an image semantic segmentation network continuous learning method provided by the present invention shows a related flow and a loss function involved in a continuous learning process, which is mainly described as follows:
firstly, a feature transformation module and a related loss function.
In the embodiment of the invention, before the newly added semantic segmentation data set is used for learning the image semantic segmentation network, the characteristic transformation module part needs to be preliminarily trained.
As mentioned above, the image semantic segmentation network comprises a feature extractor, and the feature extractor is used for extracting an original feature map of each image data in the newly added semantic segmentation data set and marking the original feature map as a new semantic segmentation data set. Then, a Feature transformation module (Feature project) is used for carrying out nonlinear transformation to generate a representation about old knowledge, and the representation output by the Feature transformation module is guided to contain rich and effective information by a training Feature transformation module.
In the embodiment of the invention, the feature transformation module is initially trained by using the structure of the self-encoder, and is recorded asP * (ii) a By means of feature transformation modulesP * For original feature mapThe transforming comprises: channel dimensionality reduction is performed through convolution operation (for example, convolution with 1 x 1) firstly, and then local space information mixing is performed through convolution operation with a plurality of holes (for example, convolution with two holes with 3 x 3), so that the original characteristic diagram is generatedAnd (3) characterization. During initial training, a reconstructed network is usedR * (e.g., two 3-by-3 convolutions may be used) to transform the resultsReconstructing, namely reconstructing an original characteristic diagram from an output transformation result of the characteristic transformation module by trying to use the difference between the original characteristic diagram and the reconstructed characteristic diagram to construct a reconstruction loss function, and preliminarily training the characteristic transformation module by using reconstruction lossP * To guide the feature transformation moduleP * The output representation contains abundant information.
It will be understood by those skilled in the art that the convolution operation and the hole convolution operation are both conventional convolution operations, and compared to the standard convolution operation, the hole convolution operation can mix spatial information in a larger range with a smaller number of layers.
Specifically, the reconstruction loss function may be a reconstructed feature mapAnd the original feature mapExpressed as:
feature transformation moduleP * When the representation of the old knowledge can be effectively generated, the initial training is completed, and the characteristic transformation module at the moment is recorded as。
The original image segmentation network and feature transformation module can then be usedInitializing a new network for learning new knowledge and corresponding feature transformation moduleKeeping the original image segmentation network (Old network) and the feature transformation module (feature transformation module) thereof in the process of learning new knowledge) Updating only the image segmentation network (New network) generated by initialization and the feature transformation module thereof without changing(i.e., initializing the generated feature transformation module). The learning phase being incrementalThe initial stage of the key concept of learning is marked as 1, and a new continuous learning stage is obtained when a category set is added every time. In FIG. 1, subscriptst-1、tRepresenting different stages of learning, in relative terms,t1 the network of the learning phase is the old network,tthe network in the learning phase is a new network,Erepresenting the encoder (feature extractor),Don behalf of the decoder,Grepresenting a classifier.
In the embodiment of the invention, consistency constraint is applied to the output of the two network feature transformation modules, so that the old knowledge is kept unchanged in the continuous learning process, and meanwhile, the new knowledge is well learned by giving higher change freedom degree of the feature diagram.
In the embodiment of the invention, the original image segmentation network is not updated in the previous stage, so the extracted feature map is the original feature map and is still marked as the original feature mapOld feature transformation moduleIs expressed as(ii) a Recording the extracted characteristic graph of the new network asRecording the new feature transformation module asThe result of the transformation is represented asAlignment Loss (Alignment Loss) is the L1 distance of the two transform results, expressed as:
and secondly, a category structure information holding module and a related loss function.
In the embodiment of the invention, the category structure information holding module respectively constructs the intra-category structure relationship and the inter-category structure relationship in the embedding space based on the decoder output of the image segmentation network. By keeping the two relations in the process of continuous learning, the discrimination of the network to the old category is effectively kept. The category structure information holding module mainly includes three parts: the method comprises three parts, namely an initial structure optimization part, an inter-class structure maintaining part and an intra-class structure maintaining part. Wherein, the initial structure optimization part mainly aims at the newly added category and calculates the initial structure optimization Loss which belongs to contrast Loss (contrast Loss); the last two parts mainly aim at the old category, calculate the Structure Preserving Loss (Structure Preserving Loss), include the Structure Preserving Loss among the classs and Structure Preserving Loss within the classs; the old category and the new category are relative concepts, namely categories which can be identified by the image segmentation network before continuous learning. The principle and related loss function of the above three parts are mainly as follows:
1. and (5) an initial structure optimization part.
As shown in fig. 2, the principle of the optimization part is schematically illustrated for the initial structure. When only cross entropy training is used, the distribution of different classes (such as class a and class B on the left side in the figure) in the embedding space is often dispersed and is easy to partially overlap, and the distribution is easy to cause class confusion in the subsequent learning process, so that forgetting is generated. By guiding the feature vector (triangle in the figure) to be as close as possible to the corresponding feature prototype (X-shaped in the figure) and simultaneously enabling the distance between different types of prototypes to be not less than a given threshold value (gray circle on the right side of the figure), the optimized type distribution is achieved and the confusion effect is reduced.
The initial structure optimization module guides the distribution of the new type in the embedding space when learning the new type, so that the distribution of the feature vectors of the same type is as compact as possible, and the distribution of the feature vectors of different types is as discrete as possible. The classification boundary of the model can be clearer, confusion is reduced, and meanwhile, the forgetting phenomenon is stronger in robustness. Initially, since the initial fabric optimization module is only for the new added category, the initial fabric optimization loss is calculated using only the output of the new network.
In order to achieve the above object, two loss functions are used in the loss function (initial structure optimization loss) of the initial structure optimization part to guide the learning of the intra-class structure and the inter-class structure, respectively, and are expressed as:
wherein,indicating the loss of structure within the boot class,indicating the loss of structure between the boot classes,is composed ofThe specific size of the weight (can be set according to actual conditions or experience).
Loss of leading intra-class structureThe distribution of feature vectors used to pull up the same new added category is represented as:
wherein,indicating the current learning phasetA set of the newly added categories is added,indicating the current learning phasetNumber of newly added categories, the current learning phasetRepresenting a stage of training the new network and the new feature transformation module;indicating new classescThe corresponding prototype of the category is represented by,indicating belonging to a new categorycThe feature vector of (2).
Loss of boot inter-class structureThe distribution used to keep away different new categories is expressed as:
wherein,andrespectively represent newly added categoriesmAnd new categorynThe corresponding prototype of the category is represented by,representation category prototypesAnd withThe cosine of the similarity of (a) is,is a predefined distance (the specific size can be according to the actual condition orA person experience setting).
In the embodiment of the invention, the category prototype is the average of all the feature vectors under the corresponding category, and for the newly added categorycCategory prototypesExpressed as:
wherein,ythe labels of the newly added category in the current stage,|y=c|indicating the new category in the labelcThe number of pixels of (a) is,to indicate a function wheny=cWhen the output is 1, the other is 0.
As will be understood by those skilled in the art, Class Prototypes (Class protocols) are proper terms used in the field of computer vision, and refer to the information for averaging a series of features belonging to a certain Class and characterizing the whole Class by their average values, and the Class Prototypes mentioned later are also used in the descriptionCalculated in a similar manner.
2. The inter-class structure remains part.
A well-trained deep neural network can map input samples into an embedding space and distribute the samples to different regions in the embedding space according to categories. This is an important characteristic of the deep neural network to correctly classify each class. Based on the characteristics, the inter-class topological structure embedded in the space is constructed, and the structure is maintained in the continuous learning process so as to keep linear separability among classes.
In the embodiment of the invention, the inter-class relation matrix constructed by the old class by utilizing the segmentation result of the old network and the feature vector obtained by decodingAnd constructing an inter-class relation matrix for the old class by using the segmentation result of the new network and the feature vector obtained by decoding(ii) a Wherein, a single element in the inter-class relationship matrix represents the cosine distance between the class prototypes corresponding to the two old classes; for old classesiWith old categoryjThe corresponding class prototypes in the old network are respectively expressed asAnd withThe corresponding class prototypes in the new network are respectively expressed asAndthen the relationship matrix between classesAnd withMiddle relative elementsAnd withIs expressed as:
wherein,、respectively representing class prototypesAnd withCosine similarity, class prototypesAnd withCosine similarity of (c).
During the continuous learning process, the consistency of the two is maintained by using the structure keeping loss function between the classes, which is expressed as:
wherein, I F Representing the F-norm of the matrix.
3. An intra-class structure retention portion.
The in-class relation is defined as a set of relative relation between each feature vector and a class prototype thereof, and the in-class relation set constructed by the old class by utilizing the segmentation result of the old network and the feature vector obtained by decoding is represented asAnd expressing the intra-class relation set constructed for the old class by using the segmentation result of the new network and the feature vector obtained by decoding asHerein, theDRepresenting some distance metric function, e.g., euclidean distance. The set of intra-class relationships reflects the fine grains in the embedding spaceTopological structure information of degree. The topological structure of the characteristic vector in the class in the embedding space is kept unchanged in the continuous learning process, and the integrity of single-class knowledge can be effectively maintained. The distance function selected when modeling the structure in the class is the Euclidean distance, so that the sensitivity reaction of the Euclidean distance is utilized to reflect the tiny change of the structure in the class.
In the process of continuous learning, the retention loss of the intra-class structure is used for retaining the intra-class structure (namely the intra-class relationship set) in the old classAndin the correspondence between the position information and the distance information), expressed as:
wherein,andrespectively representing old network acquired belonging to old categoryiThe feature vector of (a) and the corresponding class prototype;and withIndicating old category to which new network was acquirediThe feature vector of (a) and the corresponding class prototype;representing the set of old classes (i.e. all old classes that have been learned),indicating the number of old categories.
The class prototypes involved in the inter-class structure holding part and the intra-class structure holding part are calculated by using the segmentation results and feature vectors output by the corresponding networks, and the calculation formula can be referred to aboveThe difference is mainly that since this part is for the old class, the segmentation result needs to be brought inAnd (4) a formula.
wherein,the segmentation result output by the new network is represented,representing the prediction class in the segmentation result output by the new network as the old classiThe number of pixels of (c).
The same is true for the old network, and the corresponding class prototypes are calculated by substituting the segmentation results into the above formula.
Fig. 3 is a schematic diagram of the inter-class and intra-class structure holding parts. When a new category needs to be learned, the inter-category topology structure maintenance only ensures that the adjacent and related relations among the categories are kept unchanged in the updating process, but allows the adjacent and related relations to change in the embedding space, such as rotation, translation and the like, so that the learning of the new category is facilitated while the old category knowledge is effectively maintained. When the intra-class structure is unconstrained, the updating of the network often results in large changes in the relative relationship of the same input to the prototype of the feature (lower left in FIG. 3). While intra-class structure retention loss can reduce such changes, thereby maintaining the integrity of the old knowledge at a finer granularity.
And thirdly, a pseudo label generating module and a related loss function.
In the embodiment of the invention, the segmentation result of the old network is subjected to optimization denoising by utilizing the class-by-class dynamic threshold value, so that a high-quality Pseudo Label is generated to make up for the missing old class Label, and the process is called Pseudo Label Refinement (Pseudo Label reference). The principle is as follows:
in the continuous learning process, the labels of the old classes are not given in the current learning phase, i.e. the learned classes are marked as background classes in the given labels. The forgetting effect of the learned class will be exacerbated when the network is trained directly using a given label as a supervisory signal. To this end, the semantic segmentation results of the old network are used to label the background class of a given label, thereby providing a pseudo label for the learned class. Further, among the segmentation results output from the old network, it is difficult to avoid including erroneous results. For the problem, the entropy of the output class probability is used as a confidence evaluation index, and only the result with higher confidence is used as a pseudo label. Because the learning conditions of the network for different categories are different, the invention respectively calculates the distribution condition of the output entropy of each category and selects the threshold value according to the distribution conditionMake corresponding old categoryiThe pseudo labels with a fixed proportion are reserved, the final supervision labels are generated after the newly added real labels (obtained in the previous stage) are fused, and the method for generating the supervision labels (the pseudo labels) is represented as follows:
wherein,indicating the current learning phasetInputting an imageMiddle pixelkThe true label of the corresponding new added category,representing old network pairs of pixelskThe confidence of the classification of (2) is,representing old classesiThe corresponding dynamic threshold value is set to a value,a set of old categories is represented that is,representing old networksFor input imageThe output segmentation result, i.e. the classification result for each pixel,for the finally generated pixelkThe pseudo tag of (1).
Then, calculating the classification Loss of the new network by using the finally generated pseudo label, specifically, Cross Entropy Loss (Cross Entropy Loss), which is expressed as:
And fourthly, training a semantic segmentation network by combining the loss functions.
In the embodiment of the invention, the new network and the new feature transformation module are trained by combining the alignment loss, the inter-class structure maintenance loss, the intra-class structure maintenance loss, the initial structure optimization loss and the classification loss in the first to third steps, and finally the aim of realizing continuous learning on a semantic segmentation task is fulfilled. The trained target loss function is a weighted sum of the above loss functions:
The embodiment of the invention provides a continuous learning method of an image semantic segmentation network, which mainly has the following beneficial effects:
1) old knowledge representation is extracted through nonlinear transformation in a feature space to carry out alignment, so that the invariance of old knowledge is effectively maintained, and the learning capacity of new knowledge is improved.
2) The topological structure of the new class is optimized in the embedding space, the invariance of the topological structure of the old class is maintained, and the effects of reducing forgetting and preventing confusion among classes are achieved.
3) By combining the pseudo label and the pseudo label denoising technology, the labels of old categories are not required to be provided in the continuous learning of semantic segmentation, and the labeling cost is reduced.
Generally, the method is used as a universal semantic segmentation continuous learning method, has no limitation on application scenes, and has strong generalization capability and practical value.
Based on the above description, the following provides a complete implementation process, which includes the initial stage learning of the image semantic segmentation network, the continuous learning of the image semantic segmentation network, and the testing of the image semantic segmentation network.
Firstly, learning an image semantic segmentation network in an initial stage.
1. Preparing an initial semantic segmentation data set and corresponding class labels to form training data, changing the spatial resolution of the image in a random cutting mode to enable the width and the height of the image to be 512, and performing normalization processing.
2. The method comprises the steps of establishing an image semantic segmentation model based on class structure keeping and feature alignment by using a deep learning framework, wherein the image semantic segmentation model comprises a full convolution semantic segmentation network, a feature transformation module, a class structure information keeping module, a pseudo label generation module and the like. The full convolution semantic segmentation network is DeeplabV3, and the feature extractor can select ResNet, MobileNet and the like. ResNet-101 is used here as a feature extractor. The decoder part is an ASPP module. And a feature transformation module is arranged at the output part of the feature extractor to carry out nonlinear transformation and alignment operation on the features. A category structure information holding module is arranged at a decoder part of the semantic segmentation network. And arranging a pseudo label generating module at an output part of the semantic segmentation network.
3. In the initial stage learning process, a group of data input networks are selected from training data randomly each time, a semantic segmentation result is given through a model, and the loss training networks are optimized by using cross entropy loss and an initial structure.
The training processes involved in this part are all conventional techniques, and are not described again; in addition, the specific image size and the network structure and type related to the above flow are examples and are not limited.
And secondly, continuously learning the image semantic segmentation network.
1. After the initial stage training is completed, a new semantic segmentation data set and labels corresponding to new categories are prepared. And changing the spatial resolution of the image in a random cutting mode to enable the width and the height of the image to be 512, and performing normalization processing.
Likewise, the specific image sizes referred to herein are exemplary only and not limiting.
As will be understood by those skilled in the art, the newly added semantic segmentation data set includes the newly added category and the old category, and of course, there is a possibility that a few images do not include the old category, but have a small influence on the learning effect. In addition, the new category can be labeled, and the old category does not need to be labeled.
2. And (5) a preliminary training feature transformation module. Selecting a group of data input image semantic segmentation network from training data at random each iteration to obtain a feature map output by a feature extractor, and using a loss functionAnd training the feature transformation module to complete feature transformation operation on the newly added data.
3. The image semantic segmentation network and the weight of the feature transformation module are used to initialize the same network and feature transformation module (i.e. the new network and the new feature transformation module) for learning the new category, and the old network and the feature transformation module thereof are not updated. Each iteration randomly selects a group of data from the training data and inputs the data into a new network and an old network simultaneously. The output feature graphs of the two feature extractors are respectively subjected to old and new feature transformation modules to obtain old knowledge representations, and alignment loss is calculated. Constructing an inter-class relationship matrix for old classes using new and old network decoder outputs、And intra-class set of relationships、. And calculates inter-class structure retention lossAnd intra-class structure retention loss. Computing initial structure optimization penalties for new classes simultaneously. And finally, generating complete semantic labels by using the output of the old network through a pseudo label generation module, and calculating the cross entropy loss on the segmentation result of the new network.
4. And calculating a total loss function L according to the loss function in the steps, minimizing the loss function through a back propagation algorithm and a gradient descent strategy, and updating the parameter weights of the semantic segmentation network and the feature transformation module.
The back propagation algorithm and the gradient descent strategy involved in this stage can refer to the conventional techniques, and are not described in detail.
And when new categories need to be continuously learned, repeating the steps 1-4 of the continuous learning part of the image semantic segmentation network until all interested categories are completely learned.
And thirdly, testing the image semantic segmentation network.
And (3) inputting the images in the test data set into the image semantic segmentation network after continuous learning, and sequentially obtaining segmentation results through the internal feature extractor and the decoder. The segmentation result can be evaluated by setting indexes so as to judge the semantic segmentation performance of the image semantic segmentation network after continuous learning.
As shown in fig. 4, a schematic diagram illustrating comparison of segmentation results of different image semantic segmentation networks is shown; the four columns of images from left to right represent: from fig. 4, it can be found that the segmentation result of the present invention is close to the real segmentation result and is far better than the segmentation result of the existing scheme.
Example two
The invention further provides an image semantic segmentation network continuous learning system, which is implemented mainly based on the method provided by the first embodiment, as shown in fig. 5, the system mainly includes:
the data collection and preliminary training unit is used for acquiring a newly added semantic segmentation data set and labels corresponding to newly added categories, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
the learning unit is used for initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module; during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network simultaneously, and performing feature map extraction, decoding and semantic segmentation on the old network and the new network respectively to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relationship matrixes and intra-class relationship sets for old classes by utilizing the respective segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relationship matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relationship sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the newly added classes, calculating initial structure optimization loss by using the eigenvectors obtained by decoding the new network, wherein the initial structure optimization loss is used for enhancing the distribution of the eigenvectors of the same newly added class, distancing the distribution of the eigenvectors of different newly added classes, optimizing and denoising the segmentation result of the old network by using class-by-class dynamic thresholds to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the above division of each functional module is only used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the above described functions.
It should be noted that, the main principles related to the above units have been described in detail in the first embodiment, and therefore, detailed descriptions thereof are omitted.
EXAMPLE III
The present invention also provides a processing apparatus, as shown in fig. 6, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.
Further, the processing device further comprises at least one input device and at least one output device; in the processing device, a processor, a memory, an input device and an output device are connected through a bus.
In the embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:
the input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;
the output device may be a display terminal;
the Memory may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as a disk Memory.
Example four
The present invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.
The readable storage medium in the embodiment of the present invention may be provided in the foregoing processing device as a computer readable storage medium, for example, as a memory in the processing device. The readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A continuous learning method of an image semantic segmentation network is characterized by comprising the following steps:
acquiring a newly added semantic segmentation data set and a label corresponding to a newly added category, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and primarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, wherein the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module;
during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the new classes, calculating initial structure optimization loss by using the eigenvector obtained by decoding the new network, wherein the initial structure optimization loss is used for approximating the distribution of the eigenvector of the same new class, distancing the distribution of the eigenvector of different new classes, optimizing the segmentation result of the old network by using a class-by-class dynamic threshold to obtain a corresponding pseudo label, and calculating the classification loss of the new network by using the pseudo label; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
2. The method as claimed in claim 1, wherein the extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using a difference between a feature map reconstructed from a transformation result and the original feature map comprises:
using the self-encoder structure to preliminarily train the feature transformation module, and recording the original feature graph asLet the feature transformation module be denoted as P*(ii) a By means of a feature transformation module P*For original feature mapThe transforming comprises: performing channel dimensionality reduction through convolution operation, and performing local spatial information mixing through a plurality of cavity convolution operations to generate an original characteristic diagramThe characterization of (2);
using a reconstruction network R*For the transformation resultReconstructing the characteristic diagramAnd the original feature mapThe difference in (c) is the euclidean distance between the two, expressed as:
using reconstructionLoss initial training the feature transformation module P*。
3. The method as claimed in claim 1, wherein the step of continuously learning the image semantic segmentation network comprises transforming the feature maps extracted from the old network by the old feature transformation module, transforming the feature maps extracted from the new network by the new feature transformation module, and calculating the alignment loss of two transformation results, comprising:
the feature map extracted from the old network is the original feature map and is marked asRecord the old feature transformation module asThe result of the transformation is represented as(ii) a Recording the extracted feature map of the new network asSaid new feature transformation module is denoted asThe result of the transformation is represented asThe alignment penalty is the L1 distance of the two transform results, expressed as:
4. the continuous learning method for image semantic segmentation networks according to claim 1, wherein the calculating of the structure retention loss between classes by using the matrix of the relationship between classes of the old network and the new network is represented as:
wherein,representing an inter-class relation matrix constructed by using the segmentation result of the old network and the feature vector obtained by decoding for the old class;representing a class-to-class relation matrix constructed by utilizing the segmentation result of the new network and the feature vector obtained by decoding for the old class; iFAn F-norm representing a matrix;
a single element in the inter-class relationship matrix represents the cosine distance between the class prototypes corresponding to the two old classes; for the old class i and the old class j, the corresponding class prototype in the old network is represented asAndthe corresponding class prototypes in the new network are respectively represented asAnd withThen the relationship matrix between classesAnd withIn the corresponding elementAnd withIs expressed as:
5. The continuous learning method for image semantic segmentation networks according to claim 1, wherein the computing intra-class structure retention loss by using the intra-class relationship set of the old network and the new network is expressed as:
wherein, the intra-class relation set constructed by the segmentation result of the old network and the feature vector obtained by decoding for the old class is represented as,Andrespectively representing the feature vectors which are obtained by decoding the old network and belong to the old category i and the corresponding category prototype; representing the intra-class relation set constructed for the old class by using the segmentation result of the new network and the feature vector obtained by decoding,And withRespectively representing the feature vector which is obtained by decoding the new network and belongs to the old category i and the corresponding category prototype; the class prototype is the average of all feature vectors under the corresponding class, D represents the distance metric function,a set of old categories is represented that are,indicating the number of old categories.
6. The continuous learning method for image semantic segmentation networks according to claim 1, wherein for the new added category, the initial structure optimization loss calculated by using the feature vector obtained by decoding the new network is represented as:
wherein,indicating the loss of structure within the boot class,indicating a loss of structure between the boot classes,is composed ofThe weight of (c);
loss of leading intra-class structureThe distribution of feature vectors used to pull up the same new added category is represented as:
wherein,a set of new classes representing the current learning phase t,representing the number of new classes in the current learning stage t, wherein the current learning stage t represents the training of the new network and the new featuresA stage of characterizing a transformation module;the category prototype corresponding to the new category c is shown,representing a feature vector belonging to the new added category c;
loss of boot inter-class structureThe distribution of the feature vectors used to separate different new categories is expressed as:
wherein,andrespectively representing the category prototypes corresponding to the new category m and the new category n,representation category prototypesAndthe cosine of the similarity of (a) is,is a predefined distance;
the category prototype is the average of all feature vectors in the corresponding category for new additionClass c, class prototypeExpressed as:
7. The method as claimed in claim 1, wherein the optimizing the segmentation result of the old network by using a class-by-class dynamic threshold to obtain a corresponding pseudo label, and the calculating the classification loss of the new network by using the pseudo label includes:
optimizing the segmentation result of the old network by utilizing the category-by-category dynamic threshold, and fusing the obtained labels of the newly added categories to obtain corresponding pseudo labels, wherein the labels are expressed as follows:
wherein,representing input images acquired at the current learning phase tThe current learning stage t represents a stage of training the new network and the new feature transformation module;representing the classification confidence of the old network for pixel k,indicating the dynamic threshold corresponding to the old category i,a set of old categories is represented that is,representing old networksFor input imageThe output segmentation result, i.e. the classification result for each pixel,a pseudo label for the generated pixel k;
calculating a classification loss for the new network using the pseudo label, expressed as:
8. An image semantic segmentation network continuous learning system, which is realized based on the method of any one of claims 1 to 7, and comprises:
the data collection and preliminary training unit is used for acquiring a newly added semantic segmentation data set and labels corresponding to newly added categories, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
the learning unit is used for initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module; during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network simultaneously, and performing feature map extraction, decoding and semantic segmentation on the old network and the new network respectively to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and the feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the new classes, calculating initial structure optimization loss by using the eigenvector obtained by decoding the new network, wherein the initial structure optimization loss is used for approximating the distribution of the eigenvector of the same new class, distancing the distribution of the eigenvector of different new classes, optimizing the segmentation result of the old network by using a class-by-class dynamic threshold to obtain a corresponding pseudo label, and calculating the classification loss of the new network by using the pseudo label; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
9. A processing device, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
10. A readable storage medium, storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210237914.4A CN114332466B (en) | 2022-03-11 | 2022-03-11 | Continuous learning method, system, equipment and storage medium for image semantic segmentation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210237914.4A CN114332466B (en) | 2022-03-11 | 2022-03-11 | Continuous learning method, system, equipment and storage medium for image semantic segmentation network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114332466A CN114332466A (en) | 2022-04-12 |
CN114332466B true CN114332466B (en) | 2022-07-15 |
Family
ID=81034081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210237914.4A Active CN114332466B (en) | 2022-03-11 | 2022-03-11 | Continuous learning method, system, equipment and storage medium for image semantic segmentation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114332466B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114898098B (en) * | 2022-06-27 | 2024-04-19 | 北京航空航天大学 | Brain tissue image segmentation method |
CN116977635B (en) * | 2023-07-19 | 2024-04-16 | 中国科学院自动化研究所 | Category increment semantic segmentation learning method and semantic segmentation method |
CN117036790B (en) * | 2023-07-25 | 2024-03-22 | 中国科学院空天信息创新研究院 | Instance segmentation multi-classification method under small sample condition |
CN117875407B (en) * | 2024-03-11 | 2024-06-04 | 中国兵器装备集团自动化研究所有限公司 | Multi-mode continuous learning method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2954540B1 (en) * | 2009-12-23 | 2018-11-16 | Thales | METHOD FOR CLASSIFYING OBJECTS IN A SLEEPING SYSTEM BY IMAGING. |
US9704257B1 (en) * | 2016-03-25 | 2017-07-11 | Mitsubishi Electric Research Laboratories, Inc. | System and method for semantic segmentation using Gaussian random field network |
US11847819B2 (en) * | 2019-12-19 | 2023-12-19 | Brainlab Ag | Medical image analysis using machine learning and an anatomical vector |
CN111047548B (en) * | 2020-03-12 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Attitude transformation data processing method and device, computer equipment and storage medium |
CN112559784B (en) * | 2020-11-02 | 2023-07-04 | 浙江智慧视频安防创新中心有限公司 | Image classification method and system based on incremental learning |
-
2022
- 2022-03-11 CN CN202210237914.4A patent/CN114332466B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114332466A (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114332466B (en) | Continuous learning method, system, equipment and storage medium for image semantic segmentation network | |
Wang et al. | Detect globally, refine locally: A novel approach to saliency detection | |
CN113627482B (en) | Cross-modal image generation method and device based on audio-touch signal fusion | |
CN110175251A (en) | The zero sample Sketch Searching method based on semantic confrontation network | |
CN114359526B (en) | Cross-domain image style migration method based on semantic GAN | |
CN112347995B (en) | Unsupervised pedestrian re-identification method based on fusion of pixel and feature transfer | |
CN112784929B (en) | Small sample image classification method and device based on double-element group expansion | |
CN110879974B (en) | Video classification method and device | |
CN111967533B (en) | Sketch image translation method based on scene recognition | |
CN110378911B (en) | Weak supervision image semantic segmentation method based on candidate region and neighborhood classifier | |
CN112819689B (en) | Training method of human face attribute editing model, human face attribute editing method and human face attribute editing equipment | |
Tang et al. | Attribute-guided sketch generation | |
CN104036296A (en) | Method and device for representing and processing image | |
CN113361646A (en) | Generalized zero sample image identification method and model based on semantic information retention | |
CN112668608A (en) | Image identification method and device, electronic equipment and storage medium | |
CN117152459A (en) | Image detection method, device, computer readable medium and electronic equipment | |
CN118196231A (en) | Lifelong learning draft method based on concept segmentation | |
Ghorai et al. | An image inpainting method using pLSA-based search space estimation | |
CN118381980A (en) | Intelligent video editing and abstract generating method and device based on semantic segmentation | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
CN112802048B (en) | Method and device for generating layer generation countermeasure network with asymmetric structure | |
CN115984949B (en) | Low-quality face image recognition method and equipment with attention mechanism | |
CN114841887B (en) | Image recovery quality evaluation method based on multi-level difference learning | |
CN111046958A (en) | Image classification and recognition method based on data-dependent kernel learning and dictionary learning | |
CN114429648B (en) | Pedestrian re-identification method and system based on contrast characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |