CN114332466B - Continuous learning method, system, equipment and storage medium for image semantic segmentation network - Google Patents

Continuous learning method, system, equipment and storage medium for image semantic segmentation network Download PDF

Info

Publication number
CN114332466B
CN114332466B CN202210237914.4A CN202210237914A CN114332466B CN 114332466 B CN114332466 B CN 114332466B CN 202210237914 A CN202210237914 A CN 202210237914A CN 114332466 B CN114332466 B CN 114332466B
Authority
CN
China
Prior art keywords
network
class
old
new
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210237914.4A
Other languages
Chinese (zh)
Other versions
CN114332466A (en
Inventor
王子磊
林子涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210237914.4A priority Critical patent/CN114332466B/en
Publication of CN114332466A publication Critical patent/CN114332466A/en
Application granted granted Critical
Publication of CN114332466B publication Critical patent/CN114332466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system, equipment and a storage medium for continuously learning an image semantic segmentation network, on one hand, the invariance of old knowledge is effectively kept and the ability of learning new knowledge is improved by means of extracting old knowledge representation for alignment through nonlinear transformation in a feature space. On the other hand, the topological structure of the new class is optimized in the embedding space, and the invariance of the topological structure of the old class is maintained, so that the effects of reducing forgetting and preventing confusion among the classes are achieved; in addition, the pseudo label and the pseudo label denoising technology are combined, so that old labels do not need to be provided in the continuous learning of semantic segmentation, and the labeling cost is reduced. Generally, the method is used as a universal semantic segmentation continuous learning method, has no limitation on application scenes, and has strong generalization capability and practical value.

Description

Continuous learning method, system, equipment and storage medium for image semantic segmentation network
Technical Field
The invention relates to the technical field of image semantic segmentation, in particular to a continuous learning method, a system, equipment and a storage medium for an image semantic segmentation network.
Background
In recent years, deep neural networks have enjoyed great success in the task of semantic segmentation. However, the traditional semantic segmentation network training method needs to acquire all training data at one time, and is difficult to update after training is completed. In practical application, the network is often required to gradually learn and update learned knowledge from data streams, so that data storage cost and training cost are effectively reduced. But having a deep neural network to learn directly on new data can lead to severe forgetfulness of the learned knowledge. The continuous learning technology exerts additional constraint on the learning process so as to achieve the purpose of learning new knowledge without forgetting learned knowledge.
The general approach to continuous learning is to use knowledge distillation to maintain consistency of knowledge between old and new networks. This operation is often performed in the output space or feature space. In particular to the field of semantic segmentation, in addition to preventing the forgetting of old knowledge using the above measures, there are two new challenges: first, as learning progresses, it may be necessary to learn classes that have been ignored in the past, resulting in semantic information that is not constant for a particular input, which requires the network to have more new knowledge learning capabilities. Secondly, because a large amount of manpower and material resources are needed for obtaining the labeling data, only the category needing to be learned is expected to be labeled on the newly added data, which causes that the region labeled as the background category may contain the learned category, and the introduced semantic inconsistency brings great challenges to network training. Therefore, the continuous learning method in the image classification field cannot be competent for the semantic segmentation continuous learning task.
Specifically, the method comprises the following steps: in the chinese patent application CN111191709A, the continuous learning framework and continuous learning method of deep neural network, a generation network is used to generate data of old category, and the data is mixed with new data to train the network, but it only solves the task of image classification. In addition, the method depends on the generation quality of the generator seriously, and is difficult to be competent for large-scale and complex data, particularly for the task of semantic segmentation of images. In chinese patent application CN111368874A, an incremental learning method for image classification based on single classification technology, two means of knowledge distillation and preference correction in output space are used to realize continuous learning of image classification task. However, it still fails to address the challenges specific to the aforementioned semantic segmentation continuous learning task and thus cannot be directly applied in image semantic segmentation. Chinese patent application No. CN103366163A, "face detection system and method based on incremental learning", chinese patent application No. CN106897705A, "ocean observation big data distribution method based on incremental learning", and chinese patent application No. CN103593680A, "dynamic gesture recognition method based on hidden markov model self-incremental learning" are all methods dedicated to a specific field, and cannot prove that they have generalization and universality.
Therefore, the method for solving the front-back semantic inconsistency in the semantic segmentation continuous learning task while reducing the forgetting of old knowledge as much as possible is widely designed for the semantic segmentation continuous learning task, and has important practical value and practical significance.
Disclosure of Invention
The invention aims to provide a method, a system, equipment and a storage medium for continuous learning of an image semantic segmentation network, which have no limitation on application scenes, have strong generalization capability and practical value and fill up the blank of a semantic segmentation continuous learning task.
The purpose of the invention is realized by the following technical scheme:
a continuous learning method of an image semantic segmentation network comprises the following steps:
acquiring a newly added semantic segmentation data set and labels corresponding to newly added categories, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
initializing the same image semantic segmentation network and feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, wherein the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module;
during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the newly added classes, calculating initial structure optimization loss by using the eigenvectors obtained by decoding the new network, wherein the initial structure optimization loss is used for enhancing the distribution of the eigenvectors of the same newly added class, distancing the distribution of the eigenvectors of different newly added classes, optimizing and denoising the segmentation result of the old network by using class-by-class dynamic thresholds to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; and training the new network and a new feature transformation module by combining the alignment loss, the inter-class structure maintenance loss, the intra-class structure maintenance loss, the initial structure optimization loss and the classification loss.
An image semantic segmentation network continuous learning system, comprising:
the data collection and preliminary training unit is used for acquiring a newly-added semantic segmentation data set and labels corresponding to newly-added categories, extracting an original feature map of image data in the newly-added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
the learning unit is used for initializing the same image semantic segmentation network and feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module; during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and the feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the new classes, calculating initial structure optimization loss by using the eigenvector obtained by decoding the new network, wherein the initial structure optimization loss is used for approximating the distribution of the eigenvector of the same new class and distancing the distribution of the eigenvector of different new classes, optimizing and denoising the segmentation result of the old network by using class-by-class dynamic thresholds to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
A processing device, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned method.
A readable storage medium, storing a computer program which, when executed by a processor, implements the aforementioned method.
According to the technical scheme provided by the invention, on one hand, the old knowledge representation is extracted through nonlinear transformation in the feature space for alignment, so that the invariance of the old knowledge is effectively kept, and the learning capacity of new knowledge is improved. On the other hand, the topological structure of the new class is optimized in the embedding space, the invariance of the topological structure of the old class is maintained, and the effects of reducing forgetting and preventing confusion among the classes are achieved; in addition, the pseudo label and the pseudo label noise reduction technology are combined, so that the labels of old categories do not need to be provided in the continuous learning of semantic segmentation, and the labeling cost is reduced. Generally, the method is used as a universal semantic segmentation continuous learning method, has no limitation on application scenes, and has strong generalization capability and practical value.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic model diagram of a continuous learning method of an image semantic segmentation network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a part of the principle of initial structure optimization provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a portion of the inter-class and intra-class structure maintenance provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a comparison of segmentation results of different image semantic segmentation networks according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a continuous learning system of an image semantic segmentation network according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The terms that may be used herein are first described as follows:
the terms "comprising," "including," "containing," "having," or other similar terms of meaning should be construed as non-exclusive inclusions. For example: including a feature (e.g., material, component, ingredient, carrier, formulation, material, dimension, part, component, mechanism, device, step, process, method, reaction condition, processing condition, parameter, algorithm, signal, data, product, or article, etc.) that is not specifically recited, should be interpreted to include not only the specifically recited feature but also other features not specifically recited and known in the art.
The following describes a method, a system, a device and a storage medium for continuous learning of an image semantic segmentation network provided by the present invention in detail. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to a person skilled in the art. The examples of the present invention, in which specific conditions are not specified, were carried out according to the conventional conditions in the art or conditions suggested by the manufacturer.
Example one
The embodiment of the invention provides a continuous learning method of an image semantic segmentation network, which is a semantic segmentation continuous learning method based on class structure keeping and feature alignment. The current mainstream image semantic segmentation network consists of a feature extractor, a decoder and a classifier, and the main process flow is as follows: the feature extraction device is used for extracting the feature map of the input image to be segmented, the decoder is used for obtaining the corresponding feature vector, and finally the classifier is used for carrying out semantic segmentation to obtain the classification result (namely the segmentation result) of each pixel. The invention designs a corresponding module aiming at the image semantic segmentation network so as to prevent old knowledge from being forgotten. Specifically, the core content of the method comprises a feature transformation module, a category structure information keeping module, a pseudo label generation module and a joint loss function training semantic segmentation network. The feature transformation module extracts the representation of the old knowledge to align by implementing nonlinear transformation on the feature graph output by the feature extractor, so that the integrity of the old knowledge is effectively maintained while high freedom is provided for learning of new knowledge. The class structure information retaining module uses the decoder output to establish an intra-class topological structure and an inter-class topological structure, and effectively reduces the phenomenon of class topological structure damage in the continuous learning process by maintaining the structural consistency in the learning process, thereby reducing forgetting and inter-class confusion. Further, aiming at the problem of inconsistent semantics, the pseudo label generation module utilizes a class-by-class dynamic threshold value to optimally denoise the segmentation result output by the old network, so that a high-quality pseudo label is generated to make up for the missing old class label. And finally, training the semantic segmentation network by combining the loss functions of the modules so as to achieve the effect of continuous learning.
The process of each continuous learning can be described as:
acquiring a newly added semantic segmentation data set and a label corresponding to a newly added category, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and primarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, wherein the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module;
during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relationship matrixes and intra-class relationship sets for old classes by utilizing the respective segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relationship matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relationship sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the newly added classes, calculating initial structure optimization loss by using the eigenvectors obtained by decoding the new network, wherein the initial structure optimization loss is used for promoting the distribution of the eigenvectors of the same newly added class, distancing the distribution of the eigenvectors of different newly added classes, performing optimization denoising for the segmentation result of the old network by using class-by-class dynamic threshold values to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; and training the new network and a new feature transformation module by combining the alignment loss, the inter-class structure maintenance loss, the intra-class structure maintenance loss, the initial structure optimization loss and the classification loss.
For ease of understanding, the learning process described above is further described below.
As shown in fig. 1, a model schematic diagram of an image semantic segmentation network continuous learning method provided by the present invention shows a related flow and a loss function involved in a continuous learning process, which is mainly described as follows:
firstly, a feature transformation module and a related loss function.
In the embodiment of the invention, before the newly added semantic segmentation data set is used for learning the image semantic segmentation network, the characteristic transformation module part needs to be preliminarily trained.
As mentioned above, the image semantic segmentation network comprises a feature extractor, and the feature extractor is used for extracting an original feature map of each image data in the newly added semantic segmentation data set and marking the original feature map as a new semantic segmentation data set
Figure 816177DEST_PATH_IMAGE001
. Then, a Feature transformation module (Feature project) is used for carrying out nonlinear transformation to generate a representation about old knowledge, and the representation output by the Feature transformation module is guided to contain rich and effective information by a training Feature transformation module.
In the embodiment of the invention, the feature transformation module is initially trained by using the structure of the self-encoder, and is recorded asP * (ii) a By means of feature transformation modulesP * For original feature map
Figure 776043DEST_PATH_IMAGE001
The transforming comprises: channel dimensionality reduction is performed through convolution operation (for example, convolution with 1 x 1) firstly, and then local space information mixing is performed through convolution operation with a plurality of holes (for example, convolution with two holes with 3 x 3), so that the original characteristic diagram is generated
Figure 224342DEST_PATH_IMAGE001
And (3) characterization. During initial training, a reconstructed network is usedR * (e.g., two 3-by-3 convolutions may be used) to transform the results
Figure 269658DEST_PATH_IMAGE002
Reconstructing, namely reconstructing an original characteristic diagram from an output transformation result of the characteristic transformation module by trying to use the difference between the original characteristic diagram and the reconstructed characteristic diagram to construct a reconstruction loss function, and preliminarily training the characteristic transformation module by using reconstruction lossP * To guide the feature transformation moduleP * The output representation contains abundant information.
It will be understood by those skilled in the art that the convolution operation and the hole convolution operation are both conventional convolution operations, and compared to the standard convolution operation, the hole convolution operation can mix spatial information in a larger range with a smaller number of layers.
Specifically, the reconstruction loss function may be a reconstructed feature map
Figure 946759DEST_PATH_IMAGE003
And the original feature map
Figure 26710DEST_PATH_IMAGE001
Expressed as:
Figure 380331DEST_PATH_IMAGE004
feature transformation moduleP * When the representation of the old knowledge can be effectively generated, the initial training is completed, and the characteristic transformation module at the moment is recorded as
Figure 912943DEST_PATH_IMAGE005
The original image segmentation network and feature transformation module can then be used
Figure 643002DEST_PATH_IMAGE005
Initializing a new network for learning new knowledge and corresponding feature transformation module
Figure 328192DEST_PATH_IMAGE006
Keeping the original image segmentation network (Old network) and the feature transformation module (feature transformation module) thereof in the process of learning new knowledge
Figure 852715DEST_PATH_IMAGE005
) Updating only the image segmentation network (New network) generated by initialization and the feature transformation module thereof without changing
Figure 872623DEST_PATH_IMAGE006
(i.e., initializing the generated feature transformation module). The learning phase being incrementalThe initial stage of the key concept of learning is marked as 1, and a new continuous learning stage is obtained when a category set is added every time. In FIG. 1, subscriptst-1、tRepresenting different stages of learning, in relative terms,t1 the network of the learning phase is the old network,tthe network in the learning phase is a new network,Erepresenting the encoder (feature extractor),Don behalf of the decoder,Grepresenting a classifier.
In the embodiment of the invention, consistency constraint is applied to the output of the two network feature transformation modules, so that the old knowledge is kept unchanged in the continuous learning process, and meanwhile, the new knowledge is well learned by giving higher change freedom degree of the feature diagram.
In the embodiment of the invention, the original image segmentation network is not updated in the previous stage, so the extracted feature map is the original feature map and is still marked as the original feature map
Figure 875214DEST_PATH_IMAGE007
Old feature transformation module
Figure 929758DEST_PATH_IMAGE005
Is expressed as
Figure 625182DEST_PATH_IMAGE008
(ii) a Recording the extracted characteristic graph of the new network as
Figure 148698DEST_PATH_IMAGE009
Recording the new feature transformation module as
Figure 689401DEST_PATH_IMAGE006
The result of the transformation is represented as
Figure 864030DEST_PATH_IMAGE010
Alignment Loss (Alignment Loss) is the L1 distance of the two transform results, expressed as:
Figure 464776DEST_PATH_IMAGE011
and secondly, a category structure information holding module and a related loss function.
In the embodiment of the invention, the category structure information holding module respectively constructs the intra-category structure relationship and the inter-category structure relationship in the embedding space based on the decoder output of the image segmentation network. By keeping the two relations in the process of continuous learning, the discrimination of the network to the old category is effectively kept. The category structure information holding module mainly includes three parts: the method comprises three parts, namely an initial structure optimization part, an inter-class structure maintaining part and an intra-class structure maintaining part. Wherein, the initial structure optimization part mainly aims at the newly added category and calculates the initial structure optimization Loss which belongs to contrast Loss (contrast Loss); the last two parts mainly aim at the old category, calculate the Structure Preserving Loss (Structure Preserving Loss), include the Structure Preserving Loss among the classs and Structure Preserving Loss within the classs; the old category and the new category are relative concepts, namely categories which can be identified by the image segmentation network before continuous learning. The principle and related loss function of the above three parts are mainly as follows:
1. and (5) an initial structure optimization part.
As shown in fig. 2, the principle of the optimization part is schematically illustrated for the initial structure. When only cross entropy training is used, the distribution of different classes (such as class a and class B on the left side in the figure) in the embedding space is often dispersed and is easy to partially overlap, and the distribution is easy to cause class confusion in the subsequent learning process, so that forgetting is generated. By guiding the feature vector (triangle in the figure) to be as close as possible to the corresponding feature prototype (X-shaped in the figure) and simultaneously enabling the distance between different types of prototypes to be not less than a given threshold value (gray circle on the right side of the figure), the optimized type distribution is achieved and the confusion effect is reduced.
The initial structure optimization module guides the distribution of the new type in the embedding space when learning the new type, so that the distribution of the feature vectors of the same type is as compact as possible, and the distribution of the feature vectors of different types is as discrete as possible. The classification boundary of the model can be clearer, confusion is reduced, and meanwhile, the forgetting phenomenon is stronger in robustness. Initially, since the initial fabric optimization module is only for the new added category, the initial fabric optimization loss is calculated using only the output of the new network.
In order to achieve the above object, two loss functions are used in the loss function (initial structure optimization loss) of the initial structure optimization part to guide the learning of the intra-class structure and the inter-class structure, respectively, and are expressed as:
Figure 459276DEST_PATH_IMAGE012
wherein,
Figure 803670DEST_PATH_IMAGE013
indicating the loss of structure within the boot class,
Figure 592328DEST_PATH_IMAGE014
indicating the loss of structure between the boot classes,
Figure 629554DEST_PATH_IMAGE015
is composed of
Figure 845771DEST_PATH_IMAGE014
The specific size of the weight (can be set according to actual conditions or experience).
Loss of leading intra-class structure
Figure 259435DEST_PATH_IMAGE013
The distribution of feature vectors used to pull up the same new added category is represented as:
Figure 877498DEST_PATH_IMAGE016
wherein,
Figure 85626DEST_PATH_IMAGE017
indicating the current learning phasetA set of the newly added categories is added,
Figure 805451DEST_PATH_IMAGE018
indicating the current learning phasetNumber of newly added categories, the current learning phasetRepresenting a stage of training the new network and the new feature transformation module;
Figure 491647DEST_PATH_IMAGE019
indicating new classescThe corresponding prototype of the category is represented by,
Figure 229796DEST_PATH_IMAGE020
indicating belonging to a new categorycThe feature vector of (2).
Loss of boot inter-class structure
Figure 874404DEST_PATH_IMAGE014
The distribution used to keep away different new categories is expressed as:
Figure 65214DEST_PATH_IMAGE021
wherein,
Figure 555101DEST_PATH_IMAGE022
and
Figure 898489DEST_PATH_IMAGE023
respectively represent newly added categoriesmAnd new categorynThe corresponding prototype of the category is represented by,
Figure 448419DEST_PATH_IMAGE024
representation category prototypes
Figure 126525DEST_PATH_IMAGE022
And with
Figure 685683DEST_PATH_IMAGE023
The cosine of the similarity of (a) is,
Figure 132844DEST_PATH_IMAGE025
is a predefined distance (the specific size can be according to the actual condition orA person experience setting).
In the embodiment of the invention, the category prototype is the average of all the feature vectors under the corresponding category, and for the newly added categorycCategory prototypes
Figure 588096DEST_PATH_IMAGE019
Expressed as:
Figure 769810DEST_PATH_IMAGE026
wherein,ythe labels of the newly added category in the current stage,|y=c|indicating the new category in the labelcThe number of pixels of (a) is,
Figure 601500DEST_PATH_IMAGE027
to indicate a function wheny=cWhen the output is 1, the other is 0.
As will be understood by those skilled in the art, Class Prototypes (Class protocols) are proper terms used in the field of computer vision, and refer to the information for averaging a series of features belonging to a certain Class and characterizing the whole Class by their average values, and the Class Prototypes mentioned later are also used in the description
Figure 168748DEST_PATH_IMAGE019
Calculated in a similar manner.
2. The inter-class structure remains part.
A well-trained deep neural network can map input samples into an embedding space and distribute the samples to different regions in the embedding space according to categories. This is an important characteristic of the deep neural network to correctly classify each class. Based on the characteristics, the inter-class topological structure embedded in the space is constructed, and the structure is maintained in the continuous learning process so as to keep linear separability among classes.
In the embodiment of the invention, the inter-class relation matrix constructed by the old class by utilizing the segmentation result of the old network and the feature vector obtained by decoding
Figure 60480DEST_PATH_IMAGE028
And constructing an inter-class relation matrix for the old class by using the segmentation result of the new network and the feature vector obtained by decoding
Figure 713178DEST_PATH_IMAGE029
(ii) a Wherein, a single element in the inter-class relationship matrix represents the cosine distance between the class prototypes corresponding to the two old classes; for old classesiWith old categoryjThe corresponding class prototypes in the old network are respectively expressed as
Figure 348559DEST_PATH_IMAGE030
And with
Figure 518116DEST_PATH_IMAGE031
The corresponding class prototypes in the new network are respectively expressed as
Figure 580750DEST_PATH_IMAGE032
And
Figure 455165DEST_PATH_IMAGE033
then the relationship matrix between classes
Figure 159816DEST_PATH_IMAGE028
And with
Figure 436076DEST_PATH_IMAGE029
Middle relative elements
Figure 669612DEST_PATH_IMAGE034
And with
Figure 47634DEST_PATH_IMAGE035
Is expressed as:
Figure 24818DEST_PATH_IMAGE036
wherein,
Figure 421164DEST_PATH_IMAGE037
Figure 825601DEST_PATH_IMAGE038
respectively representing class prototypes
Figure 674608DEST_PATH_IMAGE032
And with
Figure 189903DEST_PATH_IMAGE033
Cosine similarity, class prototypes
Figure 457067DEST_PATH_IMAGE030
And with
Figure 32405DEST_PATH_IMAGE031
Cosine similarity of (c).
During the continuous learning process, the consistency of the two is maintained by using the structure keeping loss function between the classes, which is expressed as:
Figure 368708DEST_PATH_IMAGE039
wherein, I F Representing the F-norm of the matrix.
3. An intra-class structure retention portion.
The in-class relation is defined as a set of relative relation between each feature vector and a class prototype thereof, and the in-class relation set constructed by the old class by utilizing the segmentation result of the old network and the feature vector obtained by decoding is represented as
Figure 953274DEST_PATH_IMAGE040
And expressing the intra-class relation set constructed for the old class by using the segmentation result of the new network and the feature vector obtained by decoding as
Figure 58633DEST_PATH_IMAGE041
Herein, theDRepresenting some distance metric function, e.g., euclidean distance. The set of intra-class relationships reflects the fine grains in the embedding spaceTopological structure information of degree. The topological structure of the characteristic vector in the class in the embedding space is kept unchanged in the continuous learning process, and the integrity of single-class knowledge can be effectively maintained. The distance function selected when modeling the structure in the class is the Euclidean distance, so that the sensitivity reaction of the Euclidean distance is utilized to reflect the tiny change of the structure in the class.
In the process of continuous learning, the retention loss of the intra-class structure is used for retaining the intra-class structure (namely the intra-class relationship set) in the old class
Figure 804872DEST_PATH_IMAGE041
And
Figure 379204DEST_PATH_IMAGE040
in the correspondence between the position information and the distance information), expressed as:
Figure 501881DEST_PATH_IMAGE042
wherein,
Figure 461746DEST_PATH_IMAGE043
and
Figure 644466DEST_PATH_IMAGE030
respectively representing old network acquired belonging to old categoryiThe feature vector of (a) and the corresponding class prototype;
Figure 955362DEST_PATH_IMAGE044
and with
Figure 616150DEST_PATH_IMAGE032
Indicating old category to which new network was acquirediThe feature vector of (a) and the corresponding class prototype;
Figure 443904DEST_PATH_IMAGE045
representing the set of old classes (i.e. all old classes that have been learned),
Figure 797525DEST_PATH_IMAGE046
indicating the number of old categories.
The class prototypes involved in the inter-class structure holding part and the intra-class structure holding part are calculated by using the segmentation results and feature vectors output by the corresponding networks, and the calculation formula can be referred to above
Figure 330138DEST_PATH_IMAGE019
The difference is mainly that since this part is for the old class, the segmentation result needs to be brought in
Figure 794617DEST_PATH_IMAGE019
And (4) a formula.
By category prototype
Figure 729075DEST_PATH_IMAGE032
For example, the calculation formula is:
Figure 253597DEST_PATH_IMAGE047
wherein,
Figure 24238DEST_PATH_IMAGE048
the segmentation result output by the new network is represented,
Figure 292409DEST_PATH_IMAGE049
representing the prediction class in the segmentation result output by the new network as the old classiThe number of pixels of (c).
The same is true for the old network, and the corresponding class prototypes are calculated by substituting the segmentation results into the above formula.
Fig. 3 is a schematic diagram of the inter-class and intra-class structure holding parts. When a new category needs to be learned, the inter-category topology structure maintenance only ensures that the adjacent and related relations among the categories are kept unchanged in the updating process, but allows the adjacent and related relations to change in the embedding space, such as rotation, translation and the like, so that the learning of the new category is facilitated while the old category knowledge is effectively maintained. When the intra-class structure is unconstrained, the updating of the network often results in large changes in the relative relationship of the same input to the prototype of the feature (lower left in FIG. 3). While intra-class structure retention loss can reduce such changes, thereby maintaining the integrity of the old knowledge at a finer granularity.
And thirdly, a pseudo label generating module and a related loss function.
In the embodiment of the invention, the segmentation result of the old network is subjected to optimization denoising by utilizing the class-by-class dynamic threshold value, so that a high-quality Pseudo Label is generated to make up for the missing old class Label, and the process is called Pseudo Label Refinement (Pseudo Label reference). The principle is as follows:
in the continuous learning process, the labels of the old classes are not given in the current learning phase, i.e. the learned classes are marked as background classes in the given labels. The forgetting effect of the learned class will be exacerbated when the network is trained directly using a given label as a supervisory signal. To this end, the semantic segmentation results of the old network are used to label the background class of a given label, thereby providing a pseudo label for the learned class. Further, among the segmentation results output from the old network, it is difficult to avoid including erroneous results. For the problem, the entropy of the output class probability is used as a confidence evaluation index, and only the result with higher confidence is used as a pseudo label. Because the learning conditions of the network for different categories are different, the invention respectively calculates the distribution condition of the output entropy of each category and selects the threshold value according to the distribution condition
Figure 81373DEST_PATH_IMAGE050
Make corresponding old categoryiThe pseudo labels with a fixed proportion are reserved, the final supervision labels are generated after the newly added real labels (obtained in the previous stage) are fused, and the method for generating the supervision labels (the pseudo labels) is represented as follows:
Figure 42376DEST_PATH_IMAGE051
wherein,
Figure 284001DEST_PATH_IMAGE052
indicating the current learning phasetInputting an image
Figure 90283DEST_PATH_IMAGE053
Middle pixelkThe true label of the corresponding new added category,
Figure 15645DEST_PATH_IMAGE054
representing old network pairs of pixelskThe confidence of the classification of (2) is,
Figure 616391DEST_PATH_IMAGE050
representing old classesiThe corresponding dynamic threshold value is set to a value,
Figure 610892DEST_PATH_IMAGE045
a set of old categories is represented that is,
Figure 220865DEST_PATH_IMAGE055
representing old networks
Figure 984421DEST_PATH_IMAGE056
For input image
Figure 756068DEST_PATH_IMAGE053
The output segmentation result, i.e. the classification result for each pixel,
Figure 254177DEST_PATH_IMAGE057
for the finally generated pixelkThe pseudo tag of (1).
Then, calculating the classification Loss of the new network by using the finally generated pseudo label, specifically, Cross Entropy Loss (Cross Entropy Loss), which is expressed as:
Figure 402261DEST_PATH_IMAGE058
wherein,
Figure 285904DEST_PATH_IMAGE059
representing the new webFor input image
Figure 494031DEST_PATH_IMAGE060
And outputting the segmentation result.
And fourthly, training a semantic segmentation network by combining the loss functions.
In the embodiment of the invention, the new network and the new feature transformation module are trained by combining the alignment loss, the inter-class structure maintenance loss, the intra-class structure maintenance loss, the initial structure optimization loss and the classification loss in the first to third steps, and finally the aim of realizing continuous learning on a semantic segmentation task is fulfilled. The trained target loss function is a weighted sum of the above loss functions:
Figure 463124DEST_PATH_IMAGE061
wherein,
Figure 897123DEST_PATH_IMAGE062
and
Figure 635272DEST_PATH_IMAGE063
respectively, the weights of the corresponding losses.
The embodiment of the invention provides a continuous learning method of an image semantic segmentation network, which mainly has the following beneficial effects:
1) old knowledge representation is extracted through nonlinear transformation in a feature space to carry out alignment, so that the invariance of old knowledge is effectively maintained, and the learning capacity of new knowledge is improved.
2) The topological structure of the new class is optimized in the embedding space, the invariance of the topological structure of the old class is maintained, and the effects of reducing forgetting and preventing confusion among classes are achieved.
3) By combining the pseudo label and the pseudo label denoising technology, the labels of old categories are not required to be provided in the continuous learning of semantic segmentation, and the labeling cost is reduced.
Generally, the method is used as a universal semantic segmentation continuous learning method, has no limitation on application scenes, and has strong generalization capability and practical value.
Based on the above description, the following provides a complete implementation process, which includes the initial stage learning of the image semantic segmentation network, the continuous learning of the image semantic segmentation network, and the testing of the image semantic segmentation network.
Firstly, learning an image semantic segmentation network in an initial stage.
1. Preparing an initial semantic segmentation data set and corresponding class labels to form training data, changing the spatial resolution of the image in a random cutting mode to enable the width and the height of the image to be 512, and performing normalization processing.
2. The method comprises the steps of establishing an image semantic segmentation model based on class structure keeping and feature alignment by using a deep learning framework, wherein the image semantic segmentation model comprises a full convolution semantic segmentation network, a feature transformation module, a class structure information keeping module, a pseudo label generation module and the like. The full convolution semantic segmentation network is DeeplabV3, and the feature extractor can select ResNet, MobileNet and the like. ResNet-101 is used here as a feature extractor. The decoder part is an ASPP module. And a feature transformation module is arranged at the output part of the feature extractor to carry out nonlinear transformation and alignment operation on the features. A category structure information holding module is arranged at a decoder part of the semantic segmentation network. And arranging a pseudo label generating module at an output part of the semantic segmentation network.
3. In the initial stage learning process, a group of data input networks are selected from training data randomly each time, a semantic segmentation result is given through a model, and the loss training networks are optimized by using cross entropy loss and an initial structure.
The training processes involved in this part are all conventional techniques, and are not described again; in addition, the specific image size and the network structure and type related to the above flow are examples and are not limited.
And secondly, continuously learning the image semantic segmentation network.
1. After the initial stage training is completed, a new semantic segmentation data set and labels corresponding to new categories are prepared. And changing the spatial resolution of the image in a random cutting mode to enable the width and the height of the image to be 512, and performing normalization processing.
Likewise, the specific image sizes referred to herein are exemplary only and not limiting.
As will be understood by those skilled in the art, the newly added semantic segmentation data set includes the newly added category and the old category, and of course, there is a possibility that a few images do not include the old category, but have a small influence on the learning effect. In addition, the new category can be labeled, and the old category does not need to be labeled.
2. And (5) a preliminary training feature transformation module. Selecting a group of data input image semantic segmentation network from training data at random each iteration to obtain a feature map output by a feature extractor, and using a loss function
Figure 14301DEST_PATH_IMAGE064
And training the feature transformation module to complete feature transformation operation on the newly added data.
3. The image semantic segmentation network and the weight of the feature transformation module are used to initialize the same network and feature transformation module (i.e. the new network and the new feature transformation module) for learning the new category, and the old network and the feature transformation module thereof are not updated. Each iteration randomly selects a group of data from the training data and inputs the data into a new network and an old network simultaneously. The output feature graphs of the two feature extractors are respectively subjected to old and new feature transformation modules to obtain old knowledge representations, and alignment loss is calculated
Figure 205111DEST_PATH_IMAGE065
. Constructing an inter-class relationship matrix for old classes using new and old network decoder outputs
Figure 960577DEST_PATH_IMAGE066
Figure 553232DEST_PATH_IMAGE067
And intra-class set of relationships
Figure 853895DEST_PATH_IMAGE068
Figure 532001DEST_PATH_IMAGE069
. And calculates inter-class structure retention loss
Figure 825579DEST_PATH_IMAGE070
And intra-class structure retention loss
Figure 272741DEST_PATH_IMAGE071
. Computing initial structure optimization penalties for new classes simultaneously
Figure 993572DEST_PATH_IMAGE072
. And finally, generating complete semantic labels by using the output of the old network through a pseudo label generation module, and calculating the cross entropy loss on the segmentation result of the new network.
4. And calculating a total loss function L according to the loss function in the steps, minimizing the loss function through a back propagation algorithm and a gradient descent strategy, and updating the parameter weights of the semantic segmentation network and the feature transformation module.
The back propagation algorithm and the gradient descent strategy involved in this stage can refer to the conventional techniques, and are not described in detail.
And when new categories need to be continuously learned, repeating the steps 1-4 of the continuous learning part of the image semantic segmentation network until all interested categories are completely learned.
And thirdly, testing the image semantic segmentation network.
And (3) inputting the images in the test data set into the image semantic segmentation network after continuous learning, and sequentially obtaining segmentation results through the internal feature extractor and the decoder. The segmentation result can be evaluated by setting indexes so as to judge the semantic segmentation performance of the image semantic segmentation network after continuous learning.
As shown in fig. 4, a schematic diagram illustrating comparison of segmentation results of different image semantic segmentation networks is shown; the four columns of images from left to right represent: from fig. 4, it can be found that the segmentation result of the present invention is close to the real segmentation result and is far better than the segmentation result of the existing scheme.
Example two
The invention further provides an image semantic segmentation network continuous learning system, which is implemented mainly based on the method provided by the first embodiment, as shown in fig. 5, the system mainly includes:
the data collection and preliminary training unit is used for acquiring a newly added semantic segmentation data set and labels corresponding to newly added categories, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
the learning unit is used for initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module; during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network simultaneously, and performing feature map extraction, decoding and semantic segmentation on the old network and the new network respectively to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relationship matrixes and intra-class relationship sets for old classes by utilizing the respective segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relationship matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relationship sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the newly added classes, calculating initial structure optimization loss by using the eigenvectors obtained by decoding the new network, wherein the initial structure optimization loss is used for enhancing the distribution of the eigenvectors of the same newly added class, distancing the distribution of the eigenvectors of different newly added classes, optimizing and denoising the segmentation result of the old network by using class-by-class dynamic thresholds to obtain corresponding pseudo labels, and calculating the classification loss of the new network by using the pseudo labels; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the above division of each functional module is only used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the above described functions.
It should be noted that, the main principles related to the above units have been described in detail in the first embodiment, and therefore, detailed descriptions thereof are omitted.
EXAMPLE III
The present invention also provides a processing apparatus, as shown in fig. 6, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.
Further, the processing device further comprises at least one input device and at least one output device; in the processing device, a processor, a memory, an input device and an output device are connected through a bus.
In the embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:
the input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;
the output device may be a display terminal;
the Memory may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as a disk Memory.
Example four
The present invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.
The readable storage medium in the embodiment of the present invention may be provided in the foregoing processing device as a computer readable storage medium, for example, as a memory in the processing device. The readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A continuous learning method of an image semantic segmentation network is characterized by comprising the following steps:
acquiring a newly added semantic segmentation data set and a label corresponding to a newly added category, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and primarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, wherein the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module;
during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network at the same time, and respectively performing feature map extraction, decoding and semantic segmentation on the old network and the new network to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the new classes, calculating initial structure optimization loss by using the eigenvector obtained by decoding the new network, wherein the initial structure optimization loss is used for approximating the distribution of the eigenvector of the same new class, distancing the distribution of the eigenvector of different new classes, optimizing the segmentation result of the old network by using a class-by-class dynamic threshold to obtain a corresponding pseudo label, and calculating the classification loss of the new network by using the pseudo label; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
2. The method as claimed in claim 1, wherein the extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using a difference between a feature map reconstructed from a transformation result and the original feature map comprises:
using the self-encoder structure to preliminarily train the feature transformation module, and recording the original feature graph as
Figure 442730DEST_PATH_IMAGE001
Let the feature transformation module be denoted as P*(ii) a By means of a feature transformation module P*For original feature map
Figure 891029DEST_PATH_IMAGE001
The transforming comprises: performing channel dimensionality reduction through convolution operation, and performing local spatial information mixing through a plurality of cavity convolution operations to generate an original characteristic diagram
Figure 828023DEST_PATH_IMAGE001
The characterization of (2);
using a reconstruction network R*For the transformation result
Figure 488812DEST_PATH_IMAGE002
Reconstructing the characteristic diagram
Figure 834342DEST_PATH_IMAGE003
And the original feature map
Figure 797750DEST_PATH_IMAGE001
The difference in (c) is the euclidean distance between the two, expressed as:
Figure 595942DEST_PATH_IMAGE004
using reconstructionLoss initial training the feature transformation module P*
3. The method as claimed in claim 1, wherein the step of continuously learning the image semantic segmentation network comprises transforming the feature maps extracted from the old network by the old feature transformation module, transforming the feature maps extracted from the new network by the new feature transformation module, and calculating the alignment loss of two transformation results, comprising:
the feature map extracted from the old network is the original feature map and is marked as
Figure 450634DEST_PATH_IMAGE005
Record the old feature transformation module as
Figure 650672DEST_PATH_IMAGE006
The result of the transformation is represented as
Figure 909615DEST_PATH_IMAGE007
(ii) a Recording the extracted feature map of the new network as
Figure 70469DEST_PATH_IMAGE008
Said new feature transformation module is denoted as
Figure 338639DEST_PATH_IMAGE009
The result of the transformation is represented as
Figure 7563DEST_PATH_IMAGE010
The alignment penalty is the L1 distance of the two transform results, expressed as:
Figure 968565DEST_PATH_IMAGE011
4. the continuous learning method for image semantic segmentation networks according to claim 1, wherein the calculating of the structure retention loss between classes by using the matrix of the relationship between classes of the old network and the new network is represented as:
Figure 616715DEST_PATH_IMAGE012
wherein,
Figure 157418DEST_PATH_IMAGE013
representing an inter-class relation matrix constructed by using the segmentation result of the old network and the feature vector obtained by decoding for the old class;
Figure 456681DEST_PATH_IMAGE014
representing a class-to-class relation matrix constructed by utilizing the segmentation result of the new network and the feature vector obtained by decoding for the old class; iFAn F-norm representing a matrix;
a single element in the inter-class relationship matrix represents the cosine distance between the class prototypes corresponding to the two old classes; for the old class i and the old class j, the corresponding class prototype in the old network is represented as
Figure 57427DEST_PATH_IMAGE015
And
Figure 317507DEST_PATH_IMAGE016
the corresponding class prototypes in the new network are respectively represented as
Figure 802846DEST_PATH_IMAGE017
And with
Figure 566403DEST_PATH_IMAGE018
Then the relationship matrix between classes
Figure 495307DEST_PATH_IMAGE013
And with
Figure 977104DEST_PATH_IMAGE014
In the corresponding element
Figure 266134DEST_PATH_IMAGE019
And with
Figure 149776DEST_PATH_IMAGE020
Is expressed as:
Figure 357904DEST_PATH_IMAGE021
wherein the category prototype is the average of all feature vectors under the corresponding category,
Figure 451630DEST_PATH_IMAGE022
Figure 403406DEST_PATH_IMAGE023
respectively representing class prototypes
Figure 282500DEST_PATH_IMAGE017
And with
Figure 661529DEST_PATH_IMAGE018
Cosine similarity, class prototype
Figure 744017DEST_PATH_IMAGE015
And
Figure 499483DEST_PATH_IMAGE016
cosine similarity of (c).
5. The continuous learning method for image semantic segmentation networks according to claim 1, wherein the computing intra-class structure retention loss by using the intra-class relationship set of the old network and the new network is expressed as:
Figure 967505DEST_PATH_IMAGE024
wherein, the intra-class relation set constructed by the segmentation result of the old network and the feature vector obtained by decoding for the old class is represented as
Figure 517435DEST_PATH_IMAGE025
Figure 461120DEST_PATH_IMAGE026
And
Figure 144911DEST_PATH_IMAGE015
respectively representing the feature vectors which are obtained by decoding the old network and belong to the old category i and the corresponding category prototype; representing the intra-class relation set constructed for the old class by using the segmentation result of the new network and the feature vector obtained by decoding
Figure 592073DEST_PATH_IMAGE027
Figure 188270DEST_PATH_IMAGE028
And with
Figure 619252DEST_PATH_IMAGE017
Respectively representing the feature vector which is obtained by decoding the new network and belongs to the old category i and the corresponding category prototype; the class prototype is the average of all feature vectors under the corresponding class, D represents the distance metric function,
Figure 342619DEST_PATH_IMAGE029
a set of old categories is represented that are,
Figure 909867DEST_PATH_IMAGE030
indicating the number of old categories.
6. The continuous learning method for image semantic segmentation networks according to claim 1, wherein for the new added category, the initial structure optimization loss calculated by using the feature vector obtained by decoding the new network is represented as:
Figure 801600DEST_PATH_IMAGE031
wherein,
Figure 595243DEST_PATH_IMAGE032
indicating the loss of structure within the boot class,
Figure 496203DEST_PATH_IMAGE033
indicating a loss of structure between the boot classes,
Figure 777012DEST_PATH_IMAGE034
is composed of
Figure 105225DEST_PATH_IMAGE033
The weight of (c);
loss of leading intra-class structure
Figure 120586DEST_PATH_IMAGE032
The distribution of feature vectors used to pull up the same new added category is represented as:
Figure 559657DEST_PATH_IMAGE035
wherein,
Figure 101497DEST_PATH_IMAGE036
a set of new classes representing the current learning phase t,
Figure 226710DEST_PATH_IMAGE037
representing the number of new classes in the current learning stage t, wherein the current learning stage t represents the training of the new network and the new featuresA stage of characterizing a transformation module;
Figure 588421DEST_PATH_IMAGE038
the category prototype corresponding to the new category c is shown,
Figure 706550DEST_PATH_IMAGE039
representing a feature vector belonging to the new added category c;
loss of boot inter-class structure
Figure 102896DEST_PATH_IMAGE033
The distribution of the feature vectors used to separate different new categories is expressed as:
Figure 897546DEST_PATH_IMAGE040
wherein,
Figure 746553DEST_PATH_IMAGE041
and
Figure 527427DEST_PATH_IMAGE042
respectively representing the category prototypes corresponding to the new category m and the new category n,
Figure 919225DEST_PATH_IMAGE043
representation category prototypes
Figure 760142DEST_PATH_IMAGE041
And
Figure 722544DEST_PATH_IMAGE042
the cosine of the similarity of (a) is,
Figure 307110DEST_PATH_IMAGE044
is a predefined distance;
the category prototype is the average of all feature vectors in the corresponding category for new additionClass c, class prototype
Figure 553414DEST_PATH_IMAGE038
Expressed as:
Figure 565233DEST_PATH_IMAGE045
wherein y is the label of the new added category in the current stage, | y = c | represents the number of pixels belonging to the new added category c in the label,
Figure 513466DEST_PATH_IMAGE046
to indicate a function, when y = c, the output is 1, and otherwise the output is 0.
7. The method as claimed in claim 1, wherein the optimizing the segmentation result of the old network by using a class-by-class dynamic threshold to obtain a corresponding pseudo label, and the calculating the classification loss of the new network by using the pseudo label includes:
optimizing the segmentation result of the old network by utilizing the category-by-category dynamic threshold, and fusing the obtained labels of the newly added categories to obtain corresponding pseudo labels, wherein the labels are expressed as follows:
Figure 636143DEST_PATH_IMAGE047
wherein,
Figure 861588DEST_PATH_IMAGE048
representing input images acquired at the current learning phase t
Figure 654094DEST_PATH_IMAGE049
The current learning stage t represents a stage of training the new network and the new feature transformation module;
Figure 230569DEST_PATH_IMAGE050
representing the classification confidence of the old network for pixel k,
Figure 783036DEST_PATH_IMAGE051
indicating the dynamic threshold corresponding to the old category i,
Figure 269512DEST_PATH_IMAGE029
a set of old categories is represented that is,
Figure 623133DEST_PATH_IMAGE052
representing old networks
Figure 421324DEST_PATH_IMAGE053
For input image
Figure 10437DEST_PATH_IMAGE049
The output segmentation result, i.e. the classification result for each pixel,
Figure 944895DEST_PATH_IMAGE054
a pseudo label for the generated pixel k;
calculating a classification loss for the new network using the pseudo label, expressed as:
Figure 610363DEST_PATH_IMAGE055
wherein,
Figure 895851DEST_PATH_IMAGE056
representing the new network to the input image
Figure 790120DEST_PATH_IMAGE057
And outputting the segmentation result.
8. An image semantic segmentation network continuous learning system, which is realized based on the method of any one of claims 1 to 7, and comprises:
the data collection and preliminary training unit is used for acquiring a newly added semantic segmentation data set and labels corresponding to newly added categories, extracting an original feature map of image data in the newly added semantic segmentation data set by using an original image semantic segmentation network, transforming the original feature map by using a feature transformation module, and preliminarily training the feature transformation module by using the difference between a feature map reconstructed by a transformation result and the original feature map;
the learning unit is used for initializing a same image semantic segmentation network and a same feature transformation module by using the original image semantic segmentation network and the preliminarily trained feature transformation module, the original image semantic segmentation network is called an old network, the preliminarily trained feature transformation module is called an old feature transformation module, the image semantic segmentation network generated by initialization is called a new network, and the feature transformation module generated by initialization is called a new feature transformation module; fixing the old network and the old feature transformation module, and training the new network and the new feature transformation module; during training, inputting image data of a newly-added semantic segmentation data set to the old network and the new network simultaneously, and performing feature map extraction, decoding and semantic segmentation on the old network and the new network respectively to obtain segmentation results; the feature map extracted by the old network is transformed by the old feature transformation module, the feature map extracted by the new network is transformed by the new feature transformation module, and the alignment loss of two transformation results is calculated; respectively and independently constructing corresponding inter-class relation matrixes and intra-class relation sets for old classes by utilizing the segmentation results of the old network and the new network and the feature vectors obtained by decoding, calculating inter-class structure retention loss by utilizing the inter-class relation matrixes of the old network and the new network, and calculating intra-class structure retention loss by utilizing the intra-class relation sets of the old network and the new network, wherein the inter-class structure retention loss and the intra-class structure retention loss are used for keeping consistency of inter-class structures and intra-class structures in the old classes; meanwhile, for the new classes, calculating initial structure optimization loss by using the eigenvector obtained by decoding the new network, wherein the initial structure optimization loss is used for approximating the distribution of the eigenvector of the same new class, distancing the distribution of the eigenvector of different new classes, optimizing the segmentation result of the old network by using a class-by-class dynamic threshold to obtain a corresponding pseudo label, and calculating the classification loss of the new network by using the pseudo label; training the new network and the new feature transformation module in combination with the alignment loss, the inter-class structure retention loss, the intra-class structure retention loss, the initial structure optimization loss and the classification loss.
9. A processing device, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
10. A readable storage medium, storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1 to 7.
CN202210237914.4A 2022-03-11 2022-03-11 Continuous learning method, system, equipment and storage medium for image semantic segmentation network Active CN114332466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210237914.4A CN114332466B (en) 2022-03-11 2022-03-11 Continuous learning method, system, equipment and storage medium for image semantic segmentation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210237914.4A CN114332466B (en) 2022-03-11 2022-03-11 Continuous learning method, system, equipment and storage medium for image semantic segmentation network

Publications (2)

Publication Number Publication Date
CN114332466A CN114332466A (en) 2022-04-12
CN114332466B true CN114332466B (en) 2022-07-15

Family

ID=81034081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210237914.4A Active CN114332466B (en) 2022-03-11 2022-03-11 Continuous learning method, system, equipment and storage medium for image semantic segmentation network

Country Status (1)

Country Link
CN (1) CN114332466B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898098B (en) * 2022-06-27 2024-04-19 北京航空航天大学 Brain tissue image segmentation method
CN116977635B (en) * 2023-07-19 2024-04-16 中国科学院自动化研究所 Category increment semantic segmentation learning method and semantic segmentation method
CN117036790B (en) * 2023-07-25 2024-03-22 中国科学院空天信息创新研究院 Instance segmentation multi-classification method under small sample condition
CN117875407B (en) * 2024-03-11 2024-06-04 中国兵器装备集团自动化研究所有限公司 Multi-mode continuous learning method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2954540B1 (en) * 2009-12-23 2018-11-16 Thales METHOD FOR CLASSIFYING OBJECTS IN A SLEEPING SYSTEM BY IMAGING.
US9704257B1 (en) * 2016-03-25 2017-07-11 Mitsubishi Electric Research Laboratories, Inc. System and method for semantic segmentation using Gaussian random field network
US11847819B2 (en) * 2019-12-19 2023-12-19 Brainlab Ag Medical image analysis using machine learning and an anatomical vector
CN111047548B (en) * 2020-03-12 2020-07-03 腾讯科技(深圳)有限公司 Attitude transformation data processing method and device, computer equipment and storage medium
CN112559784B (en) * 2020-11-02 2023-07-04 浙江智慧视频安防创新中心有限公司 Image classification method and system based on incremental learning

Also Published As

Publication number Publication date
CN114332466A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN114332466B (en) Continuous learning method, system, equipment and storage medium for image semantic segmentation network
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
CN113627482B (en) Cross-modal image generation method and device based on audio-touch signal fusion
CN110175251A (en) The zero sample Sketch Searching method based on semantic confrontation network
CN114359526B (en) Cross-domain image style migration method based on semantic GAN
CN112347995B (en) Unsupervised pedestrian re-identification method based on fusion of pixel and feature transfer
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN110879974B (en) Video classification method and device
CN111967533B (en) Sketch image translation method based on scene recognition
CN110378911B (en) Weak supervision image semantic segmentation method based on candidate region and neighborhood classifier
CN112819689B (en) Training method of human face attribute editing model, human face attribute editing method and human face attribute editing equipment
Tang et al. Attribute-guided sketch generation
CN104036296A (en) Method and device for representing and processing image
CN113361646A (en) Generalized zero sample image identification method and model based on semantic information retention
CN112668608A (en) Image identification method and device, electronic equipment and storage medium
CN117152459A (en) Image detection method, device, computer readable medium and electronic equipment
CN118196231A (en) Lifelong learning draft method based on concept segmentation
Ghorai et al. An image inpainting method using pLSA-based search space estimation
CN118381980A (en) Intelligent video editing and abstract generating method and device based on semantic segmentation
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
CN112802048B (en) Method and device for generating layer generation countermeasure network with asymmetric structure
CN115984949B (en) Low-quality face image recognition method and equipment with attention mechanism
CN114841887B (en) Image recovery quality evaluation method based on multi-level difference learning
CN111046958A (en) Image classification and recognition method based on data-dependent kernel learning and dictionary learning
CN114429648B (en) Pedestrian re-identification method and system based on contrast characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant