CN113159049A

CN113159049A - Training method and device of weak supervision semantic segmentation model, storage medium and terminal

Info

Publication number: CN113159049A
Application number: CN202110443179.8A
Authority: CN
Inventors: 朱政; 王仪琦; 黄冠
Original assignee: Shanghai Xinyi Intelligent Technology Co ltd
Current assignee: Shanghai Xinyi Intelligent Technology Co ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-23

Abstract

A training method and device of a weak supervision semantic segmentation model, a storage medium and a terminal are provided, and the training method of the weak supervision semantic segmentation model comprises the following steps: classifying weak label sample data by adopting a classification network to obtain a first class characteristic, wherein the weak label sample data comprises a plurality of sample pictures; classifying the weak label sample data by adopting an initial semantic segmentation model to obtain a second class characteristic; feature aligning the first class of features with the second class of features; obtaining combined features according to feature alignment results of the first category features and the second category features, wherein the combined features comprise classification features which cannot be identified by the initial semantic segmentation model but can be identified by the classification network; and training the initial semantic segmentation model by at least adopting the combined features to obtain the weakly supervised semantic segmentation model. According to the scheme, the definition of the boundary of the semantic segmentation result of the weak supervision semantic segmentation model can be improved.

Description

Training method and device of weak supervision semantic segmentation model, storage medium and terminal

Technical Field

The embodiment of the invention relates to the technical field of semantic segmentation, in particular to a training method and device of a weakly supervised semantic segmentation model, a storage medium and a terminal.

Background

The semantic segmentation is mainly to predict the category of each pixel point in the two-dimensional image by using a deep neural network, so that a fine-grained label is provided for a downstream task. However, labeling at the pixel level of semantically segmented data sets requires a significant amount of labor. Therefore, more and more methods are beginning to try to train models with more time-efficient annotation (e.g., annotating only which classes appear in the entire image) data.

In the existing method, a strong label at a pixel level is inferred from a classification result of a weak label through a visualization technology, and then a semantic segmentation model is trained by using the strong label. However, the semantic segmentation model obtained by training has poor definition of the boundary of the semantic segmentation result obtained in the semantic segmentation.

Disclosure of Invention

The embodiment of the invention solves the technical problem of how to improve the definition of the boundary of the semantic segmentation result of the weak supervision semantic segmentation model.

In order to solve the above technical problem, an embodiment of the present invention provides a training method for a weakly supervised semantic segmentation model, including: classifying weak label sample data by adopting a classification network to obtain a first class characteristic, wherein the weak label sample data comprises a plurality of sample pictures; classifying the weak label sample data by adopting an initial semantic segmentation model to obtain a second class characteristic; feature aligning the first class of features with the second class of features; obtaining combined features according to feature alignment results of the first category features and the second category features, wherein the combined features comprise classification features which cannot be identified by the initial semantic segmentation model but can be identified by the classification network; and training the initial semantic segmentation model by at least adopting the combined features to obtain the weakly supervised semantic segmentation model.

Optionally, the obtaining a combined feature according to the feature alignment result of the first class feature and the second class feature includes: training the classification network according to the feature alignment result of the first class feature and the second class feature; and classifying the weak label sample data by adopting a trained classification network, and combining the classified class characteristics to obtain the combined characteristics.

Optionally, the training the initial semantic segmentation model by using at least the combined features to obtain the weakly supervised semantic segmentation model includes: fixing backbone network parameters of the initial semantic segmentation model, and training a classification head of the initial semantic segmentation model at least by adopting the combined features; and obtaining the weak supervision semantic segmentation model according to the semantic segmentation model obtained after the training of the classification head is completed.

Optionally, the obtaining the weak supervision semantic segmentation model according to the semantic segmentation model obtained after the training of the classification head includes: obtaining an intermediate semantic segmentation model after finishing training the classification head, fixing parameters of the classification head of the intermediate semantic segmentation model, and training a backbone network of the intermediate semantic segmentation model at least by adopting the combination features; and obtaining the weak supervision semantic segmentation model after finishing the backbone network training.

Optionally, the feature aligning the first class of features with the second class of features includes: feature aligning the first class of features with the second class of features using a square root distance.

Optionally, the first class feature obtained by classifying the weakly labeled sample data by using the classification network includes: inputting the sample data of the weak label into a convolutional neural network of the classification network to obtain a characteristic diagram of a full graph corresponding to each sample picture; carrying out dimensionality adjustment on the feature map of the full map through a full connection layer in the classification network to obtain an N-dimensional vector, wherein the N-dimensional vector is used for representing class features, the dimensionality of the N-dimensional vector is the same as the characteristic dimensionality output by the weak supervision semantic segmentation model, and N is a positive integer; and obtaining the first class features according to the N-dimensional vectors corresponding to all the sample pictures.

Optionally, the training the initial semantic segmentation model by using at least the combined features to obtain the weakly supervised semantic segmentation model includes: and training the initial semantic segmentation model by adopting the combination characteristics and the labeled semantic segmentation sample data to obtain the weakly supervised semantic segmentation model.

The embodiment of the present invention further provides a training device for a weakly supervised semantic segmentation model, including: the system comprises a first classification unit, a second classification unit and a third classification unit, wherein the first classification unit is used for classifying weak label sample data by adopting a classification network to obtain first classification characteristics, and the weak label sample data comprises a plurality of sample pictures; the second classification unit is used for classifying the weak label sample data by adopting an initial semantic segmentation model to obtain a second classification characteristic; the feature alignment unit is used for performing feature alignment on the first class features and the second class features; a combined feature determining unit, configured to obtain a combined feature according to a feature alignment result of the first class feature and the second class feature, where the combined feature includes a classification feature that cannot be identified by the initial semantic segmentation model but can be identified by the classification network; and the training unit is used for training the initial semantic segmentation model at least by adopting the combined features to obtain the weak supervision semantic segmentation model.

An embodiment of the present invention further provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of any one of the above methods for training the weakly supervised semantic segmentation model.

The embodiment of the present invention further provides a terminal, which includes a memory and a processor, where the memory stores a computer program capable of running on the processor, and the processor executes the steps of any one of the above training methods for the weakly supervised semantic segmentation model when running the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the method comprises the steps of classifying weak label sample data by a classification network to obtain first class features, classifying the weak label sample data by an initial semantic segmentation model to obtain second class features, aligning the first class features with the second class features, obtaining combination features according to feature alignment results of the first class features and the second class features, and training the initial semantic segmentation model by the combination features to obtain the weak supervision semantic segmentation model as the combination features comprise classification features which cannot be identified by the initial semantic segmentation model but can be identified by the classification network. The combined features used for training the weak supervision semantic segmentation model are combined with the first class features obtained by the classification network and the feature alignment results of the first class features and the second class features, so that the weak supervision semantic segmentation model trained by the combined features is used for performing semantic segmentation to obtain the segmentation graph.

Drawings

FIG. 1 is a flowchart of a method for training a weakly supervised semantic segmentation model in an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a training apparatus for a weakly supervised semantic segmentation model in an embodiment of the present invention.

Detailed Description

As described above, conventionally, strong labels at a pixel level are inferred from classification results of weak labels through a visualization technology, and then a semantic segmentation model is trained by using the strong labels. However, in practice, the visualization technology only relying on weak labels is usually biased to cover the most discriminative region in the image, and it is difficult to infer a strong label with clear and accurate boundary, so that the boundary definition of the semantic segmentation result obtained by the trained semantic segmentation model during semantic segmentation is poor.

In order to solve the above problem, in an embodiment of the present invention, a classification network is used to classify a first class feature obtained by weak tag sample data, an initial semantic segmentation model is used to classify a second class feature obtained by the weak tag sample data, the first class feature and the second class feature are feature-aligned, a combined feature is obtained according to a feature alignment result of the first class feature and the second class feature, and since the combined feature includes a classification feature that the initial semantic segmentation model cannot recognize but the classification network can recognize, the initial semantic segmentation model is trained by using the combined feature, and the weak supervised semantic segmentation model is obtained. The combined features used for training the weak supervision semantic segmentation model are combined with the first class features obtained by the classification network and the feature alignment results of the first class features and the second class features, so that the weak supervision semantic segmentation model is trained by using the combined features to construct a corresponding segmentation map.

In order to make the aforementioned objects, features and advantages of the embodiments of the present invention more comprehensible, specific embodiments accompanied with figures are described in detail below.

The embodiment of the invention provides a training method of a weakly supervised semantic segmentation model, and with reference to fig. 1, a flowchart of the training method of the weakly supervised semantic segmentation model in the embodiment of the invention is given, and the training method of the weakly supervised semantic segmentation model specifically includes the following steps:

and S101, classifying the weak label sample data by adopting a classification network to obtain a first class characteristic.

In specific implementation, the weakly labeled sample data may refer to data in a case where only a part of categories are labeled and all information is not completely labeled; or only all category labels can be provided, and data of specific positions of objects in the picture sample are not provided; but also data of annotations at the image level, etc.

In some embodiments, the weakly tagged sample data comprises a number of sample pictures. In order to simplify the training process of the weakly labeled semantic segmentation model, each sample picture in the adopted weakly labeled sample data may only contain an object corresponding to one type. For example, one sample picture contains a cat as an object, and the corresponding category is a cat. For another example, an object included in a sample picture is a dog, and the corresponding category is a dog.

It is understood that a sample image may also contain objects corresponding to multiple categories, such as cats and dogs.

In a specific implementation, the classification network may be obtained by training a deep neural network, wherein the deep neural network may include a convolutional neural network or the like.

In some non-limiting embodiments, taking a classification network obtained by training a convolutional neural network as an example, the step of classifying the weak label sample data by using the classification network to obtain the first class feature may specifically include the following steps: inputting the sample data of the weak label into a convolutional neural network of the classification network to obtain a characteristic diagram of a full graph corresponding to each sample picture; carrying out dimensionality adjustment on the feature map of the full map through a full connection layer in the classification network to obtain an N-dimensional vector, wherein the N-dimensional vector is used for representing class features, the dimensionality of the N-dimensional vector is the same as the characteristic dimensionality output by the weak supervision semantic segmentation model, and N is a positive integer; and obtaining the first class features according to the N-dimensional vectors corresponding to all the sample pictures. And adjusting the dimensionality of the vector obtained by the classification network to be the same as the dimensionality of the feature output by the weak supervision semantic segmentation model, so that feature alignment can be conveniently carried out on the first class feature and the second class feature subsequently.

In some embodiments, when each sample picture in the weakly labeled sample data corresponds to a category, each sample picture corresponds to a vector, and one vector represents a feature of one category.

In particular, the classification network may include a backbone network and a classification header. After the weak label sample data is input into the convolutional neural network of the classification network, the backbone network of the classification network can respectively extract the features of each sample picture in the weak label sample data to obtain the feature map of the full map corresponding to each sample picture in the weak label sample data. The classification head of the classification network can classify the feature map of the whole image according to the feature map extracted by the backbone network, and further obtain the category of each sample image.

And S102, classifying the weakly labeled sample data by adopting an initial semantic segmentation model to obtain second class characteristics.

In a specific implementation, the weakly supervised semantic segmentation model may be derived based on deep neural network training. The deep neural network may include a convolutional neural network, or the like.

Inputting the weak label sample data into a convolutional neural network of an initial semantic segmentation model, respectively extracting the features of each sample picture in the weak label sample data through a backbone network of the initial semantic segmentation model to obtain a feature map of a full map of each sample picture, and classifying the feature map of the full map of each sample picture by a classification head of the initial semantic segmentation model to obtain a second class feature.

It should be noted that the "first" in the first class features and the "second" in the second class features are obtained for distinguishing whether the class features are obtained by a classification network or an initial semantic segmentation model. Neither the first category of features nor the second category of features is specific to a particular feature and may include a plurality of categories of features.

Step S103, performing feature alignment on the first class features and the second class features.

In a specific implementation, the first class features output by the classification network may be aligned with the second class features output by the initial semantic segmentation model in a mathematical manner.

In some embodiments, the first class of features may be feature aligned with the second class of features using square root distances. It is understood that the first class features and the second class features may be aligned by euclidean distance or other methods, which are not illustrated here.

And step S104, obtaining combined features according to the feature alignment result of the first class features and the second class features.

The combined features may include classification features that the initial semantic segmentation model cannot recognize but that the classification network can recognize.

In some embodiments, the classification network is trained according to feature alignment results of the first class features and the second class features; and classifying the weak label sample data by adopting a trained classification network, and combining the classified class characteristics to obtain the combined characteristics. By performing feature alignment on the first class features and the second class features and training the classification network according to the feature alignment result, the classification network can learn how the weakly supervised semantic segmentation model distinguishes objects of different classes and learn the characteristics of the second classification features obtained by the weakly supervised semantic segmentation model, so that the classification network can generate features similar to the weakly supervised semantic segmentation model on any class.

When the category features obtained by the classification network are combined to obtain the combined features, the feature vectors corresponding to all the category features obtained by the classification network can be spliced randomly to obtain the combined features. Other ways of combining the class features may be adopted as long as the obtained combined features include all the class features obtained by the classification network.

And S105, training the initial semantic segmentation model by at least adopting the combined features to obtain the weakly supervised semantic segmentation model.

In specific implementation, the initial semantic segmentation model is trained by combining the combined features, the combined features can guide the initial semantic segmentation model to recognize the features of classes which are not marked in the obtained segmentation data, so that the features which can be recognized by a classification network and cannot be recognized by the initial semantic segmentation model temporarily are transplanted into the initial semantic segmentation model, and all classes can be recognized by the weakly supervised semantic segmentation model obtained by training.

In some embodiments, the weakly supervised semantic segmentation model may be trained as follows: fixing backbone network parameters of the initial semantic segmentation model, and training a classification head of the initial semantic segmentation model at least by adopting the combined features; and obtaining the weak supervision semantic segmentation model according to the semantic segmentation model obtained after the training of the classification head is completed.

In order to further improve the robustness of the weak supervision semantic segmentation model, an intermediate semantic segmentation model is obtained after the training of a classification head is finished, the parameters of the classification head of the intermediate semantic segmentation model are fixed, and the backbone network of the intermediate semantic segmentation model is trained at least by adopting the combination characteristics; and obtaining the weak supervision semantic segmentation model after finishing the backbone network training. After training of the classification head of the initial semantic segmentation model is completed, parameters of the classification head obtained after training are fixed, and a backbone network of the initial semantic segmentation model is trained, so that the backbone network of the weakly supervised semantic segmentation model can adapt to the classification head obtained by training, the backbone network of the weakly supervised semantic segmentation model is more adaptive to the classification head, and the stability and robustness of the semantic segmentation effect of the weakly supervised semantic segmentation model are improved.

Further, training data adopted in training the weakly supervised semantic segmentation model also comprises labeled semantic segmentation data on the basis of the combination features. The combined features have corresponding categories with respect to the classification network, but there are no corresponding semantically segmented categories for the initial semantic segmentation model. The weak supervision semantic segmentation model is trained by adopting the combined features and the labeled semantic segmentation data, the labeled data is subjected to the initial semantic segmentation model to obtain a feature map, and the feature map only contains the labeled categories, so that the classification head of the initial semantic segmentation model can be guided to identify the labeled categories. The combined features are formed by combining features generated by a classification network, and the classification network is obtained by training based on the alignment results of the first classification features and the second classification features, so that the classification network can learn the characteristics of the features obtained by the segmentation of the initial semantic segmentation model, can obtain supervision information of all categories through the combined features, and can guide a classification head of the initial semantic segmentation model to identify the categories which are not marked in the segmentation data. The combined features and the features which can be identified by the initial semantic segmentation model are used together to train the classification head of the combined features, so that the classification head of the initial semantic segmentation model can identify all classes. After the classification head of the initial semantic segmentation model is trained, the initial semantic segmentation model can identify all classes, but in order to make the effect more stable, the backbone network of the initial semantic segmentation model is expected to adapt to the new classification head, so that the parameters of the classification head are fixed to train the backbone network of the initial semantic segmentation model, and after the training of the backbone network is completed, the weakly supervised semantic segmentation model can be obtained.

As can be seen from the above, a classification network is used to classify weak tag sample data to obtain a first class feature, an initial semantic segmentation model is used to classify the weak tag sample data to obtain a second class feature, the first class feature and the second class feature are subjected to feature alignment, a combined feature is obtained according to a feature alignment result of the first class feature and the second class feature, and the combined feature includes a classification feature that the initial semantic segmentation model cannot recognize but can be recognized by the classification network, and the initial semantic segmentation model is trained by using the combined feature to obtain the weak supervision semantic segmentation model. The combined features used for training the weak supervision semantic segmentation model are combined with the first class features obtained by the classification network and the feature alignment results of the first class features and the second class features, so that the weak supervision semantic segmentation model trained by the combined features is used for performing semantic segmentation to obtain the segmentation graph.

In order to facilitate those skilled in the art to better understand and implement the embodiment of the present invention, the embodiment of the present invention further provides a schematic structural diagram of a training apparatus for a weakly supervised semantic segmentation model.

Referring to fig. 2, a schematic structural diagram of a training apparatus of a weakly supervised semantic segmentation model in an embodiment of the present invention is given, where the training apparatus 20 of the weakly supervised semantic segmentation model may include:

the system comprises a first classification unit 21, a second classification unit, a third classification unit and a fourth classification unit, wherein the first classification unit is used for classifying weak label sample data by adopting a classification network to obtain first class characteristics, and the weak label sample data comprises a plurality of sample pictures;

the second classification unit 22 is configured to classify the weak label sample data by using an initial semantic segmentation model to obtain a second class feature;

a feature alignment unit 23, configured to perform feature alignment on the first class features and the second class features;

a combined feature determining unit 24, configured to obtain a combined feature according to a feature alignment result of the first class feature and the second class feature, where the combined feature includes a classification feature that cannot be identified by the initial semantic segmentation model but can be identified by the classification network;

and the training unit 25 is configured to train the initial semantic segmentation model at least by using the combined features to obtain the weakly supervised semantic segmentation model.

In a specific implementation, the specific working principle and the working process of the training apparatus 20 for the weak supervised semantic segmentation model may refer to the description of the training method for the weak supervised semantic segmentation model provided in any of the embodiments of the present invention, and are not described herein again.

An embodiment of the present invention further provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the training method for the weakly supervised semantic segmentation model in any of the above embodiments.

The embodiment of the present invention further provides a terminal, which includes a memory and a processor, where the memory stores a computer program capable of running on the processor, and the processor executes the steps of the training method for the weakly supervised semantic segmentation model in any of the above embodiments when running the computer program.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in any computer readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A training method of a weakly supervised semantic segmentation model is characterized by comprising the following steps:

classifying weak label sample data by adopting a classification network to obtain a first class characteristic, wherein the weak label sample data comprises a plurality of sample pictures;

classifying the weak label sample data by adopting an initial semantic segmentation model to obtain a second class characteristic;

feature aligning the first class of features with the second class of features;

obtaining combined features according to feature alignment results of the first category features and the second category features, wherein the combined features comprise classification features which cannot be identified by the initial semantic segmentation model but can be identified by the classification network;

and training the initial semantic segmentation model by at least adopting the combined features to obtain the weakly supervised semantic segmentation model.

2. The method for training the weakly supervised semantic segmentation model according to claim 1, wherein the obtaining the combined feature according to the feature alignment result of the first class feature and the second class feature comprises:

training the classification network according to the feature alignment result of the first class feature and the second class feature;

and classifying the weak label sample data by adopting a trained classification network, and combining the classified class characteristics to obtain the combined characteristics.

3. The method for training a weakly supervised semantic segmentation model according to claim 1 or 2, wherein the training the initial semantic segmentation model using at least the combined features to obtain the weakly supervised semantic segmentation model comprises:

fixing backbone network parameters of the initial semantic segmentation model, and training a classification head of the initial semantic segmentation model at least by adopting the combined features;

and obtaining the weak supervision semantic segmentation model according to the semantic segmentation model obtained after the training of the classification head is completed.

4. The method for training the weakly supervised semantic segmentation model according to claim 3, wherein the obtaining the weakly supervised semantic segmentation model based on the semantic segmentation model obtained after the training of the classification head comprises:

obtaining an intermediate semantic segmentation model after finishing training the classification head, fixing parameters of the classification head of the intermediate semantic segmentation model, and training a backbone network of the intermediate semantic segmentation model at least by adopting the combination features;

and obtaining the weak supervision semantic segmentation model after finishing the backbone network training.

5. The method of training a weakly supervised semantic segmentation model as recited in claim 1, wherein the feature aligning the first class of features with the second class of features comprises:

feature aligning the first class of features with the second class of features using a square root distance.

6. The method for training the weakly supervised semantic segmentation model according to claim 1, wherein the first class features obtained by classifying the weakly labeled sample data by using the classification network include:

inputting the sample data of the weak label into a convolutional neural network of the classification network to obtain a characteristic diagram of a full graph corresponding to each sample picture;

carrying out dimensionality adjustment on the feature map of the full map through a full connection layer in the classification network to obtain an N-dimensional vector, wherein the N-dimensional vector is used for representing class features, the dimensionality of the N-dimensional vector is the same as the characteristic dimensionality output by the weak supervision semantic segmentation model, and N is a positive integer;

and obtaining the first class features according to the N-dimensional vectors corresponding to all the sample pictures.

7. The method for training a weakly supervised semantic segmentation model according to claim 1, wherein the training the initial semantic segmentation model using at least the combined features to obtain the weakly supervised semantic segmentation model comprises:

and training the initial semantic segmentation model by adopting the combination characteristics and the labeled semantic segmentation sample data to obtain the weakly supervised semantic segmentation model.

8. A training device for a weakly supervised semantic segmentation model is characterized by comprising:

the system comprises a first classification unit, a second classification unit and a third classification unit, wherein the first classification unit is used for classifying weak label sample data by adopting a classification network to obtain first classification characteristics, and the weak label sample data comprises a plurality of sample pictures;

the second classification unit is used for classifying the weak label sample data by adopting an initial semantic segmentation model to obtain a second classification characteristic;

the feature alignment unit is used for performing feature alignment on the first class features and the second class features;

a combined feature determining unit, configured to obtain a combined feature according to a feature alignment result of the first class feature and the second class feature, where the combined feature includes a classification feature that cannot be identified by the initial semantic segmentation model but can be identified by the classification network;

and the training unit is used for training the initial semantic segmentation model at least by adopting the combined features to obtain the weak supervision semantic segmentation model.

9. A computer-readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, is adapted to perform the steps of the method for training a weakly supervised semantic segmentation model according to any one of the claims 1 to 7.

10. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the method of training a weakly supervised semantic segmentation model of any of claims 1 to 7.