CN115082676A

CN115082676A - Method, device and equipment for training pseudo label model and storage medium

Info

Publication number: CN115082676A
Application number: CN202210681690.6A
Authority: CN
Inventors: 王钰超; 费敬敬; 李韡; 吴立威
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-20

Abstract

The embodiment of the disclosure provides a training method, a device, equipment and a storage medium of a pseudo label model, wherein the method comprises the following steps: acquiring label information and edge information of a sample image; obtaining a first classification prediction result and a first edge prediction result of the sample image, wherein the first classification prediction result and the first edge prediction result are obtained by prediction of a pseudo label model to be trained; determining a first network loss according to the difference between the first classification prediction result and the label information; determining a second network loss according to the difference between the first edge prediction result and the edge information; and adjusting the network parameters of the pseudo label model to be trained according to the first network loss and the second loss until a model training end condition is reached, and obtaining the pseudo label model. The pseudo label model obtained by training in the method can utilize the predicted edge information to guide and correct the predicted label information for the label-free image, and further a more reliable pseudo label is obtained.

Description

Method, device and equipment for training pseudo label model and storage medium

Technical Field

The disclosure relates to the technical field of deep learning, in particular to a training method, a training device, a training apparatus and a storage medium for a pseudo tag model.

Background

With the development of deep learning, semantic segmentation tasks have been widely researched, and the semantic segmentation algorithm based on supervised learning continuously refreshes the report accuracy on reference data sets such as cityscaps (urban landscape data sets), Pascal VOCs (target detection data sets) and the like. However, for the supervised learning based semantic segmentation task, a large number of sample images in the training dataset used in training need to be subjected to high quality pixel-level labeling, which implies expensive labeling cost and long labeling time.

Semi-Supervised Learning (SSL) is a Learning method combining Supervised Learning and unsupervised Learning. Semi-supervised learning uses a small number of labeled images and a large number of unlabeled images in the training process. The method has the core that a large number of unlabelled images are effectively utilized as supplements of the labeled images, and the precision of the model obtained by training is improved. Semi-Supervised Semantic Segmentation applies Semi-Supervised learning to Semantic Segmentation tasks to alleviate the dependence of Supervised learning based Semantic Segmentation tasks on high quality labeled training data sets. The key of the semi-supervised semantic segmentation is to mark a Pseudo Label (Pseudo Label) on each pixel of an unlabeled image in a training data set, and the higher the accuracy and reliability of the Pseudo Label are, the higher the precision of a trained model is.

In the related art, a common method is Self-learning (Self-learning), that is, a model is trained by using labeled images which are labeled, then the model is used for labeling and predicting unlabeled images, highly reliable predictions are selected as pseudo labels, but a certain number of unreliable pseudo labels exist in the obtained pseudo labels, and model Training of semantic segmentation tasks performed on data sets labeled by the pseudo labels leads to model degradation, even the model develops towards a wrong direction, and the precision of a semi-supervised semantic segmentation algorithm is restricted.

Disclosure of Invention

In view of the above, embodiments of the present disclosure provide training, apparatus, device and storage medium for at least one pseudo tag model.

Specifically, the embodiment of the present disclosure is implemented by the following technical solutions:

in a first aspect, a method for training a pseudo label model is provided, where the method includes:

obtaining label information and edge information of a sample image, wherein the label information comprises information of a segmentation mask of the sample image, the edge information comprises information of the edge of the segmentation mask, and the segmentation mask is used for marking the category of each pixel in the sample image;

obtaining a first classification prediction result and a first edge prediction result of the sample image, wherein the first classification prediction result and the first edge prediction result are obtained by prediction of a pseudo label model to be trained;

determining a first network loss according to the difference between the first classification prediction result and the label information;

determining a second network loss according to the difference between the first edge prediction result and the edge information;

and adjusting the network parameters of the pseudo label model to be trained according to the first network loss and the second loss until a model training end condition is reached, and obtaining the pseudo label model.

In a second aspect, a training method for a semi-supervised semantic segmentation network is provided, where a training sample set of the semantic segmentation network includes: a plurality of unlabeled images; the method comprises the following steps:

acquiring a pseudo label model obtained by training through the training method of the pseudo label model;

inputting the label-free image into the pseudo label model to obtain a corresponding second prediction result and a second edge prediction result;

according to the second edge prediction result, correcting the second classification prediction result to obtain pseudo label information corresponding to the label-free image;

and training the semantic segmentation network according to the unlabeled image and the corresponding pseudo label information thereof.

In a third aspect, an apparatus for training a pseudo tag model is provided, the apparatus comprising:

a training data acquisition module to: obtaining label information and edge information of a sample image, wherein the label information comprises information of a segmentation mask of the sample image, the edge information comprises information of the edge of the segmentation mask, and the segmentation mask is used for marking the category of each pixel in the sample image; obtaining a first classification prediction result and a first edge prediction result of the sample image, wherein the first classification prediction result and the first edge prediction result are obtained by prediction of a pseudo label model to be trained;

a network loss determination module to: determining a first network loss according to the difference between the first classification prediction result and the label information; determining a second network loss according to the difference between the first edge prediction result and the edge information;

a network parameter adjustment module to: and adjusting the network parameters of the pseudo label model to be trained according to the first network loss and the second loss until a model training end condition is reached, and obtaining the pseudo label model.

In a fourth aspect, a training apparatus for a semi-supervised semantic segmentation network is provided, where a training sample set of the semantic segmentation network includes: a plurality of unlabeled images; the device comprises:

a model acquisition module to: acquiring a pseudo label model obtained by training through the training method of the pseudo label model;

a model prediction module to: inputting the label-free image into the pseudo label model to obtain a corresponding second prediction result and a second edge prediction result;

a label correction module to: according to the second edge prediction result, correcting the second classification prediction result to obtain pseudo label information corresponding to the label-free image;

a network training module to: and training the semantic segmentation network according to the unlabeled image and the corresponding pseudo label information thereof.

In a fifth aspect, an electronic device includes a memory for storing computer instructions executable on a processor, and the processor is configured to implement a training method of a pseudo label model or a training method of a semi-supervised semantic segmentation network according to any one of the embodiments of the present disclosure when executing the computer instructions.

In a sixth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements a method for training a pseudo label model or a method for training a semi-supervised semantic segmentation network according to any of the embodiments of the present disclosure.

According to the training method of the pseudo label model, provided by the technical scheme of the embodiment of the disclosure, the label information and the edge information of the sample image are used as supervision information to be trained to obtain the pseudo label model, and the pseudo label model is used for predicting to obtain a classification prediction result and an edge prediction result, so that the predicted classification prediction result can be guided and corrected by using the predicted classification prediction result for the label-free image, a more reliable pseudo label is obtained, the information provided by the limited label of the sample image can be fully utilized, the algorithm precision is improved, and the labeling cost is greatly saved.

Drawings

In order to more clearly illustrate one or more embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1 is a schematic diagram illustrating a semantic segmentation network training method in accordance with at least one embodiment of the present disclosure;

FIG. 2 is a flow diagram of a method for training a pseudo label model, according to at least one embodiment of the present disclosure;

FIG. 3 illustrates a sample image and its corresponding segmentation mask in accordance with at least one embodiment of the present disclosure;

FIG. 4 illustrates a segmentation mask and its corresponding edges in accordance with at least one embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating training of a pseudo label model based on an encoder-decoder network architecture in accordance with at least one embodiment of the present disclosure;

fig. 6 is a flow chart illustrating a method of pseudo tag generation in accordance with at least one embodiment of the present disclosure;

FIG. 7 is an illustration of an unlabeled image and its corresponding pseudo label in accordance with at least one embodiment of the present disclosure;

FIG. 8 is a flow diagram illustrating a method of training a semi-supervised semantic segmentation network in accordance with at least one embodiment of the present disclosure;

FIG. 9 is a block diagram of a training apparatus for pseudo label models, according to at least one embodiment of the present disclosure;

FIG. 10 is a block diagram of a training apparatus of a semi-supervised semantic segmentation network in accordance with at least one embodiment of the present disclosure;

fig. 11 is a schematic hardware structure diagram of an electronic device according to at least one embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Semantic segmentation is to classify the image at the pixel level and label each pixel with its corresponding class. General semantic segmentation method, as an example, as shown in fig. 1, a labeled image required for training a semantic segmentation network generally consists of the following two parts: three-channel RGB image and true label of semantic segmentation. The truth labels comprise a segmentation mask of the labeled image, the segmentation mask is labeled with the category of each pixel on the labeled image and is generally used as supervision information, namely as a prediction result expected to be output by the semantic segmentation network to supervise the training of the semantic segmentation network.

In the semi-supervised semantic segmentation task, image data in a large proportion in a training data set are non-label images, the non-label images lack true labels, and only a small part of labeled images contain true labels. In view of this, at least one embodiment of the present disclosure provides a training method for a pseudo tag model and a training method for a semi-supervised semantic segmentation network, so as to fully utilize tag information of a tagged sample image, and provide a high-quality pseudo tag with higher accuracy and reliability for a non-tagged image, thereby improving the performance of the whole semantic segmentation task.

As shown in fig. 2, fig. 2 is a flowchart illustrating a training method of a pseudo label model according to at least one embodiment of the present disclosure, where the training method may include the following steps:

in step 102, label information and edge information of the specimen image are acquired.

In this embodiment, the label information and the edge information of the sample image are used as the supervision information to train the pseudo label model.

The sample image is an image labeled with a truth label, the truth label is a category label labeled to each pixel in the image based on semantic understanding in advance, and the truth label can be represented by using a segmentation mask. The label information of the truth label includes information of a segmentation mask of the sample image, and the segmentation mask is used for marking the category of each pixel in the sample image.

Specifically, the segmentation mask may be a two-dimensional matrix having a size corresponding to the sample image, and different belonging categories of the respective pixels are represented by different colors. As shown in fig. 3, the left side of fig. 3 is a sample image of a knight horse, and the right side is a segmentation mask of the sample image, wherein three categories of pixels in the sample image are marked in the segmentation mask: a knight category, a horse category, and a background category. Wherein different classes are represented by different gray values, and pixels of the same gray value belong to the same class.

The edge information includes information of edges of the segmentation mask, the edge information identifying locations of real edges in the segmentation mask. On the division mask, a boundary, i.e., an edge, exists between regions composed of pixels of different classes.

The embodiment does not limit the specific acquisition manner of the label information and the edge information of the sample image. For example, the image may be obtained by manually labeling the image, or may be obtained from an existing training data set.

In one embodiment, the information of the edges of the segmentation mask may be obtained by performing image algorithm processing on the segmentation mask. In order to not introduce additional training overhead, the step can directly use an image graphics method without training cost to process the segmentation mask to generate an edge, and the edge is used for providing additional supervision information for the next step.

In one example, the conventional image processing algorithm may be used, for example, the segmentation mask is filled in a pattern to obtain information of the edge of the segmentation mask, and specifically, the graph filling method is used to perform erosion and expansion operations on the segmentation mask through a function algorithm to generate the edge; alternatively, the segmentation mask may be input and the edge may be output by using an existing edge extraction model. Fig. 4 shows an edge obtained by pattern-filling the division mask.

In step 104, a first classification prediction result and a first edge prediction result of the sample image are obtained.

And the first classification prediction result and the first edge prediction result are obtained by predicting through a pseudo label model to be trained.

In this embodiment, the pseudo tag model to be trained may be obtained by improving the architecture of the existing semantic segmentation model, and the difference is that, compared with the output of the existing semantic segmentation model, the output of the pseudo tag model is a true value tag obtained by prediction, that is, a classification prediction result of a sample image, the output of the pseudo tag model is two branches: a first classification prediction result and a first edge prediction result of the sample image.

The present embodiment does not limit the specific network structure of the pseudo tag model. For example, an encoder-decoder network structure may be adopted, where the encoder may be a VGG (residual neural network), ResNet (deep convolutional network), or other types of neural networks, and the decoder may be capable of semantically projecting the discriminable features learned by the encoder onto the pixel space to obtain dense classification, and may be a deep lab v3 (deep neural network), or other networks. For another example, a network structure of a full convolution neural network may be employed.

The following describes a training method of the pseudo tag model according to this embodiment, taking the pseudo tag model of the encoder-decoder network structure as an example, with reference to the training flow of the pseudo tag model shown in fig. 5.

In this step, the obtaining mode may be: inputting a sample image into a pseudo label model to be trained, simultaneously performing segmentation prediction and edge prediction on the pseudo label model, outputting two prediction results, performing segmentation prediction to obtain a first classification prediction result of the sample image, and performing edge prediction to obtain a first edge prediction result of the sample image. The first classification prediction result and the first edge prediction result obtained by performing the above-described processing on the sample image by another device may be acquired.

The first classification prediction result includes information of a segmentation mask obtained by predicting the sample image, and the predicted segmentation mask is used for marking prediction classes of each pixel of the sample image. The first edge prediction result includes information of edges of a segmentation mask obtained by predicting the sample image.

The position of the edge of the divided mask in the first edge prediction result is not the same as the actual position of the predicted edge of the divided mask, and both are predicted based on the sample image.

In step 106, a first network loss is determined based on a difference between the first classification prediction result and the label information.

For partition prediction, label information of truth labels is used here to supervise it. The first network loss may be calculated by a loss function. The loss function is used to determine a gap between the first classification prediction result actually output by the network and the label information expected to be output, and the embodiment does not limit what kind of loss function is specifically used. For example, a quantile loss function, a mean square error loss function, or a cross entropy loss function may be used. By optimizing the first network loss, the first classification prediction result predicted by the model can be gradually close to the label information in the truth label.

In one example, the optimization may be performed using a cross-entropy loss function, the first network loss

The calculation formula of (a) is as follows:

wherein, P _ij Denotes the probability that a pixel in the sample image at the ith row and jth column belongs to a certain class of the segmentation mask, Y _ij The label information representing the true label of this pixel identifies the class to which the pixel actually belongs. H denotes the height of the sample image, W denotes the width of the sample image, and the unit is pixels.

In step 108, a second network loss is determined based on a difference between the first edge prediction result and the edge information.

For edge prediction, this is supervised using previously acquired edge information, which contains information of the real edges of the segmentation mask. The second network loss may be calculated by a loss function. The loss function is used to determine a gap between the first edge prediction result actually output by the network and the edge information expected to be output, and the embodiment does not limit what loss function is specifically used. For example, a quantile loss function, a mean square error loss function, or a cross entropy loss function may be used. By optimizing the second network loss, the first edge prediction result predicted by the model can be gradually close to the expected edge information.

In one embodiment, for pixels located at an edge of a segmentation mask among the pixels of the sample image, the second network loss is determined according to a difference between a first edge prediction result of the pixels and edge information. In another embodiment, the second net loss may be determined for other pixels than the background class belonging to the division mask, based on a difference between the first edge prediction result of the pixel and the edge information.

In one example, a modified version of the cross entropy loss function can be used to supervise it, where the supervised is the pixels in the sample image at the edge locations of the segmentation mask. Second network loss

The calculation formula of (a) is as follows:

where H denotes the height of the sample image, W denotes the width of the sample image, and the unit is pixels. Unlike equation (1), the classes in equation (2) are not the individual classes identified by the segmentation mask, but refer to the edge class and the background class, the pixels in the sample image at the edge position identified in the edge information belong to the edge class, and the other pixels outside and inside the edge belong to the background class, which is different from the meaning of the background class in the segmentation mask, here denoted by C _background Representing a background category. P is _ij (. X) represents the probability that a pixel in the sample image at row i and column j belongs to an edge class, Y _ij The class to which the pixel actually belongs, i.e. the edge class or the background class, is identified by the edge information representing this pixel. I (-) is an indicator function, when the condition in parentheses is satisfied, i.e. Y _ij Indicating that the current pixel is an edge and having a value of 1, otherwise 0.

By the indicating function, prediction errors without calculating background classes are limited, only prediction of edges is focused, and the phenomenon that too many pixels are involved in loss calculation to cause

Too large a number results in the inconvenience of unstable training.

In step 110, network parameters of the pseudo label model to be trained are adjusted according to the first network loss and the second loss until a model training end condition is reached, and the pseudo label model is obtained.

In one example, in each training round, the first network loss and the second network loss are subjected to weighted summation to obtain a total network loss; and adjusting the network parameters of the pseudo label model according to the total network loss.

For example, the first network loss and the second network loss may be weighted and summed according to a certain weight ratio to obtain a total loss value:

where λ is a coefficient for balancing the magnitudes of both.

By optimizing

Minimizing the overall loss enables the pseudo label model to have the capability of preliminary prediction. For example, iteration of gradient descent is performed through back propagation to adjust the network parameters of the pseudo tag model.

And when the network iteration ending condition is reached, ending the network training to obtain the trained pseudo label model. Wherein the ending condition may be that the iteration reaches a certain number of times, or that the loss value is less than a certain threshold value.

In one example, in each round of training of the first-stage training, network parameters of the pseudo tag model are adjusted according to the first network loss until the first-stage training is completed, and an adjusted pseudo tag model is obtained; and in each round of training of the second-stage training, adjusting the network parameters of the pseudo label model after the first-stage training according to the second network loss until the second-stage training is completed.

For example, the network parameters are adjusted during each training in the first training stage, so that the loss value of the first network loss is gradually reduced until the loss value is lower than the preset loss value or reaches the preset iteration number, the pseudo label model after the training in the first stage is obtained, and the training in the second stage is continued by using the pseudo label model. And then, adjusting the network parameters during each round of training in the second training stage to gradually reduce the loss value of the second network loss until the loss value is lower than the preset loss value or reaches the preset iteration times, so as to obtain the trained pseudo label model.

In another example, the network parameters may be adjusted in the first training phase to gradually decrease the loss value of the second network loss, and then the network parameters may be adjusted in the second training phase to gradually decrease the loss value of the first network loss.

According to the training method of the pseudo label model, provided by the technical scheme of the embodiment of the disclosure, the label information and the edge information of the sample image are used as supervision information to be trained to obtain the pseudo label model, and the pseudo label model is used for predicting to obtain a classification prediction result and an edge prediction result, so that the predicted classification prediction result can be guided and corrected by using the predicted edge prediction result for the label-free image, a more reliable pseudo label is obtained, the information provided by the limited label of the sample image can be fully utilized, the algorithm precision is improved, and the labeling cost is greatly saved.

The following describes an application method of the pseudo tag model obtained in the above embodiment.

As shown in fig. 6, fig. 6 is a flowchart of a pseudo tag generation method according to at least one embodiment of the present disclosure, where the method needs to use a pseudo tag model obtained by training in the previous embodiment, and may include the following steps:

in step 202, a label-free image is input into a pseudo label model, and a classification prediction result and an edge prediction result of the label-free image are obtained.

Inputting the label-free image into a pseudo label model, and performing segmentation prediction and edge prediction on the label-free image by the pseudo label model to respectively obtain a classification prediction result and an edge prediction result of the label image.

The classification prediction result comprises information of a pseudo segmentation mask of the predicted label-free image, and the pseudo segmentation mask is used for marking the category of each pixel in the label-free image. The edge prediction result includes information of the predicted pseudo edge of the segmentation mask. The position of the pseudo edge of the division mask in the edge prediction result is not the same as the position of the predicted pseudo division mask edge, and both are predicted based on the unlabeled image.

In step 204, the classification prediction result is corrected according to the edge prediction result, so as to obtain pseudo label information of the label-free image.

The classification prediction result output by the pseudo label model is not reliable enough, the classification prediction result is used for correcting the pseudo label model, an unreasonable boundary area is optimized and filtered, the optimized pseudo label is obtained, auxiliary edge guidance is used as additional supervision information, and the accuracy and the reliability of the pseudo label can be remarkably improved.

The edges of the pseudo-segmentation mask are not consistent with the predicted pseudo-edges, and in the step, optimization can be performed by combining the information given by the edges and the predicted pseudo-edges.

For example, the pseudo segmentation mask of the classification prediction result is removed from the part of the pseudo segmentation mask exceeding the pseudo edge of the edge prediction result to obtain a modified pseudo segmentation mask, and the segmentation prediction is limited within the edge prediction range. For example, referring to the unlabeled image and the optimized pseudo label shown in fig. 7, the left unlabeled image is an image of an airplane, in the image, a white gap exists between a pseudo edge and a pseudo segmentation mask, and when the classification prediction result is corrected by using the edge prediction result, an area within the pseudo edge is reserved, so that the shape of the pseudo label closer to the airplane in the unlabeled image is obtained.

After obtaining the pseudo label, the training of the neural network model may be performed using the pseudo label information of the unlabeled image as the supervision information, and the pseudo label information of the unlabeled image may be regarded as equivalent to the label information of the true label of the labeled image. Generally speaking, unlabeled images are easier and less expensive to acquire than labeled images, which allows for a larger number of training samples.

In one example, when training the neural network model, the optimization objective or loss function may be:

wherein the content of the first and second substances,

indicating the loss value, P, of an unlabeled image _ij Denotes that the pixel in ith row and jth column of the unlabeled image belongs to the segmentation maskProbability of a certain class of membrane, Y _ij The pseudo tag information representing this pixel identifies the class to which the pixel actually belongs. H represents the height of the unlabeled image, W represents the width of the unlabeled image, and the units are pixels

When the neural network model is actually trained, the unlabeled image and the labeled sample image can be mixed together for training.

According to the pseudo label generation method provided by the technical scheme of the embodiment of the disclosure, the non-label image is predicted through the pseudo label model to obtain the classification prediction result and the edge prediction result, the classification prediction result is guided and corrected by using the edge prediction result to obtain a more reliable pseudo label, a high-quality pseudo label with higher accuracy and reliability is given to the non-label image, and the performance of the whole semantic segmentation task is improved.

The complete training method of the semantic segmentation network based on semi-supervised learning is explained below.

FIG. 8 is a flowchart of a training method of a semi-supervised semantic segmentation network, the training sample set of the semantic segmentation network comprising: a plurality of unlabeled images.

For semi-supervised learning, only a small part of the used training sample set contains sample images with true-valued labels, but a large proportion of unlabeled images without true-valued labels, in order to fully utilize the small part of sample images, the embodiment uses edge guidance obtained from the true-valued labels as additional auxiliary supervision information, so as to improve the performance of the whole semantic segmentation task.

The method may comprise the steps of:

in step 302, a pseudo label model obtained by training the training method of the pseudo label model according to the above embodiment is obtained.

The pseudo label model is used for predicting label information and edge information of the image.

The pseudo label model may be obtained by improving the semantic segmentation network that is expected to be trained in the embodiment, and of course, other network models may be used without adopting such an improved method. For example, for a semantic segmentation network of an encoder-decoder network structure, a two-class structure may be added to the output of the original decoder, so that a branch of outputting edge information is added compared to the previous output of only tag information.

In the pseudo tag model training process of the embodiment, the label information and the edge information are used for supervision at the same time, and the network parameters of the pseudo tag model are continuously characterized, so that the first classification prediction result predicted by the pseudo tag model obtained through training is closer to the label information, and the first edge prediction result is closer to the edge information. And when the training condition is reached, obtaining a pseudo label model, and then, the weight parameters of each layer of network in the pseudo label model are fixed values and are not changed.

In step 304, the label-free image is input into the pseudo label model to obtain a corresponding second classification prediction result and a second edge prediction result.

And inputting the label-free image into the pseudo label model obtained in the last step, and outputting a second classification prediction result obtained by segmentation prediction and a second edge prediction result obtained by edge prediction.

And the second classification prediction result comprises the information of a pseudo segmentation mask of the predicted label-free image, and the pseudo segmentation mask is used for marking the category of each pixel in the label-free image. The second edge prediction result includes information of a predicted pseudo edge of the segmentation mask.

In step 306, the second classification prediction result is corrected according to the second edge prediction result, so as to obtain pseudo label information corresponding to the label-free image.

The second classification prediction result output by the pseudo label model is not reliable enough, the second classification prediction result is used for correcting the pseudo label model in the step, an unreasonable boundary area is optimized and filtered, optimized pseudo label information is obtained, auxiliary edge guidance is used as additional supervision information, and the accuracy and the reliability of the pseudo label can be remarkably improved.

The edges of the pseudo-segmentation mask are not consistent with the predicted pseudo-edges, and in the step, optimization can be performed by combining the information given by the edges and the predicted pseudo-edges. For example, the pseudo-segmentation mask of the classification prediction result is removed from the portion of the pseudo-segmentation mask exceeding the pseudo-edge of the edge prediction result to obtain a modified pseudo-segmentation mask, and the segmentation prediction is limited within the edge prediction range.

And optimizing and filtering unreasonable boundary areas of the divided masks through the combined processing of the second edge prediction result and the second classification prediction result to obtain the optimized pseudo labels. Therefore, a large number of label-free images obtain corresponding pseudo label information with higher reliability.

In step 308, the semantic segmentation network is trained according to the unlabeled image and the corresponding pseudo label information.

And training the optimized pseudo label as the supervision information of the label-free image.

For example, the unlabeled image is input into the semantic segmentation network to be trained to obtain the prediction segmentation result, the semantic segmentation network used in this embodiment may be selected by a person skilled in the art according to actual needs, and is not limited herein. And adjusting network parameters of the semantic segmentation network according to the difference between the predicted segmentation result and the pseudo tag information, calculating the difference between the predicted segmentation result and the pseudo tag information to obtain network loss, and continuously adjusting the network parameters of the semantic segmentation network to enable the network loss to be smaller and smaller until reaching an ending condition, wherein the process can enable the predicted segmentation result predicted by the semantic segmentation network to be gradually close to the pseudo tag information.

The semantic segmentation network obtained by training can be applied to any scene with segmentation tasks, the semantic segmentation algorithm precision is improved, and meanwhile the labeling cost is greatly saved.

For example, in scenes such as high-speed rail inspection, automobile quality inspection and the like, the method disclosed by the embodiment of the disclosure can be used for training defect detection algorithms such as nut loosening detection and automobile stamping quality inspection, and the accuracy of the defect detection algorithms is improved with low cost.

As shown in fig. 9, fig. 9 is a block diagram of a training apparatus for a pseudo tag model according to at least one embodiment of the present disclosure, where the apparatus includes:

a training data acquisition module 41 configured to: obtaining label information and edge information of a sample image, wherein the label information comprises information of a segmentation mask of the sample image, the edge information comprises information of the edge of the segmentation mask, and the segmentation mask is used for marking the category of each pixel in the sample image; and obtaining a first classification prediction result and a first edge prediction result of the sample image, wherein the first classification prediction result and the first edge prediction result are obtained by prediction of a pseudo label model to be trained.

A network loss determination module 42 for: determining a first network loss according to the difference between the first classification prediction result and the label information; and determining a second network loss according to the difference between the first edge prediction result and the edge information.

A network parameter adjustment module 43, configured to: and adjusting the network parameters of the pseudo label model to be trained according to the first network loss and the second loss until a model training end condition is reached, and obtaining the pseudo label model.

In some optional embodiments, the training data obtaining module 41 is further configured to: and filling the segmentation mask according to the pattern to obtain the information of the edge of the segmentation mask.

In some optional embodiments, the network loss determining module 42, when configured to determine the second network loss according to the difference between the first edge prediction result and the edge information, is specifically configured to: and determining a second network loss according to the difference between the first edge prediction result of the pixel and the edge information of the pixel, wherein the pixel is positioned at the edge of the segmentation mask in each pixel of the labeled image.

In some optional embodiments, the first classification prediction result includes a predicted probability that each pixel in the labeled image belongs to each class in a class set; the network loss determining module 42, when configured to determine a first network loss according to a difference between the first classification prediction result and the tag information, is specifically configured to: and for each pixel in the labeled image, calculating to obtain a first network loss according to the category to which the pixel labeled in the label information belongs and the probability of the pixel contained in the first classification prediction result belonging to the category.

In some optional embodiments, the first edge prediction result includes a predicted probability that each pixel in the labeled image belongs to an edge class or a background class; the network loss determining module 42, when configured to determine a second network loss according to a difference between the first edge prediction result and the edge information, is specifically configured to: and for each pixel in the labeled image, calculating to obtain a second network loss according to the edge class or the background class to which the pixel labeled by the edge information belongs and the probability that the pixel contained in the first edge prediction result belongs to the edge class or the background class.

In some optional embodiments, the network parameter adjusting module 43, when configured to adjust the network parameter of the pseudo tag model according to the first network loss and the second loss, is specifically configured to: in each round of training, carrying out weighted summation on the first network loss and the second network loss to obtain total network loss; adjusting network parameters of the pseudo label model according to the total network loss; or, in each round of training of the first-stage training, adjusting the network parameters of the pseudo label model according to the first network loss until the first-stage training is completed to obtain an adjusted pseudo label model; and in each round of training of the second-stage training, adjusting the network parameters of the pseudo label model after the first-stage training according to the second network loss until the second-stage training is completed.

As shown in fig. 10, fig. 10 is a block diagram of a training apparatus of a semi-supervised semantic segmentation network according to at least one embodiment of the present disclosure, where a training sample set of the semantic segmentation network includes: a plurality of unlabeled images; the device comprises:

a model acquisition module 51 for: and acquiring the pseudo label model obtained by training through the training method of the pseudo label model.

A model prediction module 52 to: and inputting the label-free image into the pseudo label model to obtain a corresponding second prediction result and a second edge prediction result.

A label modification module 53 configured to: and correcting the second classification prediction result according to the second edge prediction result to obtain pseudo label information corresponding to the label-free image.

A network training module 54 for: and training the semantic segmentation network according to the unlabeled image and the corresponding pseudo label information thereof.

In some optional embodiments, when the network training module 54 is configured to train the semantic segmentation network according to the unlabeled image and the corresponding pseudo label information thereof, it is specifically configured to: inputting the unlabeled image into the semantic segmentation network to be trained to obtain a prediction segmentation result; and adjusting the network parameters of the semantic segmentation network according to the difference between the prediction segmentation result and the pseudo label information.

The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

The embodiment of the present disclosure further provides an electronic device, as shown in fig. 11, where the electronic device includes a memory 11 and a processor 12, the memory 11 is configured to store computer instructions executable on the processor, and the processor 12 is configured to implement the training method of the pseudo tag model or the training method of the semi-supervised semantic segmentation network according to any embodiment of the present disclosure when executing the computer instructions.

Embodiments of the present disclosure also provide a computer program product, which includes a computer program/instruction, and when executed by a processor, the computer program/instruction implements a training method of a pseudo tag model or a training method of a semi-supervised semantic segmentation network according to any embodiment of the present disclosure.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a training method for a pseudo tag model or a training method for a semi-supervised semantic segmentation network according to any of the embodiments of the present disclosure.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement without inventive effort.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A training method of a pseudo label model is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

and filling the segmentation mask according to the pattern to obtain the information of the edge of the segmentation mask.

3. The method of claim 1,

the determining a second network loss according to a difference between the first edge prediction result and the edge information includes:

and determining a second network loss for pixels positioned at the edge of the segmentation mask in each pixel of the sample image according to the difference between the first edge prediction result of the pixel and the edge information.

4. The method according to claim 1, wherein the first classification prediction result comprises a predicted probability that each pixel in the sample image belongs to each class in a class set;

the determining a first network loss according to a difference between the first classification prediction result and the label information includes:

and for each pixel in the sample image, calculating to obtain a first network loss according to the class to which the pixel labeled in the label information belongs and the probability of the pixel contained in the first classification prediction result belonging to the class.

5. The method according to claim 1, wherein the first edge prediction result comprises a predicted probability that each pixel in the sample image belongs to an edge class or a background class;

and for each pixel in the sample image, calculating to obtain a second network loss according to the edge class or the background class to which the pixel labeled by the edge information belongs and the probability that the pixel contained in the first edge prediction result belongs to the edge class or the background class.

6. The method of claim 1,

adjusting network parameters of the pseudo tag model according to the first network loss and the second loss, including:

in each round of training, carrying out weighted summation on the first network loss and the second network loss to obtain total network loss;

adjusting network parameters of the pseudo label model according to the total network loss;

or, in each round of training of the first-stage training, adjusting the network parameters of the pseudo label model according to the first network loss until the first-stage training is completed to obtain an adjusted pseudo label model;

and in each round of training of the second-stage training, adjusting the network parameters of the pseudo label model after the first-stage training according to the second network loss until the second-stage training is completed.

7. A training method of a semi-supervised semantic segmentation network is characterized in that a training sample set of the semantic segmentation network comprises the following steps: a plurality of unlabeled images;

the method comprises the following steps:

acquiring a pseudo label model obtained by training through the training method of the pseudo label model according to any one of claims 1 to 6;

8. The method of claim 7, wherein the training the semantic segmentation network according to the unlabeled image and the corresponding pseudo-label information thereof comprises:

inputting the unlabeled image into the semantic segmentation network to be trained to obtain a prediction segmentation result;

and adjusting the network parameters of the semantic segmentation network according to the difference between the prediction segmentation result and the pseudo label information.

9. An apparatus for training a pseudo label model, the apparatus comprising:

10. A training device for semi-supervised semantic segmentation network is characterized in that a training sample set of the semantic segmentation network comprises: a plurality of unlabeled images;

the device comprises:

a model acquisition module to: acquiring a pseudo label model obtained by training through the training method of the pseudo label model according to any one of claims 1 to 6;

a label correction module to: correcting the second classification prediction result according to the second edge prediction result to obtain pseudo label information corresponding to the label-free image;

a network training module to: and training the semantic segmentation network according to the unlabeled image and the corresponding pseudo label information.

11. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any one of claims 1 to 6 or the method of any one of claims 7 to 8 when executing the computer instructions.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 6, or carries out the method of any one of claims 7 to 8.